PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Cancer Res. Author manuscript; available in PMC May 1, 2012.
Published in final edited form as:
PMCID: PMC3085580
NIHMSID: NIHMS279021
Large Scale Exploration of Gene-Gene Interactions in Prostate Cancer Using a Multi-stage Genome-wide Association Study
Julia Ciampa,1 Meredith Yeager,2 Laufey Amundadottir,2 Kevin Jacobs,2 Peter Kraft,3 Charles Chung,2 Sholom Wacholder,1 Kai Yu,1 William Wheeler,1 Michael J. Thun,4 W. Ryan Divers,4 Susan Gapstur,4 Demetrius Albanes,1 Jarmo Virtamo,5 Stephanie Weinstein,1 Edward Giovannucci,6 Walter C. Willett,6 Geraldine Cancel-Tassin,7 Olivier Cussenot,7 Antoine Valeri,7 David Hunter,3 Robert Hoover,1 Gilles Thomas,1 Stephen Chanock,2 and Nilanjan Chatterjee1*
1 Epidemiology and Biostatistics Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20892, USA
2 Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, NCI-Frederick, Frederick, MD 21702, USA
3 Program in Molecular and Genetic Epidemiology, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA
4 Department of Epidemiology and Surveillance Research, American Cancer Society, Atlanta, GA 30329
5 Department of Health Promotion and Chronic Disease Prevention, National Public Health Institute, Helsinki, Finland
6 Department of Nutrition, Harvard School of Public Health, Boston, Massachusetts
7 CeRePP Hopital Tenon, Assistance Publique-Hôpitaux de Paris, Paris, France
* Corresponding author: chattern/at/mail.nih.gov; phone, 301-402-7933; fax, 301-402-0081
Recent genome-wide association studies have identified independent susceptibility loci for prostate cancer (CaP) that could influence risk through interaction with other, possibly undetected, susceptibility loci. We explored evidence of interaction between pairs of 13 known susceptibility loci and single nucleotide polymorphisms (SNPs) across the genome to generate hypotheses about the functionality of CaP susceptibility regions. We used data from Cancer Genetic Markers of Susceptibility: Stage I included 523,841 SNPs in 1175 cases and 1100 controls; Stage II included 27,383 SNPs in an additional 3941 cases and 3964 controls. Power calculations assessed the magnitude of interactions our study is likely to detect. Logistic regression was used with alternative methods that exploit constraints of gene-gene independence between unlinked loci to increase power. Our empirical evaluation demonstrated that an empirical Bayes (EB) technique is powerful and robust to possible violation of the independence assumption. Our EB analysis identified several noteworthy interacting SNP pairs, although none reached genome-wide significance. We highlight a Stage II interaction between the major CaP susceptibility locus in the subregion of 8q24 that contains POU5F1B and an intronic SNP in the transcription factor EPAS1, which has potentially important functional implications for 8q24. Another noteworthy result involves interaction of a known CaP susceptibility marker near the prostate protease genes KLK2 and KLK3 with an intronic SNP in PRXX2. Overall, the interactions we have identified merit follow-up study, particularly the EPAS1 interaction which has implications not only in CaP but also in other epithelial cancers that are associated with the 8q24 locus.
Recent genome-wide association studies (GWAS) have identified multiple single nucleotide polymorphisms (SNPs) associated with risk of prostate cancer (CaP) (18). Functional variants in linkage disequilibrium (LD) with the established markers may directly contribute to CaP risk through interactions with other, yet undetected, susceptibility loci. Large datasets are required for exploration of interactions between known risk alleles and the remainder of the genome. Such large scale studies could give rise to the discovery of novel susceptibility regions and a better understanding of the biology of CaP. In this report, we present the first results from a study of gene-gene interactions in the etiology of CaP using data from Stages I and II of the Cancer Genetics Markers of Susceptibility (CGEMS) Initiative.
To identify SNPs that may interact with established susceptibility regions to affect CaP risk, we conducted a series of conditional genome scans. The susceptibility (conditioning) regions included nine gene regions and four independent regions within the chromosomal region 8q24. All have demonstrated strong associations with CaP in recent GWAS, including but not limited to CGEMS (Table 1).
Table 1
Table 1
Summary of thirteen conditioning regions individually studied in conditional genome scans.
The region of 8q24 warrants close attention because it contains independent risk markers for prostate and additional cancers (Figure 1). Despite its strong associations with multiple cancers, the physiologic function of 8q24 remains an area of investigation. Molecular studies have focused on Region 3 of 8q24 which is associated with several epithelial cancers, including prostate and colorectal. The most significant marker in that region, rs6983267, is part of a consensus binding sequence for TCF (9), a family of transcription factors that are nuclear targets of WNT signaling. The risk allele has been shown to participate in long-range regulation of the WNT-targeted oncogene MYC that is telomeric to the regions of 8q24 associated with multiple cancers (1012). To a lesser degree than MYC, POU5F1B (also called POU5F1P1) has drawn attention as a plausible candidate gene to explain the underlying biology of the association signals (1316). It is the only confirmed gene within the extended 8q24 region, located specifically in Region 3. Until recently POU5F1B was classified as a highly homologous (15) pseudogene of POU5F1 (also called OCT4 or OCT3), which is a central gene in the regulation of stem cell pluripotency (17) (Figure 2a). Recent reports on POU5F1B have demonstrated that it produces a protein with similar function to POU5F1(14) and that it is over-expressed in CaP (13).
Figure 1
Figure 1
Linkage disequilibrium and cancer susceptibility pattern for 8q24 region
Figure 2
Figure 2
Simplified schematics of the pluripotency network with emphasis on POU5F1
We applied three methods that are available for exploring gene-gene interactions in case-control studies. Traditionally, logistic regression has been the most popular method for analysis of case-control data. In recent years, a number of reports have noted that the power for exploring gene-gene interactions from case-control studies can be greatly enhanced by alternative methods that exploit the assumption of gene-gene independence between distant loci (18, 19). These methods, however, can be very sensitive to violation of the underlying gene-gene independence assumption (20). Our analysis provides the first empirical assessment of the performance of these alternative methods for large scale exploration of gene-gene interactions in a GWAS setting.
CGEMS Stage I
The details of CGEMS Stage I have been published previously (6). Briefly, the subjects available for this study included 1175 cases and 1100 controls of European ancestry from the Prostate, Lung, Colon and Ovarian Screening Trial. They were genotyped using two Illumina chips (HumanHap 300 and HumanHap240) that constituted 523,841 autosomal SNPs.
CGEMS Stage II
The details of CGEMS Stage II have been published previously (5). Briefly, the subjects included an additional 3941 cases and 3964 controls of European ancestry. They represent five studies (case/control): Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study in Finland (929/921), Health Professionals Follow-up Study in America (596/611), American Cancer Society Cancer Prevention Study II Nutrition Cohort (1760/1775), and CeRePP French Prostate Case-Control Study (656/657). Subjects were genotyped on 27,383 autosomal SNPs that showed evidence of association in single-SNP Stage I analyses (p<0.05).
We performed a series of conditional genome scans, using data from CGEMS Stage I or Stage II. We chose to study the 13 regions that have already been shown to be strongly associated with risk of CaP in CGEMS or other recent GWAS (Table 1). For each region, except EEFSEC we used the SNP from the original publication. For EEFSEC, we chose the most significant SNP within the gene in the combined CGEMS Stage I and II analysis.
In each scan, we “conditioned” on the most notable SNP from each of 13 well-established susceptibility regions and tested for interaction of the conditioning SNP with all the remaining “scan” SNPs across the genome. Tests for interaction for each scan SNP used a logistic regression model that included the main effect of the conditioning SNP, the main effect of a scan SNP, an interaction term that captured any non-multiplicative effect between the two markers, and adjusting covariates for study and DNA extraction. We assumed alleles within a locus affect risk of the disease in an additive fashion (on the logistic scale) and, thus, coded genotype data for each SNP as a continuous variable and the interaction as the product of the allele counts. In addition to performing standard tests for interactions, we also conducted “omnibus” (21) tests that simultaneously evaluated the significance of the main- and interaction- effect of the scan SNP.
For statistical inference under the logistic model, we used three alternative methods: (a) Unconstrained maximum-likelihood (UML), (b) Constrained maximum-likelihood (CML) and (c) Empirical Bayes (EB). The UML method corresponds to the traditional prospective logistic regression analysis that obtains maximum-likelihood estimates of the odds ratio parameters with no constraint on the joint distribution of scan and conditioning SNPs in the underlying population. The CML method obtains the maximum-likelihood estimates of the same odds ratio coefficients assuming gene-gene independence in the underlying population between the scan and conditioning SNPs (22). While the advantage of the UML method is that its validity does not require any assumption of joint genotype distribution in the underlying population, the CML and analogous methods are known to be much more powerful when the underlying assumptions of independence are valid (19, 23). For the CML analysis, we excluded scan SNPs within 500Kb of the conditioning SNP to minimize gene-gene dependence due to physical proximity. For assessment and estimation of odds ratio interactions, the CML method essentially produces inferences very similar to the popular case-only method (18, 19). Unlike the case-only approach, however, the CML method yields both interaction coefficients and main effects, which are needed for performing omnibus tests as well as for contextual interpretation of interaction.
The third approach we considered involved the recently proposed EB method which exploits the gene-gene or gene-environment independence in a more data-adaptive fashion so that bias can be avoided when the assumptions of independence are violated in the underlying population (24). The method obtains parameter estimates by taking weighted averages of those from the UML and CML methods where the weights depend on two key quantities: (a) the bias of the CML method which could be estimated by the difference of the estimates it produces from those obtained from the UML method and (b) the variance of the parameter estimates from the UML method. As the magnitude of the bias of the CML method increases, the EB method puts more weight on the UML method. Previous simulation studies have suggested that the EB method can strike a good balance between bias and efficiency in large scale studies where the assumptions of independence are likely to be satisfied for most, but not all, combinations of gene-gene and gene-environment factors under study (25). The parameter estimates and standard errors for logistic regression coefficients were obtained in the R package called Case Control. Genetics (26) that implements all three methods.
For each conditional region, we ran two separate scans, one for all 523,841 Stage I SNPs and the other for the subset of 27,383 SNPs that were available in both Stages I and II. We performed the scan for Stage II SNPs using only the Stage II data to make the analysis independent of the selection effect from Stage I. For each scan, we considered Bonferroni adjustment for the appropriate number of SNPs to declare genome-wide significance at p-value<0.05 level. To give higher priority for potential “cis” effects, we separately examined the significance of interaction for scan SNPs within 500Kb of the conditioning gene(s) by adjusting for multiple testing only within that region, using the EB method.
As we aimed to generate hypotheses about the functionality of susceptibility regions in the etiology of CaP, we conducted literature reviews and bioinformatics searches to investigate potential biological mechanisms that could underlie the interactions we observed. For SNPs in or near gene regions, we considered whether those genes were known or thought to participate in processes that relate to cancer or to the function of the relevant conditioning region. This work can help prioritize interacting SNP pairs for follow-up study. It guided our follow-up analysis on the top interaction result for 8q24 Region 3. Using Stage II data, we jointly modeled the top SNPs in ESRRB and SALL4 for interaction with 8q24 Region 3 in a single logistic regression that we analyzed using UML. We focused on the transcription factors ESRRB and SALL4 because, like the top-ranking EPAS1 (also known as HIF-2A), they are positive regulators of POU5F1 (27, 28) (Figure 2). They differ from EPAS1 in that they do not require hypoxic stimuli and that they are also target genes of POU5F1 (29). We examined the top SNP from each gene based on Stage I main effects results (ESRRB p-value=0.001, SALL4 p-value=0.28). Both SNPs are located in an intron of their respective gene. rs7155416 in ESRRB on chromosome 14q24 has a minor allele frequency (MAF) of 0.12 and rs6021460 in SALL4 on chromosome 20q13 has a MAF of 0.44.
Inspection of the quantile-quantile plots for omnibus and interaction tests for all conditional genome scans suggest that, in general, none of the methods was affected by large scale systematic bias or overdispersion. However, in some instances, the CML method showed more statistically significant associations than would be expected by chance (see, for example, Figure 3) even when we excluded all scan SNPs on the same chromosome as the conditioning SNP. In contrast, the UML and data adaptive EB methods did not show any evidence of such excess significance. A possible explanation for this phenomenon is population stratification that could cause long range dependence, violating the underlying assumption of gene-gene independence of the CML method (30). In the subsequent sections, we report the main findings of our conditional scans based on the EB method that is known to be more powerful than standard logistic regression analysis and yet, unlike CML type methods, is resistant to bias due to violation of the gene-gene independence assumption due to population stratification or otherwise.
Figure 3
Figure 3
Quantile-quantile plots for interaction p-values from genome scan conditional on one susceptibility SNP near MSMB
In the Stage I analysis, one SNP, rs2002865, reached genome-wide significance for interaction after Bonferroni correction (p-value=9.14e-10) in the conditional scan of 8q24 Region 4. A second SNP, rs4960563, in strong LD with rs2002865 (r2=0.68), was also highly significant for interaction with the same 8q24 conditioning region (p-value=3.79e-6). Both interactions, however, failed to replicate (p-value>0.05), when the SNP was followed-up with further genotyping in a subset of the Stage II sample. Within the extended JAZF1 conditioning region, one SNP met “region-wide” significance for potential cis-interaction: rs4857841 on chromosome 7p15. It also failed to replicate in follow-up.
In the Stage II conditional scans, no SNP reached genome-wide significance for interaction. A list of top-ranking SNPs (p-value < 1.0e-4) from each conditional scan is shown in Table 2. The most notable finding considering biologic plausibility was an interaction between rs6983267 in 8q24 Region 3 which contains POU5F1B and rs4953347 in the first intron of EPAS1 which upregulates POU5F1 (p-value=9.69e-5, multiplicative odds ratio=1.13). In a follow-up analysis we detected a significant interaction between 8q24 Region 3 and both ESRRB and SALL4 (Table 3), which have functional similarities to EPAS1 (Figure 2). Also of note is the top-ranking SNP for interaction with the KLK2-KLK3 conditioning region: rs1558874 intronic to PRRX2 (p-value=4.80e-5, multiplicative odds ratio=1.33). Those SNPs demonstrated evidence of an interaction in the same direction and of similar magnitude in the independent CGEMS Stage I data (p-value=0.047, multiplicative odds ratio=1.27).
Table 2
Table 2
Results for top SNPs in CGEMS Stage II grouped by conditional scans and ranked by interaction p-values (<1.0e-4).
Table 3
Table 3
Results from multivariate logistic regression analysis of SNPs in 8q24 Region 3, ESRRB and SALL4.
We highlight two region-wide significant results from our investigation of potential “cis” interactions. The first involves rs4314620 in the extended 8q24 region. It showed significant interaction in the conditional scans for Region 2 (p-value=0.003), Region 3 (p-value=0.0004) and Region 4 (p-value=0.02) (Figure 1). An omnibus test for rs4314620 that includes its main effect and interaction with each of the 8q24 conditioning SNPs was highly significant (p-value=2.48e-5). The second result involves rs17714461 for the KLK2-KLK3 region (p-value=7.14e-4). That SNP resides in chromosome 19q13, located ~15Kb from KLK4. It is ~60Kb from the conditioning SNP with which it does not demonstrate LD (r2<0.001).
Our study presents one of the first large scale explorations of gene-gene interactions in the setting of a multi-stage GWAS. Our analysis identifies a list (Table 2) of pair-wise SNP interactions that, through follow-up study, may elucidate the functional relevance of CaP susceptibility SNPs. Our analysis also provides insights into future methodological challenges that large scale studies will face in establishing conclusive interaction with either the primary or a secondary trait. It presents an empirical evaluation of modern methods for interaction analyses using case-control data.
The results we report for the conditional scans were obtained using a recently proposed EB method. We focused on that method because previous simulations demonstrated it is more powerful than standard logistic regression (25) and our empirical evaluation suggests it is more robust than case-only type methods. Our observation that the CML method can suffer bias even when scan and conditioning SNPs are on different chromosomes is particularly cautionary. The robustness of the EB method is expected to be similarly beneficial in studies of gene-environment interactions for which it can be difficult to assess an independence assumption.
Perhaps the most noteworthy result of our study is the top SNP for interaction with 8q24 Region 3 in Stage II: rs4953347 which is intronic to EPAS1. That gene belongs to a family of hypoxia-inducible factors that promote key carcinogenic processes such as angiogensis and metastasis (31). Under hypoxic conditions, which are common in malignant tumors, EPAS1 directly binds and activates POU5F1 (32, 33). By activating POU5F1, EPAS1 has been shown to promote tumorigenesis (34). Both EPAS1 and POU5F1B are over-expressed in CaP, but POU5F1 is not expressed in either healthy or malignant prostate tissue (13, 31). Given these data, we propose that POU5F1B mediates the observed EPAS1-8q24 Region 3 interaction. Our hypothesis involves an assumption that EPAS1 participates in the regulation of POU5F1B, which is currently poorly understood.
Kastler et. al suggested the over-expression of POU5F1B in CaP may mimic the ectopic expression of POU5F1 (13), which has been shown to promote epithelial tumors (35). That hypothesis aligns well with reports that CaP progression involves the reactivation of embryonic pathways (36) because POU5F1 is central to the regulation of stem cell pluripotency (17) and its encoded transcription factor is functionally similar to the protein of POU5F1B (37). Specifically, multipotential progenitor cells are thought to be seeds of tumorigenesis in CaP (37) and ectopic POU5F1 expression is thought to promote epithelial tumorigenesis by inhibiting the differentiation of progenitor cells (35). Given these data we cautiously hypothesize that the CaP association of 8q24 Region 3 involves a type of pluripotency network centered on POU5F1B rather than on POU5F1. Our follow-up analysis of 8q24 Region 3 with ESRRB and SALL4 offers preliminary support of the hypothesis because those genes, which function as both regulators and targets of POU5F1, demonstrate a significant interaction with 8q24 Region 3.
The preceding results should be interpreted cautiously. Future effort is needed to replicate the finding of statistical interaction for EPAS1 and 8q24 Region 3 in independent studies. To obtain conclusive evidence, the sample size for those studies needs to be large due to modest magnitude of the anticipated interaction. Even if our findings replicate, it cannot provide direct evidence for the proposed model underlying the interaction. Additional functional studies would be needed. We believe these future studies should consider not only CaP but rather all epithelial cancers associated with 8q24 Region 3 because, in subsets of those cancers, EPAS1 is over-expressed (31), mRNA transcripts of POU5F1B have been detected (15), and embryonic pathways are implicated (38). Notably, all those features characterize colon cancer.
The Stage II analysis produced other notable results, including two for the conditioning region KLK2-KLK3. The region-wide significant result for rs17714461 is noteworthy because its nearby gene, KLK4, has been shown to stimulate cellular proliferation in CaP in conjunction with KLK2 (39), and additional reports have linked KLK4 to various aspects of CaP progression, including mesenchymal transition, invasion and metastasis (4042). The top SNP for interaction in the KLK2-KLK3 conditional scan was rs1558875 an intronic SNP to PRRX2 that is also associated with cellular proliferation (43). This interaction replicated in our relatively small, independent CGEMS Stage I analysis. These preliminary results suggest that the KLK2-KLK3 susceptibility region contributes to CaP risk through interaction with genes involved in cellular proliferation. Functional follow-up studies and additional replication efforts are warranted.
A second notable finding from our investigation of potential “cis” interactions involved rs4314620 for the extended 8q24 region. Its pair-wise interactions with three independent known risk alleles for CaP within the 8q24 region suggest different 8q24 susceptibility loci may be related by some common underlying biologic mechanism. It is hard to speculate what that mechanism may be because it is a gene-poor region, but one possible explanation for the observed interactions is long-range gene regulation. rs4314620 resides in a sub-region of 8q24 that is associated with bladder cancer (Figure 1) (44) and contains two regulatory regions for the oncogene MYC that flanks 8q24 Region 4 (45).
In our analysis of CGEMS Stage I data, one SNP-pair exceeded genome-wide significance for interaction in the conditional scan for 8q24 Region 4. The result was unlikely to be due to genotyping error as a second SNP in strong LD with the original signal also showed strong significance. Yet, the interaction failed to replicate when we genotyped the SNPs in an additional 2439 cases and 2241 controls in CGEMS Stage II. This example illustrates the challenge of employing rank p-values for prioritization of interaction as well as of establishing the threshold needed for a conclusive finding. We note that Stage I of CGEMS, which included 1175 cases and 1100 controls, was underpowered for study of interactions (Figure 4). In the future, one way of reducing such false positives would be to consider Bayesian methods (46, 47) that can incorporate both power and biological plausibility into measures of statistical significance. Another strategy to gain power, particularly for initial GWAS stages, is meta-analysis of multiple studies, enabling one to increase sample size while retaining the full array of SNPs.
Figure 4
Figure 4
Power curves for detecting interactions in CGEMS at genome wide significance
Due to the scarcity of highly significant findings, we carefully examined the power of CGEMS Stages I and II to detect interactions at genome-wide significance levels (alpha = 1.0e-7 and 1.85e-6 for Stage I and II, respectively; Figure 4). In these calculations, we focused only on quantitative interactions where the effect of one locus can be modified, but not reversed, by the other locus and vice versa. Stage I of CGEMS had virtually no power to detect interaction odds ratios in the scenarios we examined which ranged 1.13–2.05. The larger Stage II, in contrast, had high power for detecting modest to large interaction odds ratio (≥ 1.7) even after accounting for the fact that some power is lost due to selection of the SNPs at Stage I by main effect only. It is notable that under a model of quantitative interaction, a larger interaction odds ratio also corresponds to larger main effects. Thus, the loss of power at Stage I due to the selection by main effect is often small when the interaction odds ratio and MAFs are reasonably large. In contrast, in the presence of qualitative interaction, the main effects of loci could be very weak or even non-existent even when the interaction odds ratio is large. The power for detecting such loci in our analysis is low, as the probability of selecting them for Stage II is low.
We conclude that the susceptibility SNPs we have studied are unlikely to have quantitative interactions of large magnitude with other SNPs in the genome. Theoretical calculations (48), as well as a lack of findings of epistasis for other diseases, also point towards the possibility that large non-multiplicative or non-additive effects may not be abundant in the etiology of complex traits. It is possible that our study has missed qualitative interactions, but the biologic plausibility of the presence of many such extreme types of interaction is questionable. It is also possible that epistasis, if it plays an important role in the etiology of CaP, will have a much more complex form than the pairwise SNP-SNP interactions we studied. Finding such higher order interactions in large scale studies, however, will remain an intrinsically challenging problem because of both the computationally daunting task of exploring all possible multi-locus models and the requirement for extremely large sample sizes that will be necessary to achieve sufficient power while minimizing the chance of false positives.
In the future, detecting evidence of gene-gene interactions through study of statistical interactions between SNP markers will likely require very large sample sizes that are achievable only by sharing individual level data in consortiums of GWAS. For smaller scale studies, the exercise of exploring gene-gene interactions is unlikely to lead to definitive findings, but it can be useful in generating lists of loci that require follow-up in replication studies. Incorporating biological knowledge from reliable pathway and network databases could enhance the power for detection, validation, and interpretation of interaction.
Our exploration of gene-gene interactions in CGEMS identified a list of SNPs that require future replication effort with varying degrees of priority (Table 2). We hope its public availability will motivate replication studies. The EB method we highlight is appropriate for those analyses, as it is both powerful and robust. We consider our most notable finding to be an interaction between SNPs in EPAS1 and 8q24 Region 3 because it generates a preliminary hypothesis about the poorly understood association of 8q24 Region 3 with multiple epithelial cancers that centers on the recently characterized gene POU5F1B. A second result with high priority for follow-up is an interaction between PRRX2 and the KLK2-KLK3 region. It suggests that the functional relevance of the KLK2-KLK3 susceptibility region in the etiology CaP may involve cellular proliferation.
Acknowledgments
This study utilized the high-performance computation capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD.
1. Duggan D, Zheng SL, Knowlton M, Benitez D, Dimitrov L, Wiklund F, et al. Two genome-wide association studies of aggressive prostate cancer implicate putative prostate tumor suppressor gene DAB2IP. J Natl Cancer Inst. 2007;19(99):1836–44. [PubMed]
2. Eeles RA, Kote-Jarai Z, Giles GG, Olama AA, Guy M, Jugurnauth SK, et al. Multiple newly identified loci associated with prostate cancer susceptibility. Nat Genet. 2008;40:316–21. [PubMed]
3. Gudmundsson J, Sulem P, Rafnar T, Bergthorsson JT, Manolescu A, Gudbjartsson D, et al. Common sequence variants on 2p15 and Xp11.22 confer susceptibility to prostate cancer. Nat Genet. 2008;40:281–3. [PMC free article] [PubMed]
4. Gudmundsson J, Sulem P, Gudbjartsson DF, Blondal T, Gylfason A, Agnarsson BA, et al. Genome-wide association and replication studies identify four variants associated with prostate cancer susceptibility. Nat Genet. 2009;41:1122–6. [PMC free article] [PubMed]
5. Thomas G, Jacobs KB, Yeager M, Kraft P, Wacholder S, Orr N, et al. Multiple loci identified in a genome-wide association study of prostate cancer. Nat Genet. 2008;40:310–5. [PubMed]
6. Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, Wacholder S, et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet. 2007;39:645–9. [PubMed]
7. Yeager M, Chatterjee N, Ciampa J, Jacobs KB, Gonzalez-Bosquet J, Hayes RB, et al. Identification of a new prostate cancer susceptibility locus on chromosome 8q24. Nat Genet. 2009;41:1055–7. [PMC free article] [PubMed]
8. Zheng SL, Stevens VL, Wiklund F, Isaacs SD, Sun J, Smith S, et al. Two independent prostate cancer risk-associated Loci at 11q13. Cancer Epidemiol Biomarkers Prev. 2009;18:1815–20. [PMC free article] [PubMed]
9. Pomerantz MM, Ahmadiyeh N, Jia L, Herman P, Verzi MP, Doddapaneni H, et al. The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nat Genet. 2009;41:882–4. [PMC free article] [PubMed]
10. Ahmadiyeh N, Pomerantz MM, Grisanzio C, Herman P, Jia L, Almendro V, et al. 8q24 prostate, breast, and colon cancer risk loci show tissue-specific long-range interaction with MYC. Proc Natl Acad Sci U S A. 2010;107:9742–6. [PubMed]
11. Sotelo J, Esposito D, Duhagon MA, Banfield K, Mehalko J, Liao H, et al. Long-range enhancers on 8q24 regulate c-Myc. Proc Natl Acad Sci U S A. 2010;107:3001–5. [PubMed]
12. Wright JB, Brown SJ, Cole MD. Upregulation of c-MYC in cis through a Large Chromatin Loop Linked to a Cancer Risk-Associated Single-Nucleotide Polymorphism in Colorectal Cancer Cells. Mol Cell Biol. 2010;30:1411–20. [PMC free article] [PubMed]
13. Kastler S, Honold L, Luedeke M, Kuefer R, Moller P, Hoegel J, et al. POU5F1P1, a putative cancer susceptibility gene, is overexpressed in prostatic carcinoma. Prostate. 2009;70:666–74. [PubMed]
14. Panagopoulos I, Moller E, Collin A, Mertens F. The POU5F1P1 pseudogene encodes a putative protein similar to POU5F1 isoform 1. Oncol Rep. 2008;20:1029–33. [PubMed]
15. Suo G, Han J, Wang X, Zhang J, Zhao Y, Zhao Y, et al. Oct4 pseudogenes are transcribed in cancers. Biochem Biophys Res Commun. 2005;337:1047–51. [PubMed]
16. Zheng SL, Sun J, Cheng Y, Li G, Hsu FC, Zhu Y, et al. Association between two unlinked loci at 8q24 and prostate cancer risk among European Americans. J Natl Cancer Inst. 2007;99:1525–33. [PubMed]
17. Pan GJ, Chang ZY, Scholer HR, Pei D. Stem cell pluripotency and transcription factor Oct4. Cell Res. 2002;12:321–9. [PubMed]
18. Khoury MJ, Flanders WD. Nontraditional epidemiologic approaches in the analysis of gene-environment interaction: case-control studies with no controls! Am J Epidemiol. 1996;144:207–13. [PubMed]
19. Piegorsch WW, Weinberg CR, Taylor JA. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Stat Med. 1994;13:153–62. [PubMed]
20. Albert PS, Ratnasinghe D, Tangrea J, Wacholder S. Limitations of the case-only design for identifying gene-environment interactions. Am J Epidemiol. 2001;154:687–93. [PubMed]
21. Kraft P, Yen YC, Stram DO, Morrison J, Gauderman WJ. Exploiting gene-environment interaction to detect genetic associations. Hum Hered. 2007;63:111–9. [PubMed]
22. Chatterjee N, Carroll RJ. Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies. Biometrika. 2005;92:399–418.
23. Umbach DM, Weinberg CR. Designing and analysing case-control studies to exploit independence of genotype and exposure. Stat Med. 1997;16:1731–43. [PubMed]
24. Mukherjee B, Chatterjee N. Exploiting gene-environment independence for analysis of case-control studies: an empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics. 2008;64:685–94. [PubMed]
25. Mukherjee B, Ahn J, Gruber SB, Rennert G, Moreno V, Chatterjee N. Tests for gene-environment interaction from case-control data: a novel study of type I error, power and designs. Genet Epidemiol. 2008;32:615–26. [PubMed]
26. CGEN. [Internet] Maryland: National Cancer Institute; 2010. [cited 19 July 2010]. Available from http://dceg.cancer.gov/about/staff-bios/chatterjee-nilanjan.
27. Zhang X, Zhang J, Wang T, Esteban MA, Pei D. Esrrb activates Oct4 transcription and sustains self-renewal and pluripotency in embryonic stem cells. J Biol Chem. 2008;283:35825–33. [PubMed]
28. Zhang J, Tam WL, Tong GQ, Wu Q, Chan HY, Soh BS, et al. Sall4 modulates embryonic stem cell pluripotency and early embryonic development by the transcriptional regulation of Pou5f1. Nat Cell Biol. 2006;8:1114–23. [PubMed]
29. Sharov AA, Masui S, Sharova LV, Piao Y, Aiba K, Matoba R, et al. Identification of Pou5f1, Sox2, and Nanog downstream target genes with statistical confidence by applying a novel algorithm to time course microarray and genome-wide chromatin immunoprecipitation data. BMC Genomics. 2008;9:269. [PMC free article] [PubMed]
30. Bhattacharjee S, Wang Z, Ciampa J, Kraft P, Chanock S, Yu K, et al. Using Principal Components of Genetic Variation for Robust and Powerful Detection of Gene-Gene Interactions in Case-Control and Case-Only Studies. Am J Hum Genet. 2010;86:331–342. [PubMed]
31. Rankin EB, Giaccia AJ. The role of hypoxia-inducible factors in tumorigenesis. Cell Death Differ. 2008;15:678–85. [PMC free article] [PubMed]
32. Simon MC, Keith B. The role of oxygen availability in embryonic development and stem cell function. Nat Rev Mol Cell Biol. 2008;9:285–96. [PMC free article] [PubMed]
33. Forristal CE, Wright KL, Hanley NA, Oreffo RO, Houghton FD. Hypoxia inducible factors regulate pluripotency and proliferation in human embryonic stem cells cultured at reduced oxygen tensions. Reproduction. 2010;139:85–97. [PMC free article] [PubMed]
34. Covello KL, Kehler J, Yu H, Gordan JD, Arsham AM, Hu CJ, et al. HIF-2alpha regulates Oct-4: effects of hypoxia on stem cell function, embryonic development, and tumor growth. Genes Dev. 2006;20:557–70. [PubMed]
35. Hochedlinger K, Yamada Y, Beard C, Jaenisch R. Ectopic expression of Oct-4 blocks progenitor-cell differentiation and causes dysplasia in epithelial tissues. Cell. 2005;121:465–77. [PubMed]
36. Schaeffer EM, Marchionni L, Huang Z, Simons B, Blackman A, Yu W, et al. Androgen-induced programs for prostate epithelial growth and invasion arise in embryogenesis and are reactivated in cancer. Oncogene. 2008;27:7180–91. [PMC free article] [PubMed]
37. van Leenders GJ, Schalken JA. Epithelial cell differentiation in the human prostate epithelium: implications for the pathogenesis and therapy of prostate cancer. Crit Rev Oncol Hematol. 2003;46 (Suppl):S3–10. [PubMed]
38. Ricci-Vitiani L, Lombardi DG, Pilozzi E, Biffoni M, Todaro M, Peschle C, et al. Identification and expansion of human colon-cancer-initiating cells. Nature. 2007;445:111–5. [PubMed]
39. Mize GJ, Wang W, Takayama TK. Prostate-specific kallikreins-2 and -4 enhance the proliferation of DU-145 prostate cancer cells through protease-activated receptors-1 and -2. Mol Cancer Res. 2008;6:1043–51. [PubMed]
40. Gao J, Collard RL, Bui L, Herington AC, Nicol DL, Clements JA. Kallikrein 4 is a potential mediator of cellular interactions between cancer cells and osteoblasts in metastatic prostate cancer. Prostate. 2007;67:348–60. [PubMed]
41. Wang W, Mize GJ, Zhang X, Takayama TK. Kallikrein-related peptidase-4 initiates tumor-stroma interactions in prostate cancer through protease-activated receptor-1. Int J Cancer. 2009 [PubMed]
42. Whitbread AK, Veveris-Lowe TL, Lawrence MG, Nicol DL, Clements JA. The role of kallikrein-related peptidases in prostate cancer: potential involvement in an epithelial to mesenchymal transition. Biol Chem. 2006;387:707–14. [PubMed]
43. Stelnicki EJ, Arbeit J, Cass DL, Saner C, Harrison M, Largman C. Modulation of the human homeobox genes PRX-2 and HOXB13 in scarless fetal wounds. J Invest Dermatol. 1998;111:57–63. [PubMed]
44. Kiemeney LA, Thorlacius S, Sulem P, Geller F, Aben KK, Stacey SN, et al. Sequence variant on 8q24 confers susceptibility to urinary bladder cancer. Nat Genet. 2008;40:1307–12. [PubMed]
45. Hallikas O, Palin K, Sinjushina N, Rautiainen R, Partanen J, Ukkonen E, et al. Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell. 2006;124:47–59. [PubMed]
46. Wacholder S, Chanock S, Garcia-Closas M, El GL, Rothman N. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J Natl Cancer Inst. 2004;96:434–42. [PubMed]
47. Wakefield J. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am J Hum Genet. 2007;81:208–27. [PubMed]
48. Hill WG, Goddard ME, Visscher PM. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 2008;4:e1000008. [PMC free article] [PubMed]
49. Boiani M, Scholer HR. Regulatory networks in embryo-derived pluripotent stem cells. Nat Rev Mol Cell Biol. 2005;6:872–84. [PubMed]
50. Kang J, Shakya A, Tantin D. Stem cells, stress, metabolism and cancer: a drama in two Octs. Trends Biochem Sci. 2009;34:491–9. [PubMed]