|Home | About | Journals | Submit | Contact Us | Français|
Prostate cancer (PrCa) is the most common non-skin cancer diagnosed among males in developed countries and the second leading cause of cancer mortality, yet little is known regarding its etiology and factors that influence clinical outcome. Genome-wide association studies (GWAS) of PrCa have identified at least 30 distinct loci associated with small differences in risk. We conducted a GWAS in 2782 advanced PrCa cases (Gleason grade ≥ 8 or tumor stage C/D) and 4458 controls with 571 243 single nucleotide polymorphisms (SNPs). Based on in silico replication of 4679 SNPs (Stage 1, P < 0.02) in two published GWAS with 7358 PrCa cases and 6732 controls, we identified a new susceptibility locus associated with overall PrCa risk at 2q37.3 (rs2292884, P= 4.3 × 10−8). We also confirmed a locus suggested by an earlier GWAS at 12q13 (rs902774, P= 8.6 × 10−9). The estimated per-allele odds ratios for these loci (1.14 for rs2292884 and 1.17 for rs902774) did not differ between advanced and non-advanced PrCa (case-only test for heterogeneity P= 0.72 and P= 0.61, respectively). Further studies will be needed to assess whether these or other loci are differentially associated with PrCa subtypes.
Prior to recent genome-wide association studies (GWAS), only three major prostate cancer (PrCa) risk factors were known, namely age, African ancestry and family history. Familial and twin studies suggested the importance of inherited factors for PrCa development (1–2), prognosis and tumor aggressiveness (3). A critical clinical question in PrCa has been to determine who will develop aggressive as opposed to indolent disease, as men with poorly differentiated or advanced-stage tumors are more likely to die from this type of cancer (4–5). Family studies have identified at least three promising regions linked to the aggressive form of PrCa, but these results have not been definitively replicated (6–9).
Over the past 3 years, GWAS have successfully identified at least 30 independent common PrCa susceptibility alleles across the genome, each of which confers a small increase in risk for both advanced and non-advanced PrCa (10–26). There is preliminary evidence that three loci could be associated with the risk for advanced disease alone, namely those on 9q33.2 (harboring the candidate gene, DAB2IP; 24), 22q13.1 (27) and 17p12 (25), but further work is required to confirm these observations. The known PrCa susceptibility loci explain a modest fraction of the 2-fold familial relative risk observed for first-degree relatives of affected individuals (11).
In an attempt to identify additional loci that influence susceptibility to PrCa overall, and, in particular, advanced PrCa, we conducted a genome-wide scan using cases with advanced disease (Gleason grade ≥ 8 or tumor stage C/D) from the National Cancer Institute Breast & Prostate Cancer Cohort Consortium (BPC3). To validate associations observed in the initial GWAS, we performed in silico replication for the most promising markers using data from two previous GWAS conducted in the UK and Australia (11–12) and the Cancer Prostate in Sweden (CAPS) Study (23). We identified a new region associated with the risk of PrCa at 2q37.3 and confirmed a proposed association at 12q13 that had previously not achieved genome-wide significance (12). These loci were associated with both advanced and non-advanced PrCa, and neither locus showed compelling evidence for heterogeneity in effects by tumor type.
For the advanced PrCa GWAS (denoted Stage 1), 2891 advanced PrCa cases and 4592 controls of European ancestry from the seven BPC3 cohorts (see Supplementary Material, Table S1) were genotyped across four genotyping centers, mostly using the Illumina HumanHap610 quad array. After quality control metrics (see Materials and Methods), 571 243 single nucleotide polymorphisms (SNPs) were analyzed in 2782 advanced PrCa cases and 4458 controls. Genotype frequencies in cases and controls were compared using a 1-d.f. trend test within each cohort and combined using fixed-effect meta-analysis (see Materials and Methods). After adjustment for differential population structure, there was little evidence for inflation in the genomic control test statistic (λ1000 = 1.01, Supplementary Material, Fig. S1).
We observed evidence of association with advanced PrCa for the majority of previously reported PrCa loci (Supplementary Material, Table S2). For the confirmed loci, the estimated per-allele odds ratios for advanced PrCa were consistent with those from the original reports in which both advanced and non-advanced PrCa were combined. For 8 of the 30 previously reported regions (2p21, 2p15, 4q22.3, 4q24, 8p21, 13q22.1, 17p12 and 19q13.33 [KLK3]), including the proposed advanced-only region 17p12, we observed no statistically significant association (P≥ 0.06) with advanced PrCa in our prospectively ascertained advanced cases and controls. We also observed no association between the proposed advanced-only marker at 22q13.1 and advanced PrCa after excluding subjects that overlapped with the initial report of that marker (P= 0.39). The failure to observe associations in these regions may be due to limited power, differences in linkage disequilibrium structure (the 5p15, 6q22 and 13q22 regions were identified in a recent GWAS in Japanese men) or differences in observed per-allele odds ratios between advanced and non-advanced disease (28–29).
One previously proposed advanced-only marker on chromosome 9q33.2 was nominally statistically significantly associated with advanced PrCa after removing samples that overlapped with original reports for these loci (rs1571801, P= 1.4 × 10−3; Supplementary Material, Table S2). This SNP was genotyped in the full BPC3 cohort (10 501 cases and 10 831 controls), including both advanced and non-advanced PrCa cases as part of an earlier validation study (30); no evidence for differences in association by Gleason grade (<8 versus ≥8; P= 0.50) or tumor stage (AB versus CD P= 0.62) was found.
To identify additional PrCa risk markers in these previously reported regions, we examined the association between 41 756 high-quality imputed SNPs (Supplementary Methods) within 1cM of any of the published risk markers listed in Supplementary Material, Table S2 and advanced PrCa in the Stage 1 data. We analyzed each imputed SNP twice: once without adjusting for any published marker and once adjusting for the published markers in the region containing the SNP. Supplementary Material, Figure S2 presents the unconditional (dark grey) and conditional (blue) results for each region within 1 cM of the index signal for 30 regions. Of the 22 regions where a previously reported SNP was nominally significantly associated with advanced PrCa in Stage 1, three (3p12.1, 6q22.1 and 11p15.5) exhibited strong evidence that a novel SNP was a better PrCa marker than a previously reported marker (approximate Bayes factor >10; Supplementary Material, Table S3). However, after conditioning on known markers, only 1 of the 22 regions (8q24.21) contained novel markers that were suggestively associated with advanced PrCa (P< 10−4; Supplementary Material, Table S3). This highlights the complex genetic architecture in these regions, which will require additional studies with dense genotyping in large sample sizes to deconstruct (19,31–33).
No novel regions were detected at genome-wide significance in Stage 1 (P< 5 × 10−8). We performed an in silico replication in two existing GWAS (11–12,23) for the most significant 5000 SNPs after removing previously reported SNPs and filtering out correlated SNPs with r2 > 0.2 (see Materials and Methods).
Results from analyses combining both non-advanced and advanced PrCa cases as well as those examining only advanced PrCa cases are presented in Table 1 for the two novel loci that achieved genome-wide significance. A novel region associated with PrCa risk overall was identified at 2q37 (rs2292884; P= 4.3 × 10−8). We also confirmed an association between rs902774 at 12q13 and PrCa (P= 4.7 × 10−9). This SNP was highlighted in a previous report of the GWAS conducted in the UK and Australia (12); despite suggestive evidence for association in the first stage of that study, it did not achieve genome-wide significance in the combined first and second stages (denoted UK1 and UK2+Aus in Table 1). The significant heterogeneity in per-allele odds ratio estimates for rs902774 across the four studies included in this report (P= 2.6 × 10−3) was also driven by the UK2+Aus samples; we observed no significant heterogeneity in this effect among BPC3 studies (P= 0.33; Supplementary Material, Table S4) or among the BPC3, UK1 and CAPS studies (P= 0.11).
The per-allele odds ratios for advanced and non-advanced PrCa were not significantly different for either of these loci (case-only test for heterogeneity P > 0.08; Table 1). However, this heterogeneity test is biased towards the null since aggressive and non-aggressive PrCa cases were combined to reach genome-wide significance. Additional studies will be needed to evaluate aggressive- and non-aggressive-specific effects.
We examined the association between these two genome-wide significant SNPs and PrCa in three non-European populations: African Americans (1071 cases and 1074 controls), Latinos (1043 cases, 1057 controls) and Japanese (1033 cases, 1042 controls) (see Materials and Methods). No significant association was observed for either of these markers in any one of these three non-European populations (Supplementary Material, Table S5 and Fig. S3), which may be due to low power or differences in allele frequencies or linkage disequilibrium. However, the direction of association between the 2q37 SNP rs2292884 and PrCa risk was consistent across all four populations, and a meta-analysis of the African American, Latino and Japanese results suggested this SNP was also associated with PrCa in these non-European populations (Supplementary Material, Table S5; P= 7.1 × 10−3). This region will need to be further evaluated in larger sample sizes of both European and non-Europeans.
Eight additional markers located in or near known risk regions achieved genome-wide significance (P< 5 × 10−8) in analyses combining Stage 1 and the in silico replication studies (Table 2). By design, the linkage disequilibrium between these markers and known markers was low (r2 < 0.2). After conditioning on known risk alleles, the statistical significance for the new markers dropped by three to eight orders of magnitude. None remained genome-wide significant after conditioning, with the exception of rs651164 at 6q25.3, which had been identified as a possible PrCa marker in a previous report (11). Several recombination hotspots separate rs651164 and the previously reported rs9364554 (12), which lie over 252 kb apart (Supplementary Material, Fig. S4); consequently, the linkage disequilibrium between these two markers is low (r2 = 0.00 and |D'| = 0.16 in the HapMap CEU sample). This suggests that the 6q25.3 region contains two statistically independent markers of PrCa risk, rs651164 and rs9364554. Also of note, rs7629490 at 3p11 remained strongly associated with the risk of PrCa (P= 1.2 × 10−7) after conditioning on the previously reported PrCa risk marker rs2660753 (12). The SNP rs7629490 is located 130 kb centromeric from rs2660753. Although these two SNPs are in a large block of limited recombination (spanning 320 kb from rs1370041 at chr3:87,171,855 to rs4858957 at chr3:87,491,848; Supplementary Material, Fig. S4), they are weakly correlated (r2 = 0.05; |D'| = 0.38). Further fine-mapping studies are required to identify the mechanism underlying the statistical associations between PrCa and multiple, weakly correlated markers in these regions.
We conducted the GWAS of PrCa using 2782 cases with advanced PrCa (Gleason grade ≥ 8 or stage C/D) and 4458 controls of European ancestry nested in seven prospective cohort studies, followed by in silico replication in two previous GWAS (11–12,23). We confirmed a previously proposed PrCa marker (rs902774) at 12q13 and identified a new PrCa marker (rs229884) at 2q37. Our findings are consistent with the expectation that further GWAS will continue to discover additional low penetrance, common alleles associated with the risk for PrCa (34).
At 2q37, rs2292884 is a missense SNP (Arg347His) located in exon 9 of the melanophilin gene (MLPH) (Fig. 1A). This protein is a member of the exophilin subfamily of Rab effector proteins. MLPH forms a ternary complex with the small Ras-related GTPase Rab27A and myosin Va; this complex functions as a tether for pigment-producing melanosomes to bind with actin cytoskeleton in melanocytes. Mouse models have demonstrated that mutations in MLPH are required for visible pigmentation in the hair and skin (35). A homozygous mutation in the MLPH gene has been associated with the human affliction Griscelli syndrome type 3, a hypopigmentaton disorder (36). Other genes nearby on chromosome 2q37 include prolactin releasing hormone (PRLH) and RAB17, which is a member of the Ras oncogene family that regulates membrane trafficking. The aforementioned candidate genes in this region have not been previously implicated in PrCa carcinogenesis. The SNP rs2292884 was associated with overall PrCa risk at a conventional level for genome-wide statistical significance (P< 5 × 10−8). This provides reassurance that this result is not a false positive due to multiple testing. However, given the modest odds ratio and the fact that the P-value is near the significance threshold, replication of this association in future, large studies will be important.
The 12q13 rs902774 SNP lies in the type II keratin family cluster (Fig. 1B), which encodes a set of genes that contribute to the formation of cytoplasmic filaments in epithelial cells. These filaments maintain cellular integrity and function in signal transduction and cellular differentiation. The closest gene (7 kb telomeric) is KRT8, which has a role in apoptosis (37). Other genes nearby include EIF4B, which plays an important role in the metabolism of RNA and is necessary for cell proliferation and survival (38), and TENC1, which is a focal adhesion molecule that regulates cell mobility. The nearby genes have yet to be implicated in PrCa.
It is notable that although our Stage 1 with 2782 cases is the largest GWAS of advanced PrCa to date and our replication sample contained an additional 2662 advanced PrCa cases, we identified no loci that were primarily associated with advanced PrCa. Moreover, the estimated odds ratios for previously identified risk alleles in our GWAS are similar to those from the original reports, even though our GWAS consisted of advanced PrCa cases and controls drawn from prospective cohort studies, while previous studies have included both advanced and non-advanced cases and controls, mostly ascertained retrospectively (Fig. 2). Since we had over 90% power to detect a marker solely associated with advanced PrCa with a minor allele frequency of 40% and a per-allele odds ratio of 1.18 (and over 50% power to detect an odds ratio of 1.13; Supplementary Methods Fig. S4), this suggests that—unlike breast cancer, where there are unique markers associated with estrogen-receptor positive and negative tumors (39)—there are currently no known common markers differentially associated with advanced or non-advanced PrCa. There are several possible reasons for this observation. It may be that the determinants of aggressive disease occur later in the pathogenesis of PrCa, and germline variants primarily play a role in the initiation of disease; or it may be that current definitions of aggressive disease, although clinically useful, do not reflect etiologic differences in prostate tumors; or it may be that the clinical heterogeneity in diagnosis may be greater than the small effect of any individual locus; or it may be that other distinctions among tumors not evaluated here (such as molecular tumor markers) are associated with genetic heterogeneity.
GWAS of PrCa have identified many regions in the genome harboring susceptibility alleles that confer risk for PrCa, but these studies have not conclusively identified regions that only confer risk of clinically relevant, advanced PrCa, as measured by advanced Gleason score or stage. The PrCa susceptibility loci identified to date may principally influence early stages of tumor development rather than disease progression based on current observations. Larger studies will be needed to identify germline variants with modest effects that influence Gleason score, stage and survival after diagnosis with PrCa.
The advanced PrCa cases included in Stage 1 were drawn from seven cohorts in the USA and Europe (Supplementary Material, Table S1 and Supplementary Methods). Cases were men of European ancestry identified through cancer registries or by self-report and confirmed through medical record review. Clinical characteristics were abstracted from medical records. Advanced PrCa was defined as having either high histologic grade (Gleason score ≥ 8) or extra-prostatic extension (Stage C/D). Controls were men without a diagnosis of PrCa and of European ancestry.
The Stage 2 in silico replication included subjects of European ancestry from two previously reported GWAS of PrCa (11–12,23,24) (Supplementary Methods). The first GWAS consisted of two stages, with cases and controls drawn from the UK (12) (UK1) and both the UK and Australia (11) (UK2+Aus), respectively. For the second, cases and controls were taken from the CAPS study (23). Non-advanced cases from the BPC3 that were genotyped as part of the Cancer Genetic Markers of Susceptibility GWAS (19) were also included in the overall PrCa analyses. To explore associations in other populations, non-European cases and controls were taken from the Multi-Ethnic Cohort, scanned as part of an ongoing GWAS of PrCa funded by the NHGRI's GENEVA program (Supplementary Methods).
Each of the participating studies obtained an informed consent from study participants and approval from their respective institutional review boards for this study.
All of the subjects from four BPC3 cohorts (1239 cases and 1188 controls), most of the subjects from a fifth (656 cases and 409 controls) and cases from the two remaining cohorts (249 and 437) were genotyped specifically for this study using Illumina Human 610-Quad Beadchips (Illumina, Inc). The remaining cases and controls were genotyped previously (Supplementary Methods). De novo genotyping was performed at four centers using genomic samples extracted from blood and buccal specimens. Cases and controls were randomly distributed across genotyping plates. Genotypes on the Illumina platform were called using GenCall algorithm as implemented in GenomeStudio version 2009.1 (Illumina, Inc.). GenCall quality metrics, sample heterozygosity, marker and sample completion rates, minor allele frequencies and departures from Hardy–Weinberg equilibrium were used to filter under-performing SNPs and samples; these filters removed SNPs with genotyping differences across arrays (Supplementary Methods). Subjects with evidence of significant non-European ancestry and population structure were also excluded (Supplementary Methods Fig. S2).
The UK Stage I samples were genotyped using the Illumina Human Hap550 platform, while the UK and Australian Stage II samples were genotyped using a custom Infinium iSelect. CAPS samples were genotyped using the Affymetrix 500k gene chip. Quality control procedures for these studies have been reported previously (11–12,23). The non-European cases and controls from the MEC were genotyped on the Illumina 610 platform.
In Stage 1, we tested associations between each SNP and advanced PrCa separately for each study using a 1-d.f. trend test from a logistic regression, adjusted for the second principal component of genetic covariance in the combined sample, which was the only component of the top 10 nominally significantly associated with PrCa (P= 0.01). For SNPs that were present in at least four studies, the overall estimate of the per-allele odds ratio and the corresponding association test were calculated using fixed-effect meta-analysis. The inflation in the Chi-squared statistic was assessed using the genomic control approach, scaled to a standard sample size of 1000 cases and controls (40).
Five thousand SNPs were chosen for in silico replication in Stage 2 after removing 53 known SNPs associated with PrCa and any correlated markers (r2 > 0.2). The remaining SNPs were ranked according to P-value and further pruned by removing SNPs in LD (r2 > 0.2) with top signals yielding 4976 SNPs. We performed an ancillary GWAS using 2 388 194 imputed SNPs (see Supplementary Methods) and identified 23 additional significant SNPs that were not correlated with any of the selected genotyped SNPs. We included these in the replication set, along with rs4054823, which is reportedly associated with advanced PrCa (41). A total of 4679 SNPs were genotyped or imputed in the replication studies. Logistic regression assuming a log-additive genetic model was performed in each study, using expected allele dosage to account for imputation uncertainty, first comparing all PrCa cases to controls, then comparing only advanced PrCa cases to controls. Fixed-effect meta-analysis was used to combine evidence for association across the replication studies and the Stage 1 studies. Genome-wide significant SNPs (P< 5.0 × 10−8) in known PrCa regions were further evaluated for independent associations by conditioning on known PrCa SNPs (Table 2). Case-only analyses were used to assess whether SNP effects differed for advanced versus non-advanced PrCa.
The BPC3 was supported by the U.S. National Institutes of Health, National Cancer Institute (cooperative agreements U01-CA98233 to D.J.H., U01-CA98710 to M.J.T., U01-CA98216 to E.R., and U01-CA98758 to B.E.H., and Intramural Research Program of NIH/National Cancer Institute, Division of Cancer Epidemiology and Genetics). This work was supported by Cancer Research UK Grants C5047/A7357, C1287/A10118, C5047/A3354, C5047/A10692, C16913/A6135, and C16913/A6835. We would also like to thank the following for funding support: The Institute of Cancer Research and The Everyman Campaign, The Prostate Cancer Research Foundation, Prostate Research Campaign UK, The National Cancer Research Network UK, The National Cancer Research Institute (NCRI) UK. We acknowledge NHS funding to the NIHR Biomedical Research Centre at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust. We also acknowledge grant support from The National Health and Medical Research Council, Australia (209057, 251533, 450104, 390130), VicHealth, The Cancer Council Victoria, Cancer Council Queensland, The Whitten Foundation, and Tattersall's. The ProtecT study is ongoing and is funded by the Health Technology Assessment Programme (projects 96/20/06, 96/20/99). The ProtecT trial and its linked ProMPT and CAP (Comparison Arm for ProtecT) studies are supported by Department of Health, England; Cancer Research UK grant number C522/A8649, Medical Research Council of England grant number G0500966, ID 75466 and The NCRI, UK. The epidemiological data for ProtecT were generated though funding from the Southwest National Health Service Research and Development. DNA extraction in ProtecT was supported by USA Dept of Defense award W81XWH-04-1-0280, Yorkshire Cancer Research and Cancer Research UK. The bio-repository from ProtecT is supported by the NCRI (ProMPT) study and the Cambridge BMRC grant from NIHR. Financial support for CAPS was provided through a grant from the Swedish Research Council (grant no K2010-70X-20430-04-3 and 70867901), the Swedish Cancer Foundation (grant no 09-0677), the Hedlund Foundation, Söderberg Foundation, Enqvist Foundation, ALF funds from the Stockholm County Council. W.B.I. was supported by National Cancer Institute grants CA112517 and CA133009.
The authors would like to thank all the study participants in BPC3, CAPS study and the UK and Australian studies. The authors would like to acknowledge the tremendous contribution of all members of the ProtecT study research group. We should like to acknowledge the NCRN nurses and Consultants for their work in the UKGPCS study.
Conflict of Interest statement. None declared.