Prostate cancer genetics is relatively advanced in that there have been several large GWAS and multiple large-scale replication studies published to date—however the majority of these studies were of men of European descent[5
]. To validate previous GWAS findings and identify additional SNPs associated with PCaP, we selected and analyzed 800 SNPs with the assistance of SNPinfo web tools[18
]. This panel included 32 replicated SNPs representing 21 distinct chromosomal regions reported by previous GWAS, and 35 flanking SNPs in these regions. The genotypes for AA and EA CaP cases were compared to iControlDB controls – a publically available dataset of individual genotypes from selected racial groups that has been established for use in genetic association studies. This database has been used in 19 published association studies and shown to produce results that are comparable to those reported in matched case-control analyses (see supplementary text
for the list of related peer reviewed publications). Similar to the methods employed by other studies, we controlled for population stratification by removing outliers based on ancestry proportion estimates from STRUCTURE analysis. Both PCaP AA and EA cases are genetically well-matched to iControlDB AA and EA controls (Supplementary Figure 1
). Although use of iControlDB controls has been established in multiple publications, these men were not explicitly screened for prostate cancer and thus may harbor undetected disease. Such misclassification of controls would be expected to lead to a slight bias toward the null and could reduce the number of GWAS hits confirmed in this case-control association analysis.
Given that 32 SNPs in our panel had already been established by previous GWAS studies, we used the 0.05 significance level when testing for association. Nearly half of the 32 SNPs achieved nominal significance at P
= 0.05 level in EA men. Despite AA men having a higher incidence of prostate cancer, no CaP GWAS of AA has been published to-date, and AA men have been underrepresented in replication studies. There have been a total of 5 replication studies examining European GWAS hits that have included African Americans [8
]. These studies collectively examined 24 of the 32 replicated SNPs (Supplementary Table 2
) surveyed in our panel and reported 6 SNPs (rs2660753 chr 3, rs6983267 chr 8, rs10896449 chr 11, rs4430796 chr 17, rs2735839 chr 19 and rs 5945572 chr X) that showed significant evidence of CaP association in at least one study of AA. In our study, 4 of the 32 SNPs demonstrated CaP associations in AA, including one (rs6983267 on chr 8) of the 6 SNPs previously reported. The remaining 3 SNPs (rs 7017300 chr 8, rs1859962 chr 17 and rs6501455 chr 17) are here identified as risk factors for CaP in AA for the first time. Thus, there are now a total of 9 SNPs that have been associated with CaP in AA.
The lack of confirmation in AA for many of the 32 European-based GWAS and the inconsistency of associated genetic variants identified in AA populations may be explained in part by the relatively small number of studies of AA reported to date as well as the relatively small sample size within each study. But more importantly, the lack of association may be related to the fact that LD structure often differs between EA and AA populations and GWAS hits are typically marker SNPs in linkage disequilibrium (LD) with causal alleles. Therefore, differences in LD structure between EA and AA populations may diminish the strength of associations when European-based marker SNPs are applied to AA populations. Thus, even if EA and AA share common causal alleles, the set of marker SNPs that show strong association in EA may show little or no association in AA. It is interesting to note that in our analysis we found an unexpected pattern: SNPs with discordant risk alleles between previous EA GWAS and PCaP AA tend to have large association P values and SNPs with concordant risk alleles tend to have small association P values. We demonstrate through simulation that when there is no difference in allele frequency between cases and controls, associations with discordant risk alleles will produce the expected random distribution of P values. However when allele frequencies differ between cases and controls (indicating an association with disease), associations with discordant risk alleles will be skewed toward high P values. Thus, although we only observe associations between CaP and 4 SNPs among AA, the distribution of discordant risk alleles may suggest that this set of 32 SNPs define loci important for CaP risk in AA.
Given the different LD structure of the AA population we also included a set of 35 SNPs adjacent to the 32 GWAS SNPs. Two SNPs reached study-wide significance in AA and one in EA, and all came from different subregions of 8q24. These 3 flanking SNPs produce a stronger signal than their corresponding replication SNPs (which were also significant), however, the flanking and replication SNPs are not in strong LD. Thus, these flanking SNPs may provide additional information for fine mapping of causal alleles in these chromosomal regions.
In addition to examining GWAS hits and related flanking SNPs, additional CaP SNPs were sought using our web-based SNP selection software tool SNPinfo[18
]. This program allows researchers to combine GWAS information with linkage and functional data along with population-specific LD information for SNP selection. In constructing our panel, 5 SNPs (rs11649743, rs4857841, rs12543663, rs8102476 and rs620861) were included in 5 chromosome regions that were later reported as GWAS hits[10
], – thus highlighting the utility of our selection approach. However, 2 of these SNPs (rs8102476 and rs620861) were subsequently excluded because of poor Illumina design scores. We found two SNPs that reached study-wide significance, one in EA (rs1472606) and one in AA (rs9351265). SNP rs1472606 is located on chromosome 5q35 within a reported copy number variant [29
] about 90 kb from the transcription start site of HRH2
, a G-coupled histamine receptor gene. This SNP was previously demonstrated to have strong evidence of linkage in 606 CaP families with early age at diagnosis (≤ 65 years) [30
]. To our knowledge, this is the first population-based study to identify this SNP as having a strong association with CaP risk, although the CGEMS GWAS showed some signal for this SNP as well (P
= 0.0016, rank = 1098) [6
]. The second SNP, rs9351265 is located at chromosome 6q16.1 in a gene-poor region 800 kb upstream from the transcription start site of MAP3K7
. Although not previously examined in AA, both the CGEMS follow-up study and Thomas, et. al. [7
], also found evidence of association with CaP in Europeans (CGEMS p=0.00067, rank = 135; Thomas et al p< 0.001 rank 184). In addition, Liu, et al., [31
] found a deletion 820kb from rs9351265 associated with high-grade prostate cancers.
There has been growing interest in the use of genetic profiles for personalized medicine. Existing genetic panels are being marketed for prediction of disease risk, although the predictive power of many of these have yet to be clearly demonstrated [32
]. For CaP, Zheng, et. al., [33
] suggested that an individual's allele counts for 5 SNPs correlated with increasing risk in Swedish men. In a subsequent study of US men, Salinas, et. al., confirmed that these 5 SNPs were significantly associated with risk, but the ROC curves obtained using clinical variables (AUC = 0.63) were not improved by inclusion of SNP information (AUC = 0.66)[34
]. It has been suggested that an AUC > 0.75 may provide an appropriate threshold for screening tools in high risk populations, while an AUC > 0.99 may be required for general population screening [32
]. In our study using a much larger panel of 32 SNPs whose association with prostate cancer had been established a priori
in previous GWAS replication studies, risk allele counts differed significantly between cases and controls (Europeans P
= 1.7 × 10−11
). Despite the profound difference in allele counts, the ROC curve analysis of our data shows poor discriminatory power for both EA (AUC=0.60; 95% CI: 0.57-0.63) and AA (AUC= 0.56; 95% CI: 0.53-0.60). In part, this may be due to disease heterogeneity, if multiple subtypes of CaP have distinct genetic and environmental risk determinants. In addition, genetic heterogeneity is likely in CaP, meaning that different variations in the same gene, or variations across multiple genes (a number of which have yet to be identified), may also contribute to genetic susceptibility. While finer mapping may provide better characterization of causative SNPs and improve the clinical utility of CaP genetic panels, it is clear that our current panel is not adequate for general clinical use, even among high risk individuals.