We recently carried out an admixture scan in African Americans with prostate cancer
1, highlighting a 3.8-Mb region of chromosome 8 (125.68–129.48 Mb in build 35 of the reference sequence) as containing risk alleles that are highly differentiated in frequency between West Africans and European Americans ( and
Supplementary Table 1 online). Independently, another group
2 localized the same region via linkage analysis and identified specific variants in a region spanning from 128.54–128.62 Mb (denoted ‘region 1’) that were associated with increased risk of prostate cancer. We replicated the associations after genotyping the same variants in independent samples
1. However, our data and analyses indicated that the variants in region 1 are insufficient to explain the magnitude of the admixture signal in African Americans with prostate cancer.
To search for additional variants that might also contribute to risk at 8q24, we selected SNPs to capture common genetic variation across the admixture peak based on data from the International HapMap Project (see
Methods). We genotyped a total of 1,521 variants (including the alleles of microsatellite DG8S737) in 1,175 African American affected individuals with age at diagnosis <72 years and 837 African American controls (). We genotyped the same variants in 465 European American cases and 446 European American controls.
Analysis of these data identified a cluster of genetic variants that we denote ‘region 2’ in a span of linkage disequilibrium from 128.14–128.28 Mb. These variants are hundreds of kilobases away from the region 1 described in ref.
2, and the strongest single-SNP association is significant at
P = 6.5 × 10
−7 ( and
Supplementary Table 2 online). We followed up by genotyping the most associated SNPs in additional cases and controls from five populations: African Americans, Japanese Americans, Native Hawaiians, Latinos and European Americans (for a total sample size of 4,266 individuals with prostate cancer and 3,252 controls) (see
Methods and
Supplementary Table 3 online). Analysis of the data, correcting for the potentially confounding covariate of genome-wide ancestry proportion and local ancestry proportion in the African American, Native Hawaiian and Latino admixed populations (see
Methods and
Supplementary Methods online), further strengthened the evidence for association, with the strongest single-SNP association at rs16901979 (
P = 1.5 × 10
−18). The risk allele at this SNP is more common in West Africans (54%) than in European Americans (3%; frequencies are from HapMap), suggesting that variants in region 2 might contribute to the admixture signal at 8q24 we previously detected in African Americans
1.
To clarify the genetic risk for prostate cancer due to variants in regions 1 and 2, and to screen for additional prostate cancer risk variants within the admixture peak, we increased the number of samples and SNPs typed in all five populations (). In African Americans and European Americans, we increased the number of variants to 2,111. In Japanese Americans and Native Hawaiians, we carried out a new linkage disequilibrium scan across the admixture peak with the goal of capturing all common variation present in HapMap Japanese samples, genotyping 1,565 variants
3. In Latinos, we genotyped 275 variants focused on regions of highest interest. To choose SNPs for follow-up genotyping, we not only mined variation in the HapMap database but also used information from an effort to genotype previously uncharacterized genetic variation in the regions of highest interest. To discover new polymorphisms, we sequenced eight African American individuals with prostate cancer and eight African American controls over 282 kb and also sequenced the exonic regions of genes under the admixture peak (
Supplementary Methods); we then characterized these variants in samples from the HapMap West African, Japanese and European American populations (genotyping data for 547 newly characterized polymorphisms is provided in
Supplementary Table 4 online). Our genotyping in prostate cancer cases and controls successfully tagged a high proportion of common variation in HapMap samples across 8q24 (
Supplementary Fig. 1 online).
Analysis of these data further clarified the evidence for association in regions 1 and 2 () and showed evidence for a third region of association, which we denote ‘region 3’ and define as the linkage disequilibrium span from 128.47–128.54 Mb (). The SNP associations in region 3 were significant at rs7000448 (
P = 3.0 × 10
−7) and rs6983267 (
P = 1.6 × 10
−5). The association at rs6983267 was also seen in the Cancer Genetic Markers of Susceptibility (CGEMS) genome-wide prostate cancer scan in European Americans(
P = 2.4 × 10
−4); combining the two data sets together, the association at rs6983267 was significant at
P = 1.0 × 10
−7. Both SNPs were highly different in frequency between West Africans and European Americans (98% versus 46% in HapMap for rs6983267), suggesting a possible contribution to the admixture signal in African Americans
1. Association scores for the variants in each population separately are given in
Supplementary Table 2 and
Supplementary Figure 2 online.
Although we observed many strong signals of association, it was important to evaluate to what extent these were independent. We performed stepwise logistic regression, incorporating each SNP into the model based on the strength of association, and repeating the analysis of all other SNPs conditional on those already incorporated into the model. We applied this procedure for the variants that had been successfully typed in all five populations in the span 128.1–128.7 Mb () until none of the remaining ones were statistically significant after correcting for 186 hypotheses tested.
Notably, this procedure identified five SNPs with independent
P values from 7.9 × 10
−19 to 1.5 × 10
−4 (). After we controlled for the top five SNPs, none crossed the threshold of statistical significance correcting for 186 hypotheses tested (). Nevertheless, we considered two additional variants that achieve nominal significance after a single hypothesis test controlling for the top five SNPs. The allele DG8S737-8 (region 1;
P = 3.1 × 10
−8 uncorrected;
P = 0.0080 after correcting for the top five alleles) was previously shown to be significantly associated with prostate cancer risk after controlling for other variants in this region
2. We also believe that rs6983267 is likely to capture additional risk (
P = 2.3 × 10
−5 uncorrected;
P = 0.035 after correcting for the top five alleles), as it was highly differentiated in frequency between African Americans and European Americans () and could potentially contribute to the admixture signal. In African Americans alone, the significance of rs6983267 after controlling for the others was
P = 0.0031.
Supplementary Table 5 online provides details of the linkage disequilibrium and association patterns among the seven variants we selected for subsequent analysis.
| Table 2Independent contribution of seven alleles to 8q24 association |
The evidence for association at the seven variants is summarized in . (
Supplementary Table 6 online presents the same analysis restricted to the prospectively collected Multiethnic Cohort (MEC)). Mutual adjustment for risk variants is consistent with each independently contributing to risk, although the odds ratios were heterogeneous across populations for rs13254738, rs6983561 and rs10090154, suggesting that we may not yet have genotyped the true causal variants
2 or that gene-gene or gene-environment interactions may be different across populations. For each variant, we did not find any evidence for a departure from multiplicative effects per allele or epistatic interaction of the risk alleles within or across regions (
Supplementary Table 7 online). In African Americans, these seven variants were sufficient to account for our previously described signal of admixture association (
Supplementary Table 7).
We used these results to build a quantitative model of prostate cancer risk associated with different genotypes. To estimate the distribution of risk relative to noncarriers of any allele in each of the populations, we used the empirically observed distribution of genotypes at the seven variants in control samples from that population. There are combinations of these risk alleles that span a more than fivefold range of risk in many populations, with both extremes of risk common (>5% frequency) in some populations (). The population attributable risk (PAR)—the expected reduction in prostate cancer incidence if the risk alleles did not exist in the population—is 68% in African Americans, 60% in Japanese Americans, 45% in Native Hawaiians, 46% in Latinos and 32% in European Americans.
Finally, we tested for association of the seven risk alleles in the three regions with specific phenotypes of prostate cancer: age at diagnosis, family history of prostate cancer in a first-degree relative, stage at diagnosis and tumor grade (
Supplementary Table 8 online).When we considered all populations together, associations with all variants except rs6983267 and rs7000448 were nonsignificantly greater among younger affected individuals (that is, less than the median age of 68 years). For African Americans, the effects of rs13254738 and rs6983561 were significantly greater for those diagnosed at younger ages (
P = 0.02;
Supplementary Table 8), consistent with the stronger admixture signal that we observed previously among younger African American cases
1. The effect of rs6983561 was modestly greater among those without a first-degree family history (
P = 0.04) and among those with high-stage disease (
P = 0.02). We also detected an association of DG8S737-8 with tumor grade (Gleason score >7,
P = 0.04) providing some support for the previous finding that the genetic variants at 8q24 are associated with cellular differentiation in prostate cancer tumors
2. Findings from these stratified analyses will need to be replicated in other large studies.
What could explain the presence of independent sets of alleles at 8q24 that together contribute to prostate cancer risk in multiple populations but that do not lie in known genes? It is possible that there are multiple unknown prostate cancer susceptibility genes in 8q24, which by chance occur within a few hundred kilobases. More likely, the variants converge on a common biological mechanism, and these regions may independently influence the regulation of the same nearby cancer-causing gene (for example, the protooncogene
MYC). Somatic amplifications at 8q are one of the most common acquired events in prostate tumors
4,5, and a speculative model is that risk alleles make the entire region (including cancer-related genes like
MYC) prone to amplification.
These results are also notable from the point of view of gene mapping, demonstrating the great power that comes from mapping in multiethnic populations. Region 1 was originally localized in populations of European descent
2; we were alerted to regions 2 and 3 by an admixture and fine-mapping scan in African Americans and the Japanese provided the strongest statistical signals of association of any single population. In the initial identification
2 of risk variants in region 1, the PAR measured in European-derived populations was 8%. With the consideration of the seven variants we describe here, it increases to 32%. The effect of the locus on prostate cancer is even higher in non-European populations, with the PAR as high as 68% in African Americans (). The difference in PAR at this locus may contribute to the higher incidence rate of prostate cancer in African Americans than in European Americans, although further studies will be necessary to prove that genetic factors account for the epidemiological differences. The large effects of the alleles in multiple populations demonstrate the importance of 8q24 in prostate cancer. It should now be a priority to further elucidate the contribution of genetic variation at 8q24 to risk and to understand the biological mechanism by which these variants lead to cancer.