|Home | About | Journals | Submit | Contact Us | Français|
Genotyping of a 615 kb region within 8q24 with 49 haplotype-tagged single-nucleotide polymorphisms (SNPs) in 2109 samples (797 cases and 1312 controls) of two ethnic/racial groups found SNPs that are significantly associated with the risk for prostate cancer (PCa). The highest significance in Caucasian men was found for rs6983267; the AA genotype reduced the risk for PCa [odds ratio (OR)=0.48, 95% confidence interval (CI)=0.35–0.65, P=2.74×10−6]. This SNP also had a significant independent effect from other SNPs in the region in this group. In Hispanic men, rs7837328 and rs921146 showed independent effects (OR=2.55, 95% CI=1.51–4.31, P=4.33×10−4, OR=2.09, 95% CI=1.40–3.12, P=3.13×10−4, respectively). Significant synergist effects for increasing numbers of high-risk alleles were found in both ethnicities. Haplotype analysis revealed major haplotypes, containing the non-risk alleles, conferred protection against PCa. We found high linkage disequilibrium between significant SNPs within the region and SNPs within the CUB and Sushi Multiple Domains 1 gene (CSMD1), on the short arm of chromosome 8 in both ethnicities. These data suggest that multiple interacting SNPs within 8q24, as well as different regions on chromosome 8 far beyond this 8q24 candidate region, may confer increased risk of PCa. This is the first report to investigate the involvement of 8q24 variants in the susceptibility for PCa in Hispanic men.
The distal long arm of chromosome 8 has been implicated by genome-wide association studies in several cancers, including colorectal, prostate, ovarian and smoking-related carcinogenesis. Multiple independent studies have demonstrated compelling evidence that genetic variations in at least three regions of 8q24 independently influence the risk of prostate cancer (PCa) [meta-analysis by Cheng et al. (1)]. Some of the associations within 8q24 were found to be population specific, in particular the most centromeric region appears to play a more significant role in non-European populations (2). Associations of 8q24 variants with aggressive PCa and/or increased tumor grade have been reported but are yet to be confirmed (1).
Despite the evidence of the importance of the 8q24 region in PCa risk, none of the associated single-nucleotide polymorphisms (SNPs) appear to cause functional changes. Furthermore, the region is known to be gene poor and so far, only one gene has been reported to be located in the associated region, the pseudogene POU5F1P1, a retrotransposed copy of the POU-domain transcription factor gene POU5F1 (3). POU5F1, located at 6p21, has been shown to promote tumor growth and to play a role in maintaining stem cell pluripotency, self-renewal and chromatin structure (4,5). An increased number of cells expressing Oct4A, a splice variant of the POU5F1 (Oct3/4) gene, have been found in PCa (6). This 8q24 region also contains several annotated genes: the family with sequence similarity 84 member B gene (FAM84B), the oncogene MYC and the transmembrane protein 75 (TMEM75). FAM84B (alias BCMP101) is involved in the formation of DNA repair complex (7,8) and is overexpressed in breast cancer (9). The MYC proto-oncogene regulates expression of numerous target genes that control key cellular functions, including cell growth and cell cycle progression. MYC also has a critical role in DNA replication. Deregulated MYC expression resulting from various types of genetic alterations leads to constitutive MYC activity in several cancers and promotes oncogenesis (10). However, studies suggest that the 8q24 risk alleles do not affect MYC expression (11,12).
Since the biological mechanisms underlying the 8q24 associations with cancer remains unclear and the fact that 8q24 is the most frequently gained chromosomal region in prostate tumors (13), the risk variants may predispose to PCa through either increased genome instability and/or could be markers for the true causal factors. Indeed, it has been shown that linkage disequilibrium (LD) stretches far beyond the interval of 8q24 where associations have been shown (14); as such, other regions of chromosome 8 might contain a causal variants and/or underlying genes. It has been shown that gain of 8q is often accompanied by the allelic loss of 8p in PCa (15,16). Begley et al. (17) further showed that transcription of genes in 8p or 8q were downregulated in cells hemizygyous for 8p and upregulated in cells carrying three copies of 8q, respectively. In addition, deletions within 8p are the most common deletion event in the genome of prostate tumors (18). Sun et al. (16) found that 30% of PCas had a deletion at 8p21.3 and that the deleted region spans a large interval extending into 8p23.3 and 8p21.1 and contains many genes.
We genotyped 49 tagged SNPs and one microsatellite marker DG8S737 covering 615 kb of the 8q24 region previously shown to be involved in PCa risk in 2109 samples (797 cases and 1312 controls) of non-Hispanic Caucasian (Caucasians) or Hispanic Caucasian (Hispanics) origin. Our goals were to confirm previous associations and determine population specificity, in particular in Hispanics who have not been analyzed for this region. We also determined regions on chromosome 8p that are in LD with the region under study.
Study subjects included men in the San Antonio Center for Biomarkers of Risk of Prostate Cancer (SABOR) cohort. SABOR is funded by the National Cancer Institute and has been prospectively enrolling healthy male volunteers since 2001. On each annual visit, a digital rectal examination was performed and serum prostate-specific antigen level was determined. From this cohort, 197 incident cases (136 Caucasians and 61 Hispanics) were available. We also included 600 cases with a known history of PCa that are enrolled within the same time period in a parallel study of prevalent PCa. Institutional review board approval was obtained and informed consent from subjects in both studies. Cases had biopsy-confirmed PCa and controls consisted of male volunteers of at least 45 years old who had normal digital rectal examination and prostate-specific antigen level ≤2.5 ng/ml on all study visits. Race/ethnicity was self-reported on a questionnaire completed at the time of enrollment. A total of 1441 Caucasians (601 cases and 840 controls) and 668 Hispanics (196 cases and 472 controls) were included in this analysis. Clinical characteristics of subjects are summarized in Table I. Study age among controls was the age at last follow-up, and age among cases was the age at PCa diagnosis; controls were younger than PCa cases with a mean age (standard deviation, SD) of 60.9 (8.8) years and 66.0 (8.3) years, respectively (P<0.0001).
DNA was isolated from whole-blood cells using a QIAamp DNA Blood Maxi Kit (QIAGEN, Valencia, CA). Forty-nine tagged SNPs covering the region 127992902–128608542 bp on chromosome 8 were selected using Haploview with the following criteria: (i) a minor allele frequency >0.05 in order to gain more statistical power; (ii) an r2 threshold of 0.8 and a log of odds (LOD) threshold for multimarker testing of 3.0; (iii) a minimum distance between tags of 60 basepairs; (iv) SNPs for which an association with PCa has been reported were included and (v) we used the 2- and 3-marker haplotype tagging option (http://www.broad.mit.edu/mpg/haploview/). The selection was based on the information on the European population as provided by HapMap retrieved from NCBI dbSNP Build 127 (www.hapmap.org). Genotyping was performed with the Golden Gate assay of the VeraCode technology using the BeadXpress Reader System according to the manufacturer's protocol (Illumina, San Diego, CA). Primers and probe sequences are available upon request. For the microsatellite marker DG8S737, the primers were described by Amundadottir et al. (19) and genotyping was performed as described previously by Wang et al. (20). To ensure reliability of the results, duplicate samples and/or known genotyped samples were included in the analysis as quality controls.
Haploview version 4 beta 15 was used to check for Hardy–Weinberg equilibrium (HWE) for each SNP and to measure LD between the SNPs within the studied region for each race/ethnicity [(21), http://www.broad.mit.edu/mpg/haploview/].
The allele frequency for each SNP was determined in each ethnic group and the frequencies among the case–control groups were compared using the chi-square test. Association analyses were performed using R statistical software version 2.8.1 and were stratified by ethnicity. The odds ratio (OR) and its 95% confidence interval (CI) were estimated by unconditional logistic regression as a measure of the associations between genotypes and PCa risk. Associations by Gleason grade (Gleason score ≥7 versus <7) or prognosis (defined as Gleason score of 7 or higher or stage T3b or higher) were examined by logistic regression in case-only analyses. We used additive, dominant and recessive models in the test analysis and only considered results with a minimum of five individuals for a specific model. For the microsatellite DG8S737, we performed both the −8 allelic association (comparing allele −8 versus all other alleles) as well as a Wald test in which the allele count of all alleles were considered. To correct for multiple testing, we used the method of Storey et al. (22) based on the concept of false discovery rate. This estimation showed that for P<0.03, the probability that the association is a true positive is >90% in the whole sample group. To test for the independent effect of a significant SNP while adjusting for other SNPs, we used a generalized linear model function from the R statistical package for which each SNP selected is entered into a single multivariable logistic regression model. SNPs in the model were taken to have additive effects.
The cumulative effect of combined genotypes on PCa risk was estimated by counting the number of genotypes associated with PCa, on the basis of the best-fitting genetic inheritance from single SNP analysis. ORs and their 95% CIs were calculated for men carrying any combination of one, two or more alleles associated with PCa as compared with men carrying none of the risk alleles using the unconditional logistic regression analysis. We selected SNPs that were not in LD with each other (D′<0.8). If several SNPs presented higher LD values, we selected the most significant SNP.
Logistic regression was used to calculate the ORs of the haplotypes, using the method implemented in the haplo.ccs package (23). Only major haplotypes, with an estimated frequency of >5%, are considered in this report. The model was fit for each major haplotype so that the OR of each major haplotype was computed relative to a reference group consisting of all other haplotypes including rare haplotypes. Three genetic models (additive, dominant and recessive) were tested.
For all statistical analyses, age was used as covariate. Individuals with missing data for a particular analysis were removed from the analysis. All statistical tests were two sided and significance was set at P<0.05. To test whether possible associations were true positives or due to confounding/admixture association in Hispanics, we adjusted for the proportion of ancestry based on the genotyping results of 64 ancestry informative markers (J.Beuten, I.Halder, K.S.Weldon, R.J.Leach, I.M.Thompson, M.Stern, D.M.Lehman, in preparation).
To measure LD between the markers across whole chromosome 8, we used the SNPMatrix tool in R [http://www.bioconductor.org/packages/2.3/bioc/html/snpMatrix.html, (24)]. Genotypes and frequency information for phases 1 and 2 of HapMap chromosome 8 data were downloaded from NCBI build 36 (dbSNP b126) coordinates for the European and Mexican populations (http://ftp.hapmap.org/genotypes/2008-10_phaseII/fwd_strand/non-redundant/). The 8q24 region under study was used as reference interval to which the remainder of chromosome 8 was analyzed for LD. A cutoff of LOD>3 was used to select regions on chromosome 8 in LD with the 8q24 region under study. For each ethnicity, we determined the LD across chromosome 8 for the SNPs that we found to be significant after correction for multiple testing from the single SNP analysis. (Note: rs7013278 and rs10094059 were not present in the HapMap data of Europeans, and rs7013278 was not present in the HapMap data of Mexicans.)
A hypothetical function for SNPs that were significant for single SNP analysis after corrections for multiple testing was assessed using in silico analysis of transcription factor-binding sites: both possible alleles of each SNP were tested for their binding capability to human transcription factors using the web tool ‘transcription element search system’ [(25), http://www.cbil.upenn.edu/cgi-bin/tess/tess]. Options employed were 21 bases of genomic sequence around each SNP (10 bases on either side of the SNP) and string-based search query with default settings. A log-likelihood score of >16 for a pretty good match (deficit 1.0 or less) and >18 for a mismatch (deficit >1.0) was used as cutoff value for reporting.
Forty-nine SNPs were genotyped in 2380 samples. All SNPs were in HWE (P>0.01) in the controls of each ethnicity/race, except for rs7825118 and rs7017671 that showed deviation from HWE in Caucasian and Hispanic controls, respectively. Although the error rate was <0.2%, SNPs that were not in HWE were omitted for further statistical analyses in the respective study groups. Table II displays minor allele frequencies of the SNPs estimated in both ethnicities. Significant case–control differences of allele frequencies at a level <0.05 were observed for 16 polymorphisms in Caucasians and for 14 SNPs within Hispanics.
When analyzed individually, 16 SNPs in Caucasians and 12 SNPs in Hispanics were significantly associated with PCa risk at the P<0.05 level (supplementary Table I is available at Carcinogenesis Online). After correction for multiple testing, 14 SNPs remained significant in Caucasians (P values between 0.03–2.74×10−6). The most significant result was obtained for rs6983267 for which the AA genotype reduces the risk for PCa (OR=0.48, 95% CI=0.35–0.65, P=2.74×10−6). In Hispanics, 11 SNPs remained significant after correction for multiple testing (P values 0.028–1.84×10−4). SNP rs921146 was more significantly associated with PCa risk than rs6983267 in this ethnic group and carriers of the CC genotype have a 3.84-fold increase in risk (95% CI=1.17–12.61, P=0.026; Supplementary Table I is available at Carcinogenesis Online). Although less significant, the OR was even higher for rs4871799 (OR=3.06, 95% CI=1.57–9.94, P=0.003). Including the proportion of Native American ancestral background as covariate in the Hispanic sample did not show a difference in outcome with results that were not conditioned on this variable. Significance for SNPs rs11985829, rs6983267, rs7013278, rs7837328, rs4871022 and rs1447293 was found in both Caucasians and Hispanics. Figure 1 shows a diagram of the single SNP results (panel A) as well as the LD across the region (panel B) for each ethnic/racial group. A plot of the LD of the significant SNPs after false discovery rate is shown in panel C.
Comparing associations of Gleason grade ≥7 versus Gleason grade <7 among PCa cases, trends toward greater significance for higher Gleason grade in Caucasians and lower Gleason grade in Hispanics were found, but data were not consistent for all SNPs (supplementary Table II is available at Carcinogenesis Online). A similar finding was seen when looking at prognosis among the PCa cases where a slight increase in significance in the Caucasians and a slight decrease in significance in Hispanics for bad prognosis was observed. Those outcomes were again not consistent for all SNPs and must be interpreted with caution due to small sample sizes (supplementary Table II is available at Carcinogenesis Online).
After conditioning on other significant SNPs not in LD with each other and thus testing whether the statistically significant associations with PCa were independent in our groups, rs6983267 showed a main effect independent of other significant SNPs in Caucasians (P=0.0004), and both rs7837328 (P=0.009) and rs921146 showed significant independent associations in Hispanics (P=0.0009 and P =0.003, respectively; data not shown).
To evaluate cumulative effects of the risk alleles defined in the single SNP analysis, we performed an age-adjusted multivariate logistic regression on combinations of risk alleles compared with the combination with no risk allele as reference. In Caucasians, the combination of the five risk genotypes, not in LD with each other, showed a significant association with PCa (Ptrend=3.6×10−8) and a 3.18-fold increase in risk (95% CI=2.11–4.79) was observed for men with all five risk alleles as compared with men without any risk alleles (Table III). A similar observation was found in the Hispanics, where the significant association between the combination of the risk alleles of three SNPs and PCa increases the risk significantly (OR=4.98, 95% CI=2.52–9.85, Ptrend=3.84×10−6).
SNPs that were found to be significantly associated with PCa risk after correction for multiple testing and were not in LD with each other were included in haplotypes to examine the joint effect of variant alleles on risk in our population; these are the same SNPs as used for analyzing the cumulative effect. A major haplotype (28%) A-G-G-G-A for the SNPs rs6983267-rs6985419-rs7357486-rs10109622-rs1447293 showed a significant decrease in risk for PCa (OR=0.68, 95% CI=0.58–0.81, P=1.65×10−5; Table IV) in the Caucasians under the additive model. All alleles of the haplotype are the respective risk alleles as found in the single SNP analysis. In Hispanics, the major haplotype G-A-G (43%) for rs7837328-rs921146-rs6981321 is significantly associated with disease risk with an OR of 0.71 (95% CI=0.55–0.91, P=0.007) under the additive model and carries all three alleles associated with decreased risk in single SNP analysis.
To investigate possible functional implications of the significant SNPs, a search for transcription factor-binding sites using transcription element search system in non-coding genomic regions was performed. Two of 14 SNPs, rs7837328 and rs10094059 that show significant association with PCa in Caucasians, and three significant SNPs, rs7837328, rs921146 and rs6981321, in Hispanics have changes in transcription factor-binding properties related to allelic alterations (data not shown). The presence of the G allele in rs7837328 creates a binding site for the repressor of the Interferon (IFN)-beta gene (26). The IFN-regulatory factor 1-binding site, which is a positive regulator of IFN-beta and IFN-induced genes, is present for the G allele in rs10094059 (27,28). A binding site for FOXJ2, a transcriptional activator, is present for the A allele in rs921146 (29). SNP rs6981321 has an albumin negative factor-binding site, which is a negative regulator, for the C allele (30).
The pairwise LD calculations from the SNPmatrix tool in R indicated that several of the significant SNPs (after false discovery rate correction) within the 8q24 region studied in this report were in high LD (LOD>3) with SNPs located within the CUB and Sushi Multiple Domains 1 gene (CSMD1) in both Caucasians and Hispanics. This gene is located on the short arm of chromosome 8, within the 8p23 region. In particular, rs11985829, rs6985419, rs1447293 and rs6981424 were in high LD with 10 SNPs within the CSMD1 in Caucasians, and rs11985829, rs10956372 and rs6981321 were in high LD with nine SNPs within CSMD1 in Hispanics (Table V).
We also found high LD (LOD>3) between significant SNPs and SNPs within the β-defensin-1gene (DEFB1) located at 8p23 in Caucasians, within the CUB and Sushi multiple domains 3 gene (CSMD3) gene at 8q23 in Hispanics and within the pleckstrin and Sec7 domain containing 3 gene (PSD3) located at 8p22 in both Caucasians and Hispanics. High LD (LOD>5) was also noticed between 8q24 SNPs and SNPs within the TMEM75 gene downstream of the region investigated in this study. In Caucasians, the highest LD was found for rs11985829 with rs7825794 and rs16903109 (LOD score of 5.36). A LOD score of 5.37 was found for LD between rs6981321 and rs2720672, with the latter located in TMEM75, in the Hispanics (data not shown).
Genotyping of a 615 kb region within 8q24 with 49 haplotype-tagged SNPs in 2109 samples (797 cases and 1312 controls) of two ethnic/racial groups found SNPs that are significantly associated with the risk for PCa. The highest significance in Caucasian men was found for rs6983267 for which the A allele is associated with a decreased risk for PCa. This SNP shows an independent effect from other significant SNPs in the studied region in this ethnic group. SNP rs6983267 is also significantly associated with a decreased risk for PCa in Hispanic men. However, rs7837328 and rs921146 reached higher significance and both SNPs showed independent effects, which was not found for rs6983267 in this ethnic group.
These results are consistent with previous reports showing significant association of rs6983267 in different ethnicities, including European American, Asian Indian and Japanese (31–35). In addition to confirming previous findings, these data now show this marker to play a role in the susceptibility of PCa in Hispanic men; to our knowledge, this is the first report of the involvement of this SNP in this ethnic group.
SNP rs6983267 has been shown to be associated with high Gleason and advanced PCa (1,34), albeit with inconsistent outcomes. Our results suggest that there might be a more pronounced effect of rs6983267 in Caucasian and Hispanic cases when measuring Gleason grade or prognosis. We found a trend toward increased risk of high Gleason grade and poor prognosis in Caucasians, whereas the opposite (decreased risk) was found in Hispanics. These data should be cautiously interpreted due to the small sample sizes; and thus, the association of rs6983267 with clinical features of PCa remains an open question.
One of two SNPs, rs7837328, that showed independent effect in Hispanics, has been shown to confer susceptibility of PCa in Europeans (32). However, rs921146 to date has not been reported to be significantly association with PCa risk. Of interest is that this marker has a change in transcription binding for FOXJ2, a transcriptional activator. Further studies may identify the effect of this allele-specific change and its contribution to prostate carcinogenesis.
The frequency of the A allele of rs1447295, previously found to be a risk allele for PCa, among the Caucasian cases and controls in this study (11.6 and 9.3%, respectively) were similar to those reported by Amundadottir et al. (19). The allele frequency distribution of the microsatellite marker DG8S737 was similar to that found in previous reports (19,20,36). While the frequencies were consistent with previous reports, neither the −8 allele at microsatellite marker DG8S737 nor SNP rs1447295 was significantly associated with PCa risk in any of our ethnic groups.
Analysis of the significant SNPs in each ethnic group that are not in LD with each other further indicate a significant synergist effect for increasing numbers of potential high-risk genotypes in both ethnicities. In Caucasians, a >3-fold increase in the risk of PCa (OR= 3.18, P=3.6×10−8) was found for carriers of all risk alleles of the five significant SNPs (rs6983267, rs6985419, rs7357486, rs10109622 and rs1447293). Carriers of all three risk alleles for rs7837328, rs921146 and rs6981321 had a 4.98 increased of PCa in Hispanics (P=3.84×10−6). Analysis of the significant single SNPs in each ethnic group revealed major haplotypes containing the non-risk alleles, which were significantly protective against PCa. Both findings indicate that multiple interacting SNPs within 8q24 most probably confer increased risk of PCa.
The significant findings within 8q24 together with the fact that this region is gene poor has led many to speculate about hypotheses related to this region's involvement in PCa risk (2,14). The results from SNPmatrix of the LD across chromosome 8 in this report emphasize the importance of a possible involvement of other regions far beyond the candidate 8q24 region. Indeed, we found that several SNPs in this region are in high LD with SNPs on the short arm of chromosome 8, a region often deleted in PCa (16,18,37). Of interest is that four SNPs found significant in Caucasians and three significant SNPs in Hispanics, showed high LD (LOD>3) with SNPs located within the CSMD1 gene on chromosome 8p23 when using ethnic-specific genotype data from HapMap. The CSMD1 gene, which contains multiple CUB and Sushi domains, encodes a large, type I transmembrane protein located on the surfaces of neuronal and epithelial cells; this protein's function is unknown but is thought to participate in cell migration (38). No association studies between CSMD1 variants and risk of PCa have been reported so far. The gene has been shown to extend into the minimal regions of deletions within 8p23 and has been suggested as a candidate for a suppressor of multiple types of cancer (39,40). Decreased expression of CSMD1 is associated with advanced PCa (39). Both the sequence of the gene and the organization of the protein are highly conserved in the mouse. In light of these observations, further studies are warranted to clarify this interaction of CSMD1 with the 8q24 region.
Of note is that SNPs in other genes across chromosome 8 were found that showed high LD (LOD>3) with those studied in this report. In particular, SNPs within the β-defensin-1 gene (DEFB1) located at 8p23 in Caucasians, within the pleckstrin and Sec7 domain containing 3 gene (PSD3) located at 8p22 in both Caucasians and Hispanics and the CSMD3 gene at 8q23 in Hispanics were in strong LD with 8q24 SNPs. Cancer-specific loss of DEFB1, which plays an important role in the innate and adaptive immune response, has been found in prostatic carcinomas and is suggested to play a specific role in tumor suppression of advanced PCa via a pathway involving cMYC and PAX2 (41,42). A meta-analysis in breast cancer metastasis indicated that the PSD3 is significantly downregulated (43). The CSMD3 gene has previously been shown to have upregulated expression associated with chromosomal gains (44). Consistent with the findings of Camp et al. (14), LD was also found between the 8q24 region under study and the more telomeric located gene TMEM75.
A limitation of our study is that selection of the 49 tagged SNPs, covering a 615 kb region within the 8q24 candidate region, was based on HapMap data of the European population. Due to the ethnic-specific LD patterns, we might have missed some non-tagged SNPs in Hispanics and therefore the SNPs selected in the study may not fully represent all tagged variants in this ethnic group. In addition, our selection of SNPs differs from other reports and thus a complete comparison with several previously reported SNPs was not possible. However, it is clear that even with these weaknesses, our findings, coupled with data from previous reports, indicate that variants within 8q24 play a significant role in the susceptibility to PCa risk.
In summary, our results support the hypothesis that variants in the 8q24 region play an important role in susceptibility of PCa risk in several ethnic groups. This study is the first to confirm the importance of this region in Hispanic men. Our findings further suggest that multiple interacting SNPs within 8q24 but also in different regions on chromosome 8 far beyond this 8q24 candidate region most probably confer increased risk of PCa.
Early Detection Research Network of the National Cancer Institute (U01 CA086402); American Cancer Society (TURSG-03-152-01-CCE), entitled ‘The Role of Genetic Variation in Prostate Cancer among Hispanics and Blacks’; Cancer Support Grant from the NCI number (P30 CA54174).
The participation of all study subjects in SABOR and in the prevalent prostate cancer studies at the University of Texas Health Science Center at San Antonio is gratefully acknowledged. The study could not have been accomplished without the skilled assistance of the SABOR clinical staff.
Conflict of Interest Statement: None declared.