|Home | About | Journals | Submit | Contact Us | Français|
Growing evidence suggests that single nucleotide polymorphisms (SNPs) in nucleotide excision repair (NER) pathway genes play an important role in bladder cancer etiology. However, only a limited number of genes and variations in this pathway have been evaluated to date.
In this study, we applied a comprehensive pathway-based approach to assess the effects of 207 tagging and potentially functional SNPs in 26 NER genes on bladder cancer risk using a large case-control study consisting of 803 bladder cancer cases and 803 controls.
A total of 17 SNPs were significantly associated with altered bladder cancer risk at P<0.05, of which 7 SNPs retained noteworthiness after assessed by a Bayesian approach for the probability of false discovery. The most noteworthy SNP was rs11132186 in ING2 gene. Compared to the major allele-containing genotypes, the odds ratio (OR) was 0.52 (95% confidence interval [CI] 0.32–0.83, P = 0.005) for the homozygous variant genotype. Three additional ING2 variants also exhibited significant associations with bladder cancer risk. Significant gene-smoking interactions were observed for three of the top 17 SNPs. Furthermore, through an exploratory classification and regression tree (CART) analysis, we identified potential gene-gene interactions.
We conducted a large association study of NER pathway with bladder cancer risk and identified several novel predisposition variants. We identified potential gene-gene and gene-environment interactions in modulating bladder cancer risk. Our results reinforce the importance of a comprehensive pathway-focused and tagging SNP-based candidate gene approach to identify low-penetrance cancer susceptibility loci.
Bladder cancer ranks ninth in worldwide cancer incidence. It is the seventh most common cancer in men and seventeenth in women1. In the United States, it is the fifth most common cancer, fourth in men and eleventh in women2. Environmental exposures account for the majority of bladder cancer cases. For example, around half of bladder cancer incidence in men and one third in women are due to tobacco smoking. In addition, occupational exposures to aromatic amines and polycyclic aromatic hydrocarbons, and polluted drinking water containing arsenic and chlorination by-product also significantly contribute to the development of bladder cancer3. Other environmental risk factors such as dietary factors, hair dye usage, artificial sweeteners, and phenacetin-containing analgesic drugs have also been reported, although the associations have not been consistent across different studies3.
Although the molecular mechanisms underlying these bladder cancer etiological factors are not fully understood, it is widely recognized that environmental carcinogens induce DNA damages that lead to genomic instability. Tobacco carcinogens mainly induce bulky DNA adducts and the nucleotide excision repair (NER) is the major cellular pathway to repair bulky DNA adducts. Other cellular DNA repair pathways, such as base excision repair (BER) and double-strand break repair (DSBR), also play important roles in the prevention of bladder carcinogenesis through repairing single strand and double strand DNA breaks caused by smoking, reactive oxygen species (ROS), ionizing radiation, and other DNA damaging agents 4–7. In bladder cancer, NER pathway is one of the most commonly studied pathways that repairs bulky DNA lesions, such as pyrimidine dimers, photo-products, larger chemical adducts and cross-links. NER pathway is also critical to the maintenance of genomic stability. There are two types of NER processes in human cells: global genomic repair (GGR) and transcription-coupled repair (TCR). Both processes have four major steps: (1) recognition of DNA lesions by a complex of interactive proteins, including XPC-RAD23B, XPA and RPA in GGR, or ERCC6 and Cockayne syndrome type A (CSTA) in TCR; (2) unwinding of DNA strands within the region of lesions by transcription factor IIH (TFIIH) complex that includes XPD and XPB proteins; (3) elimination of damaged DNA fragments by a protein complex including ERCC1, XPF, and XPG; and (4) synthesis of new DNA strands by various DNA polymerases 8. Defects in these critical genes have been frequently identified in many cancers including bladder cancer 9, 10.
Single nucleotide polymorphisms (SNPs) of NER pathway genes have been implicated in bladder cancer etiology 9. For example, we and others have previously reported that SNPs in major NER genes such as XPC, XPD, CCNH, RAD23B genes were significantly associated with bladder cancer risk 11–15. However, these studies mostly took a limited candidate gene approach that evaluates a small number of genes and potential functional SNPs. In this study, we performed a comprehensive pathway-based study to assess the effects of 207 haplotype-tagging and potentially functional SNPs in 26 major genes in the NER pathway on bladder cancer risk in a total of 803 bladder cancer patients and 803 healthy controls. We then conducted a series of exploratory analyses to evaluate the cumulative effects and interaction of these variants on bladder cancer risk.
The study participants were derived from an ongoing hospital-based bladder cancer case-control study. The cases were newly diagnosed, histologically confirmed, and previously untreated bladder cancer patients recruited at The University of Texas MD Anderson Cancer Center and Baylor College of Medicine between 1999 and 2007. There were no age, gender, ethnicity, and cancer-stage restrictions on the recruitment. The controls were healthy individuals with no prior history of cancer (except non-melanoma skin cancer) recruited from Kelsey Seybold Clinic, the largest multi-specialty, managed-care physician group in the Houston metropolitan area. Controls were matched to cases by age (±5 years), gender and ethnicity. To control for confounding of population stratification, we restricted both cases and controls to self-reported non-Hispanic Caucasians for the current analysis. These cases and controls have been included in our recent genome-wide association study of bladder cancer and there was no evidence of population substructure among cases and controls 16. The potential controls were first surveyed by a Kelsey-Seybold staff member during clinical registration using a short questionnaire to elicit their willingness to participate in the study and to provide preliminary demographic data for matching. They were contacted by telephone at a later date to schedule an interview appointment at a Kelsey-Seybold clinic convenient to the participant. The response rate for the ongoing study was 92% for cases and 76.7% for controls. All cases and controls in this study completed a 45-minute structured questionnaire administered by trained MD Anderson staff interviewers. The questionnaire collected information about demographics, smoking history, family history of cancer, and medical history. At the end of the interview, a 40-mL blood sample was drawn into a coded heparinized tube and sent to the laboratory for immediate DNA extraction and molecular analyses. The study was approved by all relevant institutional review boards, and the signed informed consent was obtained from each participant.
A comprehensive list of genes in the NER pathway was developed through the interrogation of the Gene Ontology (GO) database (http://www.geneontology.org) and PubMed-based literature review, as previously described 17. Tagging SNPs were selected by the binning algorithm of LDSelect software (http://droog.gs.washington.edu/ldSelect.pl) with r2 threshold of 0.8 and minor allele frequencies (MAF) > 0.05 within 10 kb upstream of the 5′ untranslated region (UTR) and 10 kb downstream of the 3′ UTR of each gene. We also included a few non-synonymous SNPs identified in the dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/). Since this is a gene-centered candidate region search, we achieved 100% coverage of our targeted genomic regions. The number of SNPs for each gene region was as follows: CCNH, 3; CDK7, 2; DDB2, 9; ERCC1, 2; ERCC2, 8; ERCC3, 5; ERCC4, 8; ERCC5, 14; ERCC6, 12; ERCC8, 12; GTF2H1, 6; GTF2H3, 3; GTF2H4, 6; GTF2H5, 3; ING2, 6; LIG1, 9; MMS19L, 4; MNAT1, 9; RAD23A, 3; RAD23B, 17; RPA1, 15; RPA2, 3; RPA3, 28; XAB2, 5; XPA, 7; XPC, 8.
Genomic DNA was extracted from peripheral blood using the Qiagen Whole Blood DNA Extraction Kit (Qiagen, Valencia, CA). The genotyping was performed using the iSelect Infinium II platform according to Illumina’s protocol, as described previously 12, 17. Briefly, 750 ng DNA of each sample was amplified 1,000–1,500 fold overnight. The amplified DNA was fragmented, precipitated, and resuspended before being hybridized to the iSelect Beadchip, which contains SNP-locus-specific oligonucleotide primers (50 bp long) covalently attached to the bead surface. After specific hybridization of genomic DNA to the bead array, each SNP locus-specific primer (attached to beads) was extended with a single hapten-labeled dideoxynucleotide in a single base extension reaction. Incorporated haptens were converted to fluorescent signal by multilevel immunohistochemistry staining and imaged using the BeadStation Scanner. The genotype for each SNP was auto-called using the BeadStudio software package and processed for further statistical analyses. The average call rate for SNPs is > 99%. Individuals with >5% missing genotypes, and SNPs with >5% missing calls were excluded from downstream analyses. Randomly selected 2% of samples were run in duplicates and the concordance of SNP genotype calls was >99.9% for duplicated samples.
Statistical analyses were performed using Intercooled STATA software, version 10.1 (STATA Corp., College Station, TX) and SAS/Genetics, version 9.0 (SAS Institute). χ2 test was used to assess the differences between cases and controls with regard to categorical variables such as gender and smoking status. Student’s t test was used to test for continuous variables, including age and pack-years. The Hardy-Weinberg equilibrium (HWE) was tested using a goodness-of-fit χ2 test. Odds ratios (ORs) and 95% confidence intervals (CIs) were calculated using multivariable logistic regression to identify SNPs with significant associations with bladder cancer risk, adjusting for confounding factors including age, gender, and smoking status. For each SNP, genotypes containing the homozygous major allele were used as the reference group to calculate the ORs and 95% CIs for genotypes containing the variant allele. The definitions of smoking status were the same as previously described 18. P value was calculated using the likelihood ratio test comparing the model with and without the variables of interest in multivariate logistic regression. Dominant, recessive and additive models were tested for each SNP and the reported P value was the smallest of the above three tests. We assessed the noteworthiness of an observed significant association using a Bayesian False Discovery Probability (BFDP) approach proposed by Wakefield 19. The approach has been adopted and used in a number of epidemiologic studies to control for multiple testing 20–25. We used all p-values from different inheritance models (additive, recessive, and dominant) to perform the BFDP analyses. We adjusted all the 207 tests at the SNP level. We assumed a range of prior probability from 0.01 to 0.05. We set the prior probability that OR is bigger than 2.0 as 0.025. An association was declared as being significant if BFDP was below 0.8 for prior probability 0.05 (considered to be a moderate prior for pathway-based association study) 19. The test for interaction between genotypes and smoking was done by including an interaction term in the logistic regression. We tested for interactions between SNPs and smoking status (never and ever) and between SNPs and packyears of smoking (as continuous variable). For the SNPs with significant main effects, we reported results of stratified analyses by smoking status and the interactions of these SNPs with smoking status. We also reported SNPs with nominally significant interactions with packyears of smoking (P<0.05). We also performed exploratory classification and regression tree (CART) analysis to identify potential gene-gene interactions using the HelixTree software (version 4.1.0, Golden Helix). CART is a binary recursive partitioning method that produces a decision tree to identify subgroups of subjects at different risk level. Specifically, the recursive partitioning algorithm starts at the first node (with the entire data set) and determines the first locally optimal split and each subsequent split of the dataset with multiplicity adjusted P values to control tree growth. We used P=0.001 to grow the tree and q=0.05 to prune the over-grown tree and control the final tree size. All P values in this study were two-sided.
Our participants consisted of 803 Caucasian bladder cancer patients and 803 age (± 5 years) and gender frequency-matched Caucasian controls (Table 1). Cases had a significantly higher percentage of current smokers and recent-quitters than controls (23.3% vs. 8.3%, P=5.2×10−21). Among ever smokers, cases reported significantly higher levels of cigarette consumption than controls (mean pack-years: 43.0 vs. 29.9, P=2.8×10−12).
Among the 207 NER SNPs, 17 (8.2%) were significantly associated with bladder cancer risk at the 5% level (Table 2), among which one SNP (rs4151330) had a statistically significant deviation from HWE (P<0.05) in controls, consistent with that expected by chance. The most noteworthy finding of our study was related to ING2 gene. Four of the six ING2 SNPs conferred a significantly altered risk of bladder cancer; the most significant one was rs11132186, a SNP located in the 3′ region of ING2. Under a recessive genetic model, the homozygous variant genotype of rs11132186 was associated with a reduced risk of bladder cancer (OR=0.52, 95% CI 0.32–0.83, P=0.005). Another ING2 SNP in the 3′ region, rs11735038, also conferred a reduced bladder cancer risk (OR=0.66, 95% CI 0.49–0.90, P=0.008) under a recessive genetic model. The other two significant ING2 SNPs were rs6854224 in the 5′ region (OR=0.70, 95% CI 0.53–0.93, P=0.013, recessive model) and rs11732255 in the 3′ region (OR=0.84, 95% CI 0.72–0.98, P=0.025, additive model) (Table 2). There are two additional SNPs exhibiting a best-fitting P value less than 0.01 and conferred an increased bladder cancer risk: rs11039130 in the 5′ region of DDB2 (OR=1.64, 95% CI 1.14–2.35, P=0.007) and rs4150667 in the intron of GTF2H1 (OR=1.55, 95% CI 1.12–2.15, P=0.008). In addition, we further assessed the noteworthiness of these significant associations using BFDP. We found that 7 out of the 17 significant SNPs had a BFDP value less than 0.8 (range 0.72–0.80), suggesting the potential noteworthiness of their associations with bladder cancer risk (Table 2). Table 3 shows the detailed distributions of different genotypes (homozygous major allele, heterozygote, and homozygous variant) of these 7 SNPs in cases and controls and the associations of each genotype with bladder cancer risk as well as the best fitting model for each SNP.
We also performed stratified analysis by smoking status (data not shown). There were significant overlaps of top SNPs between analysis of overall population and smokers, for example, the top two SNPs in smokers, DDB2 rs1685404 (OR=0.67, 95% CI, 0.52–0.86, P=0.0021 under dominant model) and GTF2H1 rs4150667 (OR=1.90, 95% CI, 1.24–2.89, P=0.0024 under recessive model), were the 8th and 4th most significant SNPs in overall analysis. For never smokers, the most significant SNP, XPA rs10817938 (OR=0.26, 95% CI, 0.12–0.60, P=0.0003 under dominant model), was not significant in overall analysis (P=0.67). We then tested interactions between significant SNPs and smoking status in modulating bladder cancer risk. Many of the top SNPs exhibited similar effect on bladder cancer risk in never and ever smokers, but there were significant SNP-smoking interactions for a few SNPs (Table 2). CCNH rs10065575 was the 3rd most significant SNP in never smokers (P=0.0012), but was not significant in ever smokers, and the test of interaction showed a significant interaction of this SNP with smoking status (P=0.034, Table 2). Significant interactions with smoking status were also observed for DDB2 rs1685404 (P=0.038) and ERCC6 rs4453140 (P=0.009) (Table 2).
We then performed detailed analysis of the interactions between all the assayed SNPs and packyears of smoking (as continuous variable). Table 4 listed the 17 SNPs showing nominally significant interactions with packyears (P<0.05). The most significant SNP was rs4151150 in MNAT1 that exhibited a significant negative interaction with packyears (β=−0.026, P=0.0018). Among the 17 SNPs, only rs1685404 on DDB2 showed significant main effect. The other two SNPs with significant main effect as well as significant interactions with smoking status (CCNH rs10065575 and ERCC6 rs4453140 in Table 2) were not significant in this analysis, suggesting that quantitative measures of smoking provide more information that smoking status. Since we did not have sufficient power to detect SNP-smoking interaction, this analysis was exploratory and no multiple testing correction was applied.
We performed a haplotype analysis for ING2 and DDB2, the two genes with several significant SNPs (Table 5). We calculated the pairwise D′ to measure the linkage disequilibrium between SNPs in each gene and found no significant linkages. (D′ range, 0.13–0.51 for ING2 SNPs, 0.18–0.31 for DDB2 SNPs, data not shown). A total of 12 and 9 haplotypes with a frequency of more than 1% were identified for ING2 and DDB2, respectively. For ING2, compared to the most common haplotype H1, three minor haplotypes, H8, H10, and H11, were associated with decreased bladder cancer risks with ORs of 0.60 (95% CI 0.39–0.93), 0.43 (0.20–0.90), and 0.42 (0.20–0.89), respectively. For DDB2, compared to H1, haplotype H6 was associated with a decreased bladder cancer risk (OR=0.67, 95% CI 0.50–0.91, P=0.009) (Table 5).
CART analysis uses a binary recursive partitioning method to identify subgroups of high-risk subjects and detects higher-order interactions among a large number of variables. Figure 1A depicts the resulting tree structure generated by CART analysis. The initial split was rs1051315 in RPA1 gene. In subjects with the homozygous major allele-containing genotype of rs1051315, the tree structure was further generated according to the genotype information of CCHN rs10065575, ERCC2 rs1799787, RPA1 rs3744467, ING2 rs6854224, MNAT1 rs1885094, and MNAT1 rs4151330, resulting in different subgroups (terminal nodes), each with a distinct combination of genotypes and a different risk estimate. Figure 1B summarizes the risk estimates for individuals in each terminal node. Compared with individuals in terminal node 1, individuals in terminal node 3 exhibited a significantly increased bladder cancer risk (OR=2.58, 95% CI, 1.56–4.26, P=2.1×10−4), whereas individuals in terminal node 6 had a significantly reduced risk (OR=0.42, 95% CI, 0.26–0.67, P=4.0×10−4). We tested the interactions between SNPs identified from this CART analysis. There were significant interactions between CCNH rs10065575 and ING2 rs6854224 (P=0.014), ING2 rs6854224 and MNAT1 rs4151330 (P=0.017), and between CCNH rs10065575 and MNAT1 rs1885094 (P=0.043). These interactions resulted in terminal nodes 4 to 7. Due to the post-hoc data mining nature of CART analysis, these results are exploratory.
In this study, we assessed the effects of a comprehensive panel of 207 SNPs in 26 genes in the NER pathway on the risk of bladder cancer. ING2 gene was the most noteworthy finding, with four of six evaluated SNPs exhibiting significant associations with bladder cancer risk. Furthermore, we found potential gene-smoking interaction and higher-order interactions among these NER SNPs in the modulation of bladder cancer susceptibility.
Our findings of ING2 are biologically plausible. ING2 is a member of the inhibitor of growth (ING) gene family and encodes a putative tumor suppressor protein involved in the regulation of DNA repair, cell cycle progression, apoptosis, and epigenetic functions in a p53-dependent manner. ING2 gene was first cloned in 1998 as a homologue of the first ING family member, ING1 26. ING2 gene is 6 kb in length, located in chromosome region 4q35.1. The implication of ING2 in NER was first established by Wang et al., who found that overexpression of ING2 gene significantly enhanced the repair of UV-induced DNA damage 27. This function of ING2 is dependent on the normal functions of p53 protein since small interfering RNA (siRNA)-mediated degradation of either ING2 or p53 abolished the observed repair capacity 27. Furthermore, Want et al. showed that ING2 protein is not a component of the NER core protein complex; instead, ING2 enhance NER through recruiting XPA to the core complex 27. The function of ING2 in DNA repair is also mediated by the interaction with the trimethylated and dimethylated H3K4, which stabilizes the mSin3a-HDAC1 protein complex to enhance the transcriptional activity of many relevant genes 28. In addition to its involvement in NER, ING2 has also been implicated in G1 phase cell cycle arrest through increasing the transcriptional activation ability of p53 29. Furthermore, ING2 interacts with phosphoinositides to activate p53-dependent apoptosis pathway 30. In our study, all 4 significant ING2 SNPs, which are not in strong linkage, were associated with reduced risks of bladder cancer. Among them, 3 SNPs are located in the 3′ region of gene and one located in the 5′ region of ING2, 5.5 kb upstream of the translation start codon. Since the promoter and enhancer sequences of ING2 have not been well-characterized experimentally, it remains to be determined whether these SNPs have any functional significance. It is more likely that they are tagging SNPs but not the causal variants. Therefore, high-density mapping in combination with functional characterizations are warranted to further elucidate the molecular mechanisms underlying the association between ING2 SNPs and bladder cancer risk observed in this study.
Another noteworthy gene we found in this study is DDB2. DDB2 gene contains 8 exons and spans approximately 24 kb in chromosomal 11p12-p11 31. As a response protein to DNA damage induced by genotoxic agents such as UV-irradiation, DDB2 interacts with CUL4A, RBX1, and COPS2 to form a protein complex that binds to chromatin and initiates the NER process 32. DDB2 enhances the DNA binding activity of DDB1 and serves as an crucial component in p53-mediated DNA repair process 33, 34. Two SNPs of DDB2 exhibited a significant association with bladder cancer risk. One SNP, rs11039130, located ~7 kb upstream of transcription start site, was associated with a 1.64-fold increased risk under a recessive model. The other SNP, rs1685404, is located in the 3′ of the gene and conferred a reduced bladder cancer risk under a dominant model. Several studies have reported that the transcription of DDB2 gene is tightly regulated by a wide array of transcription factors such as E2F family, BRCA1, SP1, Myc, and NF1 35–37. However, whether rs1685404 is a direct causative locus or surrogate remains to be determined through further fine mapping and functional characterization.
We performed stratified analysis by smoking status and tested SNP-smoking (smoking status and packyears) interactions. Due to the limited power to detect SNP-smoking interaction, this analysis was exploratory and further validation with larger sample size is needed. Interestingly, the effects of CCNH and ERCC6 SNPs were stronger in never smokers, whereas the effort of DDB2 SNP was only evident in ever smokers and other SNPs showed similar effects in never and ever smokers. This observation is in line with a recent genome-wide association study (GWAS) of bladder cancer in which the NAT2 slow acetylator genotype was associated with an increased bladder cancer risk in ever smokers but not never smokers, but the effect of GSTM1 null genotype was the strongest in never smokers and grew progressively weaker in former and current smokers 38. Eight other GWAS-confirmed SNPs showed similar effect among never and ever smokers 38. It is intriguing that different genotypes in carcinogen metabolism and DNA repair exhibited differential effect on bladder cancer risk among never and ever smokers. Other exposures may explain such interactions. Occupational exposure is the second major environmental risk factor for bladder cancer. Our previous publication has shown that prolonged exposure to diesel fuel or fumes on a regular basis, exposure to tar/mineral oil, dry cleaning fluids, leather and tanning solutions, rubber products, glues, pesticides, insecticides, or herbicides, fertilizers, arsenic, zinc, radioactive materials, and aromatic amine, were all associated with an increased risk of bladder cancer 39. It would be interesting to assess the NER SNPs and bladder cancer risk in the context of these different DNA damaging exposures. Since only a small percentage of our study populations were exposed to these different occupational exposures, the power to detect significant associations in exposed populations was limited. Future studies are warranted to address this question.
We also conducted exploratory CART analyses to assess potential higher-order gene-gene interactions within the NER pathway genes. The process of tumorigenesis in sporadic cancers is a multifactorial and multistep process that involves complicated interactions of various low-penetrance genetic and environmental components. The CART analysis identified subsets of individuals with different cancer risks based on different combinations of genotypes, with the ORs of individuals in each terminal node ranging from 0.42 to 2.58 (Figure 1). A paired interaction analysis supports some of the SNP-SNP interactions. These data suggest that gene-gene interactions play an important role in bladder cancer etiology. Nevertheless, given that CART analysis is a post-hoc data mining tool that was applied to the same dataset, the results are preliminary and should be interpreted with caution.
The strengths of our study include a large and homogenous study population, a comprehensive panel of genes in the relatively well-characterized NER pathway, and the use of an htSNP-based genotyping approach. The limitation of this study is that although we applied BFDP approach, one of several available statistical methods, to control for multiple testing, it is possible that some of our reported SNPs are false-positive findings. The main reason for our choice of BFDP is that the noteworthy threshold defined in BFDP approach accounts for the costs of false discovery and non-discovery. The other alternative approaches to correct for multiple testing, such as the Bonferroni correction for independent tests and the P(ACT) method to compute P values adjusted for correlated tests 40, do not consider the cost of non-discovery and are more conservative. Regardless of the method used for correcting multiple testing, the ultimate way to eliminate false positive findings is through independent validation. We reported both the significant SNPs after multiple testing correction by the BFDP method and the nominally significant SNPs in main effect and SNP-smoking interaction analyses. External validations in independent epidemiology studies with adequate sample sizes are warranted to confirm the results of our studies.
Funding sources: This study was supported by National Cancer Institute grants CA131335, CA74880, CA91846, and CA127615.
Financial disclosures: The authors have no financial disclosures related to the content of this article.