The overall case group had a higher percentage of men than the control group, and a larger fraction were current or former smokers (table ). The frequency of the minor allele in the controls by study (NH/Italy) was APE1-
148 0.47/0.42, XRCC1-399
194 0.07/0.08, XRCC3-
241 0.38/0.38, XPD
-751 0.36/0.42 (table ). While we observed slight differences in frequency for the XPD
-751 and APE1-
148 polymorphisms between the two studies, the frequency of the other polymorphisms was similar. Hardy-Weinberg Equilibrium using a chi-square test among controls had resultant p values: APE1-
148 p = 0.69, XRCC1-
399 p = 0.01, XRCC1-
194 p = 0.09, XRCC3-
241 p = 0.34, XPD
-751 p = 0.08. Deviations from the expected genotype frequency distribution for XRCC1
have been observed previously in other study populations [43
Selected characteristics of bladder cancer cases and controls by study
Odds ratios (95%CI) for bladder cancer in relation to DNA repair gene polymorphisms overall and by smoking status
Results of our logistic regression analysis of the single genotype effects for the pooled dataset are shown in table , overall and then stratified by smoking status. There was no significant heterogeneity between studies (table ). Importantly, none of the coefficients of logistic regression with study-specific slopes differed more than 10% from the model with age, gender, smoking status and study location. Therefore, our final analysis was based on the more parsimonious models.
The base excision repair (BER) pathway polymorphism APE1-148 was un-related to bladder cancer risk overall (table ) or in either study (NH 1.0 (95% CI 0.7–1.3), Italy 0.9 (95% CI 0.5–1.7)). The odds ratio for XRCC1-399 variants was slightly below one (table , NH 0.9 (95% CI 0.6–1.2), Italy 0.8 (95% CI 0.5–1.3)). XRCC1-194 variants were rare, with an overall OR slightly below 1 (table , strata were too small to compute risks for NH and Italy separately). In the double strand break (DSB) repair pathway, individuals variant for XRCC3-241 had a slightly elevated risk of bladder cancer overall (table ), (NH 1.1 (95% CI 0.8–1.5), Italy 1.7 (95% CI 1.0–2.7)) that was highest among XRCC3-241 variant homozygous current smokers (table ). Likewise, we did not observe a clear association with the nucleotide excision repair (NER) pathway polymorphism XPD-751 (table , NH 1.2 (95% CI 0.9–1.7), Italy 1.1 (95% CI 0.7–1.9)). We did not detect any statistically significant interactions between smoking and any of the genotypes. Further, odds ratios did not differ markedly by gender, and the odds ratios for analyses restricted to males were similar to those performed on the entire cohort (data not shown).
To evaluate the large number of possible combinations of genotypes, we used MDR, hierarchical interaction graphs, CART and logic regression approaches. We then used traditional logistic regression to evaluate the interactions between genotypes that were predicted by at least three of these four methods (MDR, hierarchical interaction graphs, CART, logic regression) (table ). The interaction predicted by all four methods was reaffirmed by logistic regression (increased risk with XRCC1-399 GG/XRCC3 TT vs. XRCC1-399 GG/XRCC3CC, adjusted OR 1.9 (95% CI 1.3–2.9) p = 0.001).
Interactions between genotypes by logistic regression by smoking status
In addition to assessing the concordance between the models, we also examined the complementary information provided by each analytic method to detect gene-gene interactions. MDR interaction modeling (table ) identified, XPD-751, XRCC1-399, XRCC3-241 as the combination of SNPs that most accurately predicts bladder cancer status (average prediction error 45%, CVC 8/10, permutation test p = 0.003). Table also indicates that the single most important predictor of bladder cancer risk is smoking status (average prediction error 44%, CVC 10/10, permutation test p = 0.001). Likewise, the strongest two way interaction shown in the hierarchical interaction graph (fig. A, green arrows) was between XRCC3-241and XRCC1-399. This interaction remains strong when smoking is included in the model (fig. B). XRCC3 was the most important single gene in the models (fig. A, B). Likewise, the classification tree shown in figure selected XRCC3 for the initial binary split (fig. A SNPs only, fig. B SNPs in current smokers). In figure A, within the XRCC3-CC/CT group, daughter nodes predict increased risk among individuals who are XRCC1-399 GA/AA and XRCC1-194 CC (figure , nodes 32,35). From the XRCC3-TT branch, XRCC1-399 GA/AA (node 4) or a combination of XRCC1-399 GG and APE1-TG/GG is associated with increased risk (nodes 5, 20). As observed previously, the initial split was on smoking status (current smokers vs. former/never smokers). Focusing on current smokers (fig. B), the model with the least misclassification (0%) includes XRCC3-TT, XRCC1-399 GG, and XRCC1-194 CT/TT. We also examined gene-gene interactions in this dataset using logic regression (fig. ). The optimal model predicted two independent sets of interactions: between XRCC1 399 and XPD 751 (tree 1), and between XRCC3 and either one of the two XRCC1 SNPs − 194 or 399 (tree 2).
Multifactor dimensionality reduction (MDR) interaction model
Fig. 1 Hierarchical interaction graph of genotypes. The percentage of entropy removed (i.e. information gain) by each variable is visualized for each node (box). The percentage of entropy removed for each pairwise Cartesian product of variables is visualized (more ...)
Fig. 2 Classification and regression tree (CART) model of genotypes. Splitting rules are used to stratify data into subsets of individuals, which are represented in the CART decision tree as nodes. Each ‘child node’ is selected considering only (more ...)
Fig. 3 Logic Regression model of genotype interactions. The algorithm constructs predictors from binary SNP data that are Boolean (logical) combinations of the original genotype data. Logic expressions are depicted as trees with AND/OR operators at each branch (more ...)
Three methods (MDR, hierarchical interaction graph, logic regression) predicted an interaction between XRCC1-399 and XPD-751 (table ). Relative to individuals with XRCC1-399 GG and XPD-751 AA genotypes, those with at least a variant allele for either XRCC1-399 or XPD-751 had increased bladder cancer risk (e.g. XRCC1-399GA, XPD-751 AA OR 1.5 (95%CI 1.1–2.1)). The interaction p value for heterozygotes/variants compared with wild-type was statistically significant (p = 0.008) from logistic regression analysis.
We further evaluated potential gene-environment interactions by stratifying our logistic regression analysis of the genotype combinations that were selected in our initial screen by smoking status (table ). Bladder cancer risk was particularly elevated in the current smokers with XRCC1-399 GG/XRCC3 TT genotypes versus XRCC1-399 GG/XRCC3CC (adjusted OR 4.8 (95% CI 1.9–12.1)). When age, gender and smoking history were added to the initial predictive models with all genotypes, the four analytic methods consistently selected smoking status, followed by male gender and age above 50 years as most highly predictive for bladder cancer risk (data not shown). The strongest four-factor MDR model without smoking, included the polymorphisms XPD-751, XRCC1-399, XRCC3-241, and APE1-148 (average prediction error 46.54%, cross-validation consistency 10/10). The best gene-only model was the two locus with XRCC1-399 GA, and XRCC3-241 TT as the high risk genotype combination (average prediction error 47%, cross-validation consistency 9/10).
False Positive Report Probability (FPRP)
Table reports the FPRP values calculated using the statistical power to detect an OR of 1.2, 2.0 and 3.0 with an α level equal to the observed p value. Results show a good reliability on a 3-loci gene-only model (XRCC1-399-GG + XRCC3-241-TT + APE1-148-TG/GG vs. the remaining ‘low-risk’ genotypes) in the overall population with very low prior probabilities (0.0001) for OR = 2 or 3. Among all the two-loci significant models, the comparison XRCC1-399-GG + XRCC3-241-TT vs. XRCC1-399-GG + XRCC3-241-CC/TT is still interesting at a prior probability of 0.01 (OR = 2 or 3), as well as for the 4 loci model involving XPD, XRCC1, XRCC3and APE1 genes. On the other hand, other two-loci models (XRCC1-399-GA + XRCC3-241-TT vs. ‘low-risk’ genotypes; XRCC1-399-GG + XRCC3-241-TT vs. XRCC1-399-GG + XRCC3-241-CC/TT) require higher prior probabilities (0.1 for OR = 2 or 3).
False positive report probabilities