|Home | About | Journals | Submit | Contact Us | Français|
N.R., M.G-C., N.C., J.F., D.T.S., and S.J.C. organized and designed the study.
S.J.C., K.B.J., A.H., Z.W., Y-P.F., .L.P-O., L.B., X.W., M.A.T.H., M.C., D.V.D.B., S.G., S.P., R.R.M., I.D.V., T.R., D.T.B., G.C-T., J.G.H., R.K., S.C.E.B., and A.G. conducted and supervised genotyping of samples.
M.G-C., N.C., N.R., K.B.J., M.Y., N.M., D.T.S., and S.J.C. contributed to the design and execution of statistical analysis.
M.G-C., N.R., N.C., N.M., J.F., F.X.R., J.F.F., D.T.S., and S.J.C. wrote the first draft of the manuscript.
N.R., M.G-C., N.M., X.W., J.F., D.V.D.B., F.X.R., G.M., D.B., M.T., L.A.K., P.V., I.D.V., D.A., M.P.P., T.R., M.A.T.H., A.E.K., O.C., K.G., R.K., J.A.T., J.I.M., M.K., A.T., C.S., A.C., R.G-C., J.L., A.J., M.S., M.R.K., A.S., G.A., R.G., A.B., E.J.J., W.R.D., S.M.G., S.J.W., J.V., V.K.C., M.G-D., M.C.P., M.C.S., J.Y., D.H., M.M., C.P.D., B.C., M.C., H.Y., S.H.V., K.K.A., J.A.W., R.R.M., P.S., S.B., K.S., E.R., P.B., S.P., C.N., N.E.A., H.B.B., D.T., N.C., M.T.L., F.C., B.L., A.T., F.C-C., D.T.B., M.T.W.T., M.A.K., S.G., S.P., F.R., C.S., A.A., G.C-T., S.S., J.G.H., H.D., T.F., P.R., E.G., K.K., S.C.E.B., A.G., Z.X., J.I.S-V., M.D.G-P., M.S., G.V., S.P., S.B., R.N.H., J.F.F., D.T.S., and S.J.C. conducted the epidemiologic studies and contributed samples to the bladder cancer GWAS and/or replication.
All authors contributed to the writing of the manuscript.
We conducted a multi-stage, genome-wide association study (GWAS) of bladder cancer with a primary scan of 589,299 single nucleotide polymorphisms (SNPs) in 3,532 cases and 5,120 controls of European descent (5 studies) followed by a replication strategy, which included 8,381 cases and 48,275 controls (16 studies). In a combined analysis, we identified three new regions associated with bladder cancer on chromosomes 22q13.1, 19q12 and 2q37.1; rs1014971, (P=8×10−12) maps to a non-genic region of chromosome 22q13.1; rs8102137 (P=2×10−11) on 19q12 maps to CCNE1; and rs11892031 (P=1×10−7) maps to the UGT1A cluster on 2q37.1. We confirmed four previous GWAS associations on chromosomes 3q28, 4p16.3, 8q24.21 and 8q24.3, validated previous candidate associations for the GSTM1 deletion (P=4×10−11) and a tag SNP for NAT2 acetylation status (P=4×10−11), as well as demonstrated smoking interactions with both regions. Our findings on common variants associated with bladder cancer risk should provide new insights into mechanisms of carcinogenesis.
Bladder cancer is the fourth most common incident cancer in men1 and its frequent recurrence requires regular screening and interventions. Cigarette smoking and occupational exposure to aromatic amines have been strongly linked to bladder cancer risk.1 A family history of bladder cancer is associated with an approximately two-fold increase in risk; however, multiple-cancer families are rare and no high-penetrance genes have been identified to date2-4. Large meta-analyses of candidate gene studies have provided support for associations between NAT2 slow acetylation phenotype5 (defined by NAT2 haplotypes) and a common gene deletion of GSTM16 with bladder cancer risk7,8. Further, gene-environment interactions have been shown for smoking and NAT2 acetylation, with an increased risk in slow acetylators, apparent only among cigarette smokers7,8.
Previous genome-wide association studies (GWAS) in bladder cancer have identified common variants in four genomic regions on chromosomes 3q289 (TP63), 4p16.3 (TMEM129, TACC3-FGFR3)10, 8q24.219, and 8q24.311 (PSCA) that are associated with risk. Interestingly, the variants on 8q24.21 map to a region centromeric to MYC that has been identified in GWAS of breast, colorectal and prostate cancers, as well as chronic lymphocytic leukemia12-18. Also, in follow-up analyses, an association with bladder cancer risk has been suggested for variants near the TERT-CLPTM1L locus on chromosome 5p15.33, which has also been associated by GWAS with risk for basal cell carcinoma, cutaneous melanoma, lung, brain and pancreatic cancers19-23. However, the previously reported association with bladder cancer did not achieve genome-wide significance.
We conducted a multi-stage GWAS involving 3,532 cases and 5,120 controls of self-described European descent in stage I, and followed up the most notable signals in two stages of replication (stages IIa/b and III) totaling 8,381 cases and 48,275 controls (Figure 1 and Online Methods). Individuals with scan data in stage I were participants in two case-control studies carried out in Spain and the USA (Maine and Vermont component of the New England Bladder Cancer Study) and three prospective cohort studies in the USA and Finland (see Supplementary Table 1 online for details). Replication analyses in stage II were carried out using existing scan data from two earlier studies. First, we evaluated the 100 most significant SNPs (excluding previously reported loci and SNPs with pairwise r2>0.8) in 969 cases and 957 controls from the Texas Bladder Cancer study in the USA (stage IIa)11. Five of these SNPs were further evaluated in a second scan of 1,274 cases and 1,832 controls in The Netherlands (stage IIb)9. Three of the five SNPs were included or tagged at a pair-wise r2>0.8 in the Dutch scan, and risk associations were confirmed for all three. In stage III, the three SNPs plus a tagging SNP for the NAT2 acetylation status were evaluated in 6,141 cases and 45,486 controls from 11 case-control and 3 prospective cohort studies in the USA and Europe (see Figure 1 and Supplementary Table 1).
After quality control analysis of genotypes, we combined the data sets in stage I resulting in 589,299 SNPs available for analysis (based on the common SNPs called from both the Illumina Human1M and Human 610-Quad) in 3,532 cases and 5,120 controls (Online Methods). A logistic regression model was fit for genotype trend effects (1 d.f.) adjusted for study center, age, sex, smoking status (current, former or never) and DNA source (blood/buccal). The quantile-quantile (Q-Q) plot showed little evidence for inflation of the test statistics as compared to the expected distribution (corrected λ1000 subjects=1.021), which minimizes the likelihood of substantial hidden population substructure or differential genotype calling between cases and controls24 (Online Methods and Supplementary Figure 1). A Manhattan plot displays the results of the combined GWAS in stage I (Supplementary Figure 2).
Data from the first stage confirm the associations reported with tag SNPs in the four previously identified genomic regions on chromosomes 3q28 (rs710521)9, 8q24.21 (rs9642880)9, 8q24.3 (rs2294008)11 and 4p16.3 (rs798766)10 as well as a suggested region in 5p15.33 (rs401681; a neighboring SNP, rs2736098, was also reported but data were not available in our study)19 (Table 1 and Supplementary Figure 3). Consistent with prior reports9,10, rs9642880 on 8q24.21 and rs798766 on 4p16.3 were most strongly associated with tumors of low grade/low risk of progression (Supplementary Table 2). A stronger association with low grade/low risk disease was also suggested for rs401681 on 5p15.33 (Supplementary Table 2). In addition, we used a copy number variation TaqMan assay7 to assess the presence of GSTM1 on 1p13.3 to genotype stage I samples, and confirmed an association with increased bladder cancer risk (Table 1).
In a combined analysis based on case/control counts by genotype and study, we estimated odds ratios (ORs) using logistic regression analyses adjusted for study center. Meta-analyses of estimated ORs adjusted for age, sex, smoking status and DNA source produced comparable point estimates (Supplementary Table 3). Our combined analysis of stages I, II and III identified three novel genomic regions on chromosomes 22q13.1, 19q12 and 2q37.1 that were associated with bladder cancer risk below the threshold for genome-wide significance (P<5 × 10−7)25 (Table 2 and Supplementary Figure 4 for study and stage specific estimates, Figure 2). We also confirmed a signal below genome-wide significance for rs1495741, which tags the NAT2 acetylator status26 previously reported as a bladder cancer susceptibility locus on 8p227,8. The new SNP is located approximately 10kb of the 3′ end of the gene.
The locus on chromosome 22q13.1, rs1014971 (Ptrend=8.4×10−12; OR per C allele =0.88, 95%CI 0.85-0.91)), was primarily associated with high-risk tumors (Supplementary Table 2). The locus is located in a non-genic region, approximately 25 kb centromeric of the catalytic polypeptide-like 3A (APOBEC3A) and 64 kb telomeric of the chromobox homolog 6 (CBX6). APOBEC3A is an apolipoprotein B mRNA editing enzyme that belongs to the cytidine deaminase gene family, which can play a role in the initiation of tumorigenesis by deamination of cytosine (C) to uracil (U)27. CBX6 is a component of the chromatin –associated polycomb complex involved in transcriptional repression.
In the combined analysis, we observed an association with rs8102137 on chromosome 19q12 (Ptrend=1.7×10−11; OR per C allele =1.13, 95%CI 1.09-1.17), which maps to the cyclin E1 gene (CCNE1). CCNE1 is a key member of the cyclin/cyclin-dependent kinase (Cdk)/retinoblastoma protein (pRB) pathway which determines the rates of cell cycle transition from G1 to S phase, and is commonly altered in bladder cancer and other tumors28. Cyclin E1 expression in bladder cancer has been associated with high grade or muscle invasive tumors and poor clinical outcome29. Consistently, rs8102137 was most strongly associated with risk of high grade/high risk tumors (Supplementary Table 2).
A third locus is marked by rs11892031 (P=1.0×10−7; OR per C allele =0.84, 95%CI 0.79-0.89) on chromosome 2q37.1 and resides in an intronic region of the UDP-glucuronosyltransferase (UGT) 1A gene locus, which encodes the UGT1A family of proteins. Glucuronidation by UGTs facilitates solubility and removal of substrates such as endo- and xenobiotics (including carcinogens in tobacco smoke) via bile or urine30. Genetic variation in UGT1A has been associated with predisposition to severe gastrointestinal toxicity of the anticancer drug irinotecan31. The UGT1A locus is represented by at least nine highly homologous transcripts, collectively known as UGTs, generated by alternative splicing. Tissue-specific loss or decreased expression of UGTs has been associated with several gastrointestinal cancers and bladder cancer32-34, as well as experimentally induced bladder cancer in animal models35.
Previously, a promising signal in the CLPTM1L-TERT locus on chromosome 5p15.33 was reported in a region in which common variants have been associated with multiple cancers in recent GWAS19-23. In addition, rare mutations in TERT have been linked to dyskeratosis congenita (a bone marrow failure syndrome), idiopathic pulmonary fibrosis, acute myelogenous leukemia and chronic lymphocytic leukemia36-39. In the first stage of this GWAS, we observed a moderately significant effect for rs401681 (P= 2.9 × 10−3), which was at genome-wide significance when combined with the Rafnar et al. data (P = 5.0 × 10−7; OR per C allele 1.11, 95% CI 1.07-1.16) (Table 1, Supplementary Figure 3).
The risk associated with GSTM1 and NAT2 varied in strength across categories of cigarette smoking, whereas genotype risk associations by smoking categories were of similar magnitude for the eight susceptibility loci identified by GWAS (Supplementary Table 4). In a combined analysis, the risk association with GSTM1 deletion was strongest in never smokers (OR=1.75, 95%CI=1.44-2.13), and progressively weaker in former (OR=1.55, 95%CI=1.35-1.78) and current smokers (OR=1.25, 95%CI =1.07-1.46; Pinteraction = 0.008 for current vs. never smokers; Table 3). The stronger association of the GSTM1 deletion among non-smokers is a novel observation that was not evident in previous case-only meta-analyses7. rs1495741 located on the 3′ end of NAT2 is a marker of the NAT2 phenotype associated with bladder cancer risk26. The rs1495741 GG genotype marking the slow acetylation phenotype, compared to the combined AG/AA genotypes corresponding to the intermediate/rapid acetylation phenotypes, showed a highly significant (P=5.5×10−7) association with increased bladder cancer risk that was limited to cigarette smokers (OR=1.24, 95% CI=1.16-1.32 P=4.3×10−11; Pinteraction=6.3×10−5) (Supplementary Figure 5 and Supplementary Table 3). This interaction is consistent with the role of NAT2 in the detoxification of bladder carcinogens such as aromatic amines from tobacco smoke.
Our three-stage study had adequate power to detect variants of moderate effect sizes over a range of common allele frequencies. For the newly discovered SNP markers, the power to detect the observed associations at a level of genome-wide significance was at 54%, 30%, 30% and 6% for rs104971, rs1495741, rs8102137 and rs11892031, respectively. In light of the limited power to discover SNPs with modest effect sizes, additional loci with similar effect sizes will likely be identified with larger scale GWAS. Based on a recent estimator40 that incorporates novel and previously reported loci together, we estimate that approximately two dozen additional bladder cancer susceptibility SNP markers of similar magnitude and frequencies might be discovered. Future studies should be powered with adequate sample size to detect additional variants.
With the exception of the GSTM1 deletion, relative risk estimates for novel loci are based on associations using tag SNPs, which most likely underestimate the association with biologically important alleles. Accordingly, further studies are needed to define the functional variants and the clinical utility of risk models that combine genetic markers with epidemiologic risk factors for bladder cancer (i.e. smoking, occupational and environmental exposures, family history). Our combined analysis of 12,254 individuals with bladder cancer and 53,395 controls has uncovered three new genomic regions associated with bladder cancer risk. Fine-mapping studies of these three regions are needed to identify candidate variants for functional studies that should shed light into biological mechanisms for the associations reported through GWAS. This knowledge could establish the foundation for developing improved preventive, diagnostic and/or therapeutic approaches.
The bladder cancer GWAS was supported by the intramural research program of the National Institutes of Health, National Cancer Institute.
This project has been funded in part with federal funds from the National Cancer Institute, National Institutes of Health, under Contract No. HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.
Participants were drawn from 21 studies (Supplementary Table 1). For stage I, cases were defined as histologically confirmed primary carcinoma of the urinary bladder including carcinoma in situ (ICD-0-2 topography codes C67.0-C67.9 or ICD9 codes 188.1-188.9). Each participating study obtained informed consent from study participants and approval from its Institutional Review Board (IRB) for this study. For stage I only, participating studies obtained institutional certification permitting data sharing in accordance with the NIH Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies (GWAS).
For stage I, genome-wide genotyping was conducted using three chips, SBCS (HumanHap 1 Million), NEBC-ME/VT (Human Hap 610-Quad), ATBC, CPS-II and PLCO (cases) (Human Hap 610) and controls from CGEMS/GEI for PLCO (Human Hap 550-r equivalent). DNA samples were selected for genotyping based on pre-genotyping quality control measures performed for GWAS at the Core Genotyping Facility of the NCI 4,089 blood samples and 2,813 buccal samples were analyzed. Repeat genotyping was performed on 38 blood samples (19 cases and 19 controls) and 10 buccal samples (2 cases and 8 controls) on Illumina 1M chips after suitable metrics identified performance issues. Cancer free controls (N=2003) were previously scanned in CGEMS18 and a lung cancer GWAS21.
Genotype clusters were estimated with samples by study with preliminary completion rates greater than 98% per individual study (namely SBCS, NEBC-ME/VT, PLCO, ATBC and CPS-II). Genotypes for the analytical build were based on study specific clustering. SNP assays with locus call rates lower than 90% were excluded.
SNPs with extreme departures from Hardy-Weinberg proportions (P<1×10−7) were excluded from the association analysis due to the increased likelihood of spurious associations due to problematic assays or genotyping calling.42 Additional participants were excluded based on: 1) completion rates lower than 94-96% (n=203 samples); 2) heterozygosity of less than 22% or >35% (n=12); 3) inter-study unexpected duplicates (n=5); 4) phenotype exclusions (due to ineligibility or incomplete information) (n=94).
Assessment of population structure of study participants was performed with STRUCTURE43 by seeding the analysis with founder genotypes from three HapMap populations (Phase I and II build 26).44 A set of 12,898 SNPs with extremely low pair-wise correlation (r2<0.004) was selected for this analysis.45-47 A total of 55 participants (43 cases and 12 controls) were estimated to have less than 85% HapMap CEU admixture (Supplementary Figure 6). Principal component analysis (PCA) of scanned subjects (excluding inferred sib and half-sib pairs) was performed with GLU (a similar procedure to EIGENSTRAT)45,46 and did not reveal notable eigenvectors. Consequently, a study-specific indicator was used for the stage I analysis46.
We estimated the inflation of the test statistic, λ, adjusted to a sample size of 1000 cases/1000 controls as per the method of de Bakker et. al: λ(corrected)= 1 + (λ−1) × [ncase−1 + ncont−1]/[2×10−3 ].48 The corrected estimated λ1000 is 1.021while the uncorrected λ is 1.086 (Supplementary Figure 1).
Twenty participant pairs were identified as potential relatives based on genotyping sharing in excess of theoretical expectations. A set of 4,546 SNPs were selected (with completion rates >95%, MAF>0.3 and r2<0.01 in the three HapMap populations) and used to run PREST49 to formally test for cryptic relatedness. 19 unexpected full-sib and 1 parent-child pairs were identified and excluded from PCA (but included in the association analysis). 243 expected duplicates (including 6 triplicates in ATBC) were evaluated and yielded a concordance rate of 99.99%.
The final participant count for stage I analysis was 3,532 cases and 5,120 controls (Supplementary Table 1). The number of SNPs available for association analysis in all studies but SBCS was 589,299. In the SBCS, genotyped with the Infinium HumanHap 1 M chip, after quality control metrics were applied, 1,002,634 SNPs were available and 571,643 overlapped exactly with the 610Quad/550k data.
TaqMan custom genotyping assays (ABI, Foster City, CA) were designed and optimized for 4 SNPs, including the tag SNP for NAT2. In an analysis of 1,107 samples from three studies, the comparison of the Illumina calls with the TaqMan assays showed an average concordance rate of 99.4% (range 99.2-99.8%); no shifts from wild type to homozygotes were observed. The Illumina Infinium cluster plots for the four novel associations, rs1014971, rs8102137, rs11892031 and rs1495741 are shown in Supplementary Figure 7.
Association analyses for stage I were conducted using logistic regression, adjusted for age (in five-year categories), sex, smoking (current, former or never), DNA source (buccal/blood) and study. Each SNP genotype was coded as a count of minor alleles, with the exception of X-linked SNPs among men that were coded as 2 if the participant carried the minor allele and 0 if he carried the major allele.50 A score test with one degree of freedom was performed on all genetic parameters in each model to determine statistical significance. We assessed heterogeneity in genetic effects across studies using the I2 statistic. For the inclusion of stage II and III data, we used genotype counts by case-control status and study, and conducted a fixed effects meta-analysis. We also conducted a meta-analysis based on estimates of allelic odds ratio adjusted by age, sex, smoking status, DNA source and study; the estimates did not materially differ from the fixed-effects meta-analysis (Supplementary Table 3).
Polytomous logistic regression was used to obtain estimates of effect for different tumor subtypes. Case-only analyses with tumor type as an outcome were used to test for differences in effect size across subtypes. Models for tumor grade constrained the effect size to increase linearly across levels. Genotype-smoking interactions were assessed using logistic regression for grouped data adjusted by study and including interaction terms. Forest plots by smoking, including summary estimates from fixed effects meta-analyses, are also shown for rs1495741.
Data analysis and management was performed with GLU (Genotyping Library and Utilities version 1.0), a suite of tools available as an open-source application for management, storage and analysis of GWAS data, and STATA.
SequenceLDhot51 that uses an approximate marginal likelihood method52 was used to compute likelihood ratio (LR) statistics for a set of putative hotspots across the region of interest. We sequentially analyzed subsets of 100 controls of European descent (by pooling 5 controls from each study). We used Phasev2.1 to infer the haplotypes as well as background recombination rates. The analysis was repeated with five non-overlapping sets of 100 pooled controls.
The CGEMS data portal provides access to individual level data for investigators from certified scientific institutions after approval of their submitted Data Access Request.
CGEMS portal: http://cgems.cancer.gov/
On behalf of all the authors, MG-C declares no competing financial interests.
Please see Supplementary Note for information on support for individual studies that participated in the effort.