|Home | About | Journals | Submit | Contact Us | Français|
A recent genome-wide association study of bladder cancer identified the UGT1A gene cluster on chromosome 2q37.1 as a novel susceptibility locus. The UGT1A cluster encodes a family of UDP-glucuronosyltransferases (UGTs), which facilitate cellular detoxification and removal of aromatic amines. Bioactivated forms of aromatic amines found in tobacco smoke and industrial chemicals are the main risk factors for bladder cancer. The association within the UGT1A locus was detected by a single nucleotide polymorphism (SNP) rs11892031. Now, we performed detailed resequencing, imputation and genotyping in this region. We clarified the original genetic association detected by rs11892031 and identified an uncommon SNP rs17863783 that explained and strengthened the association in this region (allele frequency 0.014 in 4035 cases and 0.025 in 5284 controls, OR = 0.55, 95%CI = 0.44–0.69, P = 3.3 × 10−7). Rs17863783 is a synonymous coding variant Val209Val within the functional UGT1A6.1 splicing form, strongly expressed in the liver, kidney and bladder. We found the protective T allele of rs17863783 to be associated with increased mRNA expression of UGT1A6.1 in in-vitro exontrap assays and in human liver tissue samples. We suggest that rs17863783 may protect from bladder cancer by increasing the removal of carcinogens from bladder epithelium by the UGT1A6.1 protein. Our study shows an example of genetic and functional role of an uncommon protective genetic variant in a complex human disease, such as bladder cancer.
With 70 530 new cases and 14 680 deaths in 2010, bladder cancer (MIM 109800) is the fifth most common cancer in the USA (1). The disease is well treatable if detected early, but the high recurrence rates, life-long surveillance and treatment add up to a cost of 4 billion dollars a year, which is estimated to be higher than for other cancers in the USA (2,3).
The involvement of environmental risk factors in bladder cancer etiology was first suggested in 1895 by a German surgeon Ludwig Rehn who reported a high occurrence of bladder cancer among dye industry workers (4). This risk was later attributed to exposures to aromatic amines, such as 2-naphthylamine, 4-aminobiphenyl, 4-nitrobiphenyl, 4,4-diaminobiphenyl and benzidine, found in industrial chemicals (5). The same chemicals are found in tobacco smoke, which is now considered the main risk factor for bladder cancer (6,7). Aromatic amines are converted into biologically active carcinogens during a two-stage cellular detoxification/bioactivation process. The first stage is a hepatic N-hydroxylation of aromatic amines by the CYP1A2 enzyme, which belongs to the cytochrome P450 phase I detoxification system (8). The second stage is an enzymatic conjugation of the N-hydroxylated aromatic amines by phase II detoxification enzymes, such as N-acetyltransferases (NATs), glutathione transferases (GSTs) and UDP-glucuronosyltransferases (UGTs). The conjugation facilitates the excretion of the N-hydroxylated intermediates via stool and urine (9). However, direct exposure to the urine enriched by these highly unstable conjugates can initiate oncogenic transformation of bladder epithelium, and lead to cancer (6,7).
Familial aggregation and twin studies of bladder cancer suggest that genetic factors play a role in its etiology (10,11). Specifically, alterations within the cellular detoxification system can determine individual response to environmental exposures. Genetic variants within the phase II detoxification genes NAT2 and GSTM1 have already been identified as risk factors for bladder cancer (12–16). It is not surprising that the UGT1A gene cluster on chromosome 2q37.1 has now been linked with bladder cancer susceptibility (17). These findings suggest that cellular detoxification in humans is mediated by several distinct pathways, and alterations within these pathways could affect bladder cancer risk.
In this study, we identified a single nucleotide polymorphism (SNP), rs17863783, which explained and strengthened the genetic association of the UGT1A region with the risk for bladder cancer. The associated T allele of rs17863783 is a coding synonymous variant (Val209Val) that affects mRNA expression of a functional splicing form, UGT1A6.1. We suggest that the molecular phenotype of this genetic association is related to increased clearance of carcinogens from bladder epithelium by the UGT1A6.1 protein. Our study exemplifies a genetic and functional contribution of an uncommon protective genetic variant to bladder cancer.
The genetic association with bladder cancer within the UGT1A gene cluster was detected for a SNP rs11892031 (17). Since multiple coding variants within the UGT1A genes have been previously linked with enzymatic activity for different pharmacological and environmental substrates (18), we hypothesized that rs11892031 might be in linkage disequilibrium (LD) with one or more of these functional variants. Thus, we conducted a fine-mapping study to comprehensively catalog genetic variants within the UGT1A locus, refine the bladder cancer genetic association and search for a functional link between this genetic association and bladder cancer risk.
The UGT1A region includes nine highly similar protein-coding and four non-coding genes, each with a unique alternative first exon followed by a set of common exons 2–5 (19) (Fig. 1A). Rs11892031 localizes to the first intron of both the UGT1A8 and UGT1A10 genes and upstream of UGT1A9. The activity and specificity of UGT1A proteins are greatly determined by their substrate-binding domains, which are entirely encoded by the nine alternative first exons of the corresponding UGT1A genes. Because of the high parology within the UGT1A family of genes, some of the 134 non-synonymous and 71 synonymous coding SNPs across these exons included in the current build 132 of the dbSNP database (http://www.ncbi.nlm.nih.gov/projects/SNP/) might represent misalignment of highly similar genomic sequences, rather than true genetic polymorphisms. To comprehensively catalog and verify coding variants in this region, we generated highly specific long-range amplicons and sequenced alternative first exons of each of the UGT1A genes in 44 bladder cancer cases and 30 trios from the HapMap European (CEU) set (www.hapmap.org). From the 156 kb UGT1A cluster (chr2:234,191,000–234,347,000, hg18), we sequenced 10 exons that covered 14 358 bp (9.2%) of this region. We reasoned that non-exonic variants located within unique sequences will be well-imputed based on the current reference sets (HapMap 3 and 1000 Genomes (20,21)), while variants from the highly similar exonic sequences should be refined and cataloged first. We detected 46 known exonic SNPs (27 non-synonymous and 19 synonymous, Supplementary Material, Table S1), but did not identify novel variants found more than in one sample. Based on the LD pattern, we selected 18 SNPs to represent all 46 exonic variants in the UGT1A region. These 18 SNPs were genotyped in 1055 cases and 962 controls from the Spanish Bladder Cancer Study (SBCS) used in stage 1 genome-wide association study (GWAS) (17). Genotyping in this large set of samples was mostly done by Sanger sequencing of long-range polymerase chain reaction (PCR) fragments because several variants could be scored from the same amplicons, and/or genotyping by other methods was difficult due to high sequence similarity between alternative first exons of UGT1A genes. Our sequencing of 2017 samples did not reveal additional genetic variants. We ignored several genetic variants observed just once and by this we might have missed some very rare variants. For exonic variants with minor allele frequency (MAF)>0.01, we detected 46 SNPs, which is similar to 40 variants in the 1000 Genomes project, and 42 variants in the Exome Variant Server (http://snp.gs.washington.edu/EVS/). Based on the SBCS data enriched for coding variants across the locus (Supplementary Material, Table S2), we imputed these variants in the remaining samples in stage 1 GWAS (2477 cases/4158 controls). Using the combined HapMap 3 CEU and 1000 Genomes reference panels, we also imputed all remaining variants within 356 kb (156 kb of the UGT1A cluster ± 100 kb, chr2:234,091,000–234,447,000, hg18) in the entire set of stage 1 samples in the bladder cancer GWAS (3532 cases/5120 controls).
The initial GWAS included 166 SNPs in the UGT1A region; using imputation, we extended this panel to 1170 SNPs presented on LD plot (Supplementary Material, Fig. S1) and then performed association analysis (Supplementary Material, Table S3). In the combined set of 4035 cases and 5284 controls, the strongest association was observed for a set of 28 uncommon SNPs in high LD with each other (r2 > 0.9) but in moderate LD with rs118920231 (0.14 < r2 ≤ 0.29) (Fig. 1B, Supplementary Material, Table S4). Of these 28 markers, only rs17863783 is a coding SNP while no functional significance could be predicted for the remaining 27 variants (Supplementary Material, Table S4). Rs17863783, with MAF of 2.5%, was genotyped in the original GWAS but was excluded from the analysis because of apparent incomplete genotyping and a standard exclusion threshold of MAF < 5% (17). Here, we fully genotyped this marker in all of our samples. To ensure correct genotyping of this uncommon variant, we cross-validated genotypes in a subset of samples by three methods, Illumina chip, Sanger sequencing and TaqMan genotyping (Supplementary Material, Fig. S2 and Table S5). Association for rs17863783 (P = 3.3 × 10−7; OR = 0.55, 95%CI = 0.44–0.69) was stronger than for the original GWAS marker, rs11892031 (P = 7.7 × 10−5; OR = 0.79, 95%CI = 0.70–0.89) (Table 1 and Supplementary Material, Table S6). Both these SNPs are uncommon variants with frequencies of minor protective alleles in controls of 8.5 and 2.5% for rs11892031 and rs17863783, respectively. There is only moderate LD between these SNPs, D′ = 0.961 and r2= 0.228 in the combined GWAS set. To further evaluate whether these SNPs represent the same association signal, we performed a conditional analysis adjusting for the effect of the other variant. Adjustment for rs11892031 attenuated the signal for rs17863783 (P = 1.52 × 10−4; OR = 0.61, 95%CI = 0.47–0.79 after adjustment, Table 1, Fig. 1C), while the loss of signal for rs118920231 after adjustment for rs17863783 (P = 8.32 × 10−2, OR = 0.89, 95%CI = 0.78–1.02 after adjustment, Table 1, Fig. 1D) suggests that these two variants represent the same association. There was no evidence of additional association signal within the UGT1A region after adjustment for rs17863783 (Fig. 1D). We also analyzed haplotypes constructed with rs11892031 and 18 selected coding SNPs that represent all the 46 coding SNPs in this region. The protective T allele of rs17863783 was found only on a haplotype with the C allele of rs11892031 and only this haplotype showed a significant protective effect. No association was detected for a haplotype with the C allele of rs11892031 but without the T allele of rs17863783, or any other haplotype (Table 2). Our results suggest that rs17863783, or other variants in strong LD with it, could explain the genetic association initially captured by rs11892031. The protective effect of rs17863783 was stronger among smokers (OR = 0.51; 95%CI = 0.40–0.66, P = 3.3 × 10−7) compared with non-smokers (OR = 0.72, 95%CI = 0.43–1.19, P= 0.2), but the interaction between rs17863783 and smoking status was not statistically significant (Table 3). This might be due to low allele frequency of rs17863783, the predominance of smokers among bladder cancer cases, and other causes of bladder cancer in non-smokers. A genetic variant rs1495741 within the NAT2 gene has previously been associated with bladder cancer and slow acetylation of aromatic amines by the NAT2 enzyme (13). In our samples, the association for rs17863783 was similar in individuals with rapid/intermediate and slow acetylation, classified by rs1495741 genotypes of NAT2, and this effect was not modified by smoking status (Supplementary Material, Table S7).
UGT1A6 has two splicing mRNA isoforms, UGT1A6.1 and UGT1A6.2. The bladder cancer-associated rs17863783 is a synonymous variant (Val209Val) located within the long isoform (UGT1A6.1, NM_001072) that encodes a full-length protein of 532 amino acids. The short form (UGT1A6.2, NM_205862) encodes a protein of 265 amino acids, which is missing a substantial portion of the highly conserved substrate-binding domain, fully encoded by the first exon (Supplementary Material, Fig. S3). UGT1A6 protein expression usually refers to UGT1A6.1 in the literature, because UGT1A6.2 lacks most of the exon 1 and is unlikely to be recognized by antibodies. UGT1A6 mRNA expression can refer to both UGT1A6.1 and UGT1A6.2 splicing forms, depending on the specific method of detection.
We considered the exonic rs17863783 to be the strongest functional candidate from the associated block of 28 linked SNPs, and performed functional evaluation of this variant. Even though synonymous amino acid substitutions do not directly cause protein changes, they may influence disease risk by altering exonic splicing enhancers (ESEs) that bind splicing factors, regulate inclusion of exons or modify expression levels of specific transcripts, without affecting splicing sites (22). Using ESE finder 3.0 software (22), we predicted a differential interaction between rs17863783 alleles and splicing factors (Supplementary Material, Fig. S4). To experimentally evaluate the effect of rs17863783 on splicing and expression of UGT1A6 transcripts, we created allelic exontrap splicing minigenes that included 2.3 kb genomic fragments surrounding rs17863783 and both alternative first exons of UGT1A6. After transient transfection into HeLa (cervical cancer), 293T (normal embryonic kidney), J82 (bladder cancer) and HepG2 (liver cancer) human cell lines, the transcripts produced by the minigenes were analyzed for quantitative mRNA expression of both isoforms. In all cell lines tested, the presence of the protective T allele significantly increased the expression of the UGT1A6.1 compared with minigenes with the risk G allele. Expression of the UGT1A6.2 was not affected by rs17863783 alleles (Fig. 2A and B). These minigenes did not include any of other 27 variants in high LD with rs17863783, indicating that the functional effect could be attributed to rs17863783 alone. While this does not exclude the possibility of some other functional variants in this region, our results showed that rs17863783 has critical impact on the function of UGT1A6.1, mechanisms of cellular detoxification and susceptibility to bladder cancer. The UGT1A6.1 protein is primarily expressed in the liver, kidney and bladder tissue (Fig. 3A), in agreement with mRNA expression we detected in a panel of human tissues and cell lines (Fig. 3B, Supplementary Material, Table S8). Expression of both splicing forms, UGT1A6.1 and UGT1A6.2, was similar between normal and tumor bladder samples, suggesting that the functional effect of this gene is not disease specific (Supplementary Material, Fig. S4). In normal human liver samples, UGT1A6.1 expression was increased 4-fold in carriers of the uncommon protective T allele of rs17863783 (P = 0.0136, n = 88, Fig. 3C), while no carriers of the uncommon T allele of rs17863783 were found among 44 normal bladder tissue samples available for expression analysis.
The UGT1A locus is well known for its genetic association with severe toxicity to an anti-cancer drug irinotecan (23,24). Genotyping of the marker UGT1A1*28 (rs8175347), a (TA)5–7 repeat within the UGT1A1 promoter region, is now required by the US Food and Drug Administration (FDA) for adjustment of drug dosage and prevention of irinotecan toxicity in susceptible individuals (25). It is reasonable to hypothesize that genetic variants associated with detoxification of irinotecan may be associated with detoxification of environmental carcinogens, and susceptibility to bladder cancer. There were multiple attempts to identify other markers in this region that could provide similar genetic information and would be easier to genotype than UGT1A1*28 (26–28). Therefore, we used our unique set of 2017 individuals of European descent with complete information for 1170 genetic markers in this region to search for markers in high LD with UGT1A1*28. Four intronic/promoter markers were in a similarly high LD with UGT1A1*28 (r2= 0.875). Of these markers, rs6742078 and rs887829 have been reported to be strongly associated with blood bilirubin levels (P < 10−324 and P < 10−69) (29,30), but we observed no association for these markers and UGT1A1*28 with bladder cancer in our samples (Supplementary Material, Table S9). Interestingly, of 46 coding variants we identified in this region, only 3 variants were in a relatively high LD with UGT1A1*28 (0.63 < r2 < 0.67). All three variants were from the UGT1A6 gene (rs1105880, Leu105Leu; rs2070959, Thr181Ala; rs11058879, Arg184Ser) and located in the vicinity of our bladder cancer-associated SNP rs17863783 (Ala209Ala), suggesting the functional relevance of UGT1A6.1 for different phenotypes. In fact, according to the pharmacogenomics knowledge database (http://www.pharmgkb.org), UGT1A6.1 metabolizes multiple drugs, including irinotecan, analgetics paracetomol (tylenol), aspirin and naproxen and an anti-convulsant drug phenytoin.
In the present study, we report the identification of SNP rs17863783 within a cellular detoxification gene, UGT1A6, as a protective factor from bladder cancer. Exposure to aromatic amines found in industrial chemicals and tobacco smoke is strongly associated with increased risk of bladder cancer (7). UGTs conjugate UDP-glucuronic acid with N-hydroxylated products of diverse substrates, including aromatic amines (31). The conjugated water-soluble glucuronides can then be excreted via stool and urine (9). Until excretion, the urine is stored in the bladder where it comes in direct contact with bladder epithelium. Urine acidity, which depends on diet, body composition and medications (32–34), is a critical factor that determines the stability of glucuronides. At a low urine pH (< 6.0), glucuronides become unstable and quickly dissociate to release N-hydroxylated oncogenic forms of aromatic amines (35), form DNA adducts and initiate carcinogenesis within the bladder epithelium (36). However, the UGT proteins endogenously expressed in bladder epithelium have the ability to conjugate different substrates (37). Our genetic study suggested that of all UGT genes, only UGT1A6.1 showed genetic association with protection from bladder cancer. Furthermore, the UGT1A6.1 functional protein isoform is strongly expressed in human bladder epithelium (38,39) (Fig. 3A and B, Supplementary Material, Table S8), and conjugates chemicals known to be of risk for bladder cancer (31) (Supplementary Material, Table S10). This suggests that even when the bladder epithelium is exposed to the reactive N-hydroxylated products of aromatic amines generated by dissociation of urine glucuronides, endogenously expressed UGT1A6.1 can reconjugate and remove these intermediates from bladder epithelium, thereby preventing carcinogenesis (Supplementary Material, Fig. S6). By increasing UGT1A6.1 mRNA expression, the T allele of rs17863783 may help remove carcinogens from bladder epithelium and therefore protect from bladder cancer. Based on the functional role, this variant might be protective only in individuals exposed to particular environmental factors, such as tobacco smoke or chemicals, while remaining neutral in all other situations.
By design, GWAS have been conducted to discover common variants, with MAF > 10%, associated with complex diseases (40), and indeed, most signals detected by cancer GWAS, are loci with SNP markers with MAF > 20% (41). This design strategy is predicated on the ‘common disease-common variant’ theory postulating that complex traits are caused by combinations of many common alleles with small individual effects (42–44). Compared with common variants, uncommon/rare variants are technically more difficult to genotype with the same level of confidence and completion, partly due to technical issues related to confidence of detection of rare alleles and the necessity of extensive validation studies. Statistical analysis of uncommon/rare variants is also more challenging due to lower power and possible effects of random confounding factors (40,45). As a result, commercial genotyping arrays used in GWAS studies are biased towards variants with MAF > 10% and have a poor representation of variants with MAF < 5% (46), or these latter variants are excluded from the analysis. Among 366 GWAS that reported significant association signals (P < 10−7), 275 studies reported association for variants with MAF > 5%, and only 28 GWAS reported 40 SNPs with MAF < 5% (47). The proportion of genetic variation explained in common diseases still appears to be relatively modest (48), in spite of thousands of common variants identified by GWAS (49). Different disease hypotheses have been discussed, and it is now suggested that both common and uncommon/rare variants significantly contribute to genetic susceptibility of common diseases (50–56).
In the original bladder cancer GWAS that analyzed SNPs with MAF > 5%, a common variant at 2q37.1 was reported (17), but due to the standard quality control metrics, the study did not evaluate the uncommon rs17863783 (MAF = 2.5%), which we now identified to be responsible for the association originally detected by a more common SNP rs118920231 (MAF = 8.5%). This might be considered ‘synthetic’ association (53,57,58), because a more common variant rs11892031 captures the signal of an uncommon linked SNP rs17863783 (D′ = 0.96). However, the less common rs1786383 falls on the backbone that contains the rs11892031 alleles (r2= 0.228), resulting in the detection of the association signal. It is postulated that in the case of a ‘synthetic’ association, the association signal should become stronger when the right variant is interrogated (53). In fact, we detected stronger association for the less common variant rs17863783, and it could explain the original association for rs11892031, but not vice versa (Table 1). Our unbiased search through all variants in this region, not limited by variants in high LD (r2> 0.8) with rs118920231, has been instrumental in identification of a probable causal variant, rs17863783. Our GWAS identified the UGT1A region for bladder cancer susceptibility, but the fine-mapping has identified a variant that explained and strengthened the original genetic association and provided a plausible functional mechanism for its effect. The risk G allele of rs17863783 is conserved in 33 of 41 species (Supplementary Material, Fig. S7), while the protective T allele is a derived allele found only in a small percentage of humans, 4.9% of controls and 2.8% of bladder cancer cases. The protective T allele is clearly functional, as it is associated with increased mRNA expression of UGT1A6.1. A recent study concluded that rarer derived variants, with MAF < 8–10%, are more likely to be functional than the more common variants (59). This can be explained by the likely deleterious selective pressure on the derived risk alleles that keep them at low allele frequencies. Here, the functional derived T allele of rs17863783 is a protective allele. It is possible that the newly derived protective variants in detoxification genes, such as UGTs, may be favored by positive selection in modern environment, substantially altered by humans. Low frequencies of these alleles may be a reflection of the short evolution period after introduction of tobacco smoking and industrial chemicals into human environment. This can also indicate that the human-specific environmental factors, such as chemicals, drugs and dietary components, might have weak deleterious effects that result in minor positive selection pressure on genetic variants that regulate metabolism of these substrates. By expanding our analysis to the broader UGT1A region, we tested and excluded the possibility that the same genetic variants underlie mechanisms responsible for bladder cancer susceptibility and detoxification of anti-cancer drug irinotecan.
In conclusion, we performed a detailed fine-mapping analysis of the UGT1A locus reported in our recent bladder cancer GWAS, identified an uncommon protective functional genetic variant, rs17863783, that greatly accounted for the initial GWAS signal, and provided the first link to the underlying molecular phenotype of this association. Although we provide compelling genetic and functional evidence for rs17863783, this does not exclude the possibility of existence of other functionally important variants in this region. The combination of common, uncommon and rare variants will eventually extend our understanding of human disease and begin to map the genomic architecture of a complex disease, such as bladder cancer. Furthermore, understanding the impact of environmental exposures should be instrumental in the functional interpretation of genetic associations identified by GWAS.
Stage 1 GWAS bladder cancer cases and controls of European descent were drawn from five studies in the USA and Europe, as previously described (17): SBCS (1106 cases/1050 controls), Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO, 708 cases/1874 controls), The American Cancer Society Cancer Prevention Study II Nutrition Cohort (CPS-II, 687 cases/730 controls), New England Bladder Cancer Study (NEBCS-ME,VT, 630 cases/759 controls) and Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study (ATBC, 401 cases/707 controls). Additional GWAS follow-up samples were drawn from: Health Professionals Follow-up Study (HPFS, 113 cases/115 controls), New England Bladder Cancer Study (NEBCS-NH, 355 cases/374 controls) and Nurse's Health Study (NHS, 63 cases/57 controls). HapMap DNA samples from 30 European trios (CEU) used for sequencing and genotyping were purchased from the Corriell Institute for Medical Research (Camden, NJ, USA). As previously described (17), each participating study obtained informed consent from study participants and approval from its respective Institutional Review Board for this study. For stage 1 only, participating studies obtained institutional certification permitting data sharing in accordance with the NIH Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies (GWAS).
Paired (normal/tumor) bladder tissue samples from 44 anonymous bladder cancer patients were purchased from Asterand (Detroit, MI, USA) under exemption #4715 by the NIH Office of Human Subject Research. Previously described liver samples (60) were provided by the University of Minnesota. DNA from normal tissue samples was prepared with Gentra kit (Qiagen) and used for sequencing and genotyping. Samples of total RNA from 17 non-cancerous human tissues (skeletal muscle, spleen, adrenal gland, kidney, brain, pancreas, heart, small intestine, stomach, bladder, colon, prostate, liver, lung and breast) were purchased from Clontech (Mountain View, CA, USA) or BioChain (Hayward, CA, USA). Samples of total RNA from the NCI-60 set of cell lines (61) were provided by the Molecular Targets Team, Developmental Therapeutics Program, Division of Cancer Treatment and Diagnosis (DCTD/NCI/NIH). All other cell lines were purchased from the American Type Culture Collection (ATCC) and were maintained according to the recommended conditions. For each sample, 1–2 μg of DNAase-treated total RNA was converted into cDNA with random hexamers and SuperScript III reverse transcriptase (Invitrogen). cDNA samples were diluted with nuclease-free water and 5 ng of total RNA was used for each quantitative reverse transcriptase PCR (qRT–PCR).
Long-range amplicons of ~1.3 kb covering each of the UGT1A exons and flanking intronic sequences were generated with specific primers and conditions (Supplementary Material, Table S11). PCR fragments were confirmed by agarose gel, and sequenced with 3730xl DNA Analyzer (Applied Biosystems). Sequence analysis was performed with Sequencher 4.2 software (Gene Code, MI, USA) and all genetic variants were scored manually by two people, independently. The DNA samples from cases and controls were mixed on genotyping plates, and the sample status was blinded to the laboratory investigators. Although rs17863783 was present on the Illumina chip, the genotyping was incomplete (~75%). For this study, we genotyped the marker in all samples in stage 1 GWAS plus 1077 additional samples from three of the follow-up sets (HPFS, NEBCS-NH and NHS) (17). The default genotyping method for this marker was by a TaqMan allelic discrimination assay, in 384-well format. For 5 μl reactions we used 5 ng DNA, 2× genotyping buffer and a genotyping assay C__25972736_20 (all from Applied Biosystems), according to the instructions. To ensure correct genotype clustering and scoring for rs17863783, each genotyping plate contained control samples with known genotypes, NA19194 (T/T) and NA19116 (T/T) from the HapMap YRI panel. The TaqMan genotyping results were validated by two other platforms (Illumina chip and Sanger sequencing). A concordance rate of 99.2–100% confirmed the high quality of genotyping by the three methods (Supplementary Material, Fig. S2 and Table S5). Four additional SNPs were genotyped by Illumina chip and confirmed by sequencing of ~2000 samples and used as additional controls for genotyping concordance (Supplementary Material, Fig. S2 and Table S6).
We used IMPUTE2 software (62) to estimate genotypes of SNPs not directly genotyped in the UGT1A region. Genotypes of 166 SNPs from this region (chr2:243,091,000–234,447,000) have been generated by the stage 1 bladder cancer GWAS in 3461 cases and 4694 controls (17). We imputed 1004 additional SNPs in this region for the entire stage 1 GWAS samples using a combined set of reference panels: 1000 Genomes Project [June 2010 release (21)], HapMap Phase 3 CEU [second February 2009 release (20)] and a subset of the stage 1 GWAS samples (SBCS, n = 2,017) in which 18 exonic SNPs were completely genotyped by sequencing. We evaluated the imputation performance using the average posterior probability for the best-guessed genotypes, and the IMPUTE2-info score, which is associated with the imputed allele frequency estimate ranging from 1 to 0 (high to low confidence). Markers with posterior probability <0.9 or IMPUTE-info score <0.9 were excluded from the association analysis.
Fisher's exact tests of the Hardy–Weinberg equilibrium (HWE) for controls and for the entire set were conducted for all markers. There was only one marker showed significant deviation from HWE (P < 0.001), and it was flagged but retained in the analysis. LD measures (D′ and r2) were estimated using Haploview (63). GTOOL (http://www.well.ox.ac.uk/~cfreeman/software/gwas/gtool.html) was used to combine all the imputed variants (with >90% imputation certainty) and actual genotyping data. Association with bladder cancer risk was tested under a dominant protective model (one risk allele is sufficient for protective effect) using PLINK (64) and SAS/STAT system version 9.2 (SAS Institute Inc., Cary, NC, USA), with the adjustment for age (in 5-year categories), gender, study sites and smoking habit (current, former or never). In the original bladder cancer GWAS (17), it was found that study sites best approximate eigenvalue of principle component analysis to control for population stratification. Thus, we used study sites for similar adjustment in our analyses. To test for the presence of independent association signals for bladder cancer risk in the 2q37.1 region, we conditioned on the original GWAS signal (rs11892031) in a logistic regression model for the additive effect, with adjustment for the same covariates. Genotype–smoking interactions were assessed by stratifying individuals as current, former, ever or never smokers for association testing, as well as adjusted for the same covariates in the logistic regression models, including other interaction terms. Genotyping data of SNP rs1495741 in the NAT2 gene were retrieved from the original GWAS (17) to stratify individuals as rapid/intermediate (rs1495741 AG/GG) and slow (rs1495741 AA) acetylators. NAT2–UGT1A interactions were tested in a logistic regression model with the adjustment for the same covariates along with interaction terms. Haplotype-specific odds ratios and P-values were estimated using PLINK (64) for each haplotype (>1%) versus all other haplotypes together, as well as a single omnibus test jointly estimating overall haplotype effects.
Expression of UGT1A6 mRNA in human tissues and cell lines was measured with TaqMan expression assays Hs01592477_m1 for UGT1A6.1 (NM_001072.3) and Hs01651483_m1 for UGT1A6.2 (NM_205862. 1). Endogenous controls Beta-2-microglobulin (B2M, assay Hs00187842_m1) and Cyclophilin (PPIA, assay 4326316E) were used for normalization of expression. For all assays, reactions with water and 10 ng of genomic DNA from pooled HapMap samples were used as negative controls. The expression detection was performed on the ABI PRISM 7900HT SDS (Applied Biosystems) with cDNA prepared from 5 ng of total RNA, 0.25 µl of 20× TaqMan gene expression assays or 2.5 µl of 2× Gene Expression Master Mix in 5 µl reaction volume. The expression was measured in four technical replicates and average values were used for the analysis.
Screening for ESEs (http://rulai.cshl.edu/cgibin/tools/ESE3) was performed with a web-based bioinformatic tool using a 50 bp DNA sequence with alleles T and G of rs17863783.
A 2.3 kb genomic DNA fragment surrounding rs17863783 and containing alternative first exons of UGT1A6.1 and UGT1A6.2 was generated with specific primers (Supplementary Material, Table S11) in 60 HapMap individuals from a European population (CEU). Sequencing of these fragments detected four exonic SNPs in three haplotypes. The PCR products representing the haplotypes were cloned into an Exontrap vector (MoBiTec, Gottingen, Germany), using XhoI and BamHI restriction sites. After validation by sequencing, the constructs were transfected into 293T, HeLa, J82 and HepG2 cell lines. Transfections were performed with LTX and PLUS transfection reagents (Invitrogen) for HeLa, J82 and HepG2 and Lipofectamine 2000 transfection reagent (Invitrogen) for 293T cell lines, in 12 biological replicates for each of the cell lines and constructs. The cells were seeded in a 96-well plate at a cell density of 1 × 105, transfected next day with 200 ng of constructs and harvested 48h post-transfection. Total RNA was extracted with QIACube with RNAeasy protocol combined with DNAse treatment (Qiagen). For each sample, 0.5–1 μg of total RNA was converted into cDNA with SuperScript III reverse transcriptase (Invitrogen) using a vector-specific primer (Supplementary Material, Table S11). cDNA samples were diluted with nuclease-free water and 10–20 ng of total RNA was used for each quantitative SYBR Green qRT–PCR. Three assays were measured for each of the samples—a common assay and two assays for specific splicing forms (Supplementary Material, Table S11). All expression assays were designed to uniquely quantify transcripts generated in vitro during the Exontrap experiment, but not endogenous UGT1A6 transcripts.
This project has been funded in part with federal funds from the National Cancer, Institute, National Institutes of Health, under Contract No. HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Support for individual studies that participated in the effort is as follows: SBCS (D.T.S.)—Intramural Research Program of the National Institutes of Health, National Cancer Institute, Division of Cancer Epidemiology and Genetics and intramural, contract number NCI N02-CP-11015. FIS/Spain 98/1274, FIS/Spain 00/0745, PI061614 and G03/174, Fundació Marató TV3, Red Temática Investigación Cooperativa en Cáncer (RTICC), Consolíder ONCOBIO, EU-FP7-201663; and RO1-CA089715 and CA34627. NEBCS (D.T.S.)—Intramural research program of the National Institutes of Health, National Cancer Institute, Division of Cancer Epidemiology and Genetics and intramural, contract number NCI N02-CP-01037, PLCO (M.P.P.)—The NIH Genes, Environment and Health Initiative (GEI) partly funded, DNA extraction and statistical analyses (HG-06-033-NCI-01 and RO1HL091172-01), genotyping at the Johns Hopkins University Center for Inherited Disease Research (U01HG004438 and NIH HHSN268200782096C) and study coordination at the GENEVA (N.C.)—The NIH Genes, Environment and Health Initiative [GEI] partly funded DNA extraction and statistical analyses (HG-06-033-NCI-01 and RO1HL091172-01), genotyping at the Johns Hopkins University Center for Inherited Disease Research (U01HG004438 and NIH HHSN268200782096C) and study coordination at the GENEVA Coordination Center (U01 HG004446) for EAGLE and part of PLCO studies. Genotyping for the remaining part of PLCO and all ATBC and CPS-II samples were supported by the Intramural Research Program of the National Institutes of Health, NCI, Division of Cancer Epidemiology and Genetics. The PLCO is supported by the Intramural Research Program of the Division of Cancer Epidemiology and Genetics and supported by contracts from the Division of Cancer Prevention, National Cancer Institute, National Institutes of Health, ATBC (D.A.)—This research was supported in part by the Intramural Research Program of the NIH and the National Cancer Institute. Additionally, this research was supported by US Public Health Service contracts N01-CN-45165, N01-RC-45035 and N01-RC-37004 from the National Cancer Institute, Department of Health and Human Services. NHS & HPFS (I.D.V.)—CA055075 and CA087969.
The NCI bladder cancer GWAS and follow-up studies are supported by the intramural research program of the National Institutes of Health, National Cancer Institute.
Following individuals are acknowledged for their support: Francisco Real (Molecular Pathology Programme, Centro Nacional de Investigaciones Oncológicas, Madrid, Spain). Marie-Joseph Horner (DCEG, NCI/NIH, Rockville, MD, USA). Adam Mumy (DCEG, NCI/NIH, Rockville, MD, USA). Natalia Orduz (DCEG, NCI/NIH, Rockville, MD, USA). Leslie Carroll (Information Management Services, Silver Spring, MD, USA). Gemma Castaño-Vinyals (Institut Municipal d'Investigació Mèdica, Barcelona, Spain). Fernando Fernández (Institut Municipal d'Investigació Mèdica, Barcelona, Spain). Paul Hurwitz (Westat, Inc., Rockville, MD, USA). Charles Lawrence (Westat, Inc., Rockville, MD, USA). Marta Lopez-Brea (Marqués de Valdecilla University Hospital, Santander, Cantabria, Spain). Anna McIntosh (Westat, Inc., Rockville, MD, USA). Angeles Panadero (Hospital Ciudad de Coria, Coria (Cáceres), Spain). Fernando Rivera (Marqués de Valdecilla University Hospital, Santander, Cantabria, Spain). Robert Saal (Westat, Rockville, MD, USA). Maria Sala (Institut Municipal d'Investigació Mèdica, Barcelona, Spain). Kirk Snyder (Information Management Services, Inc., Silver Spring, MD, USA). Anne Taylor (Information Management Services, Inc., Silver Spring, MD, USA). Montserrat Torà (Institut Municipal d'Investigació Mèdica, Barcelona, Spain). Jane Wang (Information Management Services, Silver Spring, MD, USA).
Conflict of Interest statement. The authors have declared that no competing interests exist.