|Home | About | Journals | Submit | Contact Us | Français|
Allelic imbalance (AI) is a powerful tool to identify cis-regulatory variation for gene expression. UGT2B15 is an important enzyme involved in the metabolism of multiple endobiotics and xenobiotics. In this study, we measured the relative expression of two alleles at this gene by using SNP rs1902023:G>T. An excess of the G over the T allele was consistently observed in liver (P<0.001), but not in breast (P=0.06) samples, suggesting that SNPs in strong linkage disequilibrium with G253T regulate UGT2B15 expression in liver. Seven such SNPs were identified by resequencing the promoter and exon 1, which define two distinct haplotypes. Reporter gene assays confirmed that one haplotype displayed ~20% higher promoter activity compared to the other major haplotype in liver HepG2 (P<0.001), but not in breast MCF-7 (P=0.540) cells. Reporter gene assays with additional constructs pointed to rs34010522:G>T and rs35513228:C>T as the cis-regulatory variants; both SNPs were also evaluated in LNCaP and Caco-2 cells. By ChIP, we showed that the transcription factor Nrf2 binds to the region spanning rs34010522:G>T in all four cell lines. Our results provide a good example for how AI can be used to identify cis-regulatory variation and gain insights into the tissue specific regulation of gene expression.
One of the most important goals of genetics is to identify heritable variation in (cis-) regulatory elements for gene expression, especially those that influence common diseases and other clinical phenotypes. To achieve this, numerous association studies between genotype and gene expression levels have been performed in multiple tissues, even on a genome-wide scale [Cheung, et al., 2003; Duan, et al., 2008; Kristensen, et al., 2006; Rockman and Kruglyak, 2006; Schadt, et al., 2008; Stranger, et al., 2007]. If all other effects are uniform across individuals, this approach is expected to be powerful, especially if adequate sample sizes are used. However, due to a variety of factors, including variations in the relevant transcription factors as well as environmental and physiological effects, the power and sensitivity of this approach could be significantly affected. Allelic imbalance (AI, or differential allelic expression, DAE), has been proposed as a powerful and robust complementary approach [Bray and O’Donovan, 2006; Milani, et al., 2007; Pastinen and Hudson, 2004; Stamatoyannopoulos, 2004; Wang and Elbein, 2007; Yan, et al., 2002; Yan and Zhou, 2004]. The rationale behind it is that trans-acting regulatory elements and environmental factors influence gene expression from both chromosomes while cis-regulatory elements affect gene expression in an allele-specific manner [Pastinen and Hudson, 2004; Stamatoyannopoulos, 2004]. Since expression levels from each of the two alleles can be directly compared within the same heterozygous individual and within the same experiment, variation in trans-regulatory elements and in environmental, physiological and experimental effects is not expected to affect the ratio between the two alleles. Therefore, if AI is observed between the two alleles of a coding SNP (and if genomic imprinting can be ruled out [Tycko and Morison, 2002]), cis-regulatory variation co-segregating with the marker (or the marker itself) is likely to exist. This approach was recently extended to the genome-wide scale, leading to the finding that ~20–54% of human genes showed AI [Lo, et al., 2003; Pant, et al., 2006; Serre, et al., 2008].
Multiple approaches have been used to detect AI, including SNaPShot [Bray, et al., 2003a; Bray, et al., 2003b; Fukuda, et al., 2006; He, et al., 2005; Johnson, et al., 2008; Liu, et al., 2005; Valle, et al., 2008; Wilkins, et al., 2007; Zhang, et al., 2005], sequencing [Fukuda, et al., 2006; Tuupanen, et al., 2008], pyro-sequencing [Loeuillet, et al., 2007; Mahr, et al., 2006; Wang and Elbein, 2007], iPLEX [Ding, et al., 2004], RNA difference plot [Murakami, et al., 2004], band intensity in gel [Heighway, et al., 2005; Hirota, et al., 2004] and other technologies [Jordheim, et al., 2008; Ware, et al., 2006]. However, since all these methods are end-point readings of PCR, they may not be accurate in quantifying the ratio between the alleles. Microarrays can provide a parallel reading for multiple genes or even on a genome scale with or without amplification [Cheung, et al., 2008; Lo, et al., 2003; Milani, et al., 2007; Pant, et al., 2006; Serre, et al., 2008]. However, the current array technologies may not be sensitive enough to detect all AI accurately, especially if the differences between the two alleles are small to moderate. Real time PCR using Taqman probes to distinguish the two alleles can provide a higher sensitivity [Chen, et al., 2008; Udler, et al., 2007].
In this study, we use AI to identify cis-regulatory variation for the UGT2B15 gene, which codes for a UDP-glucuronosyltransferase (UGT) enzyme. Glucuronidation is an important clearance pathway for numerous endobiotics and xenobiotics, including steroid hormones, bile acid, carcinogens and clinical drugs [Guillemette, 2003; King, et al., 2000; Mackenzie, et al., 2005; Tukey and Strassburg, 2000]. Among the UGT enzymes, UGT2B15 [Chen, et al., 1993; Turgeon, et al., 2000; MIM# 600069] is of particular significance due to its relatively high expression level [Ohno and Nakajin, 2009] and activity [Turgeon, et al., 2001]. The gene coding for this enzyme is mainly expressed in liver, breast, prostate, and colon [Gardner-Stephen and Mackenzie, 2008; Levesque, et al., 1997; Nakamura, et al., 2008b; Ohno and Nakajin, 2009]. In the first exon of this gene, a SNP rs1902023:G>T (at 253 relative to translation start), which causes an amino acid substitution D85Y [Levesque, et al., 1997], has been identified to be common in Caucasian and Asian populations [Iida, et al., 2002; Lampe, et al., 2000] and used to characterize two major haplotypes, UGT2B15*1 and *2 [Levesque, et al., 1997]. The D allele was shown to have similar substrate specificity, but half the enzyme activity compared to the Y allele [Levesque, et al., 1997], and has been proposed to correlate with oxazepam glucuronidation [Court, et al., 2004], male fat mass [Swanson, et al., 2007], and breast [Sparks, et al., 2004] and prostate cancer risk [Hajdinjak and Zagradisnik, 2004; MacLeod, et al., 2000; Park, et al., 2004], although some of these associations could not be replicated [Gsur, et al., 2002; Wegman, et al., 2007]. However, the allele specific expression levels of this gene have never been examined.
In the present study, we found a relative excess of the G allele at the coding variant rs1902023:G>T in liver but not in breast samples, which suggested the presence of cis-regulatory variation in linkage disequilibrium (LD) with this SNP. Re-sequencing of the UGT2B15 exon 1 and promoter regions identified 7 SNPs in near-perfect LD with rs1902023:G>T. By comparing different haplotypes in reporter gene promoter assays, we identified two SNPs that affect gene expression. One of these SNPs was also found to lie within a nuclear factor erythroid derived 2-like 2 (Nrf2) binding site based on a chromatin immunoprecipitation assay. Our results represent an example of how AI may lead to the discovery of new regulatory elements and provide insights into the tissue specific regulation of UGT2B15 expression.
Thirty-one normal liver (3 European American [CA] and 1 African American [AA], 27 unknown) and 81 normal breast (4 CA and 8 AA, 69 unknown) tissue samples were obtained from the University of Chicago Tissue Core Facility; none of the liver and breast samples were from the same individual. RNA and DNA were isolated by RNeasy Lipid Tissue and QIAamp DNA Mini Kit (Qiagen, USA), respectively. cDNA was synthesized by High Capacity Reverse Transcription Kit (Applied biosystems, USA).
rs1902023:G>T genotype was determined by a Taqman assay C_27028164_10 (Applied Biosystems) according to the manufacturer’s protocol. Among all tissues, 15 liver and 49 breast samples were heterozygous for rs1902023:G>T and were included in the AI analysis. Among these samples, 7 and 2 were from male in liver and breast, respectively.
Genotype of UGT2B17 copy number variation (CNV) was determined by the ratio between UGT2B17 and peripheral myelin protein 22 (PMP22), a gene that encodes a membrane protein in nervous system and does not show CNV. In detail, UGT2B17 was quantified by real time PCR with Power SYBR Green (Applied Biosystems) and primer pair 5′-AAAACAGGAAAGAAGAAGAAAAGGG-3′ and 5′-AAAGGAGGAGTCCCATCTTTTG-3′. The primer pair for PMP22 was 5′-CCCTTCTCAGCGGTGTCATC-3′ and 5′-ACAGACCGTCTGGGCGC-3′. For each sample, around 10 ng genomic DNA (gDNA) was used per reaction. For normalization, one DNA sample without UGT2B17 deletion was serially diluted of 60, 30, 15, 7.5, 3.75, 1.88, 0.94 ng per reaction and used in both genes. Three HapMap samples representing three different genotypes [McCarroll, et al., 2006] were included as positive controls. All real time PCR and Taqman genotype reading in this study was performed on a StepOne Plus Realtime PCR System (Applied Biosystems)
Allele specific expression was performed in cDNA from heterozygous individuals using the above Taqman Assay. Each Taqman probe, which is specific for a different allele, was labeled by different dye and fluorescence was detected by real time PCR. To investigate whether this assay could be utilized to detect the allele difference accurately, gDNA from two homozygous individuals were mixed as the following ratio: 4:1, 2:1, 1:1, 1:2, 1:4 and the two alleles in the mixes were quantified by the same method. The gDNA contamination in cDNA sample was evaluated by SYBR Green-based real time PCR with an intron assay (primer pair 5′-AGGTCTGCAACACAGCACAT-3′ and 5′-CTGGGAGCTATGCCTGGTC-3′). For normalization, a heterozygous genomic DNA sample was serially diluted of 100, 20, 4, 0.8, 0.16 and 0.03 ng per reaction as standard for both assays. The real time PCR was performed in triplicate for each sample and the AI ratio was expressed as Tquantity/Gquantity.
Fifty-six unrelated Hapmap samples (24 YRI, 22 CEU and 10 ASN) were chosen for resequencing. Amplification of UGT2B15 exon 1 and promoter was performed by using the primers in Supp. Table S1. After exonuclease I and Shrimp Alkaline Phosphatase (United States Biochemicals, USA) treatment, sequencing was performed by using internal primers in Supp. Table S1 and BigDye Terminator v3.1 (Applied Biosystems). In total, 3.9 kb were amplified and resequenced. Polymorphisms were scored by PolyPhred [Stephens, et al., 2006] and confirmed visually. Visual genotype and LD plot were drawn by using the Genome Variation Server (http://gvs.gs.washington.edu/GVS/) and haplotype was inferred using the program PHASE [Stephens and Donnelly, 2003].
A 1.3 kb fragment (from positions −1322 to −12 relative to translation start site) was amplified by nested PCR from individuals with specific haplotypes. The first round of PCR was performed by using PCR primers in Supp. Table S1 and the second round by using primers 5′-CAGTC-GGTACC-TACCTGGATGGCCTATTTCT-3′ and 5′-CAGTC-GCTAGC-GCAATGCTTCTTTTCCAGTT-3′, which introduced restriction sites for KpnI and NheI (New England Biolabs, USA), respectively. PCR was performed by iProof High-Fidelity DNA Polymerase (Bio-Rad, USA) to avoid artificial mutations. After digestion, the segment was cloned into the compatible sites of the pGL3-basic vector (Promega, USA). Mutagenesis at positions −818 and −1139 was performed by QuikChange Site-Directed Mutagenesis Kit (Stratagene, USA) with primer 5′-CTGCAGGAGCAGTACTCTTCCTGCAGAGGG-3′ and 5′-CCCTCTGCAGGAAGAGTACTGCTCCTGCAG-3′, 5′-CAAGCCTTCAGGTCCTGAGGAGAATCTTTGAACCC-3′ and 5′-GGGTTCAAAGATTCTCCTCAGGACCTGAAGGCTTG-3′ (target in bold), respectively, according to the manufacturer’s recommendations. Plasmid DNA was sequenced to exclude any PCR errors and to check the orientation of the haplotypes prior to transfection.
Human hepatocellular carcinoma cell line HepG2 was obtained from American Type Culture Collection (ATCC) and cultured in minimum essential medium (MEM, ATCC) supplemented with 10% fetal bovine serum (FBS, Invitrogen, USA). Human breast cancer cell line MCF-7 was cultured in Dulbecco’s Modified Eagle’s Medium (DMEM; Invitrogen) with 10% FBS and 0.1% insulin (Sigma, Germany). Human prostate adenocarcinoma cell line LNCaP and colon epithelial cell line Caco-2 were maintained in DMEM with 10% FBS. Cells (105) were seeded into 24-well plate 24 hours before transfection. Plasmid constructs (1.9 μg DNA) were transfected by using FuGene HD (Roche, USA) according to the manufacturer’s recommendations. Thirty-six hours after transfection, cells were harvested and luciferase activity was read by Dual-Luciferase Reporter Assay System (Promega) according to the manufacturer’s protocol. Plasmid pRL-TK (0.1 μg DNA; Promega) was co-transfected as an internal control and the promoter activity was expressed as the ratio between Firefly and Renilla luciferase. Six replicates were performed for each experiment. Independent t-tests were performed in SPSS 15.0 (SPSS Inc, USA) and the null hypothesis was rejected when P< 0.05.
Chromatin immunoprecipitation (ChIP) was carried out by ChIP Assay Kit (Upstate, USA) according to the manufacturer’s protocol. Briefly, 107 cells were cross-linked for 20 minutes with formaldehyde (1% final concentration) followed by addition of glycine for 5 minutes to end the cross-linking. After washing twice with ice-cold phosphate buffered saline (Invitrogen) containing protease inhibitor cocktail (Sigma) and phenylmethylsulfonyl fluoride (PMSF, Fisher, USA), cells were scraped, lysed and sonicated to obtain 200–800bp fragments in Sonicator 4000 (MISONIX, USA). The chromatin solution was diluted 10-fold with dilution buffer, and precleared with protein A beads. After centrifuging and transferring the supernatant, 1% sample was stored as input and the remaining protein/chromatin complex was captured by rabbit polyclonal anti-Nrf2 antibody (H300, Santa Cruz Biotechnology, USA) or normal rabbit IgG (Santa Cruz) and precipitated by protein A beads. After washing with low salt, high salt, LiCl and TE buffer (twice), the immunoprecipitated protein/chromatin complex was resuspended in elution buffer and cross-link were reversed. Protein was digested by proteinase K (Qiagen) and DNA was recovered by QIAquick PCR purification kit (Qiagen). The obtained DNA was quantified by real time PCR to assess the enrichment by iQ SYBR green (Bio-Rad) with primers 5′-CTGCAAGAACAGACCAGCAA-3′ and 5′-CTTCTCCAGTGGGGATGTGT-3′.
To make sure our AI assay was accurate, we first quantified the two alleles at SNP rs1902023:G>T in gDNA mixes with known ratios. As shown in Supp. Fig. S1, the observed ratio presented a highly linear relation with the expected one (r2=0.994, P<10−20), thus confirming that our assay could be used to differentiate two alleles, at least in the 0.25–4 ratio interval.
Because several paralogous genes exist, we could not place the PCR primers in different exons. Therefore, we cannot rule out some gDNA contamination in the PCR product. To evaluate the influence of gDNA on the AI assay, we also quantified gDNA contamination by real time PCR in all tissue samples. In most (>80%) liver and breast samples, gDNA only accounted for <5% and <20% of total UGT2B15 exon 1 copies, respectively, consistent with the fact that UGT2B15 is expressed at relatively high levels in liver and breast [Ohno and Nakajin, 2009]. We also calculated the AI by the formula (Tquantity-gDNA)/(Gquantity-gDNA), which was highly correlated with the result by the formula Tquantity/Gquantity in both liver (r2=0.996, P<10−15) and breast (r2=0.814, P<10−18, data not shown). These results suggest that the gDNA contamination is low and, more importantly, that such a contamination is conservative in a test of AI. Considering this and the fact that including more parameters might introduce additional noise, we decided to use Tquantity/Gquantity to report our AI results.
As shown in Fig. 1A, all 15 liver samples, which were chosen to be heterozygous for rs1902023:G>T variant, showed an excess of the G over the T allele, with a significant deviation from a 1:1 ratio (Mean±SD, 0.81±0.15; 95% confidence interval 0.73–0.90; P<0.001). On average, the proportion of G allele is 11–36% greater than that for the T allele. This suggests that the G allele of rs1902023 or one in near-perfect LD with it results in higher UGT2B15 expression levels in the liver. In contrast, in the 49 breast samples heterozygous at this SNP (Fig. 1B), the T/G value varies from 0.63 to 1.19. Fifteen of them showed a T/G value greater than 1 while the rest did not. As a consequence, the average T/G value is only slightly lower than 1 (Mean±SD, 0.95±0.16; 95% confidence interval 0.91–1.01; P=0.06). Moreover, when compared to the liver samples, the distribution of the T/G value is quite different (P=0.005), suggesting a distinct mechanism of gene regulation between these two tissues.
Recent studies have suggested that UGT2B17 CNV genotype affects UGT2B15 expression [Jakobsson, et al., 2008]. To evaluate whether this CNV has an effect on UGT2B15 AI, we also genotyped UGT2B17 deletion and performed linear regression analysis. One, 7, and 7 liver and 4, 19, and 26 breast samples were homozygous of deletion, heterozygous deletion, and homozygous of non-deletion, respectively. No correlation between the allele ratio in the AI assay and CNV genotype was observed in either liver (r2=0.008, P=747) or breast (r2=0.014, P=0.426, data not shown), which suggests that the UGT2B17 deletion only influences the total UGT2B15 mRNA level.
Recent studies also proposed that estrogen regulates UGT2B15 expression [Harrington, et al., 2006]. Therefore, gender might potentially influence UGT2B15 AI. To investigate this possibility, we compared AI between male and female, but no significant difference was observed in either liver (P=0.851) or breast (P=0.871, data not shown), which suggests that gender is not likely to influence AI.
To identify the potential cis-regulatory SNPs in LD with rs1902023:G>T, we resequenced the promoter and exon 1 of UGT2B15 in 56 HapMap samples from three major ethnic groups. In the current human genome build, there are two UGT2B15 genes, which resulted from misassembly of two different chromosomes [Xue, et al., 2008]; we took the first one as a reference in our survey. We identified 40 SNPs, most of which were not present in dbSNP (Fig. 2 and Supp. Table S2). Consistent with previous studies, SNP rs1902023:G>T has a high allele frequency, 42% in YRI, 45% in CEU, and 35% in ASN [Iida, et al., 2002; Lampe, et al., 2000]. Seven other SNPs in the promoter region, rs1580083:A>T (−506 relative to translation start), rs1120265:A>G (−508), rs34010522:G>T (−818), rs35513228:C>T (−1139), rs13112099:C>A (−1397), rs34027331:G>A (−1579), and rs7696472:T>C (−1844), showed a near-perfect LD with rs1902023:G>T (r2=0.92 in the YRI and r2=1 in CEU, and ASN; Fig. 2) and form two major and distinct promoter haplotypes (result not shown). These haplotypes will be referred to as type 1 (−506A, −508A, −818G, −1139C, −1397C, −1579G, −1844T) and type 2 (−506T, −508G, −818T, −1139T, −1397A, −1579A, −1844C) and correspond to the previously reported UGT2B15*1 and *2 haplotypes [Levesque, et al., 1997], respectively. Considering our AI result, we hypothesized that these two types of promoter display a different activity and that type 1 has higher activity compared to type 2. An additional 7 SNPs showed intermediate frequency (>10%, Supp. Table S2), but relatively low LD (all r2<0.3) with the two major haplotypes.
To test the hypothesis that the UGT2B15 promoter contains variants that influence expression levels, we constructed luciferase plasmids containing the two promoter types and transfected them into HepG2 liver cell line. Previous studies had shown that the ~1.2kb region upstream of the transcription start site of the UGT2B genes exhibited full promoter activity [Duguay, et al., 2004; Nakamura, et al., 2008a; Turgeon, et al., 2000]; therefore, this region was given higher priority in this study. As shown in Fig. 3A, pUGT2B15*1, cloned from type 1 promoter, displayed ~20% higher activity than pUGT2B15*2 (P<0.001), which is in agreement with our AI result and indicates that at least one of the four SNPs influences UGT2B15 expression in HepG2 cells. We also transfected a rare recombinant haplotype, which will be referred to as pUGT2B15*r and is identical to type 1 at positions −1139 and −818 and to type 2 at positions −508 and −506 (Fig. 3A). The relative promoter activity of pUGT2B15*r is slightly, but not significantly lower than pUGT2B15*1 (P=0.06). In contrast, pUGT2B15*r activity is significantly higher than that of pUGT2B15*2 (P<0.001). These results taken together suggest that the cis-regulatory SNPs are rs34010522:G>T and/or rs35513228:C>T. To distinguish between them, we generated two additional plasmids by site-directed mutagenesis, which differed from pUGT2B15*1 at positions −818 and −1139 and were named pUGT2B15*mg1 and *mg2, respectively. As shown in Fig.. 3A, both these constructs had lower promoter activity (both P<0.001) compared with pUGT2B15*1, thus showing that nucleotide substitutions at either SNP have similar effects as pUGT2B15*2 and suggesting that both of them influence expression in the liver.
We also transfected the same plasmids into MCF-7 cells, but no significant difference was observed between pUGT2B15*1 and *2 (P=0.544; Fig. 3B). To test whether rs34010522:G>T and rs35513228:C>T affect expression in breast, we also transfected pUGT2B15*mg1 and *mg2, but no significant difference was observed when compared with *1 (P>0.4 in both cases). This result suggests that these two variants work in a tissue specific way and do not alter promoter activity in breast cells.
To determine whether these two variants regulate UGT2B15 expression in other tissues, we also transfected pUGT2B15*1, *mg1, and *mg2 into LNCaP and Caco-2 cells, in which UGT2B15 is highly expressed [Nakamura, et al., 2008b]. As shown in Fig. 3C, pUGT2B15*mg1 did not show a significant difference compared with *1 (P=0.06) while *2 presented a ~14% lower promoter activity (P<0.001), which suggested that only rs35513228:C>T regulates gene expression in prostate. Caco-2 yielded a similar result as HepG2 (see Fig. 3D), in which pUGT2B15*mg1 and *mg2 displayed a ~18% (P=0.005) and ~21% (P=0.003) lower promoter activity than *1, respectively. Therefore, both variants can regulate UGT2B15 expression in the colon. In summary, rs35513228:C>T can influence gene expression in liver, prostate, and colon while rs34010522:G>T work only in liver and colon.
Since the SNPs rs34010522:G>T and rs35513228:C>T were within the promoter region, it was reasonable to hypothesize that they lie within transcription factor binding sites and that they modify transcription factor binding affinity. We used Match [Goessling E, 2001] at the TRANSFAC database (http://www.gene-regulation.com/cgi-bin/pub/programs/match/bin/match.cgi) to search for predicted binding sites for transcription factors and found that rs34010522:G>T resides within a canonical Nrf2 binding site. Interestingly, the paralogous region in the UGT2B7 gene (from position −990 to −858) was recently shown to bind Nrf2 and to play an important role in expression of UGT2B7 induced by oxidative stress [Nakamura, et al., 2008a]. To test whether Nrf2 binds upstream region of UGT2B15, we performed ChIP assays in all four abovementioned cells using an anti-Nrf2 antibody and quantified the enrichment for the predicted Nrf2 binding site by real time PCR. As shown in Fig. 4, the anti-Nrf2 immunoprecipitated chromatin samples were significantly enriched for the region surrounding rs34010522:G>T in HepG2 (P=0.031), MCF-7 (P=0.009), LNCaP (P<0.001), and Caco-2 (P=0.011) cells compared with IgG, suggesting that Nrf2 binds this region in all these four tissues. However, the functional role of rs35513228:C>T remains unclear.
In this study, we used AI to guide our search for cis-regulatory variation of UGT2B15 expression. To achieve this goal, we first characterized the allelic specific expression of the rs1902023:G>T variant in both liver and breast tissues. We then identified a set of SNPs in the promoter region that are in near-perfect LD with the rs1902023:G>T coding variant. These variants and their haplotypes were tested in reporter gene and ChIP assays. The results show that variation in the promoter region affect gene expression in a tissue specific manner. We also show that a sequence element containing one of these variants binds the Nrf2 transcription factor in all tested cells. Further studies are necessary to determine if these variants explain some of the inter-individual variation in UGT2B15 expression and further metabolism of steroid hormone and clinical drugs.
There are several reasons why our AI assay could have failed to detect the effects of cis-regulatory variation. First, because the assay can only be performed on individuals heterozygous for a coding SNP, the sample size is relatively small. Indeed, sample sizes in published AI studies are typically modest [Bray, et al., 2003b; Fukuda, et al., 2006; He, et al., 2005; Hirota, et al., 2004; Liu, et al., 2005; Loeuillet, et al., 2007; Wilkins, et al., 2007; Zhang, et al., 2005] and smaller than in standard association studies between genotype and total mRNA levels. Because small sample sizes reduce the statistical power to detect an imbalance, true cis-regulatory variants may be missed (i.e. a false negative result). Second, many factors, including diet, clinical drug use, disease status, age, gender, may affect UGT2B15 gene expression and introduce several sources of inter-individual variation, in addition to genetic sources. Despite the fact that AI is insensitive to factors that affect gene expression in trans, these factors will increase the experimental and biological variance and, therefore, may reduce the power of an AI assay. Despite these challenges, our AI assay detected a significant and consistent excess of the T over the G allele in liver.
Nonetheless, the significant AI in liver was considered only a starting point for further analyses. In this regard, the results of our reporter gene assays directly comparing the two alleles at promoter SNPs are consistent with and further substantiate the AI findings. Finally, the observation that one of the SNPs of interest lies within a validated binding site for Nrf2 suggests that the cis-regulatory variants affects expression by altering the binding of the transcription factor to the DNA.
AI in breast displayed a much wider distribution across samples, with the T/G ratio ranging from 0.63 to 1.19. The higher variance in breast compared to liver samples (Fig. 1B) may be due to non-genetic factors (e.g. diet, age, etc.), as discussed above, and/or to the low RNA amounts obtained from breast samples, raising the possibility that the lack of significant AI in breast is simply a false negative. To take into account the large technical variance, we performed a binomial test over all technical replicates and no significant result was obtained (P=0.993), thus suggesting that the T/G ratio in breast tissue follows a random pattern. Another possible explanation for the AI results in breast is variable degrees of contamination of adipose cells, where UGT2B15 is also expressed [Tchernof, et al., 1999], across the breast samples. However, since adipose tissue usually yields considerably less RNA than epithelial cells, this explanation seems unlikely (though it cannot be excluded). Finally, the difference in the AI results in breast and liver may be due to a tissue specific regulatory mechanism. This scenario is consistent with the fact that no significant difference was observed in the activity of two major promoter haplotypes in breast MCF-7 cells, suggesting that the inconsistent AI in breast samples is not due solely to technical issues. It is important to keep in mind that additional variants may regulate UGT2B15 gene expression in cis or in trans. Indeed, our resequencing survey identified a number of additional variants in the promoter region that are not in strong LD with rs1902023:G>T (e.g. −378T>C, −477G>A, −497T>C, rs62317005:C>T, and rs7682027:A>G). Further studies are necessary to determine if these variants also influence variation in UGT2B15 expression in liver and, especially, in breast.
The binding site prediction and the ChIP result suggest that the effect of the rs34010522:G>T variant is mediated through the interaction with Nrf2. Nrf2 is a ubiquitous transcription factor [Nguyen, et al., 2005] and has been shown to play an important role in basal and oxidative stress induced expression of phase 2 enzymes [Kwak, et al., 2001; Nakamura, et al., 2008a]. It activates transcription by binding specifically to the antioxidant-response element [ARE; Chan, et al., 2001] in promoter regions. Consistent with these findings, our results suggest that Nrf2 is involved in the regulation of UGT2B15 expression in liver and colon. It is still unclear why Nrf2 binds the region spanning position −818, but does not influence transcription in breast and prostate. A possible explanation is that Nrf2 induces gene expression by forming a complex with other cofactors, such as small Maf proteins and activating transcription factor 4 [ATF4; He, et al., 2001]. If these cofactors are not expressed in breast and prostate, Nrf2 could bind the DNA without inducing transcription. In this regard, it is interesting to note that heme oxygenase-1 (HO-1), whose expression is dependent on Nrf2, also showed different regulation between Hepa (mouse hepatoma) and MCF-7 cell lines [He, et al., 2001], similar to what we observe for UGT2B15. The tissue-specific regulation mediated by Nrf2 may reflect the different importance of oxidative stress in these four tissues. Indeed, a large part of the antioxidative response, such as metabolism of pollutants, chemicals and carcinogens, occurs in the liver and perhaps colon, which also interacts with multiple xenobiotics, but not in breast and prostate.
Recent studies have proposed that rs1902023:G>T predisposes to multiple human phenotypes, including oxazepam glucuronidation [Court, et al., 2004], male fat mass [Swanson, et al., 2007], and breast [Sparks, et al., 2004] and prostate cancer risk [Hajdinjak and Zagradisnik, 2004; MacLeod, et al., 2000; Park, et al., 2004]. Considering the nearly complete LD between rs1902023:G>T and the promoter variants identified in this study, it is possible that the reported associations between rs1902023:G>T and different phenotypes may be due to cis-regulatory in addition to coding variation. Further analyses are necessary to clarify this issue.
We are grateful to Dr. Ralf Kittler, Prof. Shutsung Liao, Prof. Marc B. Bissonnette and Dr. Eden V. Haverfield for providing MCF-7, LNCaP, Caco-2 cell line and UGT2B17 CNV genotyping assay, respectively. We thank Maria Tretiakova, Athma Pai, Dr. Francesca Luca, Ivy Aneas, and Dr. Dagan Loisel for technical advice. We also thank two anonymous reviewers for their helpful comments. This research was supported by NIH grants CA125183 and GM61393.