|Home | About | Journals | Submit | Contact Us | Français|
Single nucleotide polymorphisms (SNPs) associated with polygenetic disorders, such as breast cancer (BC), can create, destroy or modify microRNA (miRNA) binding sites; however, the extent to which SNPs interfere with miRNA gene regulation and affect cancer susceptibility remains largely unknown. We hypothesize that disruption of miRNA target binding by SNPsis a widespread mechanism relevant to cancer susceptibility. In order to test this, we analyzed SNPs known to be associated with BC risk, in silico and in vitro, for their ability to modify miRNA binding sites and miRNA gene regulation and referred to these as target SNPs. We identified rs1982073-TGFB1 and rs1799782-XRCC1 as target SNPs, whose alleles could modulate gene expression by differential interaction with miR-187 and miR-138, respectively. Genome-wide bioinformatics analysis predicted approximately 64% of transcribed SNPs as target SNPs that can modify (increase/decrease) the binding energy of putative miRNA::mRNA duplexes by over 90%. To assess whether target SNPs are implicated in BC susceptibility, we conducted a case-control population study and observed that germline occurrence of rs799917-BRCA1 and rs334348-TGFR1, significantly varies among populations with different risks of developing BC. Luciferase activity of target SNPs allelic variants and protein levels in cancer cell lines with different genotypes showed differential regulation of target genes following over-expression of the two interacting miRNAs (miR-638 and miR-628-5p). Therefore, we propose that transcribed target SNPs alter miRNA gene regulation and consequently protein expression, contributing to the likelihood of cancer susceptibility, by a novel mechanism of subtle gene regulation.
MiRNAs are a family of endogenous, short non-coding RNAs that modulate post-transcriptional gene regulation. They exert their regulatory role on protein-coding gene (PCG) expression by binding to either full or partial complementary sequences primarily in the 3′ untranslated region (UTR), but also inside the coding sequence (CDS) and the 5′ UTR, of the corresponding messenger RNAs (mRNAs). Eventually, this affects mRNA stability and translation (1–3). The role of miRNAs in human cancer pathogenesis has been well established by the identification of genetic alterations in miRNA loci, miRNA expression signatures that define different neoplastic phenotypes and numerous oncogenes and tumor suppressor genes as miRNA targets (4).
The discovery of germline variants that influence cancer susceptibility is currently the subject of intense research (5). The most common variations are single nucleotide polymorphisms (SNPs), which occur approximately once every 100 to 300 base pairs in the human genome and constitute an immense source of information for unraveling the heterogeneity of human cancer pathogenesis, clinical course and response to treatment. Several research groups have studied cancer risk association with SNPs located in candidate genes, and it was estimated that around 50,000–250,000 SNPs, mostly located in or around the 25,000 PCGs, have a putative biological effect (6). SNPs can affect protein function by changing the amino acid sequences (non synonymous SNP) or by perturbing their regulation (e.g. affecting promoter activity (7), splicing process (8), and DNA and pre-mRNA conformation). When SNPs occur in 3′ UTRs, they may interfere with mRNA stability and translation by altering polyadenylation, protein::mRNA and miRNA::mRNA regulatory interactions. Recent studies have found 3′ UTR SNPs that affect PCG expression via miRNA gene regulation in different diseases (9–14). These experimental evidences led us to hypothesize that the disruption of miRNA target binding by SNPs located in the 5′ UTRs, CDS or 3′ UTRs is a widespread mechanism leading to cancer susceptibility and initiation. In support of our hypothesis, it was recently shown by Slack group that a kRAS variant within the let-7 target site increased the risk for non-small cell lung carcinoma among moderate smokers (15). Moreover, since the 3′ UTRs of PCGs were insufficiently screened for mutations/polymorphisms, it is possible that the extent of such abnormalities might be much greater than initially predicted.
Breast Cancer (BC) is one of the most common female malignancies with more than one million new cases diagnosed every year (16). It is widely accepted that the majority of BC risk might be specified by the combined product of multiple low-penetrance alleles (for review see (17)); however, the multiplicity of variants identified so far alone cannot account for ~80% of familial BC cases that are unrelated to high-penetrance BC susceptibility genes. The role of polymorphic variants located in BC relevant genes in tumor susceptibility has been extensively addressed ((18, 19) and Supplemental Table S1). However, none of the reports investigating the pathogenetic implications of genomic variation on BC susceptibility have evaluated the role of SNPs in miRNA::mRNA gene regulation and its effect in BC development.
Here we analyzed SNPs associated with BC susceptibility for their ability to affect miRNA binding sites and miRNA::mRNA gene regulation. We identified SNP dependent miRNA interactions that might explain the pathogenetic relevance of known BC-associated SNPs. We performed a genome-wide scan for SNPs inside PCG transcribed regions (3′ UTR, CDS and 5′ UTR) that could potentially alter miRNA binding (referred to as target SNPs). Finally, we illustrated the functional consequences of target SNPs on miRNA regulation of protein expression and their possible clinical implications due to differences in frequency distribution among familial BCs, sporadic BCs and controls.
We obtained mature human miRNA sequences (total of 885) from NCBI build 36 database (version13.0). We downloaded SNP data from HapMap (Release 21A). We retrieved 5′ UTR, 3′ UTR, and CDS mRNA genomic locations from the UCSC (http://genome.ucsc.edu/) known gene table (human genome assembly, NCBI build 36, hg18).
For each SNP we retrieved 2 sequences, centered on each allele with 25 nucleotides flanking both sides. We used the miRNA target prediction program miRanda (2) with two different cut-offs (score ≥80, MFE ≤− 16Kcal/mol and score ≥50, MFE ≤− 5Kcal/mol) to calculate minimum free energy (MFE) for all the possible miRNA::SNP centered sequences. For all the predicted interactions, we computed the allele dependent MFE changes. Based on the distribution of MFE changes we identified the 20 and 80 percentile threshold values for miRNA::target SNP identification (Fig. 1). Based on these calculations we termed SNP alleles as either active or non-active alleles, the active allele being the variant that induces a decrease in MFE and a stronger miRNA binding with the target.
Blood samples from patients with familial and sporadic BC and from control subjects were collected at the Istituto Nazionale Tumori (INT), Milan Italy with their informed consent. Total genomic DNA was purified from peripheral blood leukocytes using the QIAamp® DNA mini kit (Qiagen).
We performed a literature search by use of the MedLine/PubMed database (20) covering the time between January 2006 and December 2008 in order to retrieve papers reporting association studies between SNPs and BC risk. We selected SNPs that were confirmed to be associated with BC risk in one or more papers and/or from large-scale genome wide association studies. Supplementary Table S1 shows the initial list of SNPs use for subsequent miRanda analysis.
PCR amplification of the transcribed SNP-containing region was performed on genomic DNA (Platinum Taq High Fidelity; Invitrogen). Primers are available upon request. Sequences were performed using the BigDye Terminator Reaction Chemistry v3.1 on Applied Biosystems 3730 DNA analyzers (Applied Biosystems). All sequence analyses and alignments were performed with the SeqmanPro program, Lasergene version 7.1 (DNASTAR). PCR amplified SNP-containing regions carrying either the active or NON-active alleles were XbaI cloned into the 3′ UTR of the pGL3-control vector (Promega).
All cell lines were obtained by ATCC. Cells were transfected using Lipofectamine 2000 (Invitrogen). Total protein extracts were prepared in 0.5% NP40 lysis buffer. For immunoblotting, protein extracts were separated in 4% to 20% SDS-PAGE (Criterion Precast Gel; Bio-Rad) and transferred to nitrocellulose membranes (Bio-Rad). Antibodies for TGFB, TGFBR1 and XRCC1 were from Cell Signaling; for BRCA1 from Calbiochem (clone MS110) and for Vinculin from Santa Cruz (clone N-19). Bands were quantified using GelDoc XR software (Biorad).
MCF7 cells (200×105 cells per 24-well) were co-transfected with 0.4ug of pGL3 (luciferase), 0.08ug of pRLTK (renilla) (Promega) and 50nM of precursor miRNA molecules or scrambled negative control (Ambion). Thirty hours after transfection, cells were lysed in 100ul of passive lysis buffer according to the dual luciferase reporter assay protocol (Promega) and luciferase activity measured with Veritas luminometer (Turner BioSytems).
Fisher’s exact test was used to determine the association of SNPs and patient disease status. Cochran-Armitage tend test was used to investigate whether there was a trend in binomial proportions of cancer patients (Familiar + Sporadic, Familiar, or Sporadic respectively) across levels of the different genotypes for each SNP. The univariate and multivariate logistic regression models were fit to evaluate the association of SNP genotype with patient state status.
Additional methods including Patient Information, Statistical Analysis, RNA extraction, Retrotranscription and Realtime PCR are available in Supplemental Materials.
In order to test our hypothesis that SNPs can affect BC susceptibility by altering miRNA::mRNA binding, we took advantage of genetic variants that are already known to be associated with BC (see Methods and Supplemental Table S1). For each of these SNPs, we retrieved two sequences 51 nucleotides long, one centered on the common allele and the other centered on the variant allele. We scanned the retrieved sequences with miRanda (2) in search of miRNA::mRNA target sites and calculated the MFE changes induced by the allelic variants (Supplemental Table S2). We observed that the allelic variants can either increase or decrease the MFE of the corresponding RNA duplexes, leading to either a stronger or weaker miRNA::mRNA binding, respectively. This mechanism can also lead to either creation of a new binding site or destruction of an existing target site. We focused only on those miRNAs (included in Supplemental Table S2) that were expressed in BC samples and cell lines (Supplemental Fig. S1 and data not shown) and selected 3 candidate target SNPs for functional validation, for a total of 16 miRNA::SNP interactions (Supplemental Table S3). Two out of the 16 interactions chosen were negative controls, in which the interacting miRNA binding was not modified by the SNP allelic variants. To verify whether the selected BC-associated SNPs can actually affect miRNA binding, we performed in vitro luciferase reporter assays. For each SNP, we produced two pGL3-SNP constructs (Fig. 2b) carrying either the active or non-active allele, and co-transfected them in parallel with the predicted interacting miRNAs or the scrambled negative control. rs1982073 inside TGFB1 and rs1799782 inside XRCC1 (Table 1) showed statistically significant effect on miR-187 and miR-138 activity, respectively (Fig. 2a and Fig. 2c). TGFB1’s rs1982073 [C] active allele was predicted by miRanda analysis to create a new target site for miR-187, which was absent with the more common [T] variant when stringent settings were used. Luciferase assay showed statistically significant suppressive effect of miR-187 on the construct carrying the [C] active allele, which is completely absent with the [T] non-active allele (40% decrease in luciferase activity, p=0.006) (Fig. 2c, left panel). This is in agreement with the major line of evidence assigning an inhibitory role of miRNAs on gene expression. Conversely, according to miRanda prediction, the rs1799782-XRCC1 [T] was the active allele increasing the binding energy of miR-138 compared to the [C] variant. When miR-138 was assayed with pGL3-rs1799782-XRCC1 constructs, we observed a significant increase in the luciferase activity with the [T] allele compared to the [C] allele (34% increase, p=0.0003)(Fig. 2c, right panel). This clearly indicates a stabilizing role, rather than an inhibitory role, for miR-138 stronger binding with the rs1799782-XRCC1 [T] variant.
Subsequently, we tested the effect of these two target SNPs on endogenous TGFB1 and XRCC1 protein levels in cancer cell lines carrying different SNP genotypes by transfecting with the interacting miRNAs. miR-187, consistent with the luciferase results, preferentially down-regulated protein levels of the TGFB1 gene carrying the [CC] rs1982073 active alleles; whereas, miR-187 had an intermediate and opposite effect on the [TC] and [TT] rs1982073 genotypes, respectively (Fig. 2d, left panel). Similarly, miR-138 showed a stabilizing effect when over-expressed in a cell line that was heterozygous for rs1799782-XRCC1. While [CC] carriers displayed decreased XRCC1 protein levels, a cell line carrying the [CT] genotype displayed increased XRCC1 levels, in agreement with the luciferase results (Fig. 2d, right panel). Among the 25 different cancer cell lines we screened, we did not find any rs1799782-XRCC1 [TT] homozygous carrier, supporting the notion that [TT] carriers are protected from developing cancer (21). Taken together, these results suggest that for approximately 15% of the investigated transcribed SNPs (2 out of 14 target SNP interactions predicted to be allele-dependent and tested in our analysis, Fig. 2, Supplemental Table S3 and data not shown), their known association with BC can be biologically due to interference with miRNA binding and therefore miRNA gene regulation. The fact that only one seventh of the target SNP interactions could be biologically confirmed is not surprising, given the high rate of false positives of all miRNA target prediction programs. While miRanda program is known to produce more false positive predictions than other existing programs that give preference to seed complementarity or inter-species target conservation, it has a higher sensitivity. We, therefore, choose miRanda to identify non-canonical miRNA targets (i.e. with low seed complementarity) like rs1982073-TGFB1 and rs1799782-XRCC1 that would have been otherwise overlooked by using prediction programs focusing only on 3′UTR and considering extensive 5′ complementarity (data not shown). Overall, from our in silico predictions, we validated target SNP interactions that were predicted to have a wide range of MFE change (ranging from 8% up to 42%) with no direct correlation between MFE and probability of a true miRNA::target SNP interaction. Therefore, for subsequent analyses, we filtered the predicted target interactions with a MFE change of at least 8%, at which, in our in vitro experiments, a significant biological effect was detected. Moreover, at this cut off, through a genome-wide target SNP analyses (see below), we noticed a clear peak in the MFE change distribution, further supporting a possible biological significance for MFE changes greater than 8% (Supplemental Fig. S2).
Out of 3,839,363 SNPs comprised in the human HapMap database (Release 21a), we identified 90,985 SNPs that are located in mRNA regions (i.e. transcribed SNPs) and grouped them according to their genomic locations (5′ UTRs, CDS and 3′ UTRs). We identified candidate target SNPs that can potentially create, destroy or modify miRNA binding sites due to allele-specific MFE changes, through an integrated bioinformatics approach (Fig. 1 and Methods) that includes the predictions of miRanda (2). In supplemental Tables S4, S5 and S6, we present the top 100 miRNA::SNP allele-specific MFE changes for SNPs located in 3′ UTR, CDS and 5′ UTR respectively (additional target SNP interactions are available upon request). We observed highly similar bi-modal distribution of MFE change for 3′ UTR, 5′ UTR, and CDS (Supplemental Fig. S2) and considered all SNPs that induce at least an 8% MFE change as target SNPs. Using this strategy (Fig. 1) and stringent miRanda settings (see Methods), we found that 41%, 6% and 53% of target SNPs were located inside 5′ UTR, CDS and 3′ UTR, respectively. In the MFE change distribution of all transcribed SNPs, the lower 2.5 and the higher 97.5 percentile corresponded to minimum 94% MFE change. By using 94% cut-off, we found that approximately 64% of transcribed SNPs are predicted to disrupt (increase/decrease) the MFE of putative miRNA::mRNA duplexes. In summary, through this genome-wide bioinformatics approach, we identified a set of putative target SNPs that modify the MFE of miRNA::mRNA binding and eventually influence the interaction and the regulatory function of miRNAs with their targets.
Additionally, we performed a similar analysis on the 17 SNPs that Saunders et al. identified inside experimentally verified miRNA target sites with the corresponding interacting miRNAs (22) (Supplemental Table S7). We observed that 78% of these SNPs were competent to change the miRNA MFE by at least 8%. Therefore, variant alleles inside experimentally verified miRNA target sites can change the status of miRNA::mRNA interactions and affect gene expression, consistent with our genome-wide bioinformatics predictions.
We sequenced a panel of BC patients (166 sporadic and 169 familial BRCA1 and BRCA2 negative probands) and controls (186) (see Supplemental Materials for patient information) for the germline presence of selected target SNPs located inside cancer relevant genes (Table 2). Supplemental Table S8 summarizes the distribution of target SNPs genotypes and their association with BC risk in Caucasian cases (sporadic and familial BC) and controls. The logistic regression analysis indicated that rs799917-BRCA1 [T] carriers were more likely associated with BC (all cases vs controls: univariate analysis odds ratio (OR) for [CT] carriers 1.57, p=0.03 and still significant in multivariate analysis with OR =1.86 and p=0.046, and univariate analysis OR for [TT] carriers 1.95, p=0.03) and in particular with sporadic BC (sporadic cases vs controls: univariate analysis OR for CT carriers 1.98, p=0.007 and OR for [TT] carriers 2.81, p=0.003) (Table 3). Furthermore, the analysis suggested that rs334348-TGFBR1 [AG] carriers were more likely to have BC (all cases vs controls: univariate analysis OR 1.69, p=0.048) and in particular familiar BC (familiar vs control: univariate analysis OR 2.2, p=0.005; and familiar vs sporadic: univariate analysis OR 1.99, p=0.01) and this association was also significant in the multivariate analysis (familiar vs control: multivariate analysis OR 2.67, p=0.002; and familiar vs sporadic: multivariate analysis OR 2.17, p=0.02) (Table 3). Despite the small size of the case-control study population, this evidence suggests that specific target SNPs have differential distribution among familial BCs, sporadic BCs and controls.
To validate the computational predictions and the biological relevance of target SNPs, first we carried out in vitro luciferase reporter assays. The identified target SNPs that are located in BRCA1, TGFBR1 and XIAP (Table 2) affected significantly (p<0.05) the pGL3-SNP luciferase activity by the predicted interacting miRNAs (Fig. 3a and Supplemental Fig. S3). In all three cases tested, miRNAs displayed a higher repressive effect when interacting with the active allele. Next, we tested the effect of the two target SNPs that we found to be associated with BC, rs799917-BRCA1 and rs334348-TGFBR1, on endogenous BRCA1 and TGFBR1 protein levels. We transfected the interacting miRNAs (miR-638 and miR-628-5p) in cancer cell lines carrying different target SNP genotypes. Following miR-638 over-expression, 7 out of 9 cell lines (in which BRCA1 was clearly detected in order to perform correct band quantification) showed a reduction in BRCA1 protein levels (Fig. 3b left panel and data not shown). By sub-grouping cell lines responsive to miR-638 induced BRCA1 reduction according to rs799917 genotype, we observed that the presence of the [CC] genotype was responsible for a stronger reduction of BRCA1 protein levels (61% reduction vs. 79%, p=0.047; Fig. 3c left panel). Additionally, we excluded the possibility that miR-638 dependent BRCA1 reduction was due to a conserved target site inside BRCA1 3′ UTR predicted by TargetScan via luciferase and BRCA1 over-expression experiments (Supplemental Fig. S4), further confirming the functional significance of the target SNP interaction site. Following similar approach, we evaluated the effect of rs334348 on miR-628-5p regulation of TGFBR1. By over-expressing miR-628-5p in different cell lines, we observed that miR-628-5p behaves as a true repressor of TGFBR1 (Fig. 3b right panel, and data not shown) in a cell specific manner and that its repressor activity on TGFBR1 protein levels is dependent on the rs334348 variant inside the TGFBR1 3′ UTR miRNA target sequence (80% vs 50% reduction with the [AA] and the [CC] genotypes respectively, p=0.006; Fig. 3c right panel). The observed significant effects on protein modulation by miRNA in the presence of the active or non-active alleles (Fig. 2, Fig. 3 and Supplemental Fig. S3) support our hypothesis that genetic variants inside miRNA targets can actually disrupt miRNA gene regulation and affect protein expression, eventually leading to tumor susceptibility.
In the present study, we identified a new pathogenetic mechanism to explain the association of certain SNPs with BC susceptibility by using the latest knowledge on post-transcriptional gene regulation by miRNAs. Evidence of the effect of SNPs on miRNA binding derived from our integrative analysis delineates a novel role for SNPs in gene expression modulation. Supporting this theory, a recent work suggested that differences in SNP allele frequency among ethnic groups account for differences in gene expression (23). Two out of the 11 cis SNPs identified by Spielman et al. (23) were actually transcribed SNPs located in 3′ UTRs, and by bioinformatics analysis, we found that both of the SNPs were qualified to induce MFE absolute changes greater than 8% with multiple putative interacting miRNAs (Supplemental Table S9). Therefore, we speculate that cis SNPs modulate phenotypic gene expression diversities, at least in part, through alteration of miRNA target binding capability; ultimately, leading to differences in the susceptibility to complex genetic diseases, such as BC. Notably, among the target SNPs identified as differentially expressed in BC, we found rs334348 located in the 3′ UTR of TGFBR1. The association of this SNP with germline allele specific expression of TGFBR1 was recently found, and it was shown to confer an increased risk of colorectal cancer (24). This phenotype could be explained, for example, by an altered miRNA interaction, and therefore strongly supports our hypothesis that SNPs affecting miRNAs function contribute to allele specific protein expression and consequently play a role in tumor susceptibility. A recent study by Kim and Bartel (25) based on transcriptome analyses of heterozygous mouse strains, showed that polymorphic miRNA target sites determine differences in gene expression, which further supports our hypothesis.
The focus of this study is to address the effect of transcribed SNPs on miRNA::mRNA interactions and thereby the regulation of gene expression by these SNPs with possible implications in tumorigenesis. We found that numerous target SNPs can effectively interfere with miRNA target recognition, and the distribution of these SNPs significantly vary among BC and control populations, suggesting a potential role in BC susceptibility. It is interesting to note that the BC-associated rs1982073-TGFB1 variant had been previously linked with low expression levels of TGFB1 during the initial phases of tumorigenesis, while higher expression levels were found in more advanced metastatic BC stages (26, 27). This switch in TFGB1 expression could be due to concomitant changes of miR-187 expression levels and could be favored by carrying the [T] variant. Additionally, the described model, together with TGFB1 pro -or anti-tumorigenic activities in different phases of tumorigenesis, can explain the conflicting results observed for rs1982073-TGFB1 association with BC. Another interesting observation is that the stronger binding occurring between miRNA and active allele does unequivocally mean repression; in fact, as shown for rs1799782-XRCC1, miR-187 stabilizes XRCC1 expression when carrying the active allele. This finding is in line with other evidence showing that miRNAs can positively modulate the protein expression (28, 29). Furthermore, by searching the complete PCG transcripts (5′ UTR, CDS, 3′ UTR) for miRNA target SNPs, we have identified miR-638 as regulator of BRCA1 via binding to its target site inside CDS (to our knowledge the first miRNA regulating BRCA1 published so far). Although a conserved interaction site was predicted by TargetScan in the 3′ UTR of BRCA1, we found that it is not a functional site in the conditions we tested. Recently, Tay and coworkers also proved the existence and abundance of miRNA targets located inside the CDS of PCGs (3), confirming the possibility that miR-638 regulates BRCA1 in a similar way. Although large case-control studies have not reached a unifying conclusion on the relevance to BC susceptibility of the miRNA target SNP rs7799917 inside BRCA1 CDS (30, 31), its association with tumor susceptibility has been recently highlighted in a SNP analysis of DNA repair genes in glioblastomas multiforme samples (32). In the present study, we found that rs7799917 [T] allele is associated with a weaker miR-638 dependent BRCA1 reduction, and is linked with BC (Table 3). Further investigations are underway to explain this apparent inconsistency. We speculate that this is due to concomitant variations during tumor development of miR-638 expression levels. Moreover, we observed that low frequency CDS mutations (both synonymous and non synonymous, data not shown) might exert on miRNA gene regulation effects similar to target SNPs (33). This provides an innovative insight on the importance of synonymous mutations, which in the future should be considered and investigated with renewed attention.
Supported by the data presented here, we highlight the possibility that both genetic variants and a microRNA expression patterns can account for gene expression differences among distinct populations. Therefore, the newly recognized class of target SNPs may contribute to cancer susceptibility not per se (or as tags for specific haplotypes (30, 31)), but in concert with miRNA expression patterns, which are becoming more easily available in this new era of miRNA profiling.
In summary, considering the multifactorial model of BC susceptibility, we propose that transcribed allelic variants can alter miRNA PCG regulation and thus increase miRNA abnormalities. These contribute to tumor susceptibility and tumor general risk by a mechanism of subtle gene regulation still not fully appreciated. As the lists of miRNAs and target SNPs keep growing, the need to confirm the effect of target SNPs on gene expression becomes more pressing and will eventually help us achieve a deeper insight into many variables of the molecular pathways relevant to the development of BC and other human malignancies.
G.A.C. is supported as a Fellow at The University of Texas M. D. Anderson Research Trust, as a Fellow of The University of Texas System Regents Research Scholar and by the Ladjevardian Regents Research Scholar Fund. This study was supported in part by Breast Cancer SPORE Developmental Award to G.A.C. R.D. holds a Philadelphia Healthcare Trust Endowed Chair Position; research in his laboratory is supported by the Philadelphia Healthcare Trust, NHGRI/NIH and American Cancer Society. We thank to Gustavo Baldassarre for the kind gift of BRCA1 expressing vector and to Ellen Ko for English editing.