|Home | About | Journals | Submit | Contact Us | Français|
The significant mortality associated with breast cancer (BCa) suggests a need to improve current research strategies to identify new genes that predispose women to breast cancer. Differential allele-specific expression (DASE) has been shown to contribute to phenotypic variables in humans and recently to the pathogenesis of cancer. We previously reported that nonsense-mediated mRNA decay (NMD) could lead to DASE of BRCA1/2, which is associated with elevated susceptibility to breast cancer. In addition to truncation mutations, multiple genetic and epigenetic factors can contribute to DASE, and we propose that DASE is a functional index for cis-acting regulatory variants and pathogenic mutations, and that global analysis of DASE in breast cancer precursor tissues can be used to identify novel causative alleles for breast cancer susceptibility.
To test our hypothesis, we employed the Illumina® Omni1-Quad BeadChip in paired genomic DNA (gDNA) and double-stranded cDNA (ds-cDNA) samples prepared from eight BCa patient-derived normal mammary epithelial lines (HMEC). We filtered original array data according to heterozygous genotype calls and calculated DASE values using the Log ratio of cDNA allele intensity, which was normalized to the corresponding gDNA. We developed two statistical methods, SNP- and gene-based approaches, which allowed us to identify a list of 60 candidate DASE loci (DASE ≥ 2.00, P ≤ 0.01, FDR ≤ 0.05) by both methods. Ingenuity Pathway Analysis of DASE loci revealed one major breast cancer-relevant interaction network, which includes two known cancer causative genes, ZNF331 (DASE = 2.31, P = 0.0018, FDR = 0.040) and USP6 (DASE = 4.80, P = 0.0013, FDR = 0.013), and a breast cancer causative gene, DMBT1 (DASE=2.03, P = 0.0017, FDR = 0.014). Sequence analysis of a 5′ RACE product of DMBT1 demonstrated that rs2981745, a putative breast cancer risk locus, appears to be one of the causal variants leading to DASE in DMBT1.
Our study demonstrated for the first time that global DASE analysis is a powerful new approach to identify breast cancer risk allele(s).
Breast cancer is the most common cancer and the second most common cause of cancer-related death in women. It is estimated that one out of eight American women will develop breast cancer some time in their lifespan and 3.0% will die from this disease . For the year 2012, about 226,870 new invasive breast cancer diagnoses and 39,970 breast cancer related deaths are expected in the United States . Due to this high prevalence and severe consequences, genetic factors contributing to breast cancer risk have been intensively studied.
Family history is known to be associated with 20% - 30% of breast cancer incidence in the United States . Pedigree analysis of clustered familial cases followed by positional cloning in the 1990s led to the discovery of tumor suppressor genes, BRCA1 and BRCA2, two major breast cancer susceptibility loci. Deleterious mutations in these genes increase the risk of developing breast cancer by more than 10 fold and overall account for 15% - 30% of observed risks in familial breast cancer cases . To discover breast cancer susceptibility alleles that constitute the remainder of genetic risk, genes associated with BRCA1/2 pathways were investigated in BRCA1/2 mutation negative familial cases. Such candidate gene approaches revealed that germline mutations of TP53, PTEN, ATM, CHEK2, BRIP1, PALB2, NBS1 and RAD50 are correlated with breast cancer risk, but to a much more moderate extent than BRCA1 and BRCA2. Therefore, new unbiased genomic approaches are needed for identifying genetic factors that influence breast cancer susceptibility.
Over the last decade, advances in array technologies have resulted in the ability to evaluate the expression of thousands of genes simultaneously. These platforms offer a powerful tool to test multiple biomarkers for breast cancer tumorigenesis and prognosis, as well as targeted breast cancer therapy . However, gene expression assessed by current techniques represents the total level of transcripts produced by both parental alleles. The absolute transcript level failed to resolve potential imbalances in relative allelic contribution to the total expression. This perspective is particularly important for familial breast cancers, where an individual inherits a germline mutation on one parental allele, followed by a somatic mutation of the second allele in the tumor cells. Previously, we have reported that mutant BRCA1 transcripts containing premature stop codons were eliminated or destabilized by nonsense-mediated mRNA decay (NMD)  and could lead to a state of haploinsufficiency. As a result, the ratio between the expressions from the wild-type allele and the corresponding mutant allele was significantly increased, resulting in what we coined differential allele-specific expression (DASE) or allelic imbalance (AI) .
DASE is a common phenomenon in human tissues . Although its contribution to breast cancer susceptibility has been implicated , it has not been studied on a transcriptome-wide scale in breast cancer precursor tissues. Since the phenomenon of DASE at a locus may help identify nearby cis-acting transcriptional and epigenetic regulatory sites as well as mutations resulting in non-mediated RNA decay [16,18], we propose that DASE is a sensitive functional index for genetic variants, and can be used as a novel approach to identify risk alleles for breast tumorigenesis. The main objectives of this study are to identify genes with DASE by comparing the allele-specific expression (ASE) and to demonstrate that global DASE analysis could be a powerful new approach to identify breast cancer risk alleles.
There have been successful applications of Illumina’s Infinium assay on global DASE analysis [19,20] since it provides genotyping results based on quantified fluorescent signal intensity of both alleles at a specific SNP site . The samples we used are paired gDNAs and ds-cDNAs derived from eight human mammary epithelial cell lines. In this study, we performed a transcriptome-wide DASE analysis using Illumina’s HumanOmni1-Quad BeadChip platform (Version 1B). Among the total 1,140,419 markers on the Omni1 BeadChip, we focused on SNP markers representing transcribed regions of the female genome for global DASE analysis. Raw data from the array were filtered as described in the Methods section for quality control purposes, and 35,690 qualifying SNPs, representing 8,779 transcribed loci crossing all eight samples, survived for the final DASE analysis. The global DASE pattern at each SNP locus is shown as a Circos plot in Figure Figure1A1A (Detailed data are included in Additional file 1: Table S1). As shown in the DASE distribution histogram (Figure (Figure1B),1B), about 30% of loci are with a DASE ≥ 2, which we used as the cut-off to define a locus with a positive DASE event. This result is consistent with previous array studies which suggest DASE is a relatively common event across the human genome [20,22].
DASE values of transcribed loci were calculated as described in the Methods section. Although a previous study successfully validated candidate loci exhibiting a fold change of 1.5 , we raised the stringency by arbitrarily setting the DASE value cut-off bar at 2, equal to a 4-fold variance between alleles, to ensure the significance of the findings. By using SNP-based calculation, 93 SNPs representing 90 transcribed genes showed statistical significance (P≤0.01 and FDR≤0.05) (Left panel, Figure Figure2A).2A). Similarly, using gene-based calculation, 143 genes exhibited statistical significance (P≤0.01 and FDR≤0.05) (Right panel, Figure Figure2A).2A). However, each method presents some degree of limitation. For example, the SNP-based DASE measurement may only represent certain transcribed isoforms of a certain gene, and it is practically true when the targeted SNPs are located in the 5′ or 3′ UTRs of this gene. For the gene-based approach, an outlier of DASE value from one SNP could have too much weight for final DASE results. To decrease the chance for false positive “hits’, it is important that only gene candidates (total 60) discovered by both methods (Figure (Figure2B,2B, Table Table1)1) were carried forward for further analysis.
To help interpret the candidate DASE loci in the context of biological processes, pathways and networks, IPA analyses were performed on our DASE candidates. The result showed that 24 out of 34 protein coding loci are involved in known molecular interactions. Among those interactions, there are two major networks. Interestingly, one of the major networks covers 9 DASE candidates, including cancer causative genes ZNF331 and USP6 and known breast cancer associated gene DMBT1, and most of them are downstream players of sex hormones (β-estradiol) and MMP pathways, suggesting their potential for being breast cancer risk alleles (Figure (Figure3).3). In addition, IPA analysis also revealed a variety of biological functions that candidate DASE loci are significantly associated with (P<0.05, Additional file 2: Table S2). The cellular functions of these genes are wide-ranging, including cell proliferation, cell death, and inflammation.
We utilized Sanger sequencing to validate nine DASE candidates in one major interaction network (Figure (Figure3).3). In brief, regions flanking DASE-associated SNPs were amplified by PCR followed by sequencing and comparing trace chromatograms between paired gDNA and cDNA samples. The DASE value between two alleles in a sample of a given SNP was evaluated by measuring the peak height of each allele in the chromograms originated from the cDNA sample, justified by that originated from the genomic DNA sample. Figure Figure2C2C showed the examples of sequencing trace files in the validation of DASE in ZNF331, USP6, and DMBT1. A positive DASE event by sequencing is defined when the height of the peak representing one allele is less than half of the peak height of the other allele. As summarized in Table Table2,2, we clearly observed DASE in 9 candidate loci by sequencing in 31 out of 39 (79%) of samples with DASE identified by global DASE analysis. These results supported that our approach reported here to identify DASE loci by high dense SNP array is successful.
Candidate DASE locus DMBT1 was identified and validated by analyzing SNP rs11523871 (Figure (Figure2C).2C). A nearby SNP, rs2981745 (C>T) in 5′UTR region of DMBT1, has been reported to be associated with increased breast cancer risk and rs2981745-T has decreased promoter activity compared with rs2981745-C . Since rs2981745 is not covered in the HumanOmni1 BeadChip, we carried out additional genotyping across all eight HMECs and found rs2981745 heterozygous in 6 HMECs that are also heterozygous for rs11523871 (Data not shown). Sequence analysis of DMBT1 5′ RACE product revealed that rs11523871-C co-presents with rs2981745-A in all six HMECs, which suggests that DASE observed in DMBT1 was caused by the loss of expression of rs2981745T allele (Figure (Figure4).4). To examine if any variants in DMBT1 3′ UTR could also contribute to DASE in DMBT1, we fully sequenced 3′UTR region in the genomic DNAs of all 8 HMECs and identified two common SNPs, rs8441 and rs7383, both presented in samples HMEC-1 and −5 (Additional file 3: Figure S1). We further sequenced the same regions in cDNA products isolated from these two HMEC lines. The results from HMEC-1 are consistent with DASE pattern discovered by analyzing heterozygous rs11523871 in this sample (Table (Table2).2). Importantly, the results from HEMC-5, in which both rs11523871 and rs2981745 are homozygous, revealed typical bi-allelic expression (Additional file 3: Figure S1). We then used an online miRNA targeting tool, Probability of Interaction by Target Accessibility (PITA) , to compare the effects of these variants on miRNA binding, and only very subtle differences of miRNA targeting were found among all genotype combinations (Additional file 4: Figure S2). Taken together, we concluded that in our study, variants in DMBT1 3′UTR region unlikely contribute to DASE. Our data suggested that rs2981745, if not exclusively, appears to be one of the causative variants for DASE in DMBT1.
In our current study, we demonstrated transcriptome-wide DASE analysis as a novel approach to identify breast cancer susceptibility loci. The HumanOmni1-Quad BeadChip we utilized has state of the art coverage of common SNPs on the human transcriptome. Our study identified 60 candidate DASE loci by both SNP- and gene-based methods (Table (Table11 and Figure Figure2).2). Pathway analysis reveals one major DASE gene network which is likely associated with breast tumorigenesis (Figure (Figure3).3). Using PCR and Sanger sequencing, we successfully validated the DASE predictions in this breast cancer-related network, which includes cancer causative genes ZNF331 and USP6, and breast cancer risk associated gene DMBT1 (Table (Table2).2). By analyzing the 5′UTR region of DMBT1, we successfully identified rs2981745 as the causal variant for the DASE in DMBT1 (Figure (Figure4).4). Therefore, we presented an example supporting our original expectation that DASE analysis may lead to the discovery of functional DASE-causing variants. DASE in ZNF331 has possibly resulted from genomic imprinting as indicated from studies in human extraembryonic tissues , but we are not able to verify such speculation for the lack of genetic material from parents. Nevertheless, we reported for the first time about such phenomenon of ZNF331 in primary cultures of adult human tissue. We did not investigate further with the 26 non-coding DASE candidate loci in current studies largely because very little information could be obtained to help interpret their roles. However, there have been a few studies denoting the importance of non-coding RNAs in cellular processes during recent years [27,28]. Considering the high percentage (nearly 50%) of such loci in our candidate list, it is an important next step for us to validate non-coding candidate DASE loci and study their likely roles in breast tumorigenesis.
As the Illumina Human Omni1-Quad Array is originally designed for targeting genomic DNA sequences instead of cDNA sequences, intronic SNPs and many SNPs at the exon/intron boundary are left out for DASE profiling. Although the number of “usable” SNPs covers the majority of transcripts, the coverage could be further increased with newly developed platforms with additional SNP markers. Furthermore, the size of the probes on the Illumina SNP array is 50mer, and it is likely difficult for them to pick up small size transcripts, such as miRNAs. Therefore, it would be logical to consider designing customized BeadChips covering pre-selected probes to improve both cost-effectiveness and specificity, and to reduce the data processing load for global DASE profiling as well. In addition, we observed fluorescent interference between X and Y channels during the data analysis due to the initial probe design , and it was able to be alleviated by canonical data normalization. Despite these limitations, Infinium-based BeadChip is still a practical platform for whole transcriptome DASE analysis as indicated by the results of successful validation using Sanger sequencing (Table (Table22).
As mentioned in the Introduction, currently known high and moderate penetrance risk alleles help only to explain a fraction of familial breast cancer incidents and the existence of more susceptibility genes are likely to be very rare. Thus, it is plausible that the remaining cases could be complemented by the synergistic effects of multiple low penetrance alleles, each conferring an elevated risk of <1.5 fold . The completion of the Human Genome Project and fast development of SNP array technologies have made it practical to perform genome-wide association studies (GWAS) to identify genetic factors that account for breast tumorigenesis. Since the first wave of GWAS to search for such alleles, 22 loci have been reported to significantly associate with breast cancer risk by 16 studies [31-34]. Among those candidate risk loci, nearly half (2q35, 3p24.1, 5p12, 5q11.2, 6q25.1, 8q24.21, 10q26.13, 11p15.5, 16q21.1-21.2) were reproducible in multiple independent studies, which denotes the reliability of GWAS prediction. Despite these successes, the mechanisms of how these GWA variants affect breast cancer pathogenesis are often unknown, because in most cases it is not clear which gene(s) are associated with GWA signals [35,36]. Therefore, understanding the function of these breast cancer associated variants and the mechanisms of how they contribute to breast cancer is a logical next step to validate GWAS findings. Until now, most of them are merely SNP tags for yet unknown breast cancer causal variants . A couple of exceptions so far were rs2981582-A  and rs1219648-G  on FGF2R locus. The discovery of these two common low risk variants eventually pinpointed rs2981576 and rs2981578 as causal variants by genetic mapping and CHIP assay . Evaluation of the functional impact of putative causal variants identified by GWAS could be very challenging as many GWA signals (e.g., 8q24) map some distance from the nearest coding regions, and are likely to mediate disease predisposition through remote regulatory effects on transcription . In addition, the causal alleles involved in breast cancer susceptibility are likely to have moderate molecular and cellular effects, and the measurable effects of an allele in a given functional assay may not exhibit a causal role in breast cancer pathogenesis by itself. Therefore, novel strategies are required to functionally characterize the multiple GWA signals in a genome-wide manner for convincing genetic evidence. Based on previous reports [24,40], our study has pinpointed a likely causal variant for DASE in DMBT1 (Figure (Figure4).4). In addition, genetic mapping in mice has indicated that DMBT1 is a candidate modifier of mammary tumors and breast cancer risk , and our results further support that allelic loss of expression in DMBT1 could contribute to breast cancer development, which is consistent with the reports from previous studies [42,43]. These findings support the idea that global DASE profiling mapping could be a powerful approach to validate GWAS findings when the DASE loci map is overlapped with existing data from GWAS .
In our study, we have compared top DASE loci identified in our genome-wide ASE studies with the current cancer genome database provided by Cancer Gene Census (Sanger Institute) and have identified two DASE loci (USP6 and ZNF331) listed as cancerous genes . Somatic mutations in USP6 (p.V678A and p.R475Q) and in ZNF331 (p.G193E) have been reported in 1% of primary breast cancer and 2% of ovarian cancer, respectively . These findings suggest that global DASE profiling could be a powerful tool in combination with currently available cancer genome databases to identify novel breast cancer “driver” genes. Furthermore, on-going cancer genome sequencing projects (e.g. Cancer Genome Atlas by NCI) have identified thousands of so-called variants of unknown/uncertain significance (VUS), including variations typically characterized by a single base change, or a change to several intronic bases. This large amount of VUS data produced by the next generation sequencing-based cancer genome projects has been termed “overkill” from a clinical perspective . Currently, the consequence of most VUS in oncogenesis has yet to be established. As DASE is a sensitive functional index for pathogenic mutations, it can be applied for validating these VUS at a genome-wide scale as well.
We have demonstrated for the first time that global DASE analysis is a novel approach to identify breast cancer risk alleles. The results from our study are very promising and we expect our strategy will help validate the functional variants identified by the GWAS and Cancer Genome Projects. Importantly, the research strategy developed here could be easily applied to investigating susceptibility for many other types of cancers.
For studies reported here, eight HMEC lines, tested BRCA1/2 mutation negative, were utilized as starting material for DASE analysis. Under an approved protocol by the Institutional Review Board (IRB) at Fox Chase Cancer Center, we routinely derived primary HMEC lines from adjacent or contralateral normal mammary tissue of breast cancer patients using an established commercial protocol of EpiCult®-B human mammary epithelial cell culture (Stemcell Technologies, BC, Canada). Established primary HMEC lines were maintained in medium containing 1:1 DMEM/F12 (Life Technologies, Carlsbad, CA), 2.438 g/L sodium bicarbonate, 5% chelated horse serum, 20 ng/ml EGF (BD Biosciences, San Jose, CA), 100 ng/ml cholera toxin (Sigma-Aldrich, St. Louis, MO), 10 mg/L insulin (Sigma-Aldrich, St. Louis, MO), 0.5 mg/L hydrocortisone (Sigma-Aldrich, St. Louis, MO), Antibiotic-Antimycotic (Life Technologies, Carsbad, CA), and 0.04 mM calcium chloride (Sigma-Aldrich, St. Louis, MO).
Genomic DNAs (gDNAs) were isolated from HMEC lines by phenol-chloroform extraction as previously described . Total RNAs were isolated from HMEC lines by cell lysis in guanidinium isotiocyanate buffer supplemented with 2-mercaptoethanol (BME) followed by phenol-chloroform extraction using a protocol as previously described . After re-dissolving, RNAs were treated with DNase (TURBO DNA-free kit, Ambion, Austin, TX) to remove possible genomic DNA contamination. The concentrations of genomic DNA and RNA stocks were measured using a ND-1000 spectrometer (NanoDrop, Wilmington, DE). To perform the DASE profiling, double stranded (ds)-cDNAs were synthesized from total RNAs using the SuperScript® Double-Stranded cDNA Synthesis Kit (Life Technologies, Carlsbad, CA) and random hexamers following manufacturer’s instructions.
DASE profiling was performed using Illumina’s HumanOmni1-Quad BeadChip SNP array platform (Illumina, San Diego, CA), which has more than one million SNP loci, including more than 120,000 SNPs in transcribed regions. For each HMEC, ds-cDNA (derived from 20–50 μg total RNA) and 200 ng gDNA were loaded to the BeadChip according to manufacturer’s instructions. Samples of gDNA and ds-cDNA to be used for the parallel genotyping and DASE profiling were denatured, neutralized and then underwent PCR-free whole-genome amplification followed by fragmentation according to the Infinium HD Assay Super Protocol. The ds-cDNA and gDNA pairs from each sample were individually hybridized to BeadChips and processed following standard Infinium procedures. Raw data from the assay was generated by scanning processed BeadChips using an iScan Reader. The scanned images were processed in the genotyping module (Ver. 3.3.7) of BeadStudio software (Ver. 220.127.116.11) to export a tab delimited file consisting of the SNP locus, the genotypes and quantified fluorescent signal intensities (Xraw, Yraw). This genome-wide map illustrating the global DASE distribution was drawn using a visualization tool, Circos .
Raw data were filtered before DASE calculation. Firstly, data from CNV (copy number variation) markers, Y chromosomal markers and markers that are not located in transcribed regions were discarded. Secondly, to avoid possible false positives from background noise, a cut-off bar of combined signal intensities from the ds-cDNA sample (Xraw + Yraw ≥ 500) was imposed to filter out non-expressed SNPs. In addition, readings from SNP sites with ambiguous genotyping results were removed. For each sample, raw signal intensities corresponding to ds-cDNA and gDNA for each allele at each SNP site were background corrected. After these pre-processing steps, specific ds-cDNA allele intensities were normalized to their corresponding gDNA allele intensities to eliminate probe specific effects and potential variations occurring during BeadChip scanning. The DASE value between two alleles X and Y in a sample of a given SNP was then calculated as the absolute value of the normalized log2-ratio given by DASE = ABS(log2 [(DXraw/GXraw) / (DYraw/GYraw)]), which was also used by other groups . Using the absolute DASE value (log2 ratio) for the computation enables us to quantify DASE based only on magnitude of change without regard to direction of change, as direction of change cannot be incorporated due to the lack of a standard reference allele. Without using the absolute DASE value, the averaged DASE would likely be neutralized in the gene-based approach described below. The distribution of DASE was determined to be gamma distributed using maximum likelihood methods and quantile-quantile plots. For each SNP, p-value is then calculated based on the fitted gamma distribution for DASE by testing the null hypothesis that mean DASE is zero against the two-sided alternative.
In our filtered dataset, we focused only on heterozygous individuals in assessing allelic imbalances. To assign DASE value to transcribed loci, we used two approaches in parallel. In the first approach, we extracted SNPs for which at least 3 out of the 8 HMECs were heterozygous. For each of these SNPs, their DASE values in heterozygous individuals were calculated separately and the average value was recorded. In the second approach, we extracted all the transcribed-region SNPs with heterozygous genotypes for each gene. The average DASE value was calculated for each corresponding gene, and only those genes with DASE values in at least 3 out of the 8 HMECs were included in final analysis. For each sample, DASE was determined to have a heavy right-tailed skewed distribution based on SNP-level data as well as gene-level data. The top panels of Additional file 5: Figure S3 displays the density of DASE for a typical sample determined using kernel density methods. Using maximum likelihood methods and quantile-quantile (QQ) plots, the distribution of DASE was determined to be approximately gamma distributed. The bottom panels of Additional file 5: Figure S3 display the QQ plot for a typical sample. A generalized linear model approach (based on gamma regression) was used to identify SNPs with mean DASE significantly different from zero. A p-value cut-off of 0.01 and a false discovery rate (FDR) cut-off of 0.05 were utilized to determine statistical significance of each SNP. FDR was calculated using the Benjamini-Hochberg step-up method to account for multiple testing . Biological significance of each SNP was determined based on a mean DASE value of at least 2. A plot of p-value or FDR versus mean DASE enabled visualization of the relationship between statistical and biological significance. SNPs identified based on statistical significance as well as biological significance were interrogated for molecular pathways and biological function in bioinformatics analyses. This analysis was repeated on gene-level data obtained as outlined above. All computations were performed using the R statistical language and environment .
Biological and interaction networks of candidate DASE loci were generated using IPA (Ingenuity® Systems). IPA explores the set of input genes to identify networks by using Ingenuity Pathways Knowledge Base for interactions between identified ‘Focus Genes’. For each network, IPA computes a score according to the fit of the user's set of significant genes. The score suggests the likelihood of the Focus Genes in a network from Ingenuity’s knowledge base being found together due to random chance. A score of 3 was used as the cutoff for identifying gene networks, which predicts that there is only a 1/1000 chance that the focus genes shown in a network are due to random chance. Therefore, a score of 3 or higher indicates a 99.9% confidence level to exclude random chance. In this study, the candidate gene list was uploaded into the application for biological function enrichment analysis, and networks of Network Eligible Molecules were then algorithmically generated based on their connectivity.
Genomic and mRNA sequences flanking selected SNPs were retrieved from NCBI and primers were designed accordingly using the web-based Primer3 software (http://frodo.wi.mit.edu/primer3/). The sequences of primers are available upon request. PCR amplification was performed using GoTaq® Green Master Mix (Promega) and relevant gDNA and cDNA samples on a thermal cycler (Applied Biosystems, Model 2720) using the following program: 94°C 3 minutes for initial denaturing, followed by 10 cycles touchdown PCR (94°C 30 seconds, 65°C −55°C <−1°C / cycle> 30 seconds, 72°C 30 seconds) and 35 cycles regular PCR (94°C 30 seconds, 60°C 30 seconds, 72°C 30 seconds), final extension for 5 minutes at 72°C and then hold at 4°C. PCR product purification and Sanger sequencing were performed by Beckman Coulter Genomic Services (Danvers, MA). Sequencing trace files were analyzed using Sequencher software (v4.1.4., Gene Codes, MI). The DASE value between two alleles X and Y in a sample of a given SNP was then calculated using the peak height of each allele in the chromograms originated from cDNA samples, justified by that originated from genomic DNA samples. A positive DASE event by sequencing is defined when the height of the peak representing one allele is less than half of the peak height of the other allele. The fact that we chose a different threshold (DASE=1) for DASE validation by Sanger sequencing is justified by the different data dynamic ranges between these two platforms. The SNP array gives numeric results with a dynamic range of 216. On the other hand, Sanger sequencing gives graphic trace files, and the usable peak-heights for quantification are usually within a few dozen pixels. Based on this dissimilarity, we chose different thresholds for DASE calling for each method.
The amplification of 5′ UTR region of DMBT1 was performed using the FirstChoice RLM-RACE kit (Life Technologies) following the manufacture’s manual. In brief, a 5 μg RNA sample isolated from each HMEC was treated with calf intestine alkaline phosphatase (CIP) to remove 5′-phosphates from fragmented RNA ribosomal RNA and tRNA, followed by tobacco acid pyrophosphatase (TAP) treatment to remove the cap structure of intact mRNA. A 5′RACE RNA adapter was ligated to CIP/TAP treated RNA by T4 RNA ligase, and then reverse transcription was performed using random decamers. The resulting cDNAs were used as a template for PCR with a 5′ RACE Outer Primer (5′ GCTGATGGCGATGAATGAACACTG 3′, binds to 5′RACE adapter), and a gene specific primer DMBT1-5Ro (5′ CTCAGGGCCAAACCAGAA 3′) complementary to the region (+288, +308) of the DMBT1 cDNA. A nested PCR was performed with a 5′ RACE Inner Primer (5′ CGCGGATCCGAACACTGCGTTTGCTGGCTTTGATG 3′, binds the 5′RACE adapter) and a DMBT1-5Ri (5′ GGTTGACTCCAAGGAAATCG 3′) primer, complementary to the region (+194, +213) of the DMBT1 cDNA. The PCR products were purified and sequenced using DMBT1-5Ri.
The authors declare that they have no competing interests.
CG and XC carried out the original study design, global DASE analysis and validation studies, and drafted the manuscript. KD participated in the design of the study and performed the statistical analysis and YZ performed IPA analysis. CS established the HMEC lines and isolated RNAs and DNAs for the array analysis. MD participated in the participated in study design and helped to draft the manuscript. All authors read and approved the final manuscript.
This work was kindly supported by the Susan G. Komen for the Cure (KG100274 to X.C.), the Eileen Stein Jacoby Fund, and the Risk Assessment and Presentation Keystone Program at Fox Chase Cancer Center.
Table S1. Global DASE analysis.
Table S2. Functional analysis by IPA.
Figure S1. Sequencing analysis of DMBT1 3′UTRs.
Figure S2. Predictions of microRNA-targeting at DMBT1 3′UTR.
Figure S3. Distribution of DASE.
We thank Dr. Zhengyu Jiang for graphic assistance and Ms. Patricia Bateman for editing. The Biosample Repository and Biostatistics and Bioinformatics Core Facilities at FCCC were essential for our studies.