|Home | About | Journals | Submit | Contact Us | Français|
To identify genetic events that characterize cancer progression, we conducted a comprehensive genetic evaluation of 161 primary breast tumors. Similar to the “mountain-and-hill” view of mutations, gene amplification also shows high and low frequency alterations in breast cancers. The frequently amplified genes include the well-known oncogenes, ERBB2, FGFR1, MYC, CCND1, and PIK3CA, whereas other known oncogenes that are amplified, though less frequently, include CCND2, EGFR, FGFR2, and NOTCH3. More importantly, by honing in on minimally amplified regions containing ≤ 3 genes, we identified six new amplified genes: POLD3, IRAK4, IRX2, TBL1XR1, ASPH, and BRD4. We found that both the IRX2 and TBL1XR1 proteins showed higher expression in the malignant cell lines, MCF10CA1h and MCF10CA1a, than in their precursor, MCF10A, a normal immortalized mammary epithelial cell line. To study oncogenic roles of TBL1XR1, we performed knockdown experiments using a shRNA approach and found that depletion of TBL1XR1 in MCF10CA1h cells resulted in reduction of cell migration and invasion as well as suppression of tumorigenesis in mouse xenografts. Intriguingly, our mutation analysis showed the presence of activation mutations in the PIK3CA gene in a subset of tumors that also had DNA copy number increases in the PIK3CA locus, suggesting an additive effect of co-existing activating amino-acid substitution and dosage increase from amplification. Our gene amplification and somatic mutation analysis of breast primary tumors provides a coherent picture of genetic events, both corroborating and novel, offering insight into the genetic underpinnings of breast cancer progression.
Human cancers are characterized by gene mutations and chromosomal aberrations (1, 2). Breast cancer is the most common cancer of women in the U.S. and other western countries, with an accumulated life time incidence rate of about 11%. Approximately 180,000 new cases were estimated to occur in the U.S. in 2008. About 10% of breast cancers are inherited, mostly caused by mutations in BRCA1 and BRCA2. The rest are sporadic breast cancers caused by somatic mutations and chromosome instability in the breast tissue. Breast cancer development is marked by multiple histopathologically discernable stages, including hyperplasia of mammary duct epithelial cells, ductal carcinoma in situ (DCIS), invasive tumor confined to the breast, lymph node involvement, and metastases to distant organs.
Several large initiatives have identified somatic mutations in breast cancers. These include screening for mutations located in protein kinase genes (3) as well as the more global approach of analyzing a nearly complete set of human genes (4, 5). The latter investigations have demonstrated a bimodal distribution of mutations in breast and colon cancers. With regard to breast cancer these observations led to the proposal that the genomic landscape consists of “mountains” and “hills”, the mountains corresponding to the most frequently mutated genes, specifically TP53 and PIK3CA, and the hills consisting of hundreds of less frequently mutated cancer-associated genes. In addition to the identification of somatic mutations involving single base changes or small regions of DNA alteration, many studies of breast cancer have investigated genomic instability such as copy number alteration and DNA amplification and deletion affecting larger regions. Most of these studies of genomic alterations were conducted using array CGH (comparative genome hybridization) or cDNA arrays (some examples are described in references (6-9)). A couple of recent studies have used high density oligo arrays (10, 11). The most commonly amplified regions include 8p11, 8q24, 11q13, 12q14, 17q11, 17q21, and 20q13, with amplification of oncogenes such as ERBB2, MYC, CCND1, and MDM2 noted in multiple studies. However, most of these global genomic studies have not revealed any additional genes that contain alterations that potentially impact breast cancer development. Therefore, in this study we decided to look for focal amplification events that affect relatively small regions of genomic DNA, spanning a few hundred kb to a couple of Mb, with the goal of identifying novel oncogenes.
Tissue samples were provided by the Cooperative Human Tissue Network which is funded by the National Cancer Institute. The study was approved by the Institutional Review Board of the US National Cancer Institute. The clinical pathological data are described in Supplementary Table S1. Genomic DNA was prepared using the QIAamp DNA mini kit (Qiagen, Inc., Valencia, CA, USA). RNA was isolated from the tissues using RNAzol B (Tel-Test, Inc., Friendswood, TX, USA). We followed the original Affymetrix 500K or SNP5 protocols to obtain genotype data and copy number values. The GEO accession number for these array data is GSE16619. All the primers used in PCR and sequencing are described in Supplementary Table S2.
MCF10A and MCF10AT cell lines are maintained in DMEM/F12 supplemented with 5% horse serum, 10 μg/ml insulin, 20 ng/ml EGF, 0.5 μg/ml hydrocortisone, and 100 ng/ml cholera toxin. Culture media for MCF10CA1h and MCF10CA1a cell lines are DMEM/F12 plus 5% horse serum. The tissue microarray (TMA) slides used in this paper were constructed at Toyama University according to a previously described method (12). Rabbit polyclonal anti-IRX2 antibody (cat: ARP32188_T100; Aviva System Biology, San Diego, CA, USA) and mouse monoclonal anti-TBL1XR1 antibody (cat: sc-100908; Santa Cruz Biotechnology, Santa Cruz, CA, USA) were used.
Short hairpin RNA (shRNA) against TBL1XR1 RNA (TRCN0000060743), purchased from Openbiosystems (Huntsville, AL, USA), was transfected into the 293FT producer cell line using pPACKH1 Lentivector Packaging kit (System Biosciences, Mountain View, CA, USA) and Lipofectamine 2000 (Invitrogen, Carlsbad, CA, USA). Pseudoviral particles were isolated and used to transduce MCF10CA1h cells for 24 hours. The cells were grown for 48 hours post-transduction and then selected for stable transduced cells by the addition of 4 μg/ml puromycin for 7 days followed by culturing in medium without drug. A GFP-targeting shRNA was processed similarly and used as a control (control-shRNA). Experiments for evaluating phenotypes of TBL1XR1 knockdown were performed within 2 weeks using a pool of the transduced cells. Cell motility was analyzed with the scratch assay (13). Confluent cultures of the cells in 6-well plates were treated with 10 μg/ml mitomycin C for 2 hours to inhibit cell proliferation. One straight scratch line was produced using a p200 pipet tip. The cells were washed once with PBS and then changed to culture medium without mitomycin C. The width of the scratched area was measured at 0, 12, and 24 hours post-scratching. Cell invasion was analyzed using a tumor cell invasion system (BD Biosciences, San Jose, CA, USA). 2.5×104 cells were seeded in each insert well. After 60 hours, the insert wells were washed and scrubbed according to the manual. The cells were fixed in methanol, and the nuclei were stained with Hematoxylin and counted under a microscope. For in vivo mouse studies, cells were suspended in serum free DMEM/F12 medium, and 5×105 cells were injected into the no. 2 and no. 7 mammary fat pads of 6-8 week-old female athymic NCr nu/nu mice. Tumors were measured weekly with calipers and the volumes were calculated using the formula: (short × short × long dimensions) ×0.52 (14). The numbers of mice used were 5, 5, and 7 for MCF10CA1h, control-shRNA, and TBL1XR1-shRNA cell lines, respectively.
Affymetrix CNAT4.0 was used to normalize microarray data and to generate CNstate and log2ratios. Generally, we used the default parameters: bandwith of 100 kb, transition_decay at 1e-7, and no outlier_smoothing. For focal amplification, we used the bandwith of 1 kb. The CNstate ranges from 0 to 4: normal CN corresponds to CNstate 2; CNstates 0 and 1 indicate copy number loss; CNstates 3 and 4 correspond to copy number gain. For the data generated with the Affymetrix SNP5.0, we formatted the log2ratio generated by quantile normalization so that the data could be visualized with the Affymetrix Genotyping Console Browser. For gene-level copy number estimation, we simply calculated the average log2ratio for all the probesets mapped within a gene, between the transcription start and termination sites of the gene. The gene-level log2ratio was used to identify gene amplification/deletion in each tumor. All the statistical analyses (clustering, survival analysis, Fisher's exact test, generalized linear models) were conducted using the R package.
To gain a comprehensive understanding of the genetic events that delineate multiple stages of tumor progression including hyperplasia, invasion, and metastasis, we performed DNA copy number analysis using the Affymetrix 500K or SNP5 SNP arrays on breast primary tumors. DNA copy number analysis was performed on 161 tumors, including 10 DCIS and 151 invasive breast cancers, 90 of which were positive for lymph node metastases (see Supplementary Table S1 for clinical information). We are interested in identifying genomic regions that showed copy number gain or loss. Given that there have been extensive studies on DNA copy number alterations in breast cancer, we chose to target our search to focal amplification events that affect a few hundred kb regions, anticipating that this search would facilitate identification of novel oncogenes. Table 1 summarizes a list of the genes that mapped to the regions with amplification events. Our selection criteria required that an amplification event be present in a minimum of 2 tumors, with the amplification being focal in at least one of these tumors. Among the 17 loci listed in Table 1, six genes, ERBB2, FGFR1, MYC, CCND1, PIK3CA, and NCALD, had frequent amplification (amplified in at least 10 tumors). We used quantitative PCR (qPCR) to evaluate independently the level of gene amplification. A correlation was observed between DNA copy number estimated from SNP arrays compared to qPCR (as an example, R2 is 0.8 for ERBB2, Figure 1A. The example for MYC appears in Supplementary Figure S1).
We identified genetic alterations in the genes listed in Table 1. These included amplification of ERBB2, FGFR1, MYC, and CCND1, which have been extensively described in the literature. We also saw PIK3CA amplification, which has occasionally been noted in breast tumors; however, the primary reported alteration in this gene is somatic point mutation (15, 16). Although overall our results are consistent with previous studies, we report specific novel findings below. For example, amplification of the NCALD locus on chromosome 8q22 is very frequent in our samples (16 out of 161 tumors); importantly, the involved region is distinct from the MYC region at 8q24 (Supplementary Figure S2). The minimal amplification region of NCALD spans 1.6 Mb and still contains six other genes (Table 1). The frequent amplification of this region suggests that it potentially harbors a novel oncogene.
Next, we turned our attention to high-level (i.e. high copy number) focal amplification events, with the intention of identifying novel oncogenes, even though these regions were amplified only infrequently in our tumor samples. Among the 11 infrequent amplification regions, five loci contained single genes (Table 1 and Figure 1B and Supplementary Figure S3) and two loci had three genes in the minimal overlapping region of amplified DNA fragments (Table 1). Some of these infrequently amplified genes are well-known oncogenes, CCND2, EGFR, FGFR2, and NOTCH3. Rare amplification of EGFR and FGFR2 in breast tumors was reported in a recent publication (10). The minimal region of amplification at the NOTCH3 locus also contains BRD4 and ABHD9. A recent study suggests that genes whose expression is regulated by BRD4 activation might correlate with breast cancer survival (17). To investigate the effect of DNA copy number gain on gene expression, we measured expression of BRD4 and NOTCH3 using RT-qPCR. BRD4 gene expression was frequently elevated in tumors (Figure 1C). When BRD4 copy number gain is present, gene expression up-regulation is almost always observed (Figure 1C). The expression level of BRD4 in normal tissue was always low, which suggested that up-regulation of BRD4 gene expression is relevant to tumorigenesis (Figure 1C, the difference between normal and tumor in gene expression has p-value=0.01151 by t-test). NOTCH3 gene expression remained at a low level, comparable to the average value from the seven normal tissues, even for three tumors with high-level copy number gain. Moderate increases in gene expression were noted in a small subset of tumors but did not correlate with copy number gain (Figure 1C).
In the preceding paragraphs, we have discussed some results of high-level focal amplification events, a few of them involving well-known oncogenes. Next we concentrate on the characterization of novel oncogenes within the newly identified amplification loci.
Four high-level focal amplification regions contained single potentially novel oncogenes: IRX2, TBL1XR1, POLD3, and ASPH (Figure 1B and Supplementary Figure S3). We focused detailed molecular characterization on two of these genes, TBL1XR1 and IRX2.
IRX2 is a member of the Iroquois homeobox transcription factor family, which is involved in developmental pattern formation in multiple organs such as the brain and heart (18, 19). The expression of IRX2 in mammary gland development is particularly interesting, since the gene is expressed only in epithelial cells during development; IRX2 expression is absent from stromal cells and is reduced in differentiated ductal epithelial cells (20). In contrast, some breast cancers exhibit high-levels of IRX2 expression (20). Our gene expression analysis of IRX2 also showed that IRX2 was up-regulated in some breast tumors, in at least one case in association with gene amplification (Figure 1C). To characterize IRX2 protein expression in the MCF10A series of cell lines, we performed Western blot analysis and found that IRX2 protein was expressed at higher levels in the malignant cell lines, MCF10CA1h and MCF10CA1a, than in their precursor, MCF10A, a normal immortalized mammary epithelial cell line (Figure 2A), suggesting that up-regulation of IRX2 might be involved in cancer progression. This observation was further corroborated by IHC, with more intense nuclear staining in MCF10CA1a cells than MCF10A cells (Supplementary Figure S4).
To study oncogenic mechanisms of the IRX2 gene, we undertook RNA interference experiments using siRNA as well as shRNA. Despite numerous attempts, we were not able to generate breast cancer cell lines that could maintain a stable low-level expression of the IRX2 protein. A possible explanation is that knockdown of IRX2 inhibits proliferation or survival of the breast epithelial cells. To gain insight into potential oncogenic functions of IRX2, we performed IHC studies on TMAs to investigate IRX2 protein expression in primary breast tumors. Positive staining was observed in 66 of 85 tumors (77.6%), with 20 moderately positive and 46 strongly positive tumors, suggesting an association of high-level IRX2 expression with breast carcinogenesis (Figure 2B); 19 out of the 85 tumors showed negative staining. We did not detect a statistically significant association of IRX2 expression (presence versus absence) with any of the clinical phenotypes, including stage, tumor size, and lymph node invasion (data not shown). However, when different degrees of expression intensity were analyzed among the 66 positively staining tumors, comparison of the 20 IRX2+ to the 46 IRX2++ cases revealed a positive correlation of degree of IRX2 staining with tumor size (p=0.0288 by generalized linear model). This suggests that IRX2 may play a role in tumor cell proliferation and progression.
The second focally amplified gene that we characterized is TBL1XR1. Two recent studies showed that TBL1XR1 plays a pivotal role in releasing the repressive complex of co-repressors NcoR and SMRT following oncogenic activation of multiple pathways, including the Wnt, Notch, NF-κB, and nuclear receptor pathways (21, 22). Our RT-PCR analyses showed relatively constant, low levels of TBL1XR1 gene expression in most breast tumor and normal breast samples (Figure 1C). Similar to IRX2, TBL1XR1 protein was primarily located in nuclei (Supplementary Figure S4) and was detected in breast tumors that showed gene amplification (Supplementary Figure S5). Western blot analysis showed that TBL1XR1 expression increased progressively from MCF10A to the malignant cell lines (Figure 2A), suggesting a role for TBL1XR1 in cancer progression. Two protein bands were detected, corresponding to the α form (56 kDa) and β form (60 kDa) of TBL1XR1, which differ in their carboxyl end due to alternative splicing (23).
We further characterized TBL1XR1 in terms of its oncogenic functions using a lentiviral vector system to transduce MCF10CA1h cells with a shRNA targeting the TBL1XR1 gene. TBL1XR1 protein expression in shRNA-containing cells was examined by Western blot. Compared to parental cells or cells containing a control-shRNA, TBL1XR1-shRNA knockdown cells showed a nearly complete loss of TBL1XR1 protein expression (Figure 3A). In vitro cell growth was minimally reduced in TBL1XR1-shRNA cells (Supplementary Figure S6); however, a more prominent change was observed in cell migration, as analyzed by the scratch assay (Figure 3B). The difference in the cell migration between TBL1XR1-shRNA and control-shRNA experiments, quantified by the width of the scratched area, was highly significant (Figure 3B, p-value < 0.0001, t-test). Since cell migration is related to tumor cell invasion, we further characterized the ability of the cells to invade a basement membrane using Matrigel Matrix system (BD Biosciences). TBL1XR1-shRNA knockdown cells showed a marked reduction in cell invasion when compared to control-shRNA cells (Figure 3C). The difference in invasive cell numbers between control-shRNA and TBL1XR1-shRNA was highly significant (Figure 3C, p-value < 0.0001). Given that tumor cell invasion is a hallmark of carcinoma cells, the loss of cell invasion associated with the TBL1XR1 knockdown is consistent with an oncogenic role for this gene. A more rigorous test for tumorigenesis is examination of in vivo tumor growth. Therefore, we injected parental MCF10CA1h cells, control-shRNA cells, or TBL1XR1-shRNA cells into the mammary fat pads of nude mice. Mice injected with either the MCF10CA1h or control-shRNA cells started to develop tumors around 2 weeks (Figure 3D). In contrast, mice injected with the cells containing TBL1XR1-shRNA showed a marked reduction in tumor growth (p-value<0.001, t-test). Thus, our in vitro and in vivo studies of TBL1XR1 knockdown experiments provide strong supporting evidence that TBL1XR1 is a novel breast cancer oncogene.
To characterize the relationship between gene amplification and clinical pathological data, we performed two-way clustering analysis (Figure 4A). Among the noteworthy observations, one cluster of tumors was defined by ERBB2 amplification (depicted as yellow in Figure 4A). A small subset of these tumors also showed high or moderate MYC amplification (depicted as yellow or green in Figure 4A). Some of the infrequent high-level amplification events involve sets of genes, which identify a small group of samples. One such set comprises CCND2, IRX2, IRAK4, PRDM1, PIK3CA, and TBL1XR1 (6-gene), and another includes POLD3, CCND1, FGFR1, and FGFR2 (4-gene). We explored the relationship of these amplification features to survival in an indirect manner. Having shown that a positive correlation exists between some of the high-level amplification events and increased gene expression (Figure 1C), we utilized gene expression data from the public domain for patient samples for which survival data were available (24-27). We evaluated survival differences between the two groups of patients that formed at the top of the clustering tree using either the 6-gene set or the 4-gene set by Kaplan-Meier analyses. In Kaplan-Meier analysis the 6-gene set showed no difference in survival between the two groups (data not shown). In contrast, the 4-gene set containing POLD3, CCND1, FGFR1, and FGFR2 showed a significant difference in survival in 2 out of the 4 public datasets (Figure 4B).
Since we have the clinical phenotypes for the 161 tumors analyzed for copy number variation, we performed association tests between gene amplification and clinical phenotype (using the data in Supplementary Tables S1, S3, and S4; only those with a p-value less than 0.05 in the Fisher's exact test are included in Supplementary Table S5). As expected, ERBB2 amplification was positively associated with HER2+ status. We also found that FGFR1 amplification was positively associated with HER2+ status; CCND1 and POLD3 amplification was positively associated with PR+ status whereas MYC amplification was negatively associated with PR+ status; ERBB2 amplification was positively associated with tumor size.
Recent large scale mutation analyses of the genomes of multiple breast cancers revealed a mutation landscape consisting of “mountains” and “hills” (4). TP53 and PIK3CA were the only two genes that existed as “mountains”, with high mutation frequencies, whereas hundreds of other genes making up the “hills” showed rare mutations in the breast tumors. We conducted mutation analysis for the following five genes in 161 breast tumors: TP53 (exons 4-9), PIK3CA (exons 10 and 21), BRAF (exons 11 and 15), AKT1 (exon 3), and HRAS (exons 1 and 2). Consistent with published studies, only TP53, PIK3CA, and AKT1 showed frequent mutations. We identified 44 (27.3% of the 161 tumor samples) mutations in TP53, 25 (15.5%) mutations in PIK3CA, and 11 (6.8%) mutations in AKT1. The result of mutation analyses for TP53, PIK3CA, and AKT1 is shown in Figure 4A (red colors mark tumors with a mutation) and Supplementary Table S4. Analysis of mutations and gene amplifications revealed TP53 mutations to be positively associated with gene amplification of PIK3CA, CCND2, and NCALD (Supplementary Table S5), which is consist with the notion that the loss of TP53 causes genomic instability. Interestingly, PIK3CA mutation is also positively associated with PIK3CA amplification (Supplementary Table S5), a point that will be further discussed in the next section.
To evaluate whether an interaction also exists between activating mutations (Figure 5B) and copy number gain of PIK3CA in primary breast tumors (Figure 5A), we sequenced exon 10 and exon 21 of PIK3CA in the 161 tumors. We detected PIK3CA mutations in 25 out of 161 tumors (Figure 5B); 19 of 25 were H1047R and 4 were E545K. This 15.5% mutation rate was comparable to that noted in previously published works. When we analyzed PIK3CA mutation in relation to copy number gain, we found that 5 out of 10 tumors with copy number gain also harbored activation mutations (Figure 5C). The simultaneous occurrence of an activating mutation and copy number gain was highly significant (p-value=0.008968, odds ratio 6.4, Fisher's test). Interestingly, those tumors with both copy number gain and mutation had moderate levels of gain and were enriched for E545K and other non-H1047R mutations (Figure 5C). We noted that 3 out of 3 tumors with non-H1047R mutations had copy number gain whereas only 2 out of 17 tumors with H1047R mutations had copy number increase (Figure 5C). The result suggests that the H1047R mutation may have oncogenic features that are distinct from other PIK3CA mutations. There are three recent published studies that characterize PIK3CA mutations extensively. Two show no difference in growth rate (28) and enzymatic activities (29) between H1047R and E545K mutations. But the third study demonstrates that the two mutations are associated with different prognoses for disease-free survival (30), suggesting different oncogenic mechanisms, which cannot be explained by the similar enzymatic activity and in vitro cell growth rate. The relevance of both qualitative and quantitative changes of PIK3CA to tumor progression was also supported by our observations that all 10 DCIS lesions, in contrast to multiple invasive breast cancers, had neither PIK3CA mutation nor copy number gains (Supplementary Tables S1, S3, and S4).
In conclusion, we have identified the 17 loci focally amplified in primary breast tumors, 6 of which contain potential oncogenes and reflect novel findings in this study. Among the genes representing these 6 loci, only rarely was amplification observed in primary tumors. However, these rare amplification events provided signposts that allowed us to functionally evaluate the potential oncogenic roles of these genes. To this end, we used the experimental approach of RNA interference to characterize the effect of gene knockdowns. This strategy can be applied to the other candidate cancer-causing genes identified in our study. We have also described a finding of simultaneous gene amplification and mutation of the PIK3CA gene, suggesting that an additive effect of point mutation and copy number gain can contribute to oncogenesis.
We thank Cooperative Human Tissue Network for providing tumor tissues. We thank Drs. Kent Hunter and Daoud Meerzaman for critical reading of the manuscript. We thank Dr. Robert Clifford for providing bioinformatics support. We thank Dr. Stephen Hewitt for providing tissue microarray (TMA) slides. This research was supported by the Intramural Research Program of the NIH and the National Cancer Institute.