|Home | About | Journals | Submit | Contact Us | Français|
While genomic alterations identified in human tumors using techniques such as comparative genomic hybridisation (CGH) may be recurrent, they frequently encompass large regions, in some cases containing hundreds of genes. Here we combine high-resolution CGH analysis of 598 human cancer cell lines with insertion sites isolated from 1,005 mouse tumors induced with the Murine Leukaemia Virus (MuLV). This cross-species oncogenomic analysis revealed candidate tumor suppressor genes and oncogenes recurrently mutated in both human and mouse tumors, making them strong candidate cancer genes. A significant number of these genes contained binding sites for the transcription factors Oct4 and Nanog and mice carrying tumors with insertions in or near stem cell module genes, genes that are thought to participate in self-renewal, died significantly faster than mice without these insertions. The profile of MuLV insertions that we identified was compared to insertions isolated from 73 tumors induced using the Sleeping Beauty (SB) transposon system revealing significant differences in the profile of recurrently mutated genes. Collectively this work provides a rich catalogue of candidate genes for follow-up functional analysis.
Tumors form in humans when a cell gains a selective advantage over other cells and manages to evade the checkpoints that would normally suppress its growth, or result in apoptosis. The acquisition of this behaviour is thought to occur as a result of the development of somatic mutations that deregulate gene function (1). These somatic mutations sometimes interact with predisposing germline mutations to promote tumor formation, and it is the profile of somatic and germline mutations found in a tumor that ultimately dictate its presentation and clinical course (2). Somatic mutations in human tumors may result from a multitude of genetic insults generating different types of lesions in the genome (3). With the exception of point mutations, these lesions are rarely focal and often encompass many genes. Profiling allelic imbalances found in human tumors is a powerful tool for identifying cancer gene-containing loci, the most commonly used approach being array comparative genomic hybridisation (CGH) (4). Although the resolution of this technique has improved dramatically, copy number gains and losses in human tumors are usually large and rearrangements often encompass many genes that do not contribute to tumorigenesis. Therefore differentiating ‘driver’ cancer genes from ‘passenger’ genes requires validation in other systems.
Tumors in mice can be generated using insertional mutagens such as viruses (5, 6) and transposons (7, 8) and since these elements deregulate gene function either by integrating in or near a cancer gene, they ‘tag’ cancer loci, facilitating their identification. Viruses such as MuLV and the mouse mammary tumor virus (MMTV) have been used extensively for cancer gene identification. Screens using these viruses have been proven to identify relevant cancer genes since the genes Myb, Pim1 and Bmi1 were identified using these mutagens (5, 9), and were subsequently shown to be genes relevant to cancer formation in humans (9). Similarly, transposons such as Sleeping Beauty (SB) have been shown to be potent insertional mutagens in mice (7, 8, 10). Importantly, both viruses and transposons are particularly powerful tools for identifying co-operating mutations between genes, as was shown previously for Myc and Bim1 (11) and more recently for p19 and Braf(7), and for Notch1, Rasgrp1 and Sox8(8).
Cross-species cancer gene analysis, which integrates genome-wide cancer datasets from human and other species, represents a potentially powerful approach for identifying an validating genes involved in tumorigenesis. This approach has been used successfully in several instances, most recently to identify the cancer genes NEDD9 (12) in melanoma, and YAP1 (13) and DLC1 (14) in liver cancer. Here we present a high-resolution comparative oncogenomic analysis performed using CGH data from 598 human cancer cell lines and over 1,000 murine lymphomas. Using insertional mutagenesis datasets generated with both MuLV and the SB transposon system, we identify candidate cancer genes mutated in both mouse and human tumors and predict that some common insertion site (CIS) genes may play a role in driving a program of tumor self-renewal. This work significantly extends our previous study (6) in which we performed cross-species analysis on low resolution CGH data against fewer than 500 MuLV induced tumors, and provides a comprehensive genome-wide profile of candidate cancer genes at high resolution.
598 human cancer cell lines derived from 29 different tissues (Supplementary Table 1) were analysed using the Affymetrix Genome-Wide Human SNP 6.0 array. Several analytical approaches were trialled with DNA copy and Merge levels showing optimal results and use of compute resources Supplementary Table 2). Analysis was performed as described in the Supplementary Methods. All of the CGH data is available for download in MIAME format from: http://www.sanger.ac.uk/genetics/CGP/Archive/.
MuLV was used to induce tumors on a pure FVB background as described previously (6). SB tumors were induced on an F1 C57BL/6J-FVB background by breeding together an allele of the SB transposase knocked into the Rosa26 locus (8) and a low copy transposon line, LC76 (chromosome 1) or LC68 (chromosome 15), described previously (7). Tumors were collected when mice became moribund. The SB tumor panel we used in our analysis is described in detail in Collier et al., in press with the exception of 10 tumors, which were on a Bloom heterozygous background (unpublished). Immunophenotyping of lymphomas from this panel revealed that the majority are of T-cell origin (Collier et al., in press). Immunophenotyping of the MuLV tumors (6) indicated that they of either T-cell or B-cell origin.
These methods are provided in the Supplementary Methods.
Human orthologues of the mouse candidate genes and their genomic coordinates on NCBI 36 were extracted from Ensembl v45_36f. We chose a threshold of copy number ratio ≥1.7 for amplicons because this was the lowest threshold at which we observed an over-representation of orthologues of mouse candidates, compared with orthologues of other mouse genes, in amplified regions (P=8.46×10−3 using the 2-tailed Fisher Exact Test for genes amplified in 2 or more cell lines). Likewise, for deletions we chose a threshold of copy number ≤ 0.3, which was the highest threshold at which we observed an over-representation of orthologues in deleted regions in 1 or more cell lines (P=4.67×10−3). CNV data was obtained and processed as described in the Supplementary Methods. Shared regions of deletion and amplification in paediatric acute lymphoblastic leukaemias (ALL) were obtained from Mullighan et al (15).
COSMIC (16), Cancer Gene Census (3) and exon re-sequencing datasets were analyzed as described in the Supplementary Methods. The orthologues of all mouse genes were extracted from Ensembl v45 and the number of amplicons/deletions containing each gene was calculated. Non-CIS genes were ranked according to the number of amplicons/deletions in which they resided, and a P-value was calculated for each CIS gene by counting the number of non-CIS genes with a higher number of amplicons/deletions and dividing it by the total number of non-CIS genes. P-values for the over-representation of CIS genes in COSMIC (16), Cancer Gene Census and Sjoblom et al (17) datasets were calculated using the one-tailed Fisher Exact Test. Only genes with mouse orthologues were included in the analysis.
Ensembl identifiers and human orthologues were extracted from Ensembl BioMart. P-values for the over-representation of genes with Oct4 and Nanog binding sites among CIS genes were calculated using the one-tailed Fisher Exact Test. To perform this analysis, we utilised ChIP-PET data of 3,006 Nanog binding sites, 2,408 of which were found in 1,923 Ensembl mouse genes (18). Likewise, Oct4 binding sites in 817 mouse Ensembl genes, including 797 encoding proteins or miRNAs, was derived from 1,083 Oct4 binding sites (18). The ES cell module gene list was obtained from Wong et al (19).
Western blotting for Myc expression was performed using standard procedures. The antibody used for these experiments was anti-Myc (SC-42/C-33) from Santa Cruz.
Across the 598 cell lines the average number of statistically significant gains of copy number per cell line ≥ 1.7 was 34.03 (±36.57). The average size of these amplicons was 299.10 (±1667.93) kb and an average of 2.99 (±14.50) genes were found in each amplicon. The average number of statistically significant losses per cell line was 204.10 (±194.36). These losses were on average 196.87 (±3058.58) kb in size, encompassing 2.61 (±32.98) genes. Figure 1 shows the global overview of the distribution of the amplifications and deletions in this collection of cell lines, and in the haematopoietic subset. In total we identified 2,424 amplifications and 14,010 deletions across the entire cell line panel.
We generated 1,005 murine lymphomas by infecting newborn mice with MuLV as described previously (6). The majority of the tumors were from mice on a wildtype , p19 knockout  or p53 knockout  background. (Supplementary Table 3). Collectively, we generated 134,985 DNA sequencing reads from 1,734 splinkerette reactions. The insertion site sequences of a subset of these tumors have been published previously (6). We mapped 86,187 reads to the mouse genome assembly NCBI m36, identifying 22,579 insertion sites with an average of 22.47 (±11.30) insertions per tumor. These data were analysed using a kernel convolution (KC)-based algorithm (23) identifying 447 statistically significant common insertion sites (CIS) at a kernel width of 30 kb. The vast majority of these insertion sites were in genic regions of the genome. Candidate genes (Supplementary Table 4) were assigned to CIS using the criteria described in the Supplementary Methods.
We performed splinkerette reactions from both ends of 73 SB-induced tumors, generating 10,791 DNA insertion site reads. 6,281 of these reads could be mapped to the mouse genome, identifying 2,643 insertions sites, 35.72 (±18.77) per tumor. 70 of the tumors analyzed were lymphomas. 2 tumors were retrospectively classified as high-grade gliomas, and one a skin tumor. Using the KC framework (23), we identified 21 statistically significant CIS at a kernel width of 30 kb (Supplementary Table 5). Again, the majority of these CIS were in genic regions of the genome. 18 candidate genes were identified in the vicinity of these CIS using the criteria outlined in the Supplementary Methods. These CIS were filtered as described in the Supplementary Methods to remove CIS associated with local hopping, and other artifacts, which resulted in 9 CIS that were used for downstream analysis.
Having identified insertion sites and CIS from 1,005 MuLV-induced lymphomas and 70 SB -induced lymphomas, we compared their genome-wide distributions (Figure 2). The most frequently mutated genes in MuLV-induced lymphomas were Gfi1/Evi5, c-Myc /Pvt1 and Ccnd3. These genes had insertion densities of 427.28, 314.19 and 172.09, respectively, using the kernel convolution method of CIS detection (23) at a kernel width of 30 kb, which was determined to yield optimal sensitivity with this dataset. Remarkably, in the SB dataset, we found no insertions in or around these genes (P<0.0001). This may reflect the bias of retroviruses to insert themselves into particular sites in the genome. Similarly, we identified a CIS in the tumor suppressor gene Pten (6 tumors, P<0.05, Figure 2) in the SB panel, several tumors were found to contain multiple insertions in Pten which are presumably biallelic or insertions derived from tumor subclones, but we did not detect a single Pten insertion in any of the 1,005 tumors from the MuLV dataset (P<0.0001). This strongly suggests that the SB transposon (T2/Onc) used for these studies and MuLV are unique mutagens with complementary mutagenic profiles. Intriguingly we found that despite carrying no insertions in or near the oncogene Myc many SB tumors showed a significant upregulation in Myc protein levels (Figure 2). While there were distinct differences in the profile of genes mutated using MuLV and the SB transposon system, several genes were frequently mutated by both mutagens. These included Notch1, Myb, Ikzf1 and FliI.
Of the 439 CIS genes identified in MuLV-induced tumors, we were able to identify 384 orthologous genes within the human genome. Similarly, we were able to identify human orthologues for the 9 SB CIS genes. 69 human orthologues of mouse genes predicted to be mutated by MuLV were genes with mutations in the Catalog of Somatic Mutations in Cancer (COSMIC) (19), (P=1.36×10−9). Similarly, 36 of the human orthologues of mouse genes predicted to be mutated by MuLV were oncogenes described in the Cancer Gene Census, (P=7.88×10−18). In contrast, only 3 orthologues were found to be mutated in the Sjöblom et al (17) dataset, (P=0.74). This may reflect the fact that the Sjöblom dataset was an exon re-sequencing study of breast and bowel tumors exclusively, and therefore it may bias against those genes mutated in tumors of the haematopoietic system, and genes disrupted by large-scale rearrangements. Similarly, 5 genes from the SB dataset were also genes within the COSMIC database (P=4.04×10−4), and 6 were within the Cancer Gene Census (P=4.26×10−6). This analysis reveals that using MuLV or the SB transposon system for cancer gene discovery has significant predictive power for those genes relevant to tumor formation in humans.
There were 9,681 human genes with orthologues in the mouse genome found within amplicons of human tumors. 232 of these genes were retroviral CIS genes, which is greater than the number expected by chance (P=4.47×10−3). 27 CIS genes showed significant recurrent amplification in human compared with non-CIS genes (P<0.05) (Supplementary Table 6). 9 of these genes were designated dominant cancer genes in the Cancer Gene Census, a significantly higher number than expected by chance (P=2.85×10−4). 18 retroviral CIS genes showed recurrent deletion (P≤0.05) (Supplementary Table 6). 7 of these genes contained intragenic CIS, which is not significantly different to the number of other CIS genes with intragenic CIS (P=0.990). This probably reflects the fact that MuLV is primarily a dominantly acting mutagen. 5 genes (CCND2, ETV6, Lgals9, SDK1 and WWOX) were both significantly amplified and significantly deleted. This is a larger overlap than expected by chance (P=1.12×10−4) and may suggest that some of these genes reside in unstable regions of the genome. Indeed, several of the recurrently amplified or deleted genes overlap with regions of germline copy number variation (CNV) identified previously (21) (Supplementary Table 6). We also observed significant overlap of the copy number signatures in our survey of copy number alterations with those from a large CGH analysis of acute lymphoblastic leukemias, which provides cross platform validation (15). In addition, we performed the same analysis focusing just on the haematopoietic cell lines. The orthologues of 71 retroviral CIS genes were found within amplicons in human tumors of haematopoietic or lymphoid origin (Supplementary Table 7). 19 CIS gene orthologs were shown to be recurrently amplified across the haematopoietic and lymphoid subset of the tumor panel (P<0.05). 14 of these genes were also significantly amplified across all cell lines. 16 retroviral CIS genes were found in a significant number of deletions in tumors of haematopoietic and lymphoid origin (P<0.05), and 11 of these genes were also found to be mutated in the analysis using the entire collection of tumor cell lines (Supplementary Table 7). 6 of these genes contained intragenic CIS (Supplementary Table 7).
In an attempt to ascribe putative functions for the genes we identified in our analysis, we next set out to determine if they contained binding sites for the transcription factors Oct4 (24) and Nanog (25), which play an important role in embryonic stem (ES) cell self-renewal. Many genes implicated in the regulation of embryonic “stemness” have been shown to play a role in tumor self-renewal and aggressiveness (19, 26). Remarkably, there was a highly significant enrichment of genes containing Oct4 and Nanog binding sites among those genes linked to CIS in MuLV-induced mouse tumors (P=1.64×10−5 and P=5.86×10−4, for Oct4 and Nanog respectively). None of the genes linked to SB CIS had Nanog or Oct4 binding sites (P=1 for both tests) but this may reflect the small size of the dataset. Mutations in ES cell module genes, of which the presence of Oct4 or Nanog binding sites is a common feature, have been proposed to be predictive of tumor aggressiveness (19, 26). We found that mice who carried tumors with MuLV insertions in or near ES cell module genes (19) became moribund at a significantly accelerated rate compared to mice who carried tumors without mutations linked to ES cell module genes (Figure 3; P<0.0001). The most frequently mutated ES cell module genes were Myc, Myb and Notch1 for MuLV, while Notch1 and Myb were the only ES cell module genes mutated by SB, (Supplementary Table 6 & 7).
New high-throughput genomic analysis techniques such as massively parallel sequencing and ultra high-resolution CGH are identifying remarkable heterogeneity in cancer genomes (27), implicating a multitude of genes and pathways in oncogenesis and cancer progression. Determining which of these rearrangements have actually driven tumor initiation and progression will be a significant undertaking. Ideally, validation of genetic rearrangements should involve systematic experimental evaluation. However, few of the experimental approaches that may be used for validating cancer genes are high-throughput and, with the exception of animal models, most are unable to faithfully recapitulate the genetic and cellular context in which cancers form. Forward genetic screens in mice are a powerful tool for cancer gene discovery because tumors are formed via somatic mutation and, like human tumors, undergo a process of evolution resulting in the emergence of a malignant clone (9). When used in combination as part of a comparative oncogenomics approach, high-resolution analysis of human cancer genomes by CGH and insertion sites derived from mouse tumors, represents a powerful way of identifying new genes relevant to oncogenesis.
In this study, we identified 27 genes that were recurrently amplified in human tumors where the orthologous mouse gene was a site of clonal retroviral insertions in murine lymphomas induced using MuLV (Supplementary Table 6). Similarly, we identified 18 genes that were recurrently deleted in human tumors and were also CIS genes in the MuLV dataset (Supplementary Table 6). Using the same approach we identified 19 recurrently amplified and 16 recurrently deleted CIS genes by comparison to the CGH data for the haematopoietic subset of the tumor panel (Supplementary Table 7). Reassuringly, we identified known dominantly active oncogenes from the Cancer Gene Census (3) and also genes from the COSMIC (16) database that are somatically mutated in human cancers. Many of the genes that we predict to be potential cancer genes were, however, novel. Importantly, several of the genes we identified in our analysis were found in regions of the genome either recurrently amplified or deleted in a large survey of human acute lymphoblastic leukemias (15), which provides cross platform validation. Several genes, such as WWOX, were both recurrently amplified and deleted (Supplementary Table 6 & 7). This may reflect the fact that these genes are located in unstable or fragile regions of the genome (28). Indeed, many of the genes that we identified in our analysis were also found to be located in CNV regions of the human genome (29). This does not exclude them from being cancer genes but may indicate something of the underlying genomic architecture in which they reside. Intriguingly we observed several deletions that removed the entire NOTCH1 locus, and other deletions that removed internal exons of NOTCH1 and potentially result in the formation of oncogenic NOTCH-IC protein (Supplementary Figure 1). Similarly, we observed a recurrent exon specific deletion within ETS1 that potentially generates a neomorphic allele (Supplementary Figure 2). Importantly, there were several CIS genes identified in our analysis (Supplementary Table 6 & 7) that are designated as dominantly active in the cancer gene census (3), but which we found to be deleted in our panel of human tumors. These include Etv6 (15) and Bcl11b (30). It is possible that these genes function in both gain and loss of function roles in tumorigenesis. One of the most compelling genes we identified in our analysis was the protein tyrosine phosphatase type IVA, member 3 gene (Ptp4a3), which was amplified in 10 tumors and contained multiple intragenic insertions (Supplementary Table 6). PTP genes are a small class of prenylated protein tyrosine phosphatases implicated in many cellular processes including growth.
Just as a cross-species oncogenomics approach is a powerful method of identifying genes that may be of importance in human cancer formation, performing forward genetic screens in mice with multiple mutagens is potentially a powerful way of identifying functionally important cancer genes. In this study, we isolated insertion site sequences from tumors generated using both MuLV and the SB transposon system. Analysis revealed that these mutagens have a remarkably different mutagenic profile. While Myc/Pvt1, GfiI/Evi5 and Ccnd3 were frequently mutated in MuLV tumors, this was not the case in SB tumors (Figure 2). Chi-squared analysis of the mutation profiles revealed a statistically significant difference in each case (P<0.0001). Similarly Pten was mutated in 6 of the 73 SB tumors but not in any of the 1,005 MuLV tumors (P<0.0001). The fact that we did not detect SB insertions in or around Myc is striking since activation of MYC is a critical event in the development of many forms of human lymphoma and because one of the transposon donors was located on chromosome 15, the same chromosome as Myc, which should have favored insertions into Myc by local hopping. To investigate this further, we took 10-SB induced thymic lymphomas and performed Western blotting to compare the level of Myc protein expression to wild-type thymus (Figure 2). In at least 5 cases we observed elevated Myc protein levels. The fact that there are no insertions in or near Myc in these SB tumors raised the question of whether the T2/Onc transposon is capable of inserting near this gene and activating expression. Possibly the MSCV promoter in T2/Onc is in an unfavorable context to activate Myc, or that the Myc locus is in an unfavorable context for SB transposition, or that the Myc locus is amplified which would make insertions into Myc redundant. Similarly, we observed no SB insertions in or near Gfi1, which was frequently mutated by MuLV. In the experiments described in this paper the mice treated with MuLV were on a pure FVB background while the SB tumors were collected from mice that were on a hybrid C57BL/6J-FVB background. It is possible that some of the differences in the insertion profiles we describe are due to different preferences for viral or transposon integration on these different genetic backgrounds. However, insertions of MuLV into Myc and Gfi1 have been shown to occur on most genetic backgrounds including in C57BL/6J hybrids (31, 32). The observation that SB tumors contain insertions in Pten, which were not found in MuLV-induced tumors, is in keeping with the suggestion that Pten plays an important role in T-cell lymphomagenesis (33). As we have shown previously (Collier et al., - in press) immunophenotyping of the SB tumors we used in our analysis revealed that the majority were CD4/CD8 double positive T-cell tumors, or were B220+ and therefore B-cell derived. The occasional SB tumor appeared to have two malignant cell clones. MuLV-induced tumors are either CD3+ or B220+, i.e. either T- or B-cell derived (6). It remains possible that some of the differences between the insertion profiles observed between the SB and MuLV tumors may be due to different sub-types of disease.
In this study, we also illustrate that genes with Oct4 and Nanog binding sites are enriched in genes found to be at MuLV CIS and that MuLV insertions in or near stem cell module genes is predictive of decreased survival, (Figure 3). Importantly, the most frequent stem cell module genes mutated were Myc, Myb and Notch 1. Using immunophenotyping data for 349 of the MuLV tumors (6) we determined that there was not a significant difference in the CD3 (T-cell) or B220 (B-cell) marker status between tumors with or without insertions linked to stem cell module genes, although the sub-classification of these lymphomas with additional markers may be revealing. Finally, we also identified overrepresented KEGG and GO pathways, and Pfam domains, in our analysis. Not surprisingly, these pathways and genes included those implicated in haematopoiesis, development, and in important cellular processes such as cell division and transcription.
In conclusion, we have performed extensive cross-species comparative analysis, identifying a large number of candidate cancer genes that now represent worthy targets for further functional validation in model systems. We also illustrate that cross-species oncogenomics is a powerful tool for cancer gene identification.
D.J.A is supported by Cancer Research-UK and the Wellcome Trust. JK, AGU, LW, JJ, AB and MvL are supported by the NWO Genomics program and the Netherlands Genomics Initiative/Netherlands Organization for Scientific Research (NWO). J.d.R. was supported by the BioRange program of the Netherlands Bioinformatics Centre (NBIC), which is supported by a BSIK grant through the Netherlands Genomics Initiative (NGI). J.K and A.U. were supported by the Cancer Genomics Centre through the Netherlands Genomics Initiative (NGI), The Cancer Genome Project is supported by the Wellcome Trust. LvdW is supported by the Kay Kendall Leukemia Fund. LSC is supported by the National Cancer Institute (K01CA122183) and an American Cancer Society pre-doctoral fellowship. We wish to acknowledge L. Bendzick, V. Maklakova and M. Derezinski for technical assistance. The University of Minnesota has a pending patent on the process of using transposons such as SB for cancer gene discovery. DAL and LSC are named amongst the inventors.
Conflict of Interest Statement
The authors declare that they have no conflict of interest.