Search tips
Search criteria 


Logo of wtpaEurope PMCEurope PMC Funders GroupSubmit a Manuscript
Cancer Res. Author manuscript; available in PMC 2010 August 1.
Published in final edited form as:
PMCID: PMC2880710

Large-scale cross-species oncogenomics identifies candidate oncogenes and tumor suppressor genes


While genomic alterations identified in human tumors using techniques such as comparative genomic hybridisation (CGH) may be recurrent, they frequently encompass large regions, in some cases containing hundreds of genes. Here we combine high-resolution CGH analysis of 598 human cancer cell lines with insertion sites isolated from 1,005 mouse tumors induced with the Murine Leukaemia Virus (MuLV). This cross-species oncogenomic analysis revealed candidate tumor suppressor genes and oncogenes recurrently mutated in both human and mouse tumors, making them strong candidate cancer genes. A significant number of these genes contained binding sites for the transcription factors Oct4 and Nanog and mice carrying tumors with insertions in or near stem cell module genes, genes that are thought to participate in self-renewal, died significantly faster than mice without these insertions. The profile of MuLV insertions that we identified was compared to insertions isolated from 73 tumors induced using the Sleeping Beauty (SB) transposon system revealing significant differences in the profile of recurrently mutated genes. Collectively this work provides a rich catalogue of candidate genes for follow-up functional analysis.

Keywords: Cross-species analysis, insertional mutagenesis, bioinformatics, oncogenomics, comparative genomic hybridization


Tumors form in humans when a cell gains a selective advantage over other cells and manages to evade the checkpoints that would normally suppress its growth, or result in apoptosis. The acquisition of this behaviour is thought to occur as a result of the development of somatic mutations that deregulate gene function (1). These somatic mutations sometimes interact with predisposing germline mutations to promote tumor formation, and it is the profile of somatic and germline mutations found in a tumor that ultimately dictate its presentation and clinical course (2). Somatic mutations in human tumors may result from a multitude of genetic insults generating different types of lesions in the genome (3). With the exception of point mutations, these lesions are rarely focal and often encompass many genes. Profiling allelic imbalances found in human tumors is a powerful tool for identifying cancer gene-containing loci, the most commonly used approach being array comparative genomic hybridisation (CGH) (4). Although the resolution of this technique has improved dramatically, copy number gains and losses in human tumors are usually large and rearrangements often encompass many genes that do not contribute to tumorigenesis. Therefore differentiating ‘driver’ cancer genes from ‘passenger’ genes requires validation in other systems.

Tumors in mice can be generated using insertional mutagens such as viruses (5, 6) and transposons (7, 8) and since these elements deregulate gene function either by integrating in or near a cancer gene, they ‘tag’ cancer loci, facilitating their identification. Viruses such as MuLV and the mouse mammary tumor virus (MMTV) have been used extensively for cancer gene identification. Screens using these viruses have been proven to identify relevant cancer genes since the genes Myb, Pim1 and Bmi1 were identified using these mutagens (5, 9), and were subsequently shown to be genes relevant to cancer formation in humans (9). Similarly, transposons such as Sleeping Beauty (SB) have been shown to be potent insertional mutagens in mice (7, 8, 10). Importantly, both viruses and transposons are particularly powerful tools for identifying co-operating mutations between genes, as was shown previously for Myc and Bim1 (11) and more recently for p19 and Braf(7), and for Notch1, Rasgrp1 and Sox8(8).

Cross-species cancer gene analysis, which integrates genome-wide cancer datasets from human and other species, represents a potentially powerful approach for identifying an validating genes involved in tumorigenesis. This approach has been used successfully in several instances, most recently to identify the cancer genes NEDD9 (12) in melanoma, and YAP1 (13) and DLC1 (14) in liver cancer. Here we present a high-resolution comparative oncogenomic analysis performed using CGH data from 598 human cancer cell lines and over 1,000 murine lymphomas. Using insertional mutagenesis datasets generated with both MuLV and the SB transposon system, we identify candidate cancer genes mutated in both mouse and human tumors and predict that some common insertion site (CIS) genes may play a role in driving a program of tumor self-renewal. This work significantly extends our previous study (6) in which we performed cross-species analysis on low resolution CGH data against fewer than 500 MuLV induced tumors, and provides a comprehensive genome-wide profile of candidate cancer genes at high resolution.

Materials & Methods

Comparative Genomic Hybridisation (CGH)

598 human cancer cell lines derived from 29 different tissues (Supplementary Table 1) were analysed using the Affymetrix Genome-Wide Human SNP 6.0 array. Several analytical approaches were trialled with DNA copy and Merge levels showing optimal results and use of compute resources Supplementary Table 2). Analysis was performed as described in the Supplementary Methods. All of the CGH data is available for download in MIAME format from:

Insertional Mutagenesis and Mouse Tumor Panels

MuLV was used to induce tumors on a pure FVB background as described previously (6). SB tumors were induced on an F1 C57BL/6J-FVB background by breeding together an allele of the SB transposase knocked into the Rosa26 locus (8) and a low copy transposon line, LC76 (chromosome 1) or LC68 (chromosome 15), described previously (7). Tumors were collected when mice became moribund. The SB tumor panel we used in our analysis is described in detail in Collier et al., in press with the exception of 10 tumors, which were on a Bloom heterozygous background (unpublished). Immunophenotyping of lymphomas from this panel revealed that the majority are of T-cell origin (Collier et al., in press). Immunophenotyping of the MuLV tumors (6) indicated that they of either T-cell or B-cell origin.

Insertion Site Isolation, analysis and assigning insertions to genes

These methods are provided in the Supplementary Methods.

Cross-species Analysis

Human orthologues of the mouse candidate genes and their genomic coordinates on NCBI 36 were extracted from Ensembl v45_36f. We chose a threshold of copy number ratio ≥1.7 for amplicons because this was the lowest threshold at which we observed an over-representation of orthologues of mouse candidates, compared with orthologues of other mouse genes, in amplified regions (P=8.46×10−3 using the 2-tailed Fisher Exact Test for genes amplified in 2 or more cell lines). Likewise, for deletions we chose a threshold of copy number ≤ 0.3, which was the highest threshold at which we observed an over-representation of orthologues in deleted regions in 1 or more cell lines (P=4.67×10−3). CNV data was obtained and processed as described in the Supplementary Methods. Shared regions of deletion and amplification in paediatric acute lymphoblastic leukaemias (ALL) were obtained from Mullighan et al (15).

Cancer mutation datasets

COSMIC (16), Cancer Gene Census (3) and exon re-sequencing datasets were analyzed as described in the Supplementary Methods. The orthologues of all mouse genes were extracted from Ensembl v45 and the number of amplicons/deletions containing each gene was calculated. Non-CIS genes were ranked according to the number of amplicons/deletions in which they resided, and a P-value was calculated for each CIS gene by counting the number of non-CIS genes with a higher number of amplicons/deletions and dividing it by the total number of non-CIS genes. P-values for the over-representation of CIS genes in COSMIC (16), Cancer Gene Census and Sjoblom et al (17) datasets were calculated using the one-tailed Fisher Exact Test. Only genes with mouse orthologues were included in the analysis.

Analysis of Oct4 and Nanog Transcription Factor Binding sites and Embryonic Stem Cell (ES) Module Genes

Ensembl identifiers and human orthologues were extracted from Ensembl BioMart. P-values for the over-representation of genes with Oct4 and Nanog binding sites among CIS genes were calculated using the one-tailed Fisher Exact Test. To perform this analysis, we utilised ChIP-PET data of 3,006 Nanog binding sites, 2,408 of which were found in 1,923 Ensembl mouse genes (18). Likewise, Oct4 binding sites in 817 mouse Ensembl genes, including 797 encoding proteins or miRNAs, was derived from 1,083 Oct4 binding sites (18). The ES cell module gene list was obtained from Wong et al (19).

Western Blotting for Myc expression

Western blotting for Myc expression was performed using standard procedures. The antibody used for these experiments was anti-Myc (SC-42/C-33) from Santa Cruz.

Pathways Analysis

We analyzed MuLV CIS genes to determine if there were overrepresented Pfam (20) domains, KEGG (21) or GO pathways using the DAVID Tool (22).


Comparative Genomic Hybridisation

Across the 598 cell lines the average number of statistically significant gains of copy number per cell line ≥ 1.7 was 34.03 (±36.57). The average size of these amplicons was 299.10 (±1667.93) kb and an average of 2.99 (±14.50) genes were found in each amplicon. The average number of statistically significant losses per cell line was 204.10 (±194.36). These losses were on average 196.87 (±3058.58) kb in size, encompassing 2.61 (±32.98) genes. Figure 1 shows the global overview of the distribution of the amplifications and deletions in this collection of cell lines, and in the haematopoietic subset. In total we identified 2,424 amplifications and 14,010 deletions across the entire cell line panel.

Figure 1
Global overview of genome-wide high-resolution comparative genomic hybridization (CGH) of 598 cancer cell lines

Analysis of lymphomas induced using MuLV

We generated 1,005 murine lymphomas by infecting newborn mice with MuLV as described previously (6). The majority of the tumors were from mice on a wildtype [231], p19 knockout [228] or p53 knockout [126] background. (Supplementary Table 3). Collectively, we generated 134,985 DNA sequencing reads from 1,734 splinkerette reactions. The insertion site sequences of a subset of these tumors have been published previously (6). We mapped 86,187 reads to the mouse genome assembly NCBI m36, identifying 22,579 insertion sites with an average of 22.47 (±11.30) insertions per tumor. These data were analysed using a kernel convolution (KC)-based algorithm (23) identifying 447 statistically significant common insertion sites (CIS) at a kernel width of 30 kb. The vast majority of these insertion sites were in genic regions of the genome. Candidate genes (Supplementary Table 4) were assigned to CIS using the criteria described in the Supplementary Methods.

Analysis of SB Tumor Panel

We performed splinkerette reactions from both ends of 73 SB-induced tumors, generating 10,791 DNA insertion site reads. 6,281 of these reads could be mapped to the mouse genome, identifying 2,643 insertions sites, 35.72 (±18.77) per tumor. 70 of the tumors analyzed were lymphomas. 2 tumors were retrospectively classified as high-grade gliomas, and one a skin tumor. Using the KC framework (23), we identified 21 statistically significant CIS at a kernel width of 30 kb (Supplementary Table 5). Again, the majority of these CIS were in genic regions of the genome. 18 candidate genes were identified in the vicinity of these CIS using the criteria outlined in the Supplementary Methods. These CIS were filtered as described in the Supplementary Methods to remove CIS associated with local hopping, and other artifacts, which resulted in 9 CIS that were used for downstream analysis.

The genome-wide distribution of insertion sites in MuLV- and SB-induced lymphomas

Having identified insertion sites and CIS from 1,005 MuLV-induced lymphomas and 70 SB -induced lymphomas, we compared their genome-wide distributions (Figure 2). The most frequently mutated genes in MuLV-induced lymphomas were Gfi1/Evi5, c-Myc /Pvt1 and Ccnd3. These genes had insertion densities of 427.28, 314.19 and 172.09, respectively, using the kernel convolution method of CIS detection (23) at a kernel width of 30 kb, which was determined to yield optimal sensitivity with this dataset. Remarkably, in the SB dataset, we found no insertions in or around these genes (P<0.0001). This may reflect the bias of retroviruses to insert themselves into particular sites in the genome. Similarly, we identified a CIS in the tumor suppressor gene Pten (6 tumors, P<0.05, Figure 2) in the SB panel, several tumors were found to contain multiple insertions in Pten which are presumably biallelic or insertions derived from tumor subclones, but we did not detect a single Pten insertion in any of the 1,005 tumors from the MuLV dataset (P<0.0001). This strongly suggests that the SB transposon (T2/Onc) used for these studies and MuLV are unique mutagens with complementary mutagenic profiles. Intriguingly we found that despite carrying no insertions in or near the oncogene Myc many SB tumors showed a significant upregulation in Myc protein levels (Figure 2). While there were distinct differences in the profile of genes mutated using MuLV and the SB transposon system, several genes were frequently mutated by both mutagens. These included Notch1, Myb, Ikzf1 and FliI.

Figure 2
MuLV and SB insertions across the mouse genome

Cross-species comparative analysis of human cancer datasets with the MuLV and SB datasets

Of the 439 CIS genes identified in MuLV-induced tumors, we were able to identify 384 orthologous genes within the human genome. Similarly, we were able to identify human orthologues for the 9 SB CIS genes. 69 human orthologues of mouse genes predicted to be mutated by MuLV were genes with mutations in the Catalog of Somatic Mutations in Cancer (COSMIC) (19), (P=1.36×10−9). Similarly, 36 of the human orthologues of mouse genes predicted to be mutated by MuLV were oncogenes described in the Cancer Gene Census, (P=7.88×10−18). In contrast, only 3 orthologues were found to be mutated in the Sjöblom et al (17) dataset, (P=0.74). This may reflect the fact that the Sjöblom dataset was an exon re-sequencing study of breast and bowel tumors exclusively, and therefore it may bias against those genes mutated in tumors of the haematopoietic system, and genes disrupted by large-scale rearrangements. Similarly, 5 genes from the SB dataset were also genes within the COSMIC database (P=4.04×10−4), and 6 were within the Cancer Gene Census (P=4.26×10−6). This analysis reveals that using MuLV or the SB transposon system for cancer gene discovery has significant predictive power for those genes relevant to tumor formation in humans.

Cross-species comparative analysis of the human CGH and mouse MuLV datasets

There were 9,681 human genes with orthologues in the mouse genome found within amplicons of human tumors. 232 of these genes were retroviral CIS genes, which is greater than the number expected by chance (P=4.47×10−3). 27 CIS genes showed significant recurrent amplification in human compared with non-CIS genes (P<0.05) (Supplementary Table 6). 9 of these genes were designated dominant cancer genes in the Cancer Gene Census, a significantly higher number than expected by chance (P=2.85×10−4). 18 retroviral CIS genes showed recurrent deletion (P≤0.05) (Supplementary Table 6). 7 of these genes contained intragenic CIS, which is not significantly different to the number of other CIS genes with intragenic CIS (P=0.990). This probably reflects the fact that MuLV is primarily a dominantly acting mutagen. 5 genes (CCND2, ETV6, Lgals9, SDK1 and WWOX) were both significantly amplified and significantly deleted. This is a larger overlap than expected by chance (P=1.12×10−4) and may suggest that some of these genes reside in unstable regions of the genome. Indeed, several of the recurrently amplified or deleted genes overlap with regions of germline copy number variation (CNV) identified previously (21) (Supplementary Table 6). We also observed significant overlap of the copy number signatures in our survey of copy number alterations with those from a large CGH analysis of acute lymphoblastic leukemias, which provides cross platform validation (15). In addition, we performed the same analysis focusing just on the haematopoietic cell lines. The orthologues of 71 retroviral CIS genes were found within amplicons in human tumors of haematopoietic or lymphoid origin (Supplementary Table 7). 19 CIS gene orthologs were shown to be recurrently amplified across the haematopoietic and lymphoid subset of the tumor panel (P<0.05). 14 of these genes were also significantly amplified across all cell lines. 16 retroviral CIS genes were found in a significant number of deletions in tumors of haematopoietic and lymphoid origin (P<0.05), and 11 of these genes were also found to be mutated in the analysis using the entire collection of tumor cell lines (Supplementary Table 7). 6 of these genes contained intragenic CIS (Supplementary Table 7).

Identification of Nanog and Oct4 binding sites in MuLV CIS genes and the effect of mutations in ES cell module genes on tumor latency

In an attempt to ascribe putative functions for the genes we identified in our analysis, we next set out to determine if they contained binding sites for the transcription factors Oct4 (24) and Nanog (25), which play an important role in embryonic stem (ES) cell self-renewal. Many genes implicated in the regulation of embryonic “stemness” have been shown to play a role in tumor self-renewal and aggressiveness (19, 26). Remarkably, there was a highly significant enrichment of genes containing Oct4 and Nanog binding sites among those genes linked to CIS in MuLV-induced mouse tumors (P=1.64×10−5 and P=5.86×10−4, for Oct4 and Nanog respectively). None of the genes linked to SB CIS had Nanog or Oct4 binding sites (P=1 for both tests) but this may reflect the small size of the dataset. Mutations in ES cell module genes, of which the presence of Oct4 or Nanog binding sites is a common feature, have been proposed to be predictive of tumor aggressiveness (19, 26). We found that mice who carried tumors with MuLV insertions in or near ES cell module genes (19) became moribund at a significantly accelerated rate compared to mice who carried tumors without mutations linked to ES cell module genes (Figure 3; P<0.0001). The most frequently mutated ES cell module genes were Myc, Myb and Notch1 for MuLV, while Notch1 and Myb were the only ES cell module genes mutated by SB, (Supplementary Table 6 & 7).

Figure 3
Mice carrying MuLV induced tumors with common insertion sites (CIS) linked to embryonic stem cell (ES) module genes have significantly reduced survival compared to mice without mutations affecting these genes

Pathways Analysis

KEGG, GO and DAVID analysis revealed an overrepresentation of MuLV (Supplementary Table 8) and SB (Supplementary Table 9) CIS genes in pathways known to participate in cancer formation and haematopoiesis. Kinase domains were also overrepresented in MuLV CIS genes.


New high-throughput genomic analysis techniques such as massively parallel sequencing and ultra high-resolution CGH are identifying remarkable heterogeneity in cancer genomes (27), implicating a multitude of genes and pathways in oncogenesis and cancer progression. Determining which of these rearrangements have actually driven tumor initiation and progression will be a significant undertaking. Ideally, validation of genetic rearrangements should involve systematic experimental evaluation. However, few of the experimental approaches that may be used for validating cancer genes are high-throughput and, with the exception of animal models, most are unable to faithfully recapitulate the genetic and cellular context in which cancers form. Forward genetic screens in mice are a powerful tool for cancer gene discovery because tumors are formed via somatic mutation and, like human tumors, undergo a process of evolution resulting in the emergence of a malignant clone (9). When used in combination as part of a comparative oncogenomics approach, high-resolution analysis of human cancer genomes by CGH and insertion sites derived from mouse tumors, represents a powerful way of identifying new genes relevant to oncogenesis.

In this study, we identified 27 genes that were recurrently amplified in human tumors where the orthologous mouse gene was a site of clonal retroviral insertions in murine lymphomas induced using MuLV (Supplementary Table 6). Similarly, we identified 18 genes that were recurrently deleted in human tumors and were also CIS genes in the MuLV dataset (Supplementary Table 6). Using the same approach we identified 19 recurrently amplified and 16 recurrently deleted CIS genes by comparison to the CGH data for the haematopoietic subset of the tumor panel (Supplementary Table 7). Reassuringly, we identified known dominantly active oncogenes from the Cancer Gene Census (3) and also genes from the COSMIC (16) database that are somatically mutated in human cancers. Many of the genes that we predict to be potential cancer genes were, however, novel. Importantly, several of the genes we identified in our analysis were found in regions of the genome either recurrently amplified or deleted in a large survey of human acute lymphoblastic leukemias (15), which provides cross platform validation. Several genes, such as WWOX, were both recurrently amplified and deleted (Supplementary Table 6 & 7). This may reflect the fact that these genes are located in unstable or fragile regions of the genome (28). Indeed, many of the genes that we identified in our analysis were also found to be located in CNV regions of the human genome (29). This does not exclude them from being cancer genes but may indicate something of the underlying genomic architecture in which they reside. Intriguingly we observed several deletions that removed the entire NOTCH1 locus, and other deletions that removed internal exons of NOTCH1 and potentially result in the formation of oncogenic NOTCH-IC protein (Supplementary Figure 1). Similarly, we observed a recurrent exon specific deletion within ETS1 that potentially generates a neomorphic allele (Supplementary Figure 2). Importantly, there were several CIS genes identified in our analysis (Supplementary Table 6 & 7) that are designated as dominantly active in the cancer gene census (3), but which we found to be deleted in our panel of human tumors. These include Etv6 (15) and Bcl11b (30). It is possible that these genes function in both gain and loss of function roles in tumorigenesis. One of the most compelling genes we identified in our analysis was the protein tyrosine phosphatase type IVA, member 3 gene (Ptp4a3), which was amplified in 10 tumors and contained multiple intragenic insertions (Supplementary Table 6). PTP genes are a small class of prenylated protein tyrosine phosphatases implicated in many cellular processes including growth.

Just as a cross-species oncogenomics approach is a powerful method of identifying genes that may be of importance in human cancer formation, performing forward genetic screens in mice with multiple mutagens is potentially a powerful way of identifying functionally important cancer genes. In this study, we isolated insertion site sequences from tumors generated using both MuLV and the SB transposon system. Analysis revealed that these mutagens have a remarkably different mutagenic profile. While Myc/Pvt1, GfiI/Evi5 and Ccnd3 were frequently mutated in MuLV tumors, this was not the case in SB tumors (Figure 2). Chi-squared analysis of the mutation profiles revealed a statistically significant difference in each case (P<0.0001). Similarly Pten was mutated in 6 of the 73 SB tumors but not in any of the 1,005 MuLV tumors (P<0.0001). The fact that we did not detect SB insertions in or around Myc is striking since activation of MYC is a critical event in the development of many forms of human lymphoma and because one of the transposon donors was located on chromosome 15, the same chromosome as Myc, which should have favored insertions into Myc by local hopping. To investigate this further, we took 10-SB induced thymic lymphomas and performed Western blotting to compare the level of Myc protein expression to wild-type thymus (Figure 2). In at least 5 cases we observed elevated Myc protein levels. The fact that there are no insertions in or near Myc in these SB tumors raised the question of whether the T2/Onc transposon is capable of inserting near this gene and activating expression. Possibly the MSCV promoter in T2/Onc is in an unfavorable context to activate Myc, or that the Myc locus is in an unfavorable context for SB transposition, or that the Myc locus is amplified which would make insertions into Myc redundant. Similarly, we observed no SB insertions in or near Gfi1, which was frequently mutated by MuLV. In the experiments described in this paper the mice treated with MuLV were on a pure FVB background while the SB tumors were collected from mice that were on a hybrid C57BL/6J-FVB background. It is possible that some of the differences in the insertion profiles we describe are due to different preferences for viral or transposon integration on these different genetic backgrounds. However, insertions of MuLV into Myc and Gfi1 have been shown to occur on most genetic backgrounds including in C57BL/6J hybrids (31, 32). The observation that SB tumors contain insertions in Pten, which were not found in MuLV-induced tumors, is in keeping with the suggestion that Pten plays an important role in T-cell lymphomagenesis (33). As we have shown previously (Collier et al., - in press) immunophenotyping of the SB tumors we used in our analysis revealed that the majority were CD4/CD8 double positive T-cell tumors, or were B220+ and therefore B-cell derived. The occasional SB tumor appeared to have two malignant cell clones. MuLV-induced tumors are either CD3+ or B220+, i.e. either T- or B-cell derived (6). It remains possible that some of the differences between the insertion profiles observed between the SB and MuLV tumors may be due to different sub-types of disease.

In this study, we also illustrate that genes with Oct4 and Nanog binding sites are enriched in genes found to be at MuLV CIS and that MuLV insertions in or near stem cell module genes is predictive of decreased survival, (Figure 3). Importantly, the most frequent stem cell module genes mutated were Myc, Myb and Notch 1. Using immunophenotyping data for 349 of the MuLV tumors (6) we determined that there was not a significant difference in the CD3 (T-cell) or B220 (B-cell) marker status between tumors with or without insertions linked to stem cell module genes, although the sub-classification of these lymphomas with additional markers may be revealing. Finally, we also identified overrepresented KEGG and GO pathways, and Pfam domains, in our analysis. Not surprisingly, these pathways and genes included those implicated in haematopoiesis, development, and in important cellular processes such as cell division and transcription.

In conclusion, we have performed extensive cross-species comparative analysis, identifying a large number of candidate cancer genes that now represent worthy targets for further functional validation in model systems. We also illustrate that cross-species oncogenomics is a powerful tool for cancer gene identification.

Supplementary Material

Supplementary Table 1-5

Supplementary Table 6

Supplementary Table 7

Supplementary Table: pfam analysis of MuLV CIS Genes / KEGG analysis of MuLV CIS Genes / GO analysis of MuLV CIS Genes

Supplementary Table: pfam analysis of SB CIS Genes / KEGG analysis of SB CIS Genes / GO analysis of SB CIS Genes

Supplementary Figure 1

Supplementary Figure 2

Supplementary Methods


D.J.A is supported by Cancer Research-UK and the Wellcome Trust. JK, AGU, LW, JJ, AB and MvL are supported by the NWO Genomics program and the Netherlands Genomics Initiative/Netherlands Organization for Scientific Research (NWO). J.d.R. was supported by the BioRange program of the Netherlands Bioinformatics Centre (NBIC), which is supported by a BSIK grant through the Netherlands Genomics Initiative (NGI). J.K and A.U. were supported by the Cancer Genomics Centre through the Netherlands Genomics Initiative (NGI), The Cancer Genome Project is supported by the Wellcome Trust. LvdW is supported by the Kay Kendall Leukemia Fund. LSC is supported by the National Cancer Institute (K01CA122183) and an American Cancer Society pre-doctoral fellowship. We wish to acknowledge L. Bendzick, V. Maklakova and M. Derezinski for technical assistance. The University of Minnesota has a pending patent on the process of using transposons such as SB for cancer gene discovery. DAL and LSC are named amongst the inventors.


Conflict of Interest Statement

The authors declare that they have no conflict of interest.


1. Goymer P. Natural selection: The evolution of cancer. Nature. 2008;454:1046–8. [PubMed]
2. Gabriel S. Variation in the human genome and the inherited basis of common disease. Semin Oncol. 2006;33:S46–9. [PubMed]
3. Futreal PA, Coin L, Marshall M, et al. A census of human cancer genes. Nat Rev Cancer. 2004;4:177–83. [PMC free article] [PubMed]
4. Pinkel D, Segraves R, Sudar D, et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet. 1998;20:207–11. [PubMed]
5. Mikkers H, Berns A. Retroviral insertional mutagenesis: tagging cancer pathways. Adv Cancer Res. 2003;88:53–99. [PubMed]
6. Uren AG, Kool J, Matentzoglu K, et al. Large-scale mutagenesis in p19(ARF)- and p53-deficient mice identifies cancer genes and their collaborative networks. Cell. 2008;133:727–41. [PMC free article] [PubMed]
7. Collier LS, Carlson CM, Ravimohan S, Dupuy AJ, Largaespada DA. Cancer gene discovery in solid tumours using transposon-based somatic mutagenesis in the mouse. Nature. 2005;436:272–6. [PubMed]
8. Dupuy AJ, Akagi K, Largaespada DA, Copeland NG, Jenkins NA. Mammalian mutagenesis using a highly mobile somatic Sleeping Beauty transposon system. Nature. 2005;436:221–6. [PubMed]
9. Uren AG, Kool J, Berns A, van Lohuizen M. Retroviral insertional mutagenesis: past, present and future. Oncogene. 2005;24:7656–72. [PubMed]
10. Collier LS, Largaespada DA. Transforming science: cancer gene identification. Curr Opin Genet Dev. 2006;16:23–9. [PubMed]
11. van Lohuizen M, Verbeek S, Scheijen B, Wientjens E, van der Gulden H, Berns A. Identification of cooperating oncogenes in E mu-myc transgenic mice by provirus tagging. Cell. 1991;65:737–52. [PubMed]
12. Kim M, Gans JD, Nogueira C, et al. Comparative oncogenomics identifies NEDD9 as a melanoma metastasis gene. Cell. 2006;125:1269–81. [PubMed]
13. Zender L, Spector MS, Xue W, et al. Identification and validation of oncogenes in liver cancer using an integrative oncogenomic approach. Cell. 2006;125:1253–67. [PMC free article] [PubMed]
14. Xue W, Krasnitz A, Lucito R, et al. DLC1 is a chromosome 8p tumor suppressor whose loss promotes hepatocellular carcinoma. Genes Dev. 2008;22:1439–44. [PubMed]
15. Mullighan CG, Goorha S, Radtke I, et al. Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia. Nature. 2007;446:758–64. [PubMed]
16. Forbes SA, Bhamra G, Bamford S, et al. The Catalogue of Somatic Mutations in Cancer (COSMIC) Curr Protoc Hum Genet. 2008 Chapter 10:Unit 10 1. [PMC free article] [PubMed]
17. Sjoblom T, Jones S, Wood LD, et al. The consensus coding sequences of human breast and colorectal cancers. Science. 2006;314:268–74. [PubMed]
18. Loh YH, Wu Q, Chew JL, et al. The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nat Genet. 2006;38:431–40. [PubMed]
19. Wong DJ, Liu H, Ridky TW, Cassarino D, Segal E, Chang HY. Module map of stem cell genes guides creation of epithelial cancer stem cells. Cell Stem Cell. 2008;2:333–44. [PMC free article] [PubMed]
20. Finn RD, Tate J, Mistry J, et al. The Pfam protein families database. Nucleic Acids Res. 2008;36:D281–8. [PMC free article] [PubMed]
21. Kanehisa M, Araki M, Goto S, et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008;36:D480–4. [PMC free article] [PubMed]
22. Dennis G, Jr., Sherman BT, Hosack DA, et al. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003;4:P3. [PubMed]
23. de Ridder J, Uren A, Kool J, Reinders M, Wessels L. Detecting statistically significant common insertion sites in retroviral insertional mutagenesis screens. PLoS Comput Biol. 2006;2:e166. [PubMed]
24. Nichols J, Zevnik B, Anastassiadis K, et al. Formation of pluripotent stem cells in the mammalian embryo depends on the POU transcription factor Oct4. Cell. 1998;95:379–91. [PubMed]
25. Chambers I, Colby D, Robertson M, et al. Functional expression cloning of Nanog, a pluripotency sustaining factor in embryonic stem cells. Cell. 2003;113:643–55. [PubMed]
26. Ben-Porath I, Thomson MW, Carey VJ, et al. An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors. Nat Genet. 2008;40:499–507. [PMC free article] [PubMed]
27. Campbell PJ, Stephens PJ, Pleasance ED, et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet. 2008;40:722–9. [PMC free article] [PubMed]
28. Durkin SG, Glover TW. Chromosome fragile sites. Annu Rev Genet. 2007;41:169–92. [PubMed]
29. Redon R, Ishikawa S, Fitch KR, et al. Global variation in copy number in the human genome. Nature. 2006;444:444–54. [PMC free article] [PubMed]
30. Kamimura K, Mishima Y, Obata M, Endo T, Aoyagi Y, Kominami R. Lack of Bcl11b tumor suppressor results in vulnerability to DNA replication stress and damages. Oncogene. 2007;26:5840–50. [PubMed]
31. Broussard DR, Mertz JA, Lozano M, Dudley JP. Selection for c-myc integration sites in polyclonal T-cell lymphomas. J Virol. 2002;76:2087–99. [PMC free article] [PubMed]
32. Dudley JP, Mertz JA, Rajan L, Lozano M, Broussard DR. What retroviruses teach us about the involvement of c-Myc in leukemias and lymphomas. Leukemia. 2002;16:1086–98. [PubMed]
33. Lenz G, Wright GW, Emre NC, et al. Molecular subtypes of diffuse large B-cell lymphoma arise by distinct genetic pathways. Proc Natl Acad Sci U S A. 2008;105:13520–5. [PubMed]