|Home | About | Journals | Submit | Contact Us | Français|
The cyclin dependent kinase (CDK) inhibitors p15, p16, p21 and p27 are frequently deleted, silenced or downregulated in many malignancies. Inactivation of CDK inhibitors predisposes mice to tumor development demonstrating that these genes can act as tumor suppressors. Here we describe high-throughput murine leukemia virus (MuLV) insertional mutagenesis screens in mice deficient for one or a combination of two CDK inhibitors. We retrieved 9117 retroviral insertions from 476 lymphomas and find hundreds of loci that are mutated significantly more frequently than expected by chance. Many of these are skewed toward a specific genetic context of predisposing germline and somatic mutations. We also find associations between these loci and gender, age of tumor onset and with lymphocyte lineage (B or T cell). Comparison of retroviral insertion sites with SNPs associated with chronic lymphocytic leukemia (CLL) reveals significant overlap between these datasets. Together these data highlight the importance of genetic context within large-scale mutation detection studies and demonstrate a novel use for insertional mutagenesis data in prioritization of disease associated genes resulting from genome-wide association studies.
Recent mutation profiling of human tumors has implicated hundreds of genes in the pathogenesis of cancer; however, the vast majority of these genes are only rarely mutated, making it difficult to determine which events are causal “driver” mutations and which are incidental “passenger” mutations (reviewed in (1)). Retroviral insertional mutagenesis screens performed in mouse models of cancer are a useful complement to studies of human tumors, because they provide an independent validation of oncogenic mutations in another organism (2). In these screens, slow transforming retroviruses are used to induce mutations within somatic tissues (3). Over the lifetime of the mouse these mutations accumulate, eventually promoting clonal expansion of cells bearing multiple oncogenic mutations. Since viral insertion sites can be easily identified by PCR, a high proportion of the mutations in each tumor can be identified with relatively little effort. This coverage allows for identification of genetic interactions between mutations at rates that are not yet possible from the study of human tumors. Furthermore, the use of mice allows the identification of cancer genes that collaborate with specific cancer-predisposing lesions that have been introduced into the germline.
Cyclin/cyclin dependent kinase (CDK) complexes promote the progression of cells through the cell cycle. Deregulation of CDKs is implicated in the progression of a variety of cancers including hematological malignancies (reviewed in (4)). Based on structural and functional characteristics, two families of CDK inhibitors can be distinguished: the Ink4 family (p16Ink4a, p15Ink4b, p18Ink4c and p19Ink4d) and the Cip/Kip family (p21Cip1/Waf1/Sdi1, p27Kip1 and p57Kip2)(reviewed in (5)). The Ink4 proteins specifically inhibit cyclinD-cdk4-6 complexes and expression of Ink4 genes results in a G1-phase-cell-cycle arrest. p15Ink4b and p16Ink4b as well as p19Arf (which shares part of its coding sequence with p16Ink4a in a different reading frame), are located in the same chromosomal region that is frequently deleted in human cancer (6). Germline knockout studies have shown that loss of p15 and p16 predisposes mice for tumor development, confirming their role as tumor suppressor genes (7-10).
The Cip/Kip proteins have a broader spectrum of substrates, binding to cyclin D-, cyclin E- and cyclin A-dependent kinase complexes. The p27 and p21 genes are rarely subject to inactivating mutations in human cancer, but can be inactivated by other mechanisms (reviewed by (5)). Mouse studies have clearly indicated that p27 is a tumor suppressor gene (11, 12). Germline disruption of p21 may have both anti- and pro-oncogenic effects depending on the genetic context (13-15).
We performed retroviral insertional mutagenesis in mice deficient for one or more Ink4 and/or Cip/Kip family members in order to identify genes that collaborate with absence of these CDK inhibitors in cancer. Analyzing tumors from 476 mice we identified 9117 insertions, finding hundreds of common insertion sites (CISs) targeting known cancer genes as well as novel candidate cancer genes. Many of these CISs are specifically mutated in mice lacking one or more CDK inhibitors. Furthermore, we find CISs correlating with the cell type of origin and latency of the tumor, as well as co-occurring or mutually exclusive combinations of mutations. We also found a significant correlation between the location of SNPs associated with familial CLL and CISs, indicating that insertional mutagenesis data may be of use in identifying disease-associated genes.
Some tumors were described in prior studies (16-18). p15Ink4b-deficient mice are described in (7). Newborn pups were injected i.p. with supernatant of MuLV-producing NIH3T3s. Animals were monitored for the development of tumors, moribund mice were sacrificed and tumors were isolated.
Ligation-mediated splinkerette PCRs were performed as described (19). Two ligations were performed for each tumor, using DNA digested with Sau3AI and Tsp509I. PCR products were shotgun subcloned and 96 colonies per PCR reaction were picked and sequenced on a capillary sequencer. Insertions will be available online at the Mutapedia1 and RTCGD databases2.
Sequences were mapped onto NCBI mouse build 37 using exonerate (20). Sequences from a single mouse with an identical insertion site were merged. CISs were identified and their significance estimated using the kernel convolution method described in (21). CIS labels were manually annotated based on viral position and literature analysis. P-values for associations between CISs and genotype, gender, age, FACS markers and co-occurring mutations were calculated using Fisher's exact test. Identification of genes and loci with more than one insertion per tumor at a higher frequency than expected by chance is as described (22), however for random permutations inserts were shuffled between mice of the same genotype rather than between all genotypes together to minimize false positives.
To identify genes that collaborate in tumorigenesis with deficiency for p15, p16/p19Arf, p21 and p27, we performed retroviral insertional mutagenesis in mice deficient for one or a combination of these genes. p27−/− mice developed tumors significantly faster than wild-type mice. p21 deficiency had no effect on tumor latency even when combined with loss of p27 (17) (Supplementary Table S1). Mice deficient for both p19Arf and p16 also show accelerated tumor formation upon MuLV infection (18), whereas p15 deficiency significantly decelerates tumor formation compared to wild-type controls (Logrank test, p-value < 0.04).
To identify the genes that are mutated by retroviral insertions we amplified the flanking sequences of the insertions using a ligation-mediated PCR protocol followed by shotgun subcloning and sequencing as described previously (19, 22). Previously 931 insertion sites were cloned from a subset of 189 of these tumors (16, 18): an average of 6.6 inserts per tumor. In the current study, using an improved PCR protocol in combination with shotgun subcloning, we identified 9117 independent insertions from 476 tumors (~19 insertions/tumor) (Supplementary Table S2).
Loci that are mutated at a frequency higher than expected by random chance may contain cancer genes. To identify these common insertion sites (CISs), we created a density distribution of insertions over the genome using smoothed Gaussian kernels, and estimated significance of each peak by comparison to 10,000 permutations of insertions randomly distributed over the genome (21). Using a kernel width of 30 kb, we identified CISs for the separate tumor panels (Figure 1A). We identified many CISs near known or candidate cancer genes, and their prevalence varies between genotypes. For example, some CISs are uniquely found in single knockouts but not in the compound knockouts and vice versa (Supplementary Table S3).
We next asked which CISs are more frequently mutated in one genotype versus another. For this and subsequent analyses we pooled our insertions with approximately 11,000 insertions from 510 MuLV-induced wild-type, p19Arf- and p53-deficient tumors (22) to extend the number of different genotypes that can be compared, as well as to improve the power of any statistical tests that use all tumors. After identifying CISs (596 using a 30kb kernel, and 300 using a 300kb kernel, Figure 1A and 1B, Supplementary Table S4), we determined which CISs are mutated significantly more frequently in one genotype versus the other genotypes by pairwise Fisher's exact tests (Supplementary Table S5 lists all significant associations). To visualize multiple pairwise comparisons per CIS we represented the associations of each CIS as a heat map.
The top 12 CISs are mutated significantly more frequently in some genotypes than others (Supplementary Figure S1). Myc and Mycn are structurally and functionally equivalent in many respects (23). Consistently, both genes show a significant bias towards p27-deficient tumors (Figure 2A) however, Myc but not Mycn is selected against in p15−/− and p16−/− p19Arf−/− tumors. The genotype-specificities of the Cyclin D family members Ccnd3 and Ccnd1 also differ. Ccnd3 is less frequently mutated in p16−/− p19Arf−/− mice compared to other genotypes including p19Arf−/− mice, whereas the opposite is found for Ccnd1 (Figure 2B), suggesting that activation of Ccnd1 but not Ccnd3 provides a selective advantage in the absence of p16. A similar difference in genotype-specificities is found for mutations near the miRNA cistron encoding mmu-mir-106-363 and its paralog mmu-mir-17-92 (Figure 2B). This differential specificity of paralogs is a recurrent phenomenon; CISs for Runx1 and Runx3, Pim1 and Pim2 and Rasgrp1 and Rasgrp2 are also selected for in different genetic backgrounds suggesting distinct roles in tumorigenesis despite their homology (Supplementary Figure S2).
Some CISs are surprisingly specific for compound genotypes but not single knockouts (Zfp217 and Fli1/Ets1 are selected for more frequently in p15−/− p21−/− mice than p15−/− or p21−/− single knockouts) or for heterozygotes but not homozygotes (CISs near Sox4 or Cebpb/A530013C23Rik/Ptpn1 are more frequently mutated in p16+/− p19Arf+/− compared to wild types or p16−/− p19Arf−/−) (Supplementary Figure S3). Heat maps for full sets of genotype associations are available online (http://mutapedia.nki.nl).
In a similar fashion, we examined the association of all CISs with gender (Table 1). The Trim25/Dgke CIS is selectively mutated in female mice compared to males. Trim25 is an estrogen-responsive gene that promotes breast cancer growth by targeting the anti-proliferative 14-3-3s for proteolysis (24). Mutation of two Irf family members, Irf4 and Irf2, is selected for in female mice (at the 30 kb and 300 kb scales respectively (Table 1 and data not shown)). These oncogenes are thought to act via inhibition of transcription of the tumor suppressor Irf1. Female mice lacking Dev6 (a negative regulator of Irf4 (25)) develop a systemic autoimmune disorder more frequently and at an earlier age than males. This lymphadenopathy is similar to human systemic lupus erythematosus, which also occurs more frequently in females than males (26).
Since average tumor latency differs between genotypes (Supplementary Table S1), we reasoned that the insertion profile of tumors would also significantly influence tumor latency. To separate the effects of genotype on latency from the effect of insertions, we normalized tumor latencies between all genotypes by assigning each mouse a percentile lifespan within its cohort. Using these values two pooled cohorts were assembled, one containing the short latency mice from each genotype (< 50th percentile) and a cohort that contained the long latency mice from each genotype (> 50th percentile). Next, we examined what CISs were significantly more frequently mutated in the ‘short-latency’ cohort versus the ‘long-latency’ cohort and vice versa (Table 2).
Gfi1, Mycn and Tbxa2r are most preferentially mutated in the ‘short-latency’ cohort, as well as the CIS upstream of Myc. In contrast, CISs found downstream of Myc that may also affect Pvt1 expression (27) are more frequently mutated in mice with a longer lifespan, suggesting distinct effects in tumorigenesis. Notch1, Lfng and Ikzf1 (Ikaros) have been found to collaborate in tumorigenesis (28) and are mutated in the ‘long-latency’ cohort indicating that mutation of these genes is either not as potent as others, or is only tolerated or selected for after other predisposing mutations have taken place.
We analyzed many of the tumors from the p19Arf−/−, p53−/− and wild-type screen for T or B cell content using T and B cell-specific markers (CD3 and B220, respectively) for FACS analysis (22). We separated splenic and thymic tumors and ranked them on T or B cell content to investigate if CISs were preferentially mutated in tumors with high or low percentages of T or B cells. Both Flt3 and Kdr (Vegfr2), known oncogenes in human hematopoietic malignancies, are selectively mutated in splenic tumors with a high content of B220-positive cells, suggesting that these genes might contribute to B cell lymphomagenesis (29, 30) (Supplementary Table S6). Conversely, mutations near mmu-mir-106-363 were found in spleen tumors with high levels of CD3-positive cells and low levels of B220-positive cells, indicating that this gene may be involved in the development of T cell tumors.
We and others have previously observed strong selection for or against co-mutation of certain CISs (22, 31). To identify co-occurring and mutually exclusive mutations we performed two analyses: First, we pooled the insertions from the current study with our previous study of p53−/−, p19Arf−/− and wild-type tumors in order to maximize the statistical power of our tests for co-mutation. However, we also postulated that pooling unlike genotypes might create false interactions between CISs that are enriched in one genotype versus another. To correct for this effect in CIS interactions, we also calculated interactions for all the genotypes separately. The increased number of tumors over our previous study gives higher statistical power to many of the associations found previously (22) and identifies hundreds of new interactions many of which are skewed towards particular genotypes (Supplementary Table S7A and S7B).
Many of the interactions that were significant in the pooled data showed a similar trend in one or more genotypes analyzed separately. For example, using a 300 kb window we find significant co-occurrence of inserts near Myb/Ahi1 and Rras2 and significant mutual exclusivity of Myc/Pvt1 and Mycn in multiple genotypes (Figure 3A). Some recurrent interactions appear to be remarkably genotype-specific for mice lacking both alleles of p19Arf (p19Arf−/− and p16−/− p19Arf−/−) (Figure 3B) or for p16−/− p19Arf−/− and p53−/− mice (Figure 3C). It is important to note that many of the mutually exclusive interactions are not observed in individual panels (FliI and the mmu-mir-106a cluster, Rras2 and Map3K8) (Figure 3D) suggesting that bias of these CISs towards different genotypes is at least partially responsible for the rarity of these co-mutations. Thus whilst pooling panels lends statistical power to identification of co-occurring and mutually exclusive mutations, it is important to simultaneously stratify tumors into separate subpanels to observe whether these interactions are an indirect byproduct of genotype-specificity.
Using a 30 kb kernel width, many associations appear to be recurrent for different CISs of the same locus. For instance, the 30 kb CIS of Gata1 significantly co-occurs with Runx1 30 kb CISs #1, #5 and #6 (Supplementary Table S7A). Gata1 mutations removing the N-terminus of the protein are frequently detected in Down syndrome patients with transient myeloproliferative disorder and acute megakaryoblastic leukemia (32) and even some patients with no apparent hematopoietic disorders. This suggests that selection for Gata1 mutations is a direct consequence of trisomy 21. Human chromosome 21 encodes more than 250 protein-coding genes. Runx1 is one of 7 loci associated with Gata1 in our screens, but it is the only one located within regions of the mouse genome sharing a common origin with human chromosome 21. Runx1 and Gata1 are transcription factors that bind each other and cooperate to activate genes involved in hematopoietic differentiation (33). The role of Runx1 dosage in trisomy 21-induced Gata1 mutation remains controversial (34), however, concomitant mutation of these loci in mouse lymphoma supports the hypothesis that Runx1 is at least one of the genes on chromosome 21 promoting selection for Down syndrome-associated Gata1 mutation.
Another novel interaction with precedent in the literature is the co-occurrence of mutations in Lck and the Stat5a/Stat5b/Stat3 locus. The Lck kinase is required for phosphorylating the Stat5a and Stat5b transcription factors in response to T cell receptor signaling and enhances DNA binding of Stat3, Stat5a and Stat5b (35, 36), supporting a scenario in which Lck and Stats collaborate in tumorigenesis (Supplementary Table S7A).
MuLV insertions can inactivate tumor suppressor genes through disruption of transcripts. It is difficult to distinguish which insertions are likely to be activating or inactivating based solely on location and orientation of the insertions. We previously hypothesized that the presence of more than one insertion within the transcribed region of a gene in cancer suggests selection for loss of both copies (i.e. the gene is a tumor suppressor). However, the most frequently mutated oncogenes may also have more than one insertion within them through chance (either in the same cell or in different subclones). We have previously distinguished between these events by looking for genes that have more than one insertion within the transcribed region within the same tumor more frequently than expected by chance (22).
Using similar methodology in the current analysis, pooling insertions from all panels, we find 82 genes with more than one insert per tumor (Supplementary Table S8). 44 genes have more than one insertion per tumor at frequencies higher than expected by chance, including some new candidate tumor suppressor genes like Rere (p-value 0.007) and Anks1 (or Odin, p-value 0.01). Rere was found to be located in the minimally defined loss of heterozygosity region at 1p36.2-p36.1 in a neuroblastoma cell line (37). Embryonic fibroblasts from Anks1-deficient mice exhibited a hyperproliferative phenotype compared with wild-type fibroblasts, consistent with a role for Anks1/Odin as a negative regulator of growth factor receptor signaling (38, 39).
For some genes there appears to be selection against a second insertion within the gene in the same tumor. Insertions within the Mycn transcript are suspected to be stabilizing the transcript (40). Of the 78 inserts within this gene, only one tumor carries more than one insertion, whereas in randomized data on average 3.9 tumors have more than one insert in this gene. Similarly, significant selection against more than one intragenic insertion is also observed within Ahi1 and Pvt1. The simplest explanation may be that mutation of the first allele of each of these genes is sufficient to remove any selection for a second mutation.
A number of verified tumor suppressor loci that would be anticipated to be downregulated during tumor development (Ikzf1, p53, Nf1), have insertions outside the transcribed region. Thus, rather than limiting our analysis to gene boundaries we also looked for CIS windows with more than one insertion per tumor at frequencies higher than expected by chance (Supplementary Table S9). The most significant CIS in this analysis was Cbfa2t3 (Core-binding factor, runt domain, alpha subunit 2, translocated to 3), which aside from its involvement in an AML translocation (41) is suspected of being a tumor suppressor in human breast cancer (42). Smyd4, the second most significant CIS in this analysis is implicated to be a tumor suppressor gene in breast cancer development since its expression is lost in a subset of human breast tumors and suppression of Smyd4 expression stimulates proliferation of mammary cells in vitro (43). Retroviral insertions may inactivate these genes by mutation of their promoters and enhancers or methylation induced by proviral sequences might silence these genes (44).
A recent genome-wide association study of 511 CLL cases (155 with a family history of the disease) using 346,000 single nucleotide polymorphism (SNP) arrays identified 49 SNPs distributed over 35 loci that were significantly associated with the disease. 17 SNPs were chosen for further validation, and 7 were found to significantly associate with disease in two independent validation cohorts (45). Observing that one of the validated loci is orthologous to one of our CISs (IRF4), we tested whether the entire set of 49SNPs/35 loci from the first phase of screening significantly overlaps with our insertions.
Each of the 49 SNPs was mapped to its orthologous position in the mouse genome using the Ensembl Compara database. To avoid redundancy, loci bearing more than one SNP were grouped into a single coordinate representing the average position of all SNPs, yielding 35 loci in total. Using windows ranging from 10 kb up to 300 kb surrounding each orthologous position we observed enrichment of our insertions within these windows (Table 3). To estimate the significance of this overlap, we compared the number of insertions within windows surrounding the orthologous loci of SNPs to 1000 permutations of windows surrounding random loci (random gene start sites). For all window sizes, orthologous loci of SNPs had more insertions than expected for random loci. The degree and significance of this enrichment varied between a maximum of 2.61 fold (70 kb window p-value 0.016) and a minimum of 1.56 fold (300 kb window p-value 0.066). In addition to finding retroviral inserts within 150kb of five of the six loci that were previously validated in independent CLL patient cohorts (ACOXL/BCL2L11, IRF4, CR17890/GRAMD1B, AK097902/BC029061, PRKD2/FKRP/SLC1A5)(45), we also find inserts near a number of SNPs not significantly associated with disease in the validation cohorts (which might justify further investigation of these SNPs in larger validation cohorts) and some SNPs not chosen for rescreening in the validation cohorts (suggesting these may warrant screening in a validation cohort) (Supplementary Table S10).
We isolated 9117 retroviral insertions from a cohort of 478 MuLV-induced tumors derived from mice deficient for one or a combination of two CDK inhibitors to identify genes contributing to tumorigenesis in these backgrounds. Combining this dataset with insertions from a previous screen, we identified 596 CISs at a kernel width of 30kb, more than 250 of which had not been found in our previous screen. The 596 CISs are enriched for loci in the vicinity of paralogs of human cancer genes ((22), Mattison et al., submitted). By comparison of 13 genotypes we have illustrated that given sufficient statistical power, the majority of oncogenes and tumor suppressors mutated in these screens are greatly influenced by deficiency for alleles of one or more CDK inhibitors. We also observe that whilst some combinations of retroviral mutations are recurrent in many genotypes, others are clearly dependent on a specific genetic background.
MuLVs can give rise to lymphoid, myeloid and erythroid tumors with the relative frequency of these being influenced by both host genotype and virus substrain. Ecotropic MuLVs infect cells through the mCAT-1/Slc7a1 receptor (46), which is expressed in a variety of hematopoietic lineages, making the cell type of origin of these tumors unclear. Germline mutation of Cdk inhibitors may shift the balance between cell populations in the lymphoid compartment and thus affect the cellular composition of the tumors (17). Thus whilst the prevalence of CISs in each genotype may stem from cell-intrinsic effects of tumor suppressor loss within the tumor-initiating cells, they may also be influenced by the relative size of different hematopoietic populations within each genotype.
Resources for validation in genome-wide association studies are currently limiting. Integration of independently derived datasets such as mouse retroviral insertions and human genome-wide association studies can help in prioritizing validation of loci, particularly those with no prior suggested role in cancer. Such comparisons are also useful because in many cases the relationship of SNPs to disease genes is unclear. Whilst the position of our retroviral insertions in mice frequently implicates the gene that is nearest to the SNPs in humans (as is the case for SNPs and inserts near Ccnd2 and Runx3) this is not always the case (Supplementary Table S10). For example, one SNP on chromosome 21 is nearest to Cryaa, however six insertions in the orthologous region on chromosome 17 in mice appear most likely to deregulate Snf1LK, an AMP activated kinase implicated in cell cycle regulation which is found to be overexpressed in murine lymphoma (47, 48). Snf1LK is a regulator of CREB1, a transcription factor best characterized for its neuronal functions but which is also implicated in leukemia (49). Cross-species comparisons can also be usefully combined with genetic interactions as illustrated by the co-mutation of Gata1 and Runx1. Similar inferences could be made for whole chromosome gains and losses in human tumors where the context of other mutations within the same tumor is known.
We find hundreds of significant associations between CISs and deficiency for different CDK inhibitors. These associations may serve as an additional tool to direct large-scale mutation detection studies of human tumors, particularly those deficient for these CDK inhibitors. Although it is unlikely that we have identified all mutations from our tumors, approximately 1/3 of our insertions occur within CISs suggesting the presence of at least 6-7 driver mutations per tumor. Stratton et al. (1) estimate that to define a single cancer genome in its entirety will require more than 100,000,000,000 base pairs of DNA sequence in order to provide adequate coverage of tumor DNA and somatic controls. As such, until every patient's tumor genome can be sequenced in full, clinical approaches to mutation detection can initially be better directed toward panning for frequent events. On the basis of these results, a subset of the genome can then be screened for rare mutations, which are most likely to occur within that context.
AB and MVL are funded by the NWO Genomics program and the Netherlands Genomics Initiative/Netherlands Organization for Scientific Research (NWO). JDR is supported by the BioRange program of the Netherlands Bioinformatics Centre (NBIC), which is supported by a Netherlands Genomics Initiative (NGI) BSIK grant. MVU was supported by an NGI Horizon Breakthrough Project. Sequencing was carried out by the Wellcome Trust Sanger Institute sequencing facility. DJA is funded by Cancer Research-UK and the Wellcome Trust.