We have developed a new method to identify CISs and CCIs that play a role in tumorigenesis. This approach uses a two-dimensional extension of rate parameter estimates found with a Poisson regression model. The assumptions of PRIM were verified with simple Monte–Carlo simulations.
Several of the genes identified by PRIM, but not by the Monte–Carlo method, are known to be involved in the pathogenesis of human colorectal cancer (CRC). For example,
PIK3CA, the catalytic unit of phosphatidylinositol-3 kinase (PI3K), is mutated in 32% of human colon cancers (
23), and up to 40% of CRCs have mutations in PI3K pathway genes (
24). Mutant
PIK3CA promotes cell growth and tumor invasion and enhances metastatic CRC resistance to treatment by monoclonal antibodies targeting
EGFR (
25,
26). PRIM with the TA site offset, but not the null model, identified the ephrin receptor,
EPHB2, which is initially upregulated in early colon lesions but is subsequently downregulated as the tumor progresses and this silencing correlates with poor patient survival (
27). The tumor suppressor function of EPHB2 may be responsible for hereditary CRC due to a germline mutation in rare cases (
28). Several of the CISs identified by PRIM exhibit altered expression in colorectal cancer, such as
MUC5AC (
29). These findings support the validity of the PRIM method to find candidate cancer genes using transposon insertional data sets, where these genes may be missed using the previously employed Monte–Carlo method.
We then extended PRIM to allow detection of CCIs. The PRIM method to detect CCIs was tested in simulations of the null hypothesis, a set of gastrointestinal tumors, and a set of tumors from the RTCGD. We demonstrated in simulations that the PRIM method for CCI detection properly controls the FDR. The empirical FDRs in 10 different simulation scenarios were always <0.05. The simulated data also showed the number of CCIs that are expected by chance. Given that PRIM did not select pairs in the GI tumor set with a smaller number of counts than observed in simulations, this shows that the model appropriately controls false discoveries. On the other hand, PRIM did select CCIs in the GI tumors with a larger number of counts than observed in simulations, showing that the model is not overly conservative.
After verification with simulations, we then used our methods to determine CCIs in a set of GI tumors from mice and found three statistically significant CCI regions. The basic hypothesis for doing a CCI analysis is that cooperating mutations can be identified. An analysis of the function of the CCI pairs identified supports this hypothesis.
The first CCI identified cooperating mutations within the
Apc gene. The biology of
Apc mutations has been extensively studied in colon cancer and the current hypothesis is that an initial truncating mutation resulting in a hypomorphic protein product is normally followed by loss of heterozygosity of the remaining wild-type allele (
30). This hypothesis fits well with the
Apc-Apc CCI identified in the set of intestinal tumors analyzed in this article. The seven tumors that constitute this CCI show multiple mutations in the first intron and 3′ upstream region of
Apc accompanied by paired mutations in downstream introns (A). It is possible that the first mutation creates a null product because it occurs in the first intron or promoter region of
Apc, while the second mutation creates a truncated product in the second allele.
The second CCI identified
Rspo2 and
Ppm1h as interacting mutations. This could be explained by the hypothesis that overexpression of
Rspo2 and inactivation of
Ppm1h cooperate in CRC progression by fulfilling two of the functions associated with mutant
Apc, namely uncontrolled proliferation and chromosomal instability. In addition to activation of Wnt signaling, mutations in the C-terminus of
Apc contribute to chromosomal instability (
31). This second function may explain why
Apc mutations are found more frequently in CRC than other genes capable of activating Wnt signaling. The four tumors constituting the
Rspo2-Ppm1h CCI do not have any identified transposon insertions in
Apc, so the
Rspo2 and
Ppm1h mutations could be providing the phenotypes usually caused by mutant
Apc. The transposon insertions in
Rspo2 in the four tumors are likely causing overexpression because the insertions are all located immediately upstream of
Rspo2 and the viral promoter within the transposon is in the correct orientation to cause overexpression (B). These insertions are probably causing aberrant activation of Wnt signaling because
Rspo2 normally functions as a secreted activator of the Wnt signaling pathway that is important for limb, lung and craniofacial development (
32–35). Furthermore,
Rspo1, a close homolog of
Rspo2 causes hyperproliferation in intestinal crypt cells along with an increase in β-catenin levels when the human protein is overexpressed in mice (
36). The transposon insertions in
Ppm1h, on the other hand, are likely causing disruption of this gene because they are spread throughout the gene and the direction of the viral promoter is not consistent (B). Inactivation of
Ppm1h could be cooperating with overexpression of
Rspo2 by interfering with p53 transcription leading to chromosomal instability.
Ppm1h is a protein phosphatase that can dephosphorylate and potentially inactivate
CSE1L (
37).
CSE1L was recently shown to be associated with chromatin and to regulate transcription of p53 target genes (
38). Furthermore,
CSE1L intracellular localization is controlled by phosphorylation and
CSE1L will accumulate in the nucleus when phosphorylation is blocked (
39). Based on these observations we predict that overexpression of
Rspo2 and inactivation of
Ppm1h cooperate in the etiology of CRC.
The third CCI, designated
Pan3-Cltc, contains three affected genes. All four tumors had transposon insertions in Clatherin heavy chain (
Cltc) while two of the tumors had insertions in FMS-like tyrosine kinase 1 (
Flt1,
alias Vegfr1), one had an insertion in the neighboring gene PAN3 polyA specific ribonuclease subunit homolog (
Saccharomyces cerevisiae) (
Pan3), and one had an insertion in the intergenic region between
Pan3 and
Flt1 (C). From the insertion pattern, we predict that
Cltc is inactivated in all four tumors, while it is difficult to predict the effect of the insertions on
Pan3 and
Flt1. It is possible that the
Flt1 mutations create a truncated product, as the insertions are located toward the 5′ end of the
Flt1 gene. This might result in a protein product similar to the shortened, soluble isoform
sFlt1. Although there is evidence that delivery of
sFlt1 using gene therapy can block tumor development in mouse models (
40), increased levels of
sFlt1 are found in the sera of colorectal and breast cancer patients (
41) and elevated
sFlt1 levels in renal cancer are associated with a poorer outcome (
42). These findings suggest there may be an oncogenic component to
sFlt1. Inactivation of
Cltc might be contributing to tumor development due to increased
Egfr signaling. Activated
Egfr is normally targeted for destruction after ubiquitination and subsequent transport from the plasma membrane to lysosomes.
Cltc controls
Egfr signaling by acting as a chaperone transporting activated
Egfr to the lysosome (
43). Loss of
Cltc may result in prolonged
Egfr signaling leading to uncontrolled proliferation, which could cooperate with dysregulation of
Vegf signaling due to the mutations in
Flt1.
The three significant CCI regions found by PRIM potentially explain part of the tumorigenesis stages of as many as 11 tumors in the data set, out of 135 tumors total. The 88 CIS regions found by the Poisson model involve insertions from 117 of the tumors. Most, though not all, of the tumors in the data set may in part be explained by one or two disruptions due to SB insertions. It is likely that a complete picture of tumorigenesis will require a model with more than two hits.
Comparing PRIM to the existing 2DGKC method, we found that PRIM is far more discerning. The 2DGKC method found 1176 CCIs in the GI tumors. In the RTCGD insertion set, <25% of the CCI regions detected by de Ridder
et al. (
14) were found to be significant by our model. Modifying the permutation strategy that generates peaks under a null distribution greatly reduces the number of CCIs detected (see the Section 4 of
Supplementary Data for more description of the permutation strategies). This suggests that inference under the 2DGKC method could be more similar to PRIM when using improved permutation strategies.
The PRIM framework provides for more flexibility in the estimation of transposon insertion rates. This means that as the process of transposon-based screens are better understood with time, we will be able to easily include new variables that affect the rate of insertion. We are currently expanding the methods proposed in this article to accommodate mouse gender and donor concatemer site in the model and therefore we will be able to analyze insertions on sex chromosomes and account for the local-hopping phenomenon without bias. The new methods for CCI detection are also far faster to compute than previous methods. The more efficient computations allow us to verify our approach with simulations, whereas the previously published approaches do not. The code in R to calculate the rate of insertion and co-occurrence and identify CISs and CCIs is available upon request (
http://www.r-project.org). The ease of computation also provides future opportunities to expand our approach to higher order combinations of insertions beyond a two-hit model.
In conclusion, we have presented a new method for determining CISs and CCIs from data sets of transposon or proviral insertions in forward genetic screens for cancer genes. The new method, termed PRIM, is able to identify the biologically relevant mutations in these screens and can be tailored to screen-specific behaviors such as the requirement of TA dinucleotides for SB transposons or the preference of proviruses to insert into TSSs.