|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: NS GA. Performed the experiments: NS. Analyzed the data: NS BM AHW. Wrote the paper: NS BM AHW GA.
Insertion of transposed elements within mammalian genes is thought to be an important contributor to mammalian evolution and speciation. Insertion of transposed elements into introns can lead to their activation as alternatively spliced cassette exons, an event called exonization. Elucidation of the evolutionary constraints that have shaped fixation of transposed elements within human and mouse protein coding genes and subsequent exonization is important for understanding of how the exonization process has affected transcriptome and proteome complexities. Here we show that exonization of transposed elements is biased towards the beginning of the coding sequence in both human and mouse genes. Analysis of single nucleotide polymorphisms (SNPs) revealed that exonization of transposed elements can be population-specific, implying that exonizations may enhance divergence and lead to speciation. SNP density analysis revealed differences between Alu and other transposed elements. Finally, we identified cases of primate-specific Alu elements that depend on RNA editing for their exonization. These results shed light on TE fixation and the exonization process within human and mouse genes.
The draft sequences of the human and mouse genomes confirmed that transposed elements (TEs) have played a major role in shaping mammalian genomes , . Sequences of transposed elements comprise at least 45% of the human and 37% of the mouse genomes (Lander et al., 2001; Waterston et al., 2002). A large fraction of the TEs were inserted into transcribed regions, mostly within intronic sequences . These intronic insertions contributed to the enlargement of intron size within mammalian genomes (Lander et al., 2001; Waterston et al., 2002). Sironi et al. identified constraints on insertion of TEs within introns  and showed that gene function and expression influence insertion and fixation of distinct transposon families in mammalian introns .
Exonization is the creation of a new exon as a result of mutations in intronic sequences , whereas intronization is the creation of a new intron. TEs have enriched the human transcriptome by exonizations  and intronizations . In human, most of the exons that originated from TEs are from the primate-specific transposon called Alu. Alu elements are the most abundant repetitive elements in the human genome; there are upwards of 1.1 million copies, accounting for more than 10% of the human genome , . Alu elements are derived from the 7SL RNA . The major burst of Alu retroposition took place 50–60 million years ago and has since dropped to a frequency of one new retroposition for every 20–125 births , . Alu-mediated mutagenesis, mostly through nucleotide insertions, has been estimated to be involved in close to 1% of Mendelian genetic disorders . The occurrence of single nucleotide polymorphisms (SNPs) in and around Alu sequences has been discussed , .
Makalowski and coworkers were the first to describe Alu elements within mature mRNA in human . It is now clear that transposed elements are found within a large number of mature mRNAs . The new exons generated from Alu elements are usually alternatively spliced; these exons comprise ~5% of alternatively spliced exons in the human transcriptome . Exonized TEs that are alternatively spliced are not unique to human as most of the exonized TEs in the mouse genome are also alternatively spliced . The molecular mechanism leading to Alu exonization has been well characterized. A typical Alu is around 300 nt and contains two similar monomer segments joined by an A-rich linker and a poly(A) tail-like region. Alus insert into introns of primate genes by retrotransposition, usually in the antisense orientation. Eighty-five percent of exonizations have occurred from the right arm in the antisense orientation , . The poly(A) tract of this arm in the antisense orientation creates a strong polypyrimidine tract (PPT). Downstream from this PPT a 3′ splice site is selected and further downstream from that site (approximately 120 nt) a 5′ splice site is recognized . Without the left arm, exonization of the right arm shifts from alternative to constitutive splicing. This results in elimination of the evolutionary conserved isoform and may thus be selected against . Only one or two mutations are required within intronic Alus that reside in antisense orientation relative to the coding sequences to yield a consensus 3′ splice site  or 5′ splice site . The role of splicing regulatory sequences on the exonization process has also been studied , , . The 3′ splice site of exonized Alus are very similar to those of the 3′ splice sites of mammalian interspersed repeat (MIR) exons .
Recent studies indicate that the pattern of splicing of exonized TEs differs among human tissues , , . Additionally, there are variations in splicing patterns within individuals in the human population , , . Certain SNPs correlate with heritable changes in alternative splicing but do not cause disease, thus indicating a link between genetic variation and mode of splicing , , . Another study identified SNPs correlated with obesity that cause variation within alternative splicing patterns .
The exonization process is subject to many evolutionary constraints: New exons are generally alternatively spliced  and the inclusion rate is relatively low , , . This implies that novelties added to established genes (within established coding sequences, CDSs) are under lower purifying selection if they do not interfere with the original coding sequence, compared to those events that change the original CDS. Also, exonization usually occurs in untranslated regions (UTRs)  or within duplicated genes , further supporting the idea that purifying selections are more intense on exonization events that occur within CDSs. Thus, alternative splicing of Alu exons enriches the human transcriptome with new mRNAs without eliminating the original, functionally important transcripts, which are generated via exon skipping .
Here we set to find additional characteristics of TE exonization events within human and mouse. We looked at the location of the exonizations within genes and the SNP densities, and evaluated SNPs that change canonical splice sites. We found that exonizations occur preferentially in the beginning of protein coding sequences. Moreover, we show that exonizations can be population specific. Our findings reveal a possible contribution of TE exonizations to population divergence within human and mouse.
Non-symmetrical, conserved, alternatively spliced exons are more often located at the beginning of the CDS than elsewhere in transcripts , , . We analyzed the Transpogene database of exons that originated from TEs  to determine whether there is a bias in their location within mRNA. We normalized the CDS length between 0 and 1 (see Materials and Methods) and compared, in increments of 0.1, the extent of TE exonization at different locations in human and mouse (Figure 1). We found that exonized TE sequences are biased to reside in the first half of the CDS sequence compared to alternatively spliced cassette exons that did not originate from TE exonizations. Most exonizations in both human and mouse are found between position 0.1 and position 0.4 within the CDS, with a median location of 0.336 in human and 0.369 in mouse. No statistically significant differences were observed between the human and mouse populations or within different TEs families. Alternatively spliced cassette exons that did not originate from TEs are found at a median location of 0.513 and 0.507 in human and mouse, respectively. Statistically significant differences were observed between alternative cassette exons and TE exons (Wilcoxon Rank Sum test, p=1.2244e–027 and p=1.2322e–006 for human and mouse, respectively). These results imply that most TE exonizations tend to occur within the first introns of genes. In human non-TE alternatively spliced exons, 1353 out of 17,642 are the second exon, whereas in TE-derived exons 233 out of 927 are found in the first intron and if spiced become the second exon; this difference is statistically significant (Fisher's exact test, p<10−42). The first intron is substantially longer, with respect to the other introns, in most human and mouse genes and shows higher rate of TE insertion . The longer introns presumably provide a good environment for exonization . Effects of TE exonization within the first intron are usually neutral with respect to the protein sequence, but can affect signal sequences .
In order to analyze whether the location bias results from potential involvement of purifying selection, we separated our data to three groups: exonizations that contain an in-frame stop codon (599 exons), exonizations that are non-symmetrical and do not contain an in-frame stop codon (216 exons), and symmetrical exons that do not contain stop codons (137 exons). The median locations within the normalized CDS of these three groups are 0.3062, 0.3795, and 0.4199, respectively. The Wilcoxon Rank Sum test showed that there is a statistically significant difference between the first and the third group (p=0.0428) but not between the second group and the third group or the first group and the second (p=0.2555 and p=0.3641, respectively). This observation strengthens the hypothesis that the 5′ position bias of TE exonization has a connection with the NMD machinery. We previously showed that non-symmetrical exons (not related to TEs) that are alternatively spliced in both human and mouse (and thus likely to be functional events) tend to be located near the 5′ end of the CDS, whereas conserved symmetrical alternative exons are located throughout the CDS . The current results show a statistically significant difference in location between symmetrical exons and those with in-frame stop codons. We hypothesize that TE-driven alternative exons are under purifying selection to be locate at the beginning of the CDS, presumably to enhance identification of the TE-containing mRNA by the nonsense-mediated decay (NMD) system .
Identifying features shaping the architecture of sequence variations is important for understanding genome evolution and mapping of disease loci. A positive correlation was shown previously between Alu elements and SNPs density . Analysis of the positive association between schizophrenia and a cluster of SNPs and haplotypes in the seventh intron of the β2 subunit of the type A γ-aminobutyric acid receptor revealed that the Alu-Y near the 5′ end of exon 8 contains as many 11 SNPs .
Here we set out to evaluate and compare SNP densities in all TE families from human and mouse. All positions of exons and introns of all genes as annotated in the Golden Path database and the positions of intergenic regions along with the number of SNPs in these regions were obtained and divided by the total length of the particular region. The dataset contained 39,288 human genes. For the human analysis of the SNPs, we evaluated 382,892 exons with 446,357 SNPs, 347,948 introns with 8,428,718 SNPs, and 8,899 intergenic regions with 10,395,717 SNPs. We also used 31863 mouse genes. For the mouse analysis we evaluated 301506 exons with 273700 SNPs, 270782 introns with 500541 SNPs, 8602 intergenic regions with 661474 SNPs.
Multiplying the resulting SNP densities by 100 yielded the SNP frequency per 100 bp. The average SNP density in the human genome is 0.43 in exons, 0.4 in introns, and 0.41 in intergenic regions. The similar densities of SNPs in exons, introns, and intergenic sequences were somewhat unexpected, as one might expect strong evolutionary pressure against substitutions in protein coding regions. This might be caused by a bias of the SNP data from dbSNP itself as EST data is the basis for many SNPs. In the mouse genome, the average frequency of SNPs is 0.31, 0.33, and 0.28 in exons, introns, and intergenic regions, respectively. These SNP densities are consistent with the number of SNPs observed in the baseline windows presented in Figure 2 for human TEs and in Figure 3 for mouse TEs. These results are in agreement with the SNP densities previously obtained from exons, introns, and intergenic regions in human and mouse genomic sequences .
As shown in Figure 2, the SNP density in primate-specific Alu elements is 0.53, which is higher than the baseline level. The density in Alu elements is the highest level observed among the different families of TEs. Alu elements are GC rich with 24 or more CpG dinucleotides per element. These dinucleotides are prone to mutation as a result of deamination of 5-methylcytosine. Only half of the SNPs in young Alu elements were found at CpG dinucleotides, however , , . Also, analysis of the GC-rich Alu body separately from the AT rich Alu tail showed that both parts are enriched in SNPs . Therefore, the GC content cannot be the sole determinant of this enrichment. For the L1 elements, the SNP density is similar to the baseline frequency, whereas the frequency is lower than baseline for the other families of TEs. A correlation of the age of the different Alu families with the SNP density shown by Ng et al.  suggests that the lower SNP density for L1 and the other TE elements might be related to their earlier integration into the human genome. However, we cannot rule out the option that there is not a simple correlation between the age of the TE and the number of SNPs. The primate-specific Alu element and the rodent-specific B1 element originated from the same 7SL RNA gene and share a high level of sequence identity. Nevertheless, the high SNP density detected in Alu elements was not observed in murine B1 elements (Figure 3).
We then examined the SNP density in exonized TEs (Table 1). The SNP density in exonized TEs from all TE families in the human genome is lower than the overall SNP density of all TEs, but the difference is not significant (Mann-Whitney test, p=0.382, two-tailed). An exception was observed in the CR1 (LINE-3) elements; exonized CR1 elements have a higher than average SNP density. However, only four CR1 elements were exonized so the sample size is very small. In mouse, for all transposed element families, the density of SNPs in exonized TEs was significantly higher than the overall density in all TEs (Mann-Whitney test, p=0.004, two-tailed). In mouse, exonization seems to occur preferentially in areas with higher SNP density.
In order to investigate the possibility that exonization of TEs creates transcriptomic diversity within the human population, we searched for SNPs that eliminate or create canonical splice site in a TE. Specifically, we looked either for changes in the invariant AG dinucleotide at the 3′ splice site or the canonical GT or GC at the 5′ splice site. Although there are other positions that might alter recognition by the splicing machinery, only the four positions must be fully conserved to ensure selection by the spliceosome. To enhance the fraction of bona fide exonization events we searched for exonized TEs that are supported by at least two ESTs. Our analysis revealed 10 SNPs in canonical splice sites of TE-derived exons in the human genome (Table 2); these SNPs eliminate change a canonical splice site into a non-canonical one (the ancestral nucleotides are also shown in Table 2). Of the ten, five are in the acceptor and five in the donor splice sites. Seven of the SNPs occur in splice sites of exonized Alu elements, two in splice sites of exonized L2 elements, and one in the splice site of an exonized LTR element. To ensure that we identified the sequence without the SNP correctly, we examined the sequences of the orthologous TEs in chimp (Table 2). Additional support for the role of SNPs in TE population-specific exonization is given by the ssSNPTarget database (http://sssnptarget.org/) , the SNPs rs2377301 and rs5758111 have EST evidence for exon skipping due the SNP modification. In the mouse genome, three splice sites of exonized TEs contain SNPs (Table 3). SNPs were found in the splice sites of an exonized B1 element, an exonized B2 element, and an exonized LTR element; all are within 5′ splice sites.
We searched the NCBI Database of Single Nucleotide Polymorphisms for population frequency data. Data were only available for two of the 10 SNPs observed in the human genome (Table 4). One of them, SNP rs1721244, is located at chr2 position 73983403 and is the first nucleotide of the 5′ splice site. The allele with G has a canonical splice site (GT) but the other allele has a non-canonical splice site (AT). Both splice sites occur at a frequency of more than 0.3 (Table 4); thus, this SNP, and associated splice variation, is common in the human population. In this analysis, we selected only cases in which SNPs clearly changed the sequence directly at the splice site. We did not take into account SNPs within other splice signals or within exonic or intronic splicing enhancers/silencers that might modulate the selection level of the exon. Thus, the effect of SNPs on splicing might be greater than observed here.
We have also built a dataset of TEs with non-canonical splice sites that appear to be active based on evidence of exonization from ESTs or cDNAs. We searched the SNP database for SNPs that might change the non-canonical splice sites into canonical ones. In the human genome, we found 45 SNPs that changed a non-canonical splice site into a canonical site (a GT/GC dinucleotide in the 5′ splice site and an AG dinucleotide in the 3′ splice site; see supplementary data Table S1). Only three such SNPs were identified in the mouse genome. As a result of these SNPS, these exons are flanked by canonical acceptor and donor splice sites, explaining their identification by the splicing machinery and their presence in the ESTs database.
Population frequency data were available for 11 of the 45 SNPs (see supplementary data Table S2). One interesting case is SNP rs231518 in an L1 element. There are six ESTs and cDNAs with the 5′ splice site sequence AT, but the SNP rs231518 has a canonical 5′ splice site GT. The two alleles have an intriguing evolutionary history. There is a G at the 5′ splice site in chimp and orangutan and an A in rhesus. The sequences of chimp, orangutan, and rhesus were extracted from published sequences and the multi-species alignment of the SNP location was downloaded from UCSC genome browser . We cannot exclude the possibility that A/G polymorphisms also exist within chimp, orangutan, and rhesus based on available data. The SNP rs231518 with the canonical dinucleotide 5′ splice site GT is the most frequent allele in all human populations (G allele frequency of 0.792 in the CEU population, 1 in the HCB and JPT population and 0.937 in the YRI population, see supplementary data Table S2).
How new exons are created and established is an intriguing issue. Recently, Lev-Maor et al.  demonstrated that exonization of an Alu exon in the NARF gene depends on an RNA editing mechanism. In this case, editing from AA to AI activated the 3′ splice site; inosine is recognized as G by the splicing machinery . We searched for additional cases in which the 3′ splice site of the exonized Alus is AA or the 5′ splice site is AT, such that RNA editing to AG or GT, respectively, would produce a canonical splice site. We did not find any evidence for editing in 5′ splice sites of Alu-derived exons. However, we found six cases of Alu exonization in which the 3′ splice site contains an AA at the genomic level and EST sequences support exonization (Table 5). Two of these cases were found in ESTs generated from brain tissues and another two were from immune system tissues, tissues that have high levels of RNA editing , , , . Two other cases were found in cancerous tissues and in kidney. The most convincing evidence of exonization of an Alu element resulting from RNA editing is found within a non-coding brain-specific gene NR_024561. This exonization is supported by a validated Refseq sequence and three additional cDNA and ESTs (all from brain tissues). Moreover, transcripts containing this exon have three additional A-to-I editing sites within the Alu-derived exon. Several potential editing sites are usually observed within a region that contains two Alu elements located in opposite orientation due to the formation of a long double-stranded RNA structure between the elements . Interestingly, the nearest Alu to that exonized in the NR_024561 gene is in the downstream intron (Figure 4). There is an Alu within the upstream intron but it is more than >2000 nucleotides away and is therefore unlikely to hybridize with the Alu exon , , , . NR_024561 appears to be a non-coding gene and is expressed exclusively in the brain. A BLAST search against the database of known non-coding RNAs NONCODE ,  revealed 85% identity (E value=4e−52) of the NR_024562 isoform to the MESTIT1 non-coding RNA . This isoform also had 86% identity (E-value=6e–48) to the brain-specific non-coding KLHL1 antisense RNA ; this RNA is involved in the spinocerebellar ataxia type 8 (SCA8) neurodegenerative disorder , .
Cassette exons that are non-symmetrical and conserved in both human and mouse are more often located in the 5′ region of the coding sequence than in other regions . Inclusion of non-symmetrical exons is likely to cause a frame shift in the coding sequence, introducing a premature stop codon and activating nonsense mediation decay or producing an unstable protein , , . Most TE-derived exons are non-symmetrical ,  and are usually exonized from the first introns of a coding gene. We previously suggested that the majority of the TE-derived exons are non-symmetrical because they are still young in evolutionary terms and thus have not yet undergone purifying selection, which eliminates deleterious exonizations. Given a sufficient period of time, some of the currently non-symmetrical exons that are only mildly deleterious will eventually become symmetrical (through small deletions/insertions) and thus will add coding capacity into already established genes. Examples of functional TE-exonizations are exon 8 of ADAR2 gene  and exon 8 of NARF gene . Nonsense codons in the 3′ halves of genes may less efficiently activate the RNA degradation machinery than those found near the start of a transcript , ; it may also be that longer peptides are more likely to be deleterious than shorter ones , , . The first intron is usually longer than the others and thus following exonization the two flanking introns are still relatively long. Alternatively spliced exons are generally flanked by longer introns than are constitutively spliced exons . It is also possible that the bias observed may be due to the fact that TEs are more often found near the start of genes than in other regions. These results suggest that the first intron with its longer size function of a “buffer zone” to the emergence of new potentially deleterious exons.
Alu elements were inserted into the human genome after the insertion of other families, such as MIRs, DNA transposed elements, and LTRs . Alu elements show higher level of exonization than all other TE families . Here we show that Alu elements tend to accumulate more SNPs than other TE families. The higher mutation rate in Alu elements is not correlated with their CpG enrichment , . There appears to be a correlation between the age of TE transposition and the mutation rate. A small fraction of L1 elements are still active in the human genome  and on average L1 elements contain a higher density of mutations than other analyzed families (L2, MIR, DNA, LTR). The average SNP density in TEs in the mouse genome is lower than the SNP density in the surrounding sequences. The SNP density in TEs in the human genome is at least 2-fold higher than that in mouse TEs. Artificial selection and inbreeding accompanying the generation of laboratory mouse strains presumably serves to reduce genomic differences between individual mice. Therefore SNP data from mouse probably do not reflect real population dynamics.
In our analysis, we found evidence for exonization of an Alu element that probably requires RNA editing. The NR_024561 gene is expressed exclusively in the brain. The exonized Alu element is from AluJo subfamily and it was inserted into this gene about 25 million years ago . The 5′ splice site dinucleotide GT is conserved in rhesus and gorilla but not in orangutan. The 3′ splice site dinucleotide AA and the editing sites E1 and E3 are conserved in rhesus, orangutan, and gorilla (Figure 4C). The editing site E2 is not conserved in rhesus but is found in orangutan and gorilla. The conservation of these editing sites implies a possible function for this Alu exonization in this non-coding, brain-specific gene.
In summary, exonization of regions of transposed elements is thought to be an important contributor to mammalian evolution and speciation. We found that exonization of transposed elements is biased towards the beginning of the coding sequence in both human and mouse genes. Analysis of SNPs revealed population-specific exonization events, implying that exonizations may enhance divergence. These results shed light on TE fixation and the exonization process within human and mouse genes.
The dataset of human and mouse transposed element exonization was obtained from the TranspoGene database . Based on UCSC genome browser annotations  of the human genome version hg17 and mouse genome version mm6. Sequences of TE exonizations within human and mouse protein coding genes were selected.
Exon location was determined by using the knownGene table downloaded from the UCSC genome browser. In this table, all genes are listed along with their CDS start and end coordinates. To normalize the exon location within the CDS, we calculated the location for the start point of the exon in the CDS without exceeding the boundaries of the CDS (N=CDS length−exon length + 1). The normalized location was the quotient of the actual location of the exon start point within the CDS divided by N.
In order to create a dataset of cassette exons that had not originated from TE exonization, we downloaded the altSplice table from the UCSC genome browser , . We analyzed only the cassette exons dataset. We used GALAXY  and RepeatMasker in order to extract the sequences and exclude cassette exons that originated from TEs , , , .
SNP locations (original from dbSNP, http://www.ncbi.nlm.nih.gov/projects/SNP/) were obtained from the UCSC Genome Browser Database  (versions hg17, May 2004 for human and mm6, March 2005 for mouse). For every family of TEs the average SNP density in the TE-body was determined. For comparison purposes, the SNP density in sequences surrounding the TEs was extracted in 50-bp non-overlapping windows from either end of the TE up to a distance of 3 kb. This yielded 120 windows which we call baselines. The positions of all TEs in the genome and locations of SNPs within each TE were determined using the SNP data set from UCSC Genome Browser Database. The same was done for the surrounding 50-bp non-overlapping windows (up to distance of 3 kb) for determination of the baseline density of SNPs. The SNP densities were averaged over all TEs and normalized to SNP frequency per 100 bp by dividing the average number of SNPs within the TE by the average length of the TEs divided by 100. Averaging the SNP frequencies in all 50-bp windows flanking the TE yielded the baseline SNP frequency, similar to the calculation described in . The number of SNPs in each of the 50-bp windows was multiplied by 2 to obtain the frequency per 100 bp. The SNP density in exonized TEs was then determined. Exons originating from exonizations of TEs that were flanked by canonical splice sites and that had at least two ESTs confirming their exonization were used. The average SNP density in the exonized TEs was determined for the human and mouse. All SNP densities are the SNPs per 100 bp.
Annotations of SNPs were obtained from the UCSC Genome Browser Database  (versions hg17, May 2004 for human and mm6, March 2006 for mouse). A search for SNPs in splice site dinucleotides of exonized TEs was conducted. Any changes from GT or GC dinucleotides in the first two positions of the intron (5′SS) and AG dinucleotides in the last two positions of the intron (3′SS) by SNPs were considered; these mutations change a canonical splice site into a non-canonical one thus eliminating the selection of this exon by the splicing machinery. We also considered situations in which SNPs changed a non-canonical splice site into a canonical one if at least one transcript confirmed the existence as exon.
Population frequency data was obtained from the NCBI Database of Single Nucleotide Polymorphisms (dbSNP Build ID: 125) . This data was only available for a small number of SNPs in dbSNP. Many researchers do not provide genotype or frequency data in their submissions. dbSNP Build ID 125 had approximately 27 million SNPs and only 3.5 million of these had frequency data associated with them.
The dataset of Alu exonizations was searched for Alu elements with the non-canonical AA 3′ splice sites or the AT non-canonical 5′ splice site. These Alus were filtered according to the following criteria: (1) no SNPs were detected within these slice sites, (2) at least one A to G transition was detected between the DNA sequence and the mRNA, and (3) another Alu sequence in reverse orientation is located within a distance of 2000 bp.
SNPs in non-canonical splice sites of exonized transposed elements in the human genome as well as in the mouse genome resulting in a canonical splice site. Given are the gene id, the chromosome and strand on which the SNP is located, the start and end of the exon which derived from the transposed element, the transposed element's family, the SNP id and the alleles of the SNP and the position at which the SNP is located (always seen from the exon, that is, 1st position of acceptor indicates the base which is located nearest to the splice site).
(0.05 MB DOC)
Population frequency data for the SNPs which changed a non-canonical splice site into a canonical one while the other splice site was already canonical. Given is the SNP id along with the alleles and the position where this SNP occurred as well as the frequency data. Here, the homozygosity for the first allele, the heterozygosity, the homozygosity for the second allele, the Hardy-Weinberg proportions as well as the frequencies for each of the alleles are given. CEPH-European, HISP-Hispanic, AD-African American, CEU-European, HCB-Asian, JPT-Asian, YRI-Sub-Saharan African, HWP-Hardy-Weinberg proportions.
(0.10 MB DOC)
Competing Interests: The authors have declared that no competing interests exist.
Funding: This work was supported by a grant from the Israel Science Foundation (ISF 61/09), Joint Germany-Israeli Research Program (ca-139), Deutsche-Israel Project (DIP MI-1317), and Israel Cancer Research Foundation (ICRF). B.M. and A.H.-W. were supported by the Joint Germany-Israeli Research Program (ca-139) (DKFZ/MOST). N.S. is supported by the LMUexcellence fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.