All annotated genes of the human and mouse genomes were screened for TE-containing exons. The number of times the different transposed elements were exonized (and fulfilled the condition of at least three EST observations of the mRNA containing the exon in tissue T) are shown in Table (for the human genome) and Table (for the mouse genome).
Exonizations in the human genome
Exonizations in the mouse genome
The 859 human and 260 mouse TE-containing exons were then analyzed for tissue or tumor specificity using SERpredict
. In the human exon list, we were able to identify 39 tissue-specifically spliced exons (see Table for the exons with tissue specificity (TS) score >
90). In the mouse exon list, 11 exons showed tissue-specific splicing (see Table for the exons with tissue specificity (TS) score >
90). In the human genome, 18 exons belonged to Alu, 5 were L1 exons, 2 were L2 exons, 1 was an CR1 exon, 5 were MIR exons, 4 were LTR exons and 4 were exons derived from DNA transposons. The highest amount of tissue-specific exonizations arises from the exonization of an Alu element. The fact that the Alu is the most abundant transposed element in the human genome and that it contains potential splice sites makes it a much better-suited sequence for the exonization process than other transposed elements [7
] and could be a reason for these results. In mouse, 4 were B1 exons, 2 were B2 exons and 2 were LTR exons. For B4, L2, MIR there was one exon each. The higher amount of specific exonized B1 elements is consistent with the fact that B1 derived from the same ancestral origin as Alu. Still, B1 does not reach the same amount of specific exonizations as Alu because the majority of exonizations of Alu occur in the right arm of the Alu element which is not present in B1. In contrast to the dimeric structure of the Alu element, B1 is a monomer.
Human tissue-specific TE-exons
Mouse tissue-specific TE-exons
We did not observe a tendency for specificity in any certain tissue in humans. In the mouse genome, interestingly, there is a bias for specific exons in pancreas tissue. This is not due to a bias in the number of ESTs/mRNAs from mouse pancreatic tissue in the database since there are as many pancreatic sequences as sequences from other tissues like intestine or blood. Therefore, this is an interesting result for which we do not have any explanations so far.
As MIR SINEs were active prior to the mammalian diversification [29
] it was unexpected to find 5 tissue-specific MIR exonizations in human and only 1 in the mouse genome. We examined the orthologous loci of the 5 relevant genes RDH13, Elmo2, MRRF, Tri14 and NP_060401.2 in the human and the mouse genome and discovered that there is no MIR element in the mouse genome in 4 of the 5 cases. Only for MRRF there is a MIR in the mouse genome but the exonization in mouse is not tissue-specific. For the specific exon of gene ST6galrsc4 in the mouse genome, there is a MIR at the same position in the human genome but the exon boundaries are different. Therefore, the MIR is not exonized in the human genome.
To show the efficiency of SERpredict
, some of the genes which we predicted to have a tissue-specific TE-derived exon were verified by searching both the literature and the database annotations. Isoform 2 of the T-cell activation NFKB-like protein contains an Alu exon and was predicted as ovary-specific (Table ), which was verified through the human SwissProt [30
] entry Q9BRG9. A testis-specific isoform of TPK1 (Thiamine pyrophosphokinase 1) is described in the OMIM [31
] entry 606370 [32
]. This isoform is 100 bp longer than the broadly expressed variant. This complies with our results of an additional ERV1-derived exon of about 100 bp which makes this isoform testis-specific (Table ). Additionally, the 4F2 cell-surface heavy chain protein seems to be highly expressed in the early stage of new bone formation [34
]. Although we found an alternative isoform expressed in bone (Table ), the specificity of the TE-derived exon is not described in the literature and could therefore not be verified.
Our second analysis identified exons which were spliced in a tumor-specific way. We found 21 such exons in human and 2 in mouse genes. In the human genome, 11 were Alu exons, 1 was a L1 exon, 1 was a L2 exon and 4 were MIR exons, 3 were LTR exons and one exon derived from a DNA transposon (see Table ). In mouse, there was 1 L1 exon and 1 MIR exon (see Table ). The data was filtered to search for exons that were intronic within normal tissues and were recognized as exons only within tumorous tissues and, as such, could serve as potential markers for tumor diagnostics. One such exon which contains an Alu element was found in the human gene YY1AP1 (YY1-associated protein 1: hepatocellular carcinoma susceptibility protein). All results for TS >
85 and LOD >
2 are given in Additional file 2
for the human and the mouse genome.
Human tumor-specific TE-exons
Mouse tumor-specific TE-exons
We also found an indication for the accuracy of our predictions of genes with tumor-specific exons. From the ST6GALNAC6 gene a 2.4 kB transcript has been described for colon carcinoma, while in normal colon transcripts of 2.5 and 7.5 kB length are found [35
]. The colon carcinoma transcript could represent the isoform which omits the first exon and contains the tumor-specific exon.
Taking these results into account, SERpredict
is a useful tool for analyzing TE insertions in genes and to determine their effects for the human and mouse transcriptomes. On the one hand, their insertion into mature mRNAs and the subsequent change in the protein can cause effects in single tissues or even cause major illnesses like cancer. This has already been shown in several examples in the literature [10
]. On the other hand, these new exons could be raw material for future evolution of the organisms. The new alternative TE-exons are only included into a fraction of the transcripts of a gene while the rest of the transcripts maintain their original function. Therefore, the addition may be free to evolve with no loss of original function. If the alternative form gains a useful function, its splice sites are strengthened or it can become tissue-specific if the new function has only local benefits [14