|Home | About | Journals | Submit | Contact Us | Français|
So far, about 800 different chromosomal translocations have been characterized in hemato-malignant and solid tumors. Chromosomal translocations mostly result in the expression of chimeric fusion proteins associated with enhanced proliferation and/or malignant transformation. Here, we demonstrate that genes frequently involved in such genetic rearrangements exhibit a unique feature: premature transcriptional termination. These early-terminated RNA molecules have an abundance of 10-20% when compared to their cognate full-length transcripts. They exhibit an unsaturated splice donor site that gives rise to trans-splicing events, leading to RNAs displaying exon repetitions or chimeric fusion RNAs. These arbitrary fusion RNAs mimic the presence of a chromosomal translocation in genetically unaffected cells. Based on our and published data, we propose the hypothesis that these artificial “chimeric fusion transcripts” may influence DNA repair processes, resulting in the generation of de novo chromosomal translocations. This idea provides a rational explanation why different individuals suffer from nearly identical genetic rearrangements.
Chromosomal translocations (CT) - associated with different human cancer types - arise due to DNA double-strand breaks and subsequent DNA repair processes, predominantly executed by the non-homologous end joining (NHEJ) pathway [1, 2]. DNA double-strand breaks occur more frequently in specific chromosomal regions that exhibit an intrinsic genetic instability. Such a property was exemplarily investigated for the human MLL gene and could be linked to specific chromatin features, e.g. SAR/MAR structures [3-6], DNase I hypersensitive sites , Topoisomerase II binding sites [8,9], gene internal transcription initiation  and site-specific DNA cleavage during early apoptosis [11-13]. These features - or any combination thereof -seem to define recombination hot spots that seem not randomly distributed in the human genome. Recombination events at those sites may result in genetic rearrangements with the potential to generate novel oncogenes and the onset of pre-cancerous cells. Potent oncogenes arise when proto-oncogenes are recombined with cell-type specific enhancer regions (type I CTs) or by the creation of chimeric fusion genes (type II CTs). In the hematopoietic system, type I rearrangements are associated with a lym-phoma disease phenotype, while type II rearrangements are associated with more aggressive leukemias. Nearly 800 different chromosomal rearrangements have been characterized that are associated with the development of solid or hematological malignancies. Noteworthy, 83 different genes have been yet identified in solid tumors , a list which is rapidly growing due to the many efforts to precisely characterize cancer genomes of different tumor entities.
However, the question remains why different individuals suffer from nearly “identical” genetic rearrangements, described as “recurrence". Here, we asked the question whether chromosomal translocations are generated by pure accident combined with subsequent positive selection processes, or whether there exists a molecular mechanism that could explain the phenomenon of recurrence as described for different human cancers.
A first hint comes from older experimental observations where specific chimeric fusion mRNA transcripts - known from chromosomal translocations - were identified at the RNA level in cells deriving from healthy individuals. For example, the MLL-AF4 chimeric fusion transcript and MLL -partial tandem duplications (PTDs) were diagnosed at low levels in hematopoietic stem or cord blood cells of healthy individuals [15-17]. Similar observations were made for chimeric fusion transcripts of BCR-ABL [18, 19], TEL-AML1 and AML1-ETO [20, 21], PML-RAR , NPM-ALK and ATIC-ALK [23, 24]. Most importantly, the investigated individuals were without any sign of a disease phenotype and displayed no genomic rearrangement of the corresponding gene loci. These findings were controversially discussed in the scientific community; however, different laboratories provided conclusive experimental evidence for their existence. Thus, these experimental observations were classified as biological curiosity, with no explanation for their biological function or any other convincing explanation why they occur at all.
Another important observation was the discovery of chromosome territories in interphase nuclei [25-28]. Based on the 3-dimensional architecture of interphase chromosomes, it was experimentally demonstrated that genes known to participate in illegitimate recombination events are located in close spatial proximity in the nucleus. These are e.g. BCR and ABL, PML and RAR [29, 30], MYC, BCL2 and IGH . Some of these genes were even transcribed in the same transcription factor (e.g. MYC and IGH ; for review see Gingeras, 2009 ). Similarly, MLL and at least AF4 and ENL are closely located in the 3-dimensional space of nuclei . All these findings may suggest that chromosomal translocations are not created by chance rather than by specific properties of the involved genes, and in particular by their specific localization within the 3-dimensional space of the nucleus.
For the purpose of our studies we chose the well-characterized human MLL gene. This gene is frequently rearranged with a large variety (n>60) of different translocation partner genes (TPGs). About 40% of the yet characterized 64 TPGs were recurrently diagnosed in MLL-mediated leukemias . However, the most frequently diagnosed fusion partners are AF4 (42%), AF9 (16%), ENL (11%) and ELL (4%) which account for the large majority of all diagnosed MLL rearrangements in human acute leukemias. Therefore, our studies focused on these 5 genes to investigate potential mechanisms that could provide a rational explanation for the phenomenon of recurrence that is typically found in all human cancers.
PBMCs were prepared from peripherial blood of the healthy volunteers using Ficoll-Reagent (GE-Healthcare) according to the manufacturers instruction. About 1 × 107 cells were used for RNA preparation using the RNeasy-Kit (Qiagen).
Five μg total RNA were reverse transcribed (Invitrogen) with either random hexamer primers (for inverse RT-PCR experiments), oligodT anchored primer (obtained from the GeneRacer-Kit from Invitrogen for 3'-RACE) or gene specific primers (AF4.Y32 5'-CG-AT-GA-CG-TT-CC-TT-GC-TG-AG-AA-TT-TG-AG-TG-AG-3’ and M LL.CR2 5'-GT-CC-CA-GG-CA-CT-CA-GG-GT-GA-TA -GC-TG-TT-TC-GG-3’ for MLL-AF4 and AF4-MLL nested PCR; ALK.E20R1 5'-GG-TT-GT-AG-TC-GG-TCA-TG-AT-GG-TC-GA-GGT-3’ for NPM-ALK nested PCR).
Using the Genracer3’ and the Generacer3'-nested primers 3'-RACE nested PCR was performed for MLL, AF4 (AF4.E3F 5'-CG-GA-GG-AC-TA-TC-GA-CA-GC-AG-AC-CT-TT-GAA-3’ and AF4.X3 5'-AC-CA-AA-CT-GA-AG-AT-GC-CT-TC-TC-AG -TC-AG-TT-GAG-3'), AF9 (AF9.E3F 5'-AT-CT-GG-GT-AT-GC-TG-GT-TT-CA-TT-TT-GC-CA-AT-3’ and AF9.E4F 5'-AT-CC-AC-CA-GT-GA-AT-CA-CC-TC-CG-CT-GT-GA-AA-3'), ENL (ENL.E1F1 5'-GG-CG-GC-GC-TT-GA-CA-GA-CA-AT-GA-GG-3’ and ENL.E1F2 5'-GG-GC-GC-CA-GC-CA-TG-GA-CA-AT-CAG-3') and ELL (ELL.E1F 5'-AA-GG-AG-GA-TA-GG-AG-CT-AC-GG-GC-TG-TC-GT-3’ and ELL.E2F 5'-CC-AT-CT-AT-CC-GA-TT-TC-AA-GG-AA-GC-CA-AGG-3'). The second round of PCR was subjected to gelelectrophoresis and resulting bands were cut out from the gel, extracted (Qiagen gel extraction kit) and cloned into the pGEM-T vector (Promega) for sequence analysis.
Quantification of early-terminated transcripts was performed on a MiniOpticon Realtime PCR Detection System (BioRad): For the MLL gene: MLL. E7F2 5'-AA-GC-CT-AC-CT-GC-AG-AA-GC-AA-3', MLL.I8R1 5'-TC-CT-GC-TT-TC-AA-AT-GC-TG-TTT -3', MLL.E5F 5'-TA-AG-CC-CA-AG-TT-TG-GT-GG-TC-3’ and MLL.E8R 5'-CT-TG-GG-CT-CA-CT-AG-GA-GT -GG-3'; for the AF4: gene: AF4.E3F, AF4.I3R 5'-GC-CA-AA-AA-GA-AT-TC-CC-CC-TA-3’ and AF4.E5R 5'-AA-GG-AA-AC-TT-GG-AT-GG-CT-CA-3'; for the AF9 gene: AF9.E4F 5'-CA-AC-AA-CC-CC-AC-AG-AG-GA-CT-3', AF9.E5bR 5'-GG-AT-TC-GA-AT-TC-TT-GC-TC-TG-TC-3', and AF9.E6R 5'-GC-AG-GA-CT-GG-GT -TG-TT-CA-GA-3'; for the ELL gene: ELL.E1F 5'-GA -GA-GA-TG-GT-CG-CA-AG-AT-GG-3’ and ELL.I2R 5'-CT-AT-CC-TG-GG-GG-CC-TA-GA-AC-3', and ELL.E3R 5'-AC-TG-CT-GG-AT-GC-AG-TC-GA-AG-3'; for the ENL gene: ENL.E2F 5'-CG-TC-CA-GG-TG-AG-GT-TA-GA-GC-3', ENL.E3bR 5'-TG-TT-CC-CA-GC -AG-AT-GT-CA-AG-3’ and ENL.E4R 5'-GT-GG-GG-TT-GT-TG-AA-GG-TG-AG-3'.
The following primers were used for inverse PCR experiments: MLL.R1 5'-GA-CA-TT-CC-CT-TC-TT-CA-CT-CT-TT-TC-CTC-3’ and MLL.F1A 5'-CC-AC-CT-AC-TA-CA-GG-AC-CG-CA-AG-AA-AA-3'; AF4.E3F and AF4.E3R 5'-CA-TG-GA-GA-CT-TG-GC-AT-TG-GT -TC-AG-TT-CT-TG-3'; AF9.E5F 5'-GA-TC-CC-AA-TG-AT-TC-AG-AT-GT-GG-AG-GA-GA-AT-3’ and AF9.E5R 5'-TG-TG-AG-GC-TT-TG-AA-AA-AC-TG-GT-AC-TA-CT-GC-TG-3'; ELL.E2F and ELL.E2R 5'-TG-AA-AT-CG-GA-TA-GA-TG-GC-CT-CA-GT-GA-AA-CA-3'; ENL.E2F and ENL.E2R 5'-CA-AA-CA-CC-AT-CC-AG-TC-GT-GA -GT-GA-ACC-3'. The PCR-products were subjected to gel electrophoresis and the resulting bands were cut out of the gel, extracted (Qiagen gel extraction kit) and cloned into the pGEM-T vector (Promega) for sequencing.
MLL-AF4 fusion transcripts were detected with nested PCR using the primers MLL.F1A with AF4.Y30 5'-GG-TT-TT-GG-GT-TA-CA-GA-AC-TG-AC-AT-GC-TG-AG-AG-3’ (first PCR) and MLL.NF1 5'-CC-AA-AA-CC-AC-TC-CT-AG-TG-AG-CC-CA-AGA-3’ with AF4.Y29 5'-GT-AT-TG-CT-GT-CA-AA-GG-AG-GC-GG-CC-AT-GA-AT-GG-3’ (second PCR); MLL.CR2 with AF4.E3F and MLL.R6 5'-CA-AA-AC-TT-GT-GG-AA-GG-GC-TC-AC-AA-CA-GA-CT-TGG-3’ with AF4.X3 for the reciprocal AF4-MLL fusion transcript.
NPM-ALK fusion transcripts were detected with nested PCR using the primers NPM.E4F1 5'-GC-AA-AG-GA-TG-AG-TT-GC-AC-AT-TG-TT-GA-AGC-3’ with ALK.E20R1 5'-GG-TT-GT-AG-TC-GG-TC-AT-GA -TG-GT-CG-AG-GT-3’ (first PCR) and NPM.E4F2 5'-AA-GC-AG-AG-GC-AA-TG-AA-TT-AC-GA-AG-GC-AG -TC-3’ with ALK.E20R2 5'-AG-GT-GC-GG-AG-CT-TG-CT-CA-GC-TT-GT-ACT-3’ (second PCR).
All primers used for experiments testing several housekeeping genes (GAPDH, ACTB, HSPCB, CCND3 and RPL3) and MLL2 can be made available upon request.
All members of the ALF gene family (AF4, LAF4, FMR2 and the later discovered AF5q31/MCEF gene) produce “early-terminated transcripts” that have been described in the literature [36, 37]. In case of the AF4 gene, the FelC transcript encodes only the first 3 exons of AF4 and is terminated within AF4 intron 3 at a poly A site residing about 1.350 nucleotides downstream of the 3'-end of AF4 exon 3. For the other investigated gene (MLL, AF9, ELL and ENL) no such early-terminated RNA transcripts were described so far. Therefore, we analyzed all these genes by 3'-RACE experiments using oligo-dT primers and an oligonucleotide that specifically binds to the first exon 5’ to the breakpoint cluster region.
As shown in Figure 1A, all 5 investigated genes produce transcripts that were terminated within corresponding breakpoint cluster regions (MLL-I8T, AF4-I3T, AF9-I5T, ELL-I2T and ENL-I3T). Similar experiments performed for several introns of housekeeping genes (GAPDH, ACTB, HSPCB, CCND3 and RPL3) or the MLL2 gene revealed no such early-terminated transcripts. A precise quantification of these early-terminated transcripts is summarized in Figure 1B, indicating that their relative abundance compared to their cognate full-length transcript is in the range of 10-20%. Thus, the presence of early-terminated transcripts seems to be a novel and specific property of genes involved in illegitimate recombination events. Since all 3'-RACE were performed with oligo-dT primers, the cloned RNA species ended all with a poly-A tail. Surprisingly, many transcripts contained yet unknown cryptic exons (e.g. 2 cryptic exons within AF4 intron 3, 1 cryptic exon in AF9 intron 5, 1 cryptic exon in ELL intron 2 and 1 cryptic exon in ENL intron 3). These cryptic exons are normally not used during splicing of regular transcripts deriving from these genes. These transcripts may represent only a minor fraction of early-terminated transcripts, because premature termination also occurs due to other processes, like e.g. transcriptional pausing or stalling of RNA polymerase. This may result in RNA molecules without poly-A tails, however, could not be identified by our applied strategy.
Early-terminated transcripts exhibit a non-saturated splice donor site at their final exonintron borderline (see Figure 1C). This unsaturated splice donor site is per se able to participate in splicing reactions. This may result in the use of arbitrary, cryptic exons to saturate the splice donor site as already discussed above (see Figure 1Ca). Alternatively, a trans-splicing reaction may occur (see Figure 1Cb). Trans-splicing events may involve transcripts deriving from the same gene (intragenic trans-splicing), or RNA molecules derived from other genes transcribed in vicinity or in the same transcription factory (intergenic trans-splicing). By using nested PCR experiments in combination with cloning and sequencing, we were able to demonstrate a predominant intergenic trans-splicing product of transcripts deriving from the MLL gene in all investigated healthy individuals: an MLL-AF4 chimeric fusion RNA (in-frame fusion of MLL exon 9 with AF4 exon 4 (A); see Figure 2A). A minor chimeric fusion RNA (fusing MLL exon 8 with AF4 exon 4 (B)) was identified in volunteer #8 - and after cloning and sequencing - also in the remaining 9 volunteers. An additional MLL exon 9 / AF4 exon 5 chimeric fusion RNA was infrequently detected. In none of the investigated samples, a corresponding reciprocal AF4-MLL chimeric fusion transcript could be detected (see Figure 2B), thereby excluding the presence of a genomic MLL rearrangement. It is worth mentioning that all investigated samples derived from healthy individuals that never had a leukemic disease. We also excluded that the investigated material carried chromosomal translocations of the MLL gene by using a recently established and highly sensitive method to characterize MLL rearrangements at the genomic DNA level .
Additional 3'-RACE experiments were performed to investigate the spectrum of trans-spliced RNA molecules and which are summarized in Figure 2C; the identified trans-spliced RNA species were mostly out of frame, and thus, not able to encode a chimeric fusion protein. However, the ENL-ALKB7 fusion transcript was in frame, indicating that this specific fusion transcript could potentially be translated into a fusion protein. We also validated these findings by conventional RT-PCR experiments using 3 additional healthy volunteers (a-c). As shown in Figure 2C (right panels), 3 out of 4 intergenic trans-splicing events of the ENL gene and 2 out of 3 intergenic trans-splicing events of the ELL gene could be validated in independent PBMC samples, indicating that these arbitrary fusion transcripts can be readily identified in healthy volunteers. The exception was the in frame ENL-ALKB7 fusion transcript that was identified only in 1 out of 3 healthy volunteers.
Several chimeric fusion transcripts have been described to be present in hematopoietic cells of healthy individuals (see above). We tested our samples for these fusion transcripts that potentially derive from trans-splicing events, but only the NPM-ALK fusion RNA could be identified (see Figure 2D). It is interesting to note that 5 out of 10 healthy volunteers displayed high abundance of this in frame fusion transcript, while the other 5 volunteers showed only a weak PCR band. This may indicate that there are interindividual differences in the production of these trans-spliced fusion RNAs. The chimeric NPM-ALK fusion transcript was first identified in patients suffering from anaplastic large cell lymphoma (ALCL) and derives from the chromosomal translocation t(2;5)(p23;q35) . The reason why we were unable to detect the other chimeric fusion RNAs (e.g. TEL-AML1, AML1-ETO, BCR-ABL and PML-RAR) could be due to fact that all former studies have used hematopoietic stem or cord blood cells [15-21], while we investigated only peripheral blood mononuclear cells (PBMCs).
Intragenic trans-splicing reactions between early -terminated and regular transcripts of the same gene result in another phenomenon termed “exon-repetition", also known as “partial tandem duplication” (PTD). By simple standard RT-PCR experiments (not nested) using inverse oligonucleotides that specifically bind to a single exon located 5’ to the breakpoint cluster region (see Figure 3A), we were able to identify exon-repetitions for all 5 investigated genes (see Figure 3B). Sequence analyses revealed the repetition of single or multiple exons. This has several consequences, because MLL partial tandem duplications are used as diagnostic readout to identify AML patients with genomic tandem duplications of MLL exons 3-9. In order to identify such patients, fusion transcripts between MLL exon 9 and exon 3 were diagnosed by RT-PCR experiments. Exactly this MLL exon-exon fusion was identified also in healthy cells (see Figure 3C, RNA species a), indicating that even routine diagnostic methods may be compromised by these trans-splicing events. Analogous experiments performed on the above mentioned housekeeping genes and MLL2 remained again negative. The high frequency of intragenic trans-splicing was surprising, because only 263 (corresponding to 178 human genes) out of ~6.25 million human ESTs display exon repetitions , indicating that PTDs are rarely identified in human transcripts. Noteworthy, thirteen out of these 178 genes (~7,5 %) have been reported to participate in chromosomal translocations . In conclusion, genes that produce early-terminated transcripts can be easily identified by exon-repetitions, using the above outlined RT-PCR method.
Recurrent chromosomal translocations are a genetic hallmark for many types of human cancer. They are mostly generated by NHEJ-mediated pathways of DNA double-strand breaks that occur in selected genes of our genome and which seem to be prone to DNA damage (genetic hot spots). Here, we demonstrate that genes known to be involved in chromosomal translocations display a genuine and novel feature: early-termination of transcriptional processes. Early termination was observed in those regions of the investigated genes that have already been defined as breakpoint cluster regions. The abundance of these early terminated transcripts was in the range of 10-20% when compared to their corresponding full-length transcripts. These shorter transcripts seem to participate in intragenic and intergenic trans-splicing events, resulting in the creation of RNA species that exhibit exon-repetitions or resemble chimeric fusion RNAs. All our data were obtained by using human peripheral blood mononuclear cells (PBMCs) deriving from healthy individuals. These data confirm earlier studies made by others [15-24], however, we present for the first time a rational explanation for this unusual observation.
In first instance, the production of early-terminated transcripts seems to provide negative effects to cells, because it uses predominantly transcripts of the same gene to saturate the unique splice donor site, and thus, is lowering the transcript abundance of such genes. Moreover it also disrupts transcripts of other genes due to trans-splicing events. On the other hand, trans-splicing of RNA molecules remind us of the exon-shuffling mechanism that turned out to be a useful mechanism to create novel genes from existing ones during evolution .
However, the question remains what is causing early-termination of transcription. Cryptic poly-A sites, transcriptional pausing/stalling due to specific chromatin features or the presence of repetitive DNA elements may provide possible explanations. Another explanation is provided by microRNAs. A recently published study identified microRNAs encoded at a genomically unstable region . Such microRNAs can specifically bind to their own gene transcripts and probably cause cleavage. The outlined mechanisms or any combination thereof possibly explain the creation of early-terminated transcripts, exhibiting a non-saturated splice donor site, and therefore, produce molecule species that are able to initiate in cis splicing processes to cryptic exons (if available) or participates in trans-splicing reactions.
This study demonstrated that early-terminated transcripts seem to be the molecular source for the creation of trans-spliced RNA species. Most trans-spliced RNA species result in exon-repetitions (intragenic trans-splicing; identified in RT-PCR experiments) while only few cause the formation of chimeric fusion transcripts (intergenic trans-splicing; identified in nested RT -PCR experiments). In addition, most identified trans-spliced fusion RNAs displayed no continuous open reading frame (ENL-RPL18A, ENL-CCDC127, ENL-SFPQ, ELL-MKI67, ELL-SFRS7, and ELL-RPS6). By contrast, we identified ENL-ALKB7, MLL-AF4 and NPM-ALK with fused open reading frames in PBMCs of healthy individuals. Noteworthy, the latter two are identical to chimeric fusion transcripts that are regularly identified in tumor cells carrying the specific chromosomal translocations t(4;11)(q21;q23) and t (2;5)(p23;q35), These translocations are all associated with acute lymphoblastic leukemia and anaplastic large cell lymphoma (ALCL), respectively. Therefore, we suggest defining these chimeric fusion RNAs produced in non-rearranged cells as “pro-neoplastic RNA molecules".
The existence of pro-neoplastic RNA molecules in healthy individuals is not restricted to hematopoietic cells. Recent findings suggest that chimeric fusion RNAs can be produced in healthy (by trans-splicing) and in cancer cells carrying indeed this particular genetic rearrangements. Examples are the JAZF1-JJAZ1 chimeric fusion RNA in normal endometrial cells , or the androgen-induced SLC45A3-ELK4 chimeric fusion RNA in normal prostate cells . The JAZF1-JJAZ1 fusion protein provides anti-apoptotic activity, indicating that such a trans-spliced fusion RNA even provide a benefit to those cells. Moreover, the chimeric fusion RNAs seem to be produced in a cell-type specific manner [43, 44]. This could be explained by a cell-type specific localization of chromosome territories in the nucleus; because chromatin loops deriving from different chromosomes must be attracted to a common transcription factory [33, 45-48] in order to create such trans-spliced fusion RNAs.
These findings are leading back to our initial question whether there exists a molecular mechanism that could explain how illegitimate recombination events are created and why the phenomenon of recurrence exists in human tumor samples. Are pro-neoplastic RNA molecules somehow involved in the creation of illegitimate recombination? If so, the trans-spliced fusion RNA molecules must be able to influence DNA repair processes. This idea is supported by three recent findings:
Based on these data and our findings, we propose a hypothesis that might explain the onset of chromosomal translocations: (a) transcription of genes in local proximity, e.g. in a common transcription factory, (b) the presence of early-terminated transcripts in certain genes of our genome, (c) the creation of trans-spliced fusion transcripts, (d) the occurrence of DNA double-strand breaks in both genes involved in genetic recombination events, and finally (e) the formation of RNA/DNA hybrid structures that guide subsequent DNA repair process. The latter process is depicted in Figure 4. It is important to note that the trans-spliced chimeric RNA molecules, in this case the MLL-AF4 fusion RNA, is not a substrate for DNA repair polymerases rather than forcing an “RNA/DNA repair structure". Due to the enzymatic activity of DNA ligase IV, the genomic fusion may occur at any sequence downstream of MLL exon 9 (e.g. MLL intron 9) and upstream of AF4 exon 4 (e.g. AF4 intron 3). This is exactly where most t(4;11) leukemia patients display their chromosomal fusion site. Thus, the most abundant trans-spliced MLL (exon9): AF4(exon4) fusion RNA is explaining also the breakpoint distribution that has been described for acute leukemia patients bearing t(4;11) translocations .
The existence of an “RNA-mediated proof-editing process” would not only be interesting for the onset of specific genetic lesions, but could be a novel and fundamental mechanism for maintaining genome integrity by using the existent hnRNA molecules. We are currently conducting experiments that aim to validate this novel hypothesis and to investigate these emerging data in more detail.
This study was supported by the grant III L 4-518/55.004 from LOEWE priority program (Oncogenic Signaling Frankfurt, OSF), funded by the Hessisches Ministerium für Wissenschaft und Kunst (HMWK). RM is supported by the Center of Excellence Frankfurt on Macromolecular Complexes (CEF-MC).