High-throughput sequencing techniques have allowed characterization of genome and transcriptome catalogs in an unprecedented detail, revealing complex structures of genome rearrangements [1
] and transcript networks of chimeric RNAs in human cells [2
]. Because genome instability is a hallmark of cancer [3
], most studies of genome rearrangements and RNA fusions focus on cancer cells, and some chimeric RNAs appear to result from DNA rearrangements [4
]. In fact, some studies use RNA-seq data as a guide to annotate genome rearrangement [9
]. Several findings, however, suggest that normal human cells also produce chimeric RNA through trans
]. Notably, Li et al
. demonstrated that in normal endometrial cells, trans
-splicing produces a chimeric RNA that is identical to a fusion transcript present in endometrial stromal tumor cells [13
]. The corresponding chromosomal translocation which may permit production of this chimeric RNA by cis
-splicing is present in endometrial tumor cells, but not detectable in normal endometrial cells. This hints at the possibility that RNA fusion may even predispose relevant genomic loci to rearrangement [13
], via RNA-guided DNA recombination, which our lab previously discovered in the ciliate Oxytricha
We therefore asked whether we can identify more occurrences of chimeric transcripts, especially those involving genes on separate chromosomes, in normal human cells. We mined high-throughput RNA-seq data from human mammary epithelial cells (HMEC) available from the ENCODE project [16
]. The deFuse program [17
] predicted one interchromosomal RNA fusion between the genes encoding ZC3HAV1L (zinc finger CCCH-type, antiviral 1-like) and CHMP1A (charged multivesicular body protein 1A), located on chromosomes 7 and 16, respectively (Figure ). ZC3HAV1L
contains 5 exons and encodes a 300 residue protein. CHMP1A
has two protein-coding transcript isoforms, according to NCBI annotation; transcript variant 1 contains 6 exons encoding a 240 residue protein, and transcript variant 2 contains 7 exons encoding a 196 residue protein, which functions as a tumor suppressor in human kidney and pancreas [18
isoform 1 skips exon 2, but has a larger exon 7. Here we used CHMP1A
isoform 1 as the reference for annotation purposes, because the fusion product we detect contains the larger exon 7 (see below).
Figure 1 Detection of ZC3HAV1L-CHMP1A chimeric RNA in human cells. (A) Schematic representation of the ZC3HAV1L-CHMP1A chimeric RNA. Blue and yellow boxes indicate exons from ZC3HAV1L and CHMP1A, respectively. Above the predicted fusion, colored bars indicate (more ...)
Because we initially predicted the presence of the ZC3HAV1L-CHMP1A fusion in a breast cell line, we first verified the presence of this chimeric transcript in human mammary cells. Use of a primer pair (Table ) that amplifies across the predicted ZC3HAV1L-CHMP1A fusion junction (from ZC3HAV1L exon 2 to CHMP1A exon 6) confirms the presence of this fusion at the RNA level in HMLE cells, which are human mammary cells derived from HMEC, but not at the DNA level from matching genomic DNA (Figure ). Sequencing of the PCR product verified the fusion junction (Figure ). The same PCR analysis suggests that the ZC3HAV1L-CHMP1A fusion is present in MCF10A, an immortalized but otherwise normal human mammary epithelial cell line, as well as two human breast cancer cell lines, MB-MDA231 and MCF7. We also detected the fusion in CAPAN1, a pancreatic cancer cell line, human embryonic kidney (HEK) 293 T cells, and CEMT, a human T cell line. In addition, we were able to amplify the fusion RNA from commercially-available human universal reference RNA and a panel of human tissue RNAs (Figure ). These results suggest that the ZC3HAV1L-CHMP1A chimeric RNA is common across multiple human tissue types, both healthy and diseased.
Oligonucleotide sequences (5′-3′)
Curiously, we detected some minor PCR products of different sizes as well. Sequencing revealed that some of them reflect alternative splicing of the ZC3HAV1L-CHMP1A
chimeric RNA, adding additional complexity to this fusion transcript. A common splicing variant present in multiple tissue types is the full-length chimeric RNA skipping ZC3HAV1L
exon 3 (Figure ). Two other minor RT-PCR products suggest splicing between partial intron 2 and exon 3, and partial exon 3 and intron 3 of ZC3HAV1L
, respectively (Figure ). Sequence alignments (see Additional file 1
) indicate that these alternative splicing events all occur at canonical splicing sites, suggesting that they likely derive from authentic, alternative RNA splicing, rather than an in vitro
RT-PCR artifact. A fusion product detected from RT-PCR analysis of human prostate RNA fuses ZC3HAV1L
exon 4 to part of CHMP1A
exon 6, not at canonical splicing site but between a pair of 5 bp direct repeats at the boundary. Therefore, this may represent either an endogenous activity or just template-switching during the reverse transcription step in our experimental procedure.
The major chimeric, fusion RNA that joins exon 4 of ZC3HAV1L to exon 5 of CHMP1A preserves the open reading frame. We therefore tested whether the entire open reading frame could be detected from mRNA by nested PCR. The use of primers located upstream and downstream of the predicted start and stop codons in the first round of PCR, and then a nested primer pair between the start and stop codons did amplify a product containing the predicted open reading frame, as well as the ZC3HAV1L exon3- version of the transcript, though at much lower levels, consistent with our previous PCR result (Figure ). We infer that the major chimeric RNA that joins ZC3HAV1L and CHMP1A may encode a novel fusion protein. Predicted domains are not available for either fusion partner, however, precluding further structural and functional predictions of this putative fusion protein.
From qPCR analysis, we estimated that the ZC3HAV1L-CHMP1A
chimeric RNA is present at
0.1 copies per HMLE cell, suggesting that its expression is limited to either a small population of cells, or a transient time window. We also assayed the relative abundance of this chimeric RNA compared to beta-actin
, a constitutively expressed gene, and found that the relative levels of the ZC3HAV1L-CHMP1A
chimeric RNA differ in different samples (Figure ). This suggests that the production of ZC3HAV1L-CHMP1A
RNA might be regulated, or it might be a stochastic event.
In summary, we report the discovery of a chimeric RNA between ZC3HAV1L and CHMP1A in human, located on chromosome 7 and 16, respectively. The fusion occurs at an exon-exon boundary, and was detected both computationally and experimentally from different cells or tissue types. This suggests it is not an artifact from reverse transcription, and is likely an authentic trans-splicing product. We also detected three minor variants which also likely result from trans-splicing, because the fusion occurs at canonical “GT-AG” splicing sites. The fusion products are present at very low levels, and thus may reflect promiscuous splicing involving ZC3HAV1L and CHMP1A.
Could such low abundance chimeric RNAs have any function? While the major chimeric RNA that we detected preserves open reading frame and could potentially produce a novel fusion protein or proteins, we propose that such examples of chimeric RNA may also occasionally impact somatic genome rearrangements, facilitating rogue recombination events between the two respective chromosomes [14
]. The ability of RNA to influence genome remodeling has gained considerable support and interest over the past few years [20
], but RNA-guided DNA rearrangement in humans still needs further investigation, emphasizing the importance of detecting more chimeric RNAs and their possible DNA rearrangements in normal or diseased tissue.
Computational detection of fusion RNAs
HMEC polyA-RNA-seq data from ENCODE project (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/
) were downloaded from UCSC genome browser website. DeFuse 0.4.3 [17
] was used to detect fusion transcripts, requiring the presence of two pairs of spanning reads and one split read at the junction. We divided the RNA-seq data into subsets of 10 and 20 million paired-end reads to accommodate computation memory, and the ZC3HAV1L-CHMP1A
fusion was predicted in two independent subsets of data.
RNA and genomic DNA extraction
Total RNA from HMLE, MB-MDA231, MCF7, CAPAN1, MCF10A, HEK293T, and CEMT cells was extracted using RNeasy (Qiagen), and DNase treated with TURBO DNA-free kit (Ambion) following the manufacturer’s instructions. Human reference RNA was purchased from Stratagene, and the human tissue RNA panel was purchased from Clontech. Genomic DNA was extracted using NucleoSpin Tissue (Macherey-Nagel).
3.5 μg of RNA was reverse transcribed with SuperScriptIII reverse transcriptase (Invitrogen), in a 20 μl reaction following the oligo(dT) priming protocol.
FastStart High Fidelity PCR system (Roche) was used to amplify fusion product from 1μl of cDNA (equivalent of 175 ng RNA), or 200 ng genomic DNA. To detect the predicted fusion from ZC3HAV1L exon 2 to CHMP1A exon 6, the following program was used: 95°C 2 min initial denaturing; 95°C 30 s, 58°C 30 s, 72°C 45 s for 36 cycles; 72°C 7 min. To recover the entire coding region by nested PCR, the following program was used. First round: 95°C 2 min initial denaturing; 95°C 30 s, 60°C 30 s, 72°C 90 s for 20 cycles; 72°C 7 min. The PCR reaction was diluted 100 fold, and used in the second round of PCR: 95°C 2 min initial denaturing; 64°C 30 s, 72°C 80 s for 30 cycles; 72°C 7 min.
Sanger sequencing of PCR products
PCR products were either sent for direct Sanger sequencing (Genewiz) following Genewiz DNA sequencing instructions, or TOPO-cloned (Invitrogen) for colony PCR and sequencing (Genewiz).
The 7900HT Fast Real-Time PCR System and SYBR green master mix (Applied Biosystems) were used for qPCR analysis with the default cycling program. Standard curves for each primer pair were generated with five ten-fold serial dilutions of appropriate control plasmids in yeast RNA, allowing absolute quantification of DNA levels. Primer specificity was confirmed by a melt-curve analysis. Each primer pair detected the full range of standards with a correlation of R2