The approach presented here has provided data for quantitative and architectural aspects of two mammary cell-line poly (A)+ transcriptomes, and has shown the potential to fully represent transcripts. Restriction enzyme digestion revealed advantage over physical methods for cDNA fragmentation, such as prompt identification of spurious gene fusion reads produced during cDNA library construction, by the presence of enzymatic restriction sites in the fusion junction of the reads.
High-throughput transcriptome sequencing has been used to identify genomic rearrangements resulting in gene fusion events
[23],
[35],
[36],
[37] and superior sensitivity was achieved when paired-end sequencing was applied
[36],
[37],
[38]. Single-end sequencing of long or short reads has led to low validation rates, which seem to be increased when single-end long and short sequencing are used in combination
[35]. The low validation rate obtained here (21%) reinforces the difficulty of confirming these events using PCR-based approaches. Nonetheless, despite using single-end long reads sequencing we identified three bona fide gene fusions, reported here for the first time to the best of our knowledge.
One of the validated gene fusions reported 10 nt of micro-homology sequence, also detected in 82% of the gene fusion candidates, probably resulting from a replication mechanism known as Fork Stalling and Template Swichting Model (FoSTeS). These replication disorders arises due to nucleotide similarities between DNA strands
[39], and have been detected in breast cancer samples
[40].
For exploring SNPs, we used stringent bioinformatics and manual inspection that resulted in a high rate of confirmation (89%), including validated SNPs with apparent allelic dosage imbalance in C5.2 cells. Whether ERBB2 overexpression can mediate allelic dosage imbalance during transcriptional process remains to be addressed.
The detection of alternative splicing by our method is enhanced by the longer fragments produced by the 454-platform, compared to other next-generation sequencing technologies. Here we showed a 90% validation rate of the exon inclusion splicing variant class. An extrapolation of this value over the 1,704 novel AS events in multi-exon splicing variants, with conserved splice sites, identified by RNAseq would result in 1,533 bona-fide AS events. Our approach therefore demonstrates a high capacity for identification of novel splicing variants and, consequently, for the definition of the mammary transcriptome.
Amplification of the
ERBB2 oncogene is considered an important tumor driver
[41], and has been reported in approximately 25% of breast cancers
[11]. The quantitative transcriptional aspect of overexpression of the oncogene has been previously assessed by 3′ end sequence methodology
[18],
[19]. However, only whole transcriptome sequencing enables the assessment of some relevant structural aspects. The influence of
ERBB2 was observed in quantitative aspects of breast cell line transcriptomes, not only on gene expression but also on specific splicing variants. Enrichment of exon skipping/inclusion and alternative splice site selection by
ERBB2 overexpression in C5.2 cells, observed in the current study, indicated a potential influence of the oncogene in the regulation of the splicing process. In this sense, evidence has already been presented by us concerning expression level regulation of specific AS variants mediated by
ERBB2 [42]. Additionally, it has been suggested by others
[43] that activation of signaling pathways such as Ras/MAPK and PI3K/AKT, which are controlled in part by ERBB2 signaling, might influence the alternative splicing balance of cells, by phosphorylation and activation of specific splicing factors.
The intrinsic molecular heterogeneity found between distinct human tumor samples as well as within a single breast tumor sample has been reported by many laboratories
[7],
[44]. These differences appear to be strongly dependent upon microenvironmental factors
[5],
[45]. Despite differences in molecular characteristics between cells
in vivo and
in vitro, our approach allowed us to identify 4 genes, the expression of which is likely mediated by
ERBB2. We highlight
LOX downregulation in C5.2 cells as well as in tumor samples overexpressing
ERBB2. Furthermore, our data also revealed a higher level of
LOX expression in C5.2 cells after exposure to rapamycin (4-fold change), data indicating that
LOX is potentially regulated by the ERBB2/mTOR pathway.
LOX encodes an extracellular copper-requiring enzyme that initiates collagen and elastin crosslinking and enhances tumor cell invasion and metastasis
[46]. Conversely, the 18-kDa LOX propeptide was found to be an effective inhibitor of the more invasive phenotype of breast cancer cells driven by
ERBB2 and has been suggested to improve treatment in this subtype of breast cancer
[47].
Altogether, the results presented here demonstrate that our approach is suitable for whole transcriptome interrogation, with single or multiple samples in parallel sequencing by the 454-ROCHE platform, from which an accurate quantitative and qualitative portrait of complex transcriptomes can be generated.