The majority of mammalian genes generate multiple transcript variants and protein isoforms through alternative transcription and/or alternative splicing, and the dynamic changes at the transcript/isoform level between non-oncogenic and cancer cells remain largely unexplored. We hypothesized that isoform level expression profiles would be better than gene level expression profiles at discriminating between non-oncogenic and cancer cellsgene level.
We analyzed 160 Affymetrix exon-array datasets, comprising cell lines of non-oncogenic or oncogenic tissue origins. We obtained the transcript-level and gene level expression estimates, and used unsupervised and supervised clustering algorithms to study the profile similarity between the samples at both gene and isoform levels.
Hierarchical clustering, based on isoform level expressions, effectively grouped the non-oncogenic and oncogenic cell lines with a virtually perfect homogeneity-grouping rate (97.5%), regardless of the tissue origin of the cell lines. However, gene levelthis rate was much lower, being 75% at best based on the gene level expressions. Statistical analyses of the difference between cancer and non-oncogenic samples identified the existence of numerous genes with differentially expressed isoforms, which otherwise were not significant at the gene level. We also found that canonical pathways of protein ubiquitination, purine metabolism, and breast-cancer regulation by stathmin1 were significantly enriched among genes thatshow differential expression at isoform level but not at gene level.
In summary, cancer cell lines, regardless of their tissue of origin, can be effectively discriminated from non-cancer cell lines at isoform level, but not at gene level. This study suggests the existence of an isoform signature, rather than a gene signature, which could be used to distinguish cancer cells from normal cells.