In this study we have performed an extensive identification and verification of alternative splice variant gene expression in NSCLC. To our knowledge, this study is the first such genome wide analysis of alternative splicing events in NSCLC or any other tumor type. Our results indicate that approximately 13% of the 17 800 core RefSeq genes appear to have alternative transcripts that are differentially expressed between lung adenocarcinoma and adjacent normal lung tissue. Furthermore, the largest subsets of these alternatively spliced genes appear to be cancer related and/or involved in cellular processes such as growth and proliferation. For some genes, alternative transcripts have already been identified but we now demonstrate their differential expression in cancer. In many cases however, our microarray data indicates the presence of differentially expressed alternative transcripts that are currently unidentified. Thus it appears that differential expression of alternative transcripts is frequent in NSCLC and that this may be a valuable resource for the development of novel diagnostic, prognostic and therapeutic tools.
While the ability to analyze alternative transcript expression on a genome wide scale is very powerful, verification and validation of this data is labor intensive. For this reason, we chose to focus on genes that have previously been associated with cancer, and where differential expression occurred in >50% of tumor/normal tissue pairs. In total, 11 genes were examined and we were able to validate the array data for six of them. Thus, verification and validation of data from the exon arrays is clearly required. Alternative transcript expression for four of these genes was studied in more detail.
avian erythroblastosis virus E26 oncogene homolog) protein shares significant homology with both 5′and 3′ regions of the viral ETS1
, suggesting that it belongs to the ETS
oncogene family. ERG
is located at chromosome band 21q22 and has been identified as the target of genomic rearrangement events in acute myeloid leukemia (46
), Ewing's sarcoma (47
) and prostate cancer (48–50
). In acute myeloid leukemia (AML) and Ewing's sarcoma ERG
has been associated with several translocation fusion partners including ELF4
). Furthermore, high expression of ERG
in the absence of karyotypic rearrangement or amplification was demonstrated to be an adverse prognostic factor in patients with AML (53
). In prostate cancer, ERG
is frequently fused to a nearby gene, TMPRSS2
resulting in androgen regulation of ERG
and several reports now indicate that the presence of this fusion is a poor prognostic indicator in prostate cancer (54
). Interestingly, none of the reports cited above discuss the existence of two ERG
variants and how these are related to the fusion product. Our analysis of variant-specific expression clearly showed that variant 2 has much higher expression in tumor compared to the paired normal lung tissue while variant 1 has similar or lower expression in tumors. Thus it seems that the oncogenic effect of ERG
may be exerted through functions encoded by variant 2 and expression of fusion gene products should also be closely evaluated for which ERG
variant is present.Further investigations are required to identify the functional differences between the two variants and could lead to more targeted drug discovery.
The cadherins are a family of transmembrane proteins that mediate calcium-dependent cell-cell adhesion at adherens junctions. The cytoplasmic domain of cadherins binds to A and G catenins and is linked to the actin cytoskeleton via A catenin (56
). These interactions are vital for stable cell-cell interactions and maintenance of normal cell physiology. In cancer, disruption of the adherens junctions, for example by downregulation or inactivating mutation of cadherins, can result in epithelial-to-mesenchymal transition, increased proliferation, invasion and metastasis (56
). In part, this may be mediated by the release and accumulation of B catenin which, when translocated to the nucleus induces transcription of genes such as cyclin D1 and c-myc.
While the role of the prototypic cadherin, E-cadherin (CDH1
) as a classic tumor suppressor gene in cancer is well established the role of P-cadherin remains unclear as it behaves differently depending on the tumor type being studied. For example, in melanoma, the loss of P-cadherin (and E-Cadherin) allows invasion and migration of cells and thus P-cadherin appears to be acting as a pro-adhesion tumor suppressor (58
). In breast cancer however, high expression of P-cadherin strongly correlated with high histologic grade, increased proliferation and poor patient survival (60
). Furthermore, in pancreatic cancer cell lines, overexpression of P-cadherin resulted in increased cell motility, cytoplasmic accumulation of catenins and activation of the Rho GTPases, Rac1 and Cdc42 (62
In our study we found overexpression of P-cadherin in lung tumors compared to normal lung but also identified overexpression of an alternative splice variant in which exon 2 is missing. Analysis of the resulting mRNA indicates that the normal ATG initiation codon is placed out of frame and would result in a truncated protein after only 27 amino acids. This would clearly result in an inactive protein and would fit with a tumor suppressor function for P-cadherin in lung cancer if it were not for the fact that full length P-cadherin mRNA is actually overexpressed in our tumors. However, upon further analysis we identified several alternative in frame ATG codons downstream of the known translation start site. Furthermore, at least two of these putative alternative start sites have kozak sequences that are believed to be active in other genes. Protein translation initiated at either of these sites would result in a P-cadherin protein lacking the signal peptide and most of the extracellular domain, while retaining the transmembrane domain, juxtamembrane domain and the catenin binding domain. If such a protein were to be overexpressed in tumors one can easily envision disruption of the adherens junctions in a dominant manner leading to catenin accumulation and tumorigenesis.
Its is well known that multiple transcripts are transcribed from the CDKN2A
(cyclin-dependent kinase inhibitor 2A) locus. CDKN2A
is an extensively studied tumor suppressor locus that is frequently mutated or deleted in a wide variety of tumor types. Exploration of the RefSeq database identified four transcript variants potentially transcribed from this locus of which three are considered verified. Variant 1 gives rise to the p16 protein and variant 4 gives rise to the alternative reading frame p14/ARF protein. Variant 4 also gives rise to a shorter protein product (p19smARF) which results from an alternative translation start site (63
). Variant 3 gives rise to a longer protein that shares the same reading frame as p16 and appears to be specifically expressed in the pancreas. In addition, another transcript variant (p16γ) was recently identified (64
) but has yet to be curated in the RefSeq databases. Finally, variant 2 lacks exons 1α and 1β, and exon 2 is slightly longer due to inclusion of an additional 100 bases of intronic sequence. Variant 2 may also have a shorter 3′ UTR than p16 or Arf. Variant 2 was originally cloned from testis tissue but has been temporarily removed by RefSeq staff for further evaluation.
In cancer, inactivation of the p16INK4a/ARF tumor suppressor genes is frequently mediated through genomic deletion, promoter methylation or inactivating mutation leading to loss of p53 and Rb dependent cell cycle regulation (65
). In NSCLC, loss of heterozygosity and/or homozygous deletion of the CDKN2A
locus on chromosome 9p21 has been reported at frequencies up to 40% (8
). In our study, ~30% of tumors demonstrated reduced expression of all three measured transcripts (p16, ARF and variant 2) and this is likely a result of genomic deletions. However, in the remaining tumors expression of ARF and variant 2 (but not p16) were significantly higher than in paired normal tissue. Overexpression of ARF in cancer has now been reported several times (66–68
) and has been associated with poor differentiation status in hepatocellular carcinoma (67
) and worse outcome in B-cell lymphomas (66
). Similarly, overexpression of p16 has also been observed and has been associated with progression and poor survival in ovarian cancer (69
), prostate cancer (70
) and breast cancer (71
). While overexpression of p16 and ARF appears to contradict their known cellular functions as tumor suppressors, mechanisms have been proposed whereby this event may be explained through activating mutations in Rb or induction of myc
). However, our data suggests an alternative: that the variant 2 transcript may account for the previously observed overexpression. Variant 2 was originally believed to give rise to a new isoform (Isoform 2) of P16 with the first amino acid encoded by an in frame ATG that is present in the original exon 2. However, we also identified an alternative ATG codon that is in the extended exon 2 and is in frame with ARF. This alternative ATG has a reasonably good Kozak sequence (CCGTCATGC) and, being upstream of the putative p16 isoform 2 start site, would presumably dominate translation initiation. This putative ARF isoform would lack the amino terminal portion of ARF and would therefore be unable to bind TBP-1, E2F, Myc, FoxM1, CTBP1 or mdm2 (74
) and may be unable to block cell cycle progression. However others have shown that the carboxy terminus of artificially truncated ARF still accumulates in the nucleolus (75–77
) and thus this putative ARF isoform could theoretically act as a dominant negative, thus explaining how overexpression of ARF may be pro-tumorigenic.
[carcinoembryonic antigen-related cell adhesion molecule 1 (biliary glycoprotein)] is a cell–cell adhesion molecule that also plays a role in signal transduction. Two common variants are known for CEACAM1
; one with a long cytoplasmic domain (L form or variant 1) and one with a short cytoplasmic domain (S form or variant 2). The expression of CEACAM1
in cancer has been extensively studied but early reports appeared to be contradictory. For example, reduced expression of CEACAM1
was reported in breast, colon, prostate and endometrial cancer (39
) and CEACAM1
was therefore considered to be a negative regulator of tumor cell growth. However, in melanoma (79
) and lung cancer (43
), several reports indicated that CEACAM1
was overexpressed in tumors and that this was associated with disease progression and poor outcome. In 1997 Turbide et al.
) found that the L form of CEACAM1
exhibited a tumor suppressive phenotype and that this was dominant over expression of the S form. Furthermore, using semi-quantitative RT-PCR Wang et al.
) found that the L form of CEACAM1
predominated in normal lung while the S form appeared more abundant in tumors. Thus they proposed that isoform switching rather than CEACAM1
downregulation occurs in NSCLC as opposed to other tumor types. Our quantitative analysis clearly demonstrates a switch in abundance from the L form (variant 1) to the S form (variant 2) in NSCLC and we also demonstrate that no such switch appears to occur in breast cancer or colon cancer. Furthermore, we also analyzed a publicly available GeneChip Human Exon 1.0 ST array data set from colon (33
) and found no significant differential expression of CEACAM1
variants in those 10 pairs of colon tumor/normal samples (data not shown). Thus our findings support the hypothesis that the tumor suppressive or oncogenic effects of CEACAM1
are splice variant dependent and that expression of the two variants is differentially regulated in different tissue types.
In conclusion, our data demonstrates that differential expression of alternative splice variants is a common event in NSCLC. It also shows that in addition to identification of novel, cancer-related splice variants, additional information can be gained even with regard to extensively studied, cancer-related genes. Splice variant expression should be considered in future genome-wide expression studies and may lead to novel diagnostic, prognostic or therapeutic strategies in the fight against cancer.
(GeneChip Human Exon 1.0 ST array cell files along with GC-RMA data from core gene probsets and patient information have been submitted to GEO databases and GEO Accession # is GSE12236).