There are many design differences between the exon array and the U133 Plus 2: feature size, the number of probes per probeset, the number of probesets per transcript, background calculation, the assay procedure, etc. The most obvious difference is that the exon array interrogates exons/subexons throughout the transcript, while the U133 Plus 2 generally targets only the 3' end. Alternative splicing and, particularly, alternative polyadenylation sites may account for substantial differences in gene signal estimation. Furthermore, our method of averaging the signal value of multiple probesets targeting the same transcript on the U133 Plus 2 is only a rough compromise. Despite the many differences, the gene-level comparison demonstrated a reasonable correlation in signals for genes that are significantly different between tissue types. The most notable difference in signal estimations is the shift of apparently low expressing genes on the U133 Plus 2 array to higher signals on the exon array. Since the exon array generally contains 3 to 4 times as many probes per transcript as the U133 Plus 2 array and the probesets are distributed throughout the transcript cluster, the exon array may be more sensitive in detecting a subset of low expressing genes, at least in this data set. Resolution of, for example, the portability of results between the arrays types, or the characteristics of transcripts that are differentially detected, will require several dedicated studies, for which data is becoming available.
Among the 160 genes differentially expressed in the Core gene set, 60 have been previously identified as participating in cancers, with 21 specifically in colon or colorectal cancer. Almost one third of the up-regulated genes are part of a tightly interconnected network involved in mitosis, cell cycle control, cell proliferation, invasion, matrix remodeling and Wnt signaling. Constitutive activation within the Wnt signaling pathway has been a prevalent theme in colon cancers, in particular the role of β-catenin [
18,
34]. Eight of the over-expressed genes here (
BIRC5,
MMP7,
VEGF,
ENC1,
CCND1,
STRA6,
MET and
CLDN1) are targets of Wnt/β-catenin regulation and two other Wnt-associated genes,
SOX9 and
HIG2, are up-regulated in these colon cancer samples.
Twenty nine of the tumor-expressed genes have unknown or weakly annotated functions, but several may be involved in cell proliferation or apoptosis (
MCTS1,
SIL,
HSPA5BP1,
WDR18 and
IFITM1) and at least eight have previously been associated with various cancers (
Additional file 1). Thirteen putative transcriptional regulators and most of the genes classified as signal transducers have unknown biological roles. Another seven genes annotated only as "hypothetical protein" or "open reading frame" (
MGC4677,
C12orf11,
FLJ10726,
C6orf167,
FLJ31153,
FLJ20272,
DKFZp762E1312,
KIAA1217 and
FLJ10719) are strongly expressed above background. The expanded analysis based primarily on
ab-initio exon predictions identified 38 more genes with significant expression even though they completely lack previous annotation (
Additional file 2). These transcripts with unknown functions may represent a novel set of targets for study in colon cancer oncology and demonstrate the exploratory power of an inclusive array design.
We have identified a number of genes that are differentially spliced between normal and cancerous tissue. Most of the tissue-specific alternative splicing events that were experimentally validated occurred in genes involved in cytoskeletal structure, the extracellular matrix or cell-cell interactions. Some of these events are reported splice variants that occur in a tissue-specific manner and may represent a loss of tissue function as colonic epithelial and smooth muscle cells dedifferentiate, rather than abetting transformation or metastasis. Determination of the role of these splice events requires more detailed study, but in most cases these genes have previously been implicated with active roles in the progression of tumors.
Five of the genes (
TPM1,
ACTN1,
VCL,
CTTN and
CALD1) that we found to be alternatively spliced in colon cancer have actin-binding domains and play a direct role in the organization or structure of the cytoskeleton. Remodeling of the cytoskeleton is fundamental in proliferation, apoptosis, cell invasion and metastasis [
35].
TPM1 appears to act as a tumor suppressor by promoting anoikis (apoptosis induced by cell detachment) [
36]. Down-regulation of
TPM1 by oncogenic transformation results in a loss of actin stress fibers [
37,
38], whereas restoration of
TPM1 inhibits cell migration in metastatic cell lines [
39]. A splice variant of one of the low molecular weight isoforms of tropomyosin has been found specifically in association with colonic polyps and adenomas, but not normal colon tissue [
40].
Actinin is a component of stress fibers and links the cytoskeleton to adherens-type junctions. It plays a role in cell migration probably by facilitating detachment of focal adhesions distal to the direction of movement [
41]. Alternative splicing of actinin-4, which has a high sequence similarity to ACTN1, apparently leads to an abnormal cytoskeleton in small cell lung cancer [
42]. ACTN1 also has a binding site for VCL, and the two proteins cooperate to organize the cytoskeleton at adhesion junctions [
43].
CALD1 binds actin and responds to calmodulin to promote stress fibers and focal adhesions, and CALD1-defective cells are highly impaired in motility [
44]. In cells transformed by Kaposi sarcoma-associated herpes virus (KSHV) or v-
erbB, hypermethylation of
CALD1 and recruitment of its product into membrane complexes is linked to the loss of cytoskeletal microfilaments [
45,
46].
CTTN, frequently overexpressed in breast cancer and squamous cell carcinomas, is highly enriched at tumor invasion fronts [
47,
48]. Two conformational forms of CTTN are known, with both forms present in normal cells, but the apparently larger one (p85) is more prevalent in colorectal cancers [
47]. Two splice variants that affect cell mobility have been previously identified: SV-1 (lacking Exon 11) and SV-2 (lacking Exons 10 and 11). SV-1 and full-length
CTTN were equally abundant in normal cells while SV-2 was barely detectable. SV-1, but not SV-2, can bind and crosslink actin, but overexpression of either form interferes with cell migration [
49]. Our results indicate that transcripts with or without Exon 11 are approximately equal (in agreement with previous data), but transcripts carrying Exon 11 (i.e., full length) are relatively more abundant in colon cancer samples, suggesting that these cells may be more competent for motility.
Alternative splicing of
FN1,
CD44 and
COL6A3 may play some role in matrix remodeling and/or cell migration in colon cancer, though
COL6A3 has not been previously implicated in this role. Fibronectin was one of the consistently up-regulated genes in an artificial selection for highly metastatic cell lines, which also identified
ACTN1 and several collagens [
50]. Two splice variants of
FN1 have been implicated in the neo-vasculature of a variety of human tumors but not in normal adult tissues, however the role of these species in tumor-related angiogenesis is unclear [
51]. Alternatively spliced
FN1 containing an extra domain has been found frequently in cancers [
3], whereas we find preferential skipping of Exon 25 in tumor tissues. CD44 is involved in both cell-matrix and cell-cell interactions as well as connections to the actin cytoskeleton via ankyrin. The variably spliced region of
CD44 (exons 6–15) is preferentially included in many cancer types and appears to affect cell migration, invasion and metastasis [
5].
Integrin ITGB4 interacts with the intermediate filament network, stimulates the Ras and PI3-K signaling pathways, and appears to be important for cell invasion in colon cancer [
52,
53]. SLC3A2, which functions in transmembrane transport, associates with integrins and appears to participate in integrin-mediated anchorage-independent cell growth and tumorigenesis in 3T3 fibroblasts [
23,
54]. The inclusion of several exons of
SLC3A2 in colon cancer transcripts may represent a tumor-specific alteration of its role in integrin signaling.
Of the eleven differentially spliced genes we found or confirmed, ten are involved in the organization of the cytoskeleton or interaction with the matrix or other cells. Seven of these genes (
TPM1,
CALD1,
CTTN,
FN1,
CD44,
ITGB4 and
SLC3A2) have previously been implicated in cancers, and, in five cases, specific splice variants are correlated with the cancerous state. This grouping may represent a coherent, and possibly coordinated, set of alterations which may impact cell mobility and extracellular interactions. A similar concentration of splice variants was found in mouse brain, where targets of the Nova splicing regulon are clustered into functions affecting synaptic transmission and cell morphology. In fact, the targets appear to act as a modular network that impacts not only signaling functions, but also specifically the actin cytoskeleton, extracellular matrix and cell-cell adhesion [
55]. It is possible that a similar splicing network is altered in colon cancer to produce the complex of interacting splice variants seen in this study. Such patterns may be more apparent in a highly parallel genome-wide exon analysis than in traditional methods involving gene-by-gene searches.
One assumption of this type of analysis is that biologically important splicing changes would consistently appear in samples of a particular category (i.e., colon cancer). In some cases, our samples did not reproduce some previously reported events (
Rac1,
VEGF,
SIAHBP1, and
MST1R), or else the changes were sporadic (
CD44). Furthermore, we find differences even in normal samples with regard to Exon 6 of
VEGF compared to the mass spectrometry results of McCullough et al. [
30]. In fact, the most relevant splicing change in
VEGF for tumorigenesis may be a splice variant involving Exon 8 [
56]. In this situation, it may be difficult to resolve what the normal and abnormal states are for
VEGF. Inconsistent results may be due to differences in samples, experimental procedures, analyses or interpretations, suggesting that conclusions about subtle changes in splicing may need to be reinforced by multiple sources. On the other hand, many of our validated splicing events have been observed in several other instances. Nearly identical patterns of differential alternative splicing was found in colon tumors by Okumura et al. for
TPM1,
ACTN1 and
ITGB4, and these patterns were even more emphatic in tumor cell lines. Interestingly, differential splicing of
CD44 in colon cancer was not observed in that study [
57].
ITGB4 and
TPM1 were identified by an EST-based analysis, and experimentally validated as differentially spliced in several tumor types [
58]. The alternative splicing of
CTTN is consistent with protein alterations seen in association with colon cancer [
47]. Our sample set ranged from well differentiated to poorly differentiated (i.e., advanced) tumors, yet
ACTN1,
ATP2B4,
VCL and
CALD1 show strong and consistent changes across our samples (
Additional file 5). The presence of these splice variants in multiple tumor stages argues that they must be both early and persistent events. Finally, differences in splicing patterns may be due to biologically important differences in cancer etiology, and therefore be useful indicators of tumor subtypes or stages.
With the large amount of EST and genomic data available, a great deal of useful information may be gained from
in silico prediction of transcript isoforms. Validation rates from such methods have been fairly high and the resulting pattern of predictions are mostly in line with empirical data. There are a number of potential hazards in this approach: EST libraries are highly variable in quality and reliability, it is difficult to account for differential gene expression, and tissues and individual genes are unevenly represented. Almost 45% of EST libraries are from cancer samples [
33] and 70% of mRNAs in GenBank are cloned from tumor samples [
59], leading to a strong bias against identifying isoforms present in normal tissues. Also, in spite of various methods to remove or normalize differentially expressed genes, this factor may lead to overprediction [
32]. The exon array addresses many of the difficulties by providing an empirical platform that is unbiased with regard to tissue or gene representation, and allows for direct normalization of differential gene expression. While a substantial proportion (32%) of our top candidates for alternative splicing matched predictions from three EST-derived methods, only four of our validated splicing events appeared among the 2797 genes with bioinformatically predicted splice variants. This suggests that there are substantial gaps in current EST databases that must be addressed empirically.
The splicing changes seen here are not necessarily as dramatic as seen in a previous study of 16 pure normal tissues with an exon-based microarray (Clark
et al., in prep.). This highlights the subtlety of most splicing events, which are typically not an all-or-none phenomenon, so sample size and homogeneity are important considerations. In spite of limitations which are frequently encountered in cancer and other tissue studies in humans, such as a modest sample size, heterogeneous tissues, and multiple categorical variables (tumor stage, gender, and individual patient variation), we were able to identify and validate a number of candidate colon cancer-specific splicing events. Exploration of alternative splicing will promote understanding of cancer etiology and may provide therapeutic targets and diagnostic markers [
2,
3].