Using RNA-seq technology, we profiled the whole transcriptomes from CRC, adjacent non-tumor and normal tissue with extremely thoroughly. In total, approximately 50–70 million reads were generated per sample, which enabled us to quantify the gene expression abundance at a wide range 
. The number of expressed genes (FPKM >0) detected in our study is approximately 67% of the total UCSC reference genes per sample, representing the majority of the transcriptome.
Alternative regulation of gene expression can be achieved by transcriptional and post-transcriptional regulation. The first class of dysregulation of CRC at the transcriptional level has been well studied using microarray technology 
. Quantifying the second class of regulatory change remains challenging despite the invention of the exon array 
. RNA-seq technology enables the simultaneous study of these two different mechanisms 
. In our study, we investigated transcriptional dysregulation by analyzing the DEGs. Then, we used pair-end cDNA sequencing to more efficiently identify the alternative splicing. Moreover, by employing the MISO algorithm, we were able to measure the relative expression level of different isoforms produced by the exon-skipping events, which are quantitative measurements of alternative splicing. Interestingly, the genes affected by these two different regulatory mechanisms are largely independent (), suggesting versatile ways to reprogram the cancer transcriptome.
The local invasion and distant metastasis of cancer has been considered a multistep processes composed of the regulatory changing of intracellular circuitry and the complex interaction between cancer cells and their microenvironment 
. During invasion and metastasis, frequent remodeling of the extracellular matrix enables cancer cells to disseminate from primary tumors and invade normal tissue. In our study, we found that many genes related to extracellular matrix (ECM) receptor interactions are highly dysregulated in a cancer-restricted manner. The ECM is composed of several types of macromolecules, including collagen-type proteins, laminins, tenascin and other adhesion molecules 
. All of the collagen-type genes, including type I–IX collagen, are up-regulated 10- to 1000-fold in cancer tissue (Table S7
). Although there is some concordance between our observations and previous studies on the up-regulation of collagen mRNA in colorectal cancer tissue 
, the pervasive induction of collagen mRNAs is unique to our study. These findings suggest that the reprogramming of the collagen protein family network during colon cancer development can be much more complex than previously thought. In addition, we also noted that members of the matrix metalloproteinase (MMP) family, which degrade ECM structures 
, are also significantly induced in cancer tissues, consistent with a previous report 
. The fold change in the expression of the MMPs ranged from 10-fold (MMP1, MMP3 and MMP14) to 554-fold (MMP7). Meanwhile, other cell-cell adhesion-related molecules, such as laminins (LAMA4, LAMA5, LAMB1, LAMB2 and LAMC2) and integrins (ITGA5, ITGA5, ITGB5, ITGA11 and ITGBL1), are elevated in cancer tissues. We also detected the up-regulation of vascular endothelial growth factor (VEGF), suggesting that the “angiogenesis switch” is activated in cancer tissue. Taken together, the global up-regulation of the ECM pathway and the angiogenic growth factor indicates that CRC progression leads to massive ECM remodeling and the expansion of new vessel networks. Moreover, previous studies have shown that genes in the ECM pathway are under intensive epigenetic modification 
and thus may be novel prognostic biomarkers; thus, our study provides greater insight into using expression changes in ECM pathway members as candidate biomarkers.
Gene fusion, which often results from a genomic aberration, has been shown to be the key mechanism for generating chimeric “oncogenes” that initiate tumorigenesis or contribute to tumor progression (reviewed in 
). Using the RNA-seq technique, the expressed gene fusion transcript that is more likely to produce a functional product can be detected 
. Given that common gene fusion is rare in CRC 
, identifying case-specific gene fusion can help to understand the complexity of the molecular basis of CRC development. In this study, we detected a cancer-restricted gene fusion between PTGFRN and NOTCH2 in CRC. In addition, the gene fusions between the immunoglobulin lambda variables and IGLL5 were detected in the filtering result of TopHat-Fusion (Table S5
), which might represent immune rearrangements in tumor-associated B cells. Previous studies suggested that the consequence of gene fusion can be i) an alteration of gene expression 
; or ii) the generation of a truncated or chimeric protein with a different function 
. Because the PTGFRN-NOTCH2 transcript only includes a small portion of PTGFRN and the expression of PTGFRN and NOTCH2 are not down-regulated in CRC, we reason that the original functions of these two genes are not affected by this fusion event, and therefore, the gain of function of this fusion construct will be particularly interesting for future study. Given that the majority of the fusion gene is composed of NOTCH2, the function of this fusion product could be more related to that for NOTCH2. NOTCH2 is a homolog of NOTCH1 and plays a role in a variety of developmental processes by controlling cell fate decisions. NOTCH2 expression has been shown to be a prognostic predictor and is related to the tumor differentiation status in CRC 
. In addition, the gain of function of truncated NOTCH2 with nonsense mutations causes an autosomal dominant skeletal disorder 
. Therefore, NOTCH2 may play an important role in CRC development, and the PTGFRN-NOTCH2 gene fusion could introduce dominant negative effects on the normal development program.