Colorectal adenoma formation and further progression into carcinomas is caused by accumulation of (epi)genetic alterations. As such, one might expect colorectal carcinogenesis to be a stochastic process in which, sooner or later, malignant progression is an inevitable event. However, biologically and clinically, colorectal adenomas form a distinct intermediate stage in CRC development from normal colon epithelium. It is estimated that only 5% of adenomas ever progress into adenocarcinomas, indicating that carcinoma formation from adenomas requires significantly different biological and, therefore, molecular alterations than those involved in adenoma formation from normal epithelium. Given the high prevalence of colorectal adenomas and the relatively low progression rate, one could argue that especially biological processes involved in adenoma-to-carcinoma progression are most relevant from a clinical point of view. We here report identification of six cancer-related biological processes whose activity is increased in CRCs compared to adenomas (Table ) and a list of key genes whose increased mRNA expression levels are associated with malignant transformation (Table ). For two of these genes, i.e., AURKA and PDGFRB, differential expression was verified at the protein level (Fig. ).
The GSEA carcinogenic pathway analysis performed in the present study was restricted to a limited number of 16 carefully selected cancer-related gene sets for two reasons. First, although large databases that contain numerous predefined gene sets are available, such as the Molecular Signatures Database (
www.broad.mit.edu/gsea/msigdb), none of these contain a well-defined subset of gene sets representing various biological aspects of carcinogenesis. Second, it is not recommended to perform GSEA using large groups of gene sets that are not relevant to the research question addressed, as this will increase the multiple testing problem and lead to unnecessary decrease of statistical power. Therefore, we set out to select gene sets representing cancer-related processes using two strategies, one based on Gene Ontology terms (seven gene sets) and one based on a PubMed literature search of in vitro and in vivo experimental data (nine gene sets). For four cancer-related processes (proliferation, differentiation, hypoxia, and angiogenesis) gene sets were obtained using both strategies, allowing to compare their value for GSEA carcinogenic pathway analysis. The experiment-derived “proliferation” and “differentiation” gene sets yielded a significant difference between adenomas and CRCs while their GO-derived equivalents did not. In contrast, the GO-derived “angiogenesis” gene set yielded a significant difference while the experiment-derived gene set did not. No significant differences were observed at all for the experiment- and GO-derived “hypoxia” gene sets. These data illustrate that both strategies revealed useful gene sets for GSEA carcinogenic pathway analysis. However, they also imply that optimal gene sets may not be available yet for all (colorectal) cancer-related processes.
The GSEA carcinogenic pathway analysis results indicated significantly different rates of chromosomal instability, proliferation, differentiation, angiogenesis, stroma activation, and invasion between colorectal adenomas and CRCs (Table ). These results fit current knowledge about malignant transformation. Chromosomal instability increases the rate of genomic alterations, necessary to bypass the rate-limiting steps in carcinogenesis [
19]. Analysis of chromosome copy number changes by comparative genomic hybridization has demonstrated that CRCs exhibit much more chromosomal instability than adenomas [
3,
8]. The present data show that the chromosomal instability gene set changes highly significantly in activity during adenoma-to-carcinoma progression. This further emphasizes the importance of chromosomal instability in colorectal adenoma-to-carcinoma progression. Chromosomal instability might even be the driving force in tumor progression by initiating the changes in other cancer-related biological processes. Although the balance between proliferation and differentiation is already abnormal in adenomas, proliferation rates further increase during the adenoma–carcinoma sequence [
20]. Angiogenesis is induced by growing tumors in an attempt to meet their increasing demand for oxygen and nutrients. Microvessel density, a widely used surrogate marker for angiogenesis, has been shown to be increased in CRCs compared to colorectal adenomas [
21]. In comparison to adenomas, CRCs also contain much more tumor stroma, which is often composed of reactive tissue that resembles wounds that do not heal [
5]. Interestingly, the amount of stroma differs widely among CRCs [
22], and a high stroma percentage has been correlated with poor prognosis in CRC patients [
23]. Hence, increased expression of the invasion gene set by CRCs fits the concept of adenoma-to-carcinoma progression.
For several cancer-related processes, no significant differences were revealed between adenomas and CRCs, i.e., for gene sets representing apoptosis, cell cycle, hypoxia, immune response, tumor-associated macrophages, and metastasis. One interpretation is that these biological processes are more relevant during formation of colorectal adenomas from normal colon epithelium, than during adenoma-to-carcinoma progression. Alternatively, although these biological processes could play a role in malignant transformation, the selected gene sets may not adequately represent the in vivo situation analyzed here. For instance, GO-derived gene sets are composed of groups of genes known to be involved in similar biological processes, irrespective of whether they actually function in a coordinated manner or not. In contrast, experiment-derived gene sets are composed of groups of genes that are coordinately expressed during certain biological processes; however, assumptions have been made about conservation of these gene sets across species, across tumor types and about the validity of extrapolation from in vitro to in vivo settings. Nevertheless, although our approach may underestimate the effects of some cancer-related processes in adenoma-to-carcinoma progression, the positively identified gene sets yield valuable information for further investigation, such as the identification of key genes for malignant transformation. Expression of individual genes within gene sets that were positively identified by GSEA yielded a list of these key genes for various carcinogenic processes that may be used for molecular characterization of series of tumor samples (Table ). Some of these genes have been described to contribute to (colorectal) carcinogenesis. From the “chromosomal instability” gene set,
AURKA and
TPX2 (targeting protein for XKLP2) have been reported to interact with each other and to play a role in centrosome maturation and spindle formation [
24]. Aberrant expression of
TPX2 has been reported in breast, endometrial, and lung cancer and in neuroblastoma [
25]. Furthermore, TPX2 overexpression at the protein level was found to be associated with poor prognosis in lung cancer [
25].
AURKA, when overexpressed, induces centrosome amplification, aneuploidy, and cellular transformation in vitro [
26]. In nasopharyngeal carcinoma,
AURKA overexpression was correlated with clinical stage and invasiveness, and inhibition with small molecules or RNA interference reduced cell invasion in vitro [
27]. From the “proliferation” gene set, polo-like kinase 1 (
PLK1) is thought to play a role in spindle formation and in cell cycle progression during the G
2 and M phase [
28]. Interference with
PLK1 expression decreases proliferation, induces apoptosis, and affects spindle assembly in vitro [
29]. Moreover, down-modulation of
PLK1 expression was found to inhibit growth of bladder cancer in mice [
29]. Expression of
PLK1 and
CCNF (cyclin F), which contributes to the G
2 to M phase transition, have been related to response to radio and chemotherapy [
30,
31]. From the “invasion” gene set, secreted protein acidic and rich in cysteine (
SPARC) (also known as osteonectin) is overexpressed in CRCs and induces proinvasive activity [
32]. PDGFRB is upregulated within CRC tumor stroma, and blocking of PDGFRB signaling has been shown to inhibit colon tumor growth and metastasis [
33].
Immunohistochemical analysis of a series of colorectal adenomas and adenocarcinomas was used to verify the expression of some of the key genes at the protein level, i.e., AURKA and PDGFRB (Fig. ). CRCs exhibited more frequently more intense staining for both proteins than adenoma tissue. Therefore, both AURKA and PDGFRB may have the potential to be used as markers indicative for the activity level of “chromosomal instability” and “invasion,” respectively. For AURKA, protein staining was restricted to epithelial cells, indicating that AURKA influences CRC progression by its effect on tumor cells. PDGFRB staining was predominantly observed within tumor stroma, suggesting a stromal effect of PDGFRB on cancer progression. In this way, information on protein expression helps to put mRNA expression data into biological context.
In summary, GSEA was applied as a tool for pathway analysis of gene expression using a restricted number of gene sets representing cancer-related biological processes. Expression of six gene sets was increased in CRCs compared to adenomas, of which the chromosomal instability pathway was most prominent. Subsequently, key genes within these gene sets that exhibited significant differential expression were identified. Further research is required to explore whether these genes can be used as tumor markers for malignant transformation, and/or as drug targets for distinct carcinogenic pathways that contribute to colorectal adenoma-to-carcinoma progression.