|Home | About | Journals | Submit | Contact Us | Français|
This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
The expression of carcino-embryonic antigen by colorectal cancer is an example of oncogenic activation of embryonic gene expression. Hypothesizing that oncogenesis-recapitulating-ontogenesis may represent a broad programmatic commitment, we compared gene expression patterns of human colorectal cancers (CRCs) and mouse colon tumor models to those of mouse colon development embryonic days 13.5-18.5.
We report here that 39 colon tumors from four independent mouse models and 100 human CRCs encompassing all clinical stages shared a striking recapitulation of embryonic colon gene expression. Compared to normal adult colon, all mouse and human tumors over-expressed a large cluster of genes highly enriched for functional association to the control of cell cycle progression, proliferation, and migration, including those encoding MYC, AKT2, PLK1 and SPARC. Mouse tumors positive for nuclear β-catenin shifted the shared embryonic pattern to that of early development. Human and mouse tumors differed from normal embryonic colon by their loss of expression modules enriched for tumor suppressors (EDNRB, HSPE, KIT and LSP1). Human CRC adenocarcinomas lost an additional suppressor module (IGFBP4, MAP4K1, PDGFRA, STAB1 and WNT4). Many human tumor samples also gained expression of a coordinately regulated module associated with advanced malignancy (ABCC1, FOXO3A, LIF, PIK3R1, PRNP, TNC, TIMP3 and VEGF).
Cross-species, developmental, and multi-model gene expression patterning comparisons provide an integrated and versatile framework for definition of transcriptional programs associated with oncogenesis. This approach also provides a general method for identifying pattern-specific biomarkers and therapeutic targets. This delineation and categorization of developmental and non-developmental activator and suppressor gene modules can thus facilitate the formulation of sophisticated hypotheses to evaluate potential synergistic effects of targeting within- and between-modules for next-generation combinatorial therapeutics and improved mouse models.
The colon is composed of a dynamic and self-renewing epithelium that turns over every three to five days. It is generally accepted that at the base of the crypt, variable numbers (between 1 and 16) of slowly dividing, stationary, pluripotent stem cells give rise to more rapidly proliferating, transient amplifying cells. These cells differentiate chiefly into post-mitotic columnar colonocytes, mucin-secreting goblet cells, and enteroendocrine cells as they migrate from the crypt base to the surface where they are sloughed into the lumen . Several signaling pathways, notably Wnt, Tgfβ, Bmp, Hedgehog and Notch, play pivotal roles in the control of proliferation and differentiation of the developing and adult colon . Their perturbation, via mutation or epigenetic modification, occurs in human colorectal cancer (CRC) and the instillation of these changes via genetic engineering in mice confers a correspondingly high risk for neoplasia in the mouse models. Moreover, tumor cell de-differentiation correlates with key tumor features, such as tumor progression rates, invasiveness, drug resistance and metastatic potential [3-5].
A variety of scientific and organizational obstacles make it a challenging proposition to undertake large-scale comparisons of human cancer to the wide range of genetically engineered mouse models. To evaluate the potential of this approach to provide integrated views of the molecular basis of cancer risk, tumor development and malignant progression, we have undertaken a comparative analysis of a variety of individually developed mouse colon tumor models (reviewed in [6,7]) to human CRC. The ApcMin/+ (multiple intestinal neoplasia) mouse model harbors a germline mutation in the Apc tumor suppressor gene and exhibits multiple tumors in the small intestine and colon . A major function of APC is to regulate the canonical WNT signaling pathway as part of a β-catenin degradation complex. Loss of APC results in a failure to degrade β-catenin, which instead enters the nucleus to act as a transcriptional co-activator with the lymphoid enhancer factor/T-cell factor (LEF/TCF) family of transcription factors . The localization of β-catenin within the nucleus indicates activated canonical WNT signaling. In addition to germline APC mutations that occur in persons with familial adenomatous polyposis coli (FAP) and ApcMin/+ mice, loss of functional APC and activation of canonical WNT signaling occurs in more than 80% of human sporadic CRCs . Similar to the ApcMin/+ model, tumors in the azoxymethane (AOM) carcinogen model, which occur predominantly in the colon , have signaling alterations marked by activated canonical WNT signaling.
Two other mouse models that carry different genetic alterations leading to colon tumor formation are based on the observation that transforming growth factor (TGF)β type II receptor (TGFBR2) gene mutations are present in up to 30% of sporadic CRCs and in more than 90% of tumors that occur in patients with the DNA mismatch repair deficiency associated with hereditary non-polyposis colon cancer (HNPCC) . In the mouse, a deficiency of TGFβ1 combined with an absence of T-cells (Tgfb1-/-; Rag2-/-) results in a high occurrence of colon cancer . These mice develop adenomas by two months of age, and adenocarcinomas, often mucinous, by three to six months of age. Immunohistochemical analyses of these tumors are negative for nuclear β-catenin, suggesting that TGFβ1 does not suppress tumors via a canonical WNT signaling-dependent pathway. The SMAD family proteins are critical downstream transcription regulators activated by TGFβ signaling, in part through the TGFβ type II receptor. Smad3-/- mice also develop intestinal lesions that include colon adenomas and adenocarcinomas by six months of age .
To identify transcriptional programs that are significantly activated or repressed in different colon tumor models, we compared gene expression profiles of 100 human CRCs and 39 colonic tumors from the four models of colon cancer to mouse embryonic and mouse and human adult colon. The results of these analyses demonstrate that tumors from the mouse models extensively adopt embryonic gene expression patterns, irrespective of the initiating mutation. Although two of the mouse tumor subtypes were distinguishable by their relative shifts towards early or later stages of embryonic gene expression (driven principally by localization of β-catenin to the nucleus versus the plasma membrane), Myc was over-expressed in tumors from all four tumor models. Further, by mapping mouse genes to their corresponding human orthologs, we further show that human CRCs share in the broad over-expression of genes characteristic of colon embryogenesis and the up-regulation of MYC, consistent with a fundamental relationship between embryogenesis and tumorigenesis. Large scale similarities could also be found at the level of developmental genes that were not activated in either mouse or human tumors. In addition, there were transcriptional modules consistently activated and repressed in human CRCs that were not found in the mouse models. Taken together, this cross-species, cross-models analytical approach - filtered through the lens of embryonic colon development - provides an integrated view of gene expression patterning that implicates the adoption of a broad program encompassing embryonic activation, developmental arrest, and failed differentiation as a fundamental feature of the biology of human CRC.
Our strategy for the characterization of mouse models of human CRC (Figure (Figure1)1) relies on gene expression differences and relative patterning across a range of mouse CRC models, normal mouse colon developmental stages, and human CRCs. Achieving this comparison was facilitated by the use of reference RNAs from whole-mouse and normal adult colon reference RNAs for both mouse and human measurements. Mouse tumor samples were profiled on cDNA microarrays using the embryonic day (E)17.5 whole mouse reference RNA identical to that used previously  to examine embryonic mouse colon gene expression dynamics from E13.5 to E18.5, during which time the primitive, undifferentiated, pseudo-stratified colonic endoderm becomes a differentiated, single-layered epithelium. This strategy allowed us to construct a gene expression database of mouse colon tumors in which gene expression levels of the tumors could be referenced, ranked, and statistically compared to an average value among the tumors or to embryonic or adult colon gene expression levels on a per-gene basis. First, we compared the four models with each other, then to mouse colon development, and finally to human CRCs using gene ortholog mapping (Figure (Figure11).
To discover gene expression programs underlying differences between etiologically distinct mouse models of CRC, gene expression level values for each transcript in each tumor sample was set to its ratio relative to its median across the series of tumor models. Using non-parametric statistical analyses, 1,798 cDNA transcripts were identified as differentially expressed among the four mouse models of CRC. Five major gene patterns were identified using K-means clustering (clusters C1-C5; Figure Figure2a,2a, top). Genes belonging to these clusters were strongly associated with annotated gene function categories (see Table Table11 for detailed biological descriptions and associations). For example, cluster C1, composed of transcripts that exhibited lower expression in Smad3-/- tumors and higher expression in AOM, ApcMin/+ and Tgfb1-/-; Rag2-/- tumors, contains 391 transcripts, including Cdk4, Ctnnb1, Myc, Ezh2, Mcm2 and Tcf3. Gene list over-representation analysis using Ingenuity Pathway Analysis applications demonstrated highly significant associations to cell cycle progression, replication, post-transcriptional control and cancer. Similarly, cluster C2, composed of 663 transcripts that exhibited high expression in AOM and ApcMin/+ tumors, but low in Smad3-/- and Tgfb1-/-; Rag2-/- tumors, included transcripts for contact growth inhibition (Metap1, Pcyox1), mitosis (Mif, Pik1), cell cycle progression and checkpoint control (Id2, Ptp4A2, Tp53).
From the 1,798 transcripts differentially expressed among the four mouse models of CRC, more than 70% (n = 1265) distinguished ApcMin/+ and AOM tumors versus Smad3-/- and Tgfb1-/-; Rag2-/- tumors (Figure (Figure2a,2a, bottom). If a random or equivalent degree of variance occurred among all classes, there would be far less overlap. The majority of this signature (approximately 75%, n = 904 features) derived from genes over-expressed in ApcMin/+ and AOM tumors relative to the Smad3-/- and Tgfb1-/-; Rag2-/- tumors (cluster C6). Cluster C6 was functionally enriched for genes linked to canonical WNT signaling (Table (Table1).1). These included genes previously identified to be part of this pathway (Cd44, Myc, Stra6, Tcf1, Tcf4 , Id2, Lef1, Nkd1, Nlk, Twist1 , Catnb, Csnk1a1, Csnk1d, Csnk1e, Plat, Wif1) as well as genes that appear to be novel canonical WNT signaling targets (for example, Cryl1, Expi, Ifitm3l, Pacsin2, Sox4 , Ets2, Hnrnpg, Hnrpa1, Id3, Kpnb3, Pais, Pcna, Ranbp11, Rbbp4, Yes , Hdac2 ). Moreover, consistent with the over-expression of Myc in tumors from the ApcMin/+ and AOM models, we detected enrichment of Myc targets, such as Apex, Eef1d, Eif2a, Eif4e, Hsp90, Mif, Mitf, Npm1 , and the repression of Nibam .
To establish a molecular basis for over-expression of canonical WNT target genes in ApcMin/+ and AOM tumors, we used immunohistochemistry to characterize the relative cellular distribution of β-catenin. Tumors from ApcMin/+ (Figure (Figure2b,2b, bottom left panel) and AOM (not shown) mice exhibited strong nuclear β-catenin immunoreactivity and reduced membrane staining (see inset), whereas tumors from Smad3-/- (Figure (Figure2b,2b, bottom right panel) and Tgfb1-/-; Rag2-/- (not shown) mice showed strong plasma membrane β-catenin staining with no nuclear accumulation (see inset). Additional tests to confirm the microarray results were also carried out using an independent set of C57BL/6 ApcMin/+ colon tumor samples analyzed by quantitative real-time PCR (qRT-PCR; Figure Figure3a)3a) and immunohistochemistry (Figure (Figure3b).3b). All expression patterns identified via microarray analysis were consistent with the qRT-PCR results (n = 9 transcripts, chosen for their demonstration of a range of differential expression characteristics). In situ hybridization analyses using C57BL/6 ApcMin/+ colon tumor samples also validated that Wif, Tesc, Spock2 and Casp6 were strongly expressed in dysplastic cells of the tumors (data not shown). At the protein level, immunohistochemical analyses confirmed relatively greater expression of the oncoprotein stathmin 1 in ApcMin/+ mice and tyrosine phosphatase 4a2 in Smad3-/- mice (Figure (Figure3b3b).
Overall, cluster C6 genes (that is, genes with greater up-regulation in tumors from ApcMin/+ and AOM models than in Smad3-/- and Tgfb1-/-; Rag2-/-) were consistent with increased tumor cell proliferation (for example, Myc, Pcna), cytokinesis (for example, Amot, Cxcl5), chromatin remodeling (for example, Ets2, Hdac2, Set) as well as cell cycle progression and mitosis (for example, Cdk1, Cdk4, Cul1, Plk1). It is important to note that Myc is up-regulated in all four mouse tumor models relative to normal colon tissue (see below). Biological processes showing increased transcription in tumors from the Smad3-/- and Tgfb1-/-; Rag2-/- models (cluster C7) included immune and defense responses (for example, Il18, Irf1, Myd88), endocytosis (for example, Lrp1, Ldlr, Rac1), transport (for example, Abca3, Slc22a5, Slc30a4), and oxidoreductase activity (for example, Gcdh, Prdx6, Xdh) (Table (Table1).1). Taken together, these transcriptional observations are both consistent with and extend our understanding of the histological features of the CRC models . For example, while ApcMin/+ and AOM tumors are characterized by cytologic atypia (that is, nuclear crowding, hyperchromasia, increased nucleus-to-cytoplasm ratios and minimal inflammation), tumors from Smad3-/- and Tgfb1-/-; Rag2-/- mice show less overt dysplastic changes but exhibit a significant inflammatory component.
We hypothesized that comparisons of genes over-expressed in both colon tumors and embryonic mouse colon could provide valuable insights into tumor programs important for fundamental aspects of tumor growth and regulation of differentiation. To identify genes and observe regulatory patterns that were shared or differed between colon tumors and embryonic development, we applied a global quantitative referencing strategy to both tumor and embryonic samples by calculating the relative expression of each gene as the ratio of its expression in any sample as that relative to its mean level in adult colon. From this adult baseline reference, genes over-expressed in the four mouse tumor models appeared strikingly similar. Moreover, the vast majority of genes over-expressed in tumors were also over-expressed in embryonic colon (Figure (Figure4a).4a). If the fraction of fetal over-expressed genes from the entire microarray (5,796 of 20,393 features; 28.4%) was maintained at a similar occurrence frequency in the tumor over-expressed fraction (8,804 of 20,393), one would expect an overlap of 2,502 transcripts ((8,804/20,393) × 28.4%). Rather, 4,693 out of the 5,796 fetal over-expressed transcripts were observed to be over-expressed in the 8,804 tumor over-expressed genes (Figure (Figure4b).4b). The probability calculated by Fisher's exact test is p < 1-300, and thus represents highly significant over-representation of fetal genes among the tumor over-expressed genes. Similarly, genes under-expressed in developing colon were disproportionately underexpressed in tumors relative to normal adult colon (3,282 of 3,541; p < 1-300). Combining these results, approximately 85% of the developmentally regulated transcripts (7,975 out of 9,337 features) were recapitulated in tumor expression patterns relative to adult colon (Figure 4a,b, green and red markers represent the corresponding 7,975 features).
To explore the potential biological significance of genes over-expressed in both embryonic colon development and mouse tumors, we used K-means clustering to generate C8-C10 cluster patterns as shown in a hierarchical tree heatmap (Figure (Figure4c;4c; Table Table2).2). Several sub-patterns were evident, some of which clearly separated ApcMin/+ and AOM from Smad3-/- and Tgfb1-/-; Rag2-/- tumors. One strong cluster, cluster C8, consisted of genes more strongly expressed in ApcMin/+ and AOM than Smad3-/- and Tgfb1-/-; Rag2-/- tumors. This group of genes represented a large fraction of all differences found between nuclear β-catenin-positive (ApcMin/+ and AOM) and negative (Smad3-/- and Tgfb1-/-; Rag2-/-) tumors (approximately 45%; 1,636 out of 3,592 features), as well as differences detected between early (that is, E13.5-E15.5, ED) and late (E.16.5-E18.5, LD) embryonic colon developmental stages. Thus, the fraction of developmentally regulated genes that are more characteristic of the earlier stages of normal colon development (E13.5-E15.5), are clearly expressed at higher levels in nuclear β-catenin-positive tumors. This observation is illustrated by 750 transcripts selected solely for stronger expression in ED versus LD (Figure (Figure4d).4d). Note that most of these transcripts overlap with cluster C6 containing 230 features (Figure (Figure2a,2a, lower panel) and illustrate the tendency of the earlier-expressed developmental genes to be more strongly expressed in ApcMin/+ and AOM mice. In addition, transcripts associated with increased differentiation and maturation, observed at later stages of colon development E16.5-E18.5 (for example, Klf4 , Crohn's disease-related Slc22a5/Octn2 , Slc30a4/Znt4 , Sst ), were expressed at higher levels by tumors from Smad3-/- and Tgfb1-/-; Rag2-/- mice.
Since mouse tumors recapitulated developmental signatures irrespective of their etiology, we asked whether a similar commitment to embryonic gene programming was shared by sporadic human CRCs. Tumor classification by microarray profiling is usually accomplished by referencing relative gene expression levels to the median value for each gene across a series of tumor samples. Using this 'between-tumors median normalization' approach, as well as a gene filtering strategy that detects significantly regulated genes in at least 10% of the cases, led to the identification of a set of 3,285 probe sets corresponding to transcripts whose expression was highly varied between independent human tumor cases. As shown in Figure Figure5,5, there was striking heterogeneity of gene expression among 100 human CRCs. For example, cluster 15 contained a set of genes (principally metallothionein genes) recently identified to be predictive of microsatellite instability [25,26]. This analysis indicates that human CRCs have a greater level of complexity than the mouse colon tumors studied here (compare Figures Figures22 and and5).5). There was no correlation between these distinguishing clusters and the stage of the tumor (note the broad overlapping distributions of Dukes stages A-D across these different clusters). However, as shown in Table Table3,3, gene ontology and network analysis of the individual gene clusters (clusters C11-C17) that were differentially active in subgroups of the tumors, map to genes highly associated with a diverse set of biological functions, including lipid metabolism, digestive tract development and function, immune response and cancer
To evaluate if similar sets of genes are systematically activated or repressed in human CRC, as in the mouse colon tumors, we undertook two procedures to align the data. First, gene expression values for the mouse and human tumors were separately normalized and referenced relative to their respective normal adult colon controls; second, mouse and human gene identifiers were reduced to a single ortholog gene identifier. The latter is a somewhat complex procedure that requires identifying microarray probes from each platform that can be mapped to a single gene ortholog and undertaking a procedure to aggregate redundant probes within a platform (see Materials and methods). This approach allowed the identification of 8,621 gene transcripts on the HG-U133 plus2 and Vanderbilt NIA 20 K cDNA arrays for which relative expression values could be mapped for nearly all mouse and human samples. A clustering-based assessment of expression across the whole mouse-human ortholog gene set identified a large number of transcripts behaving similarly across colon tumors, many irrespective, but some respective of species. Notably, the great majority of genes over-expressed in all tumors were also over-expressed during colon development (Figure (Figure6a).6a). To evaluate the statistical significance of this pattern, we used a Venn overlap filtering strategy and Fisher's exact test analysis. Approximately 50% of the 2,212 ortholog genes over-expressed in at least 10% of the human cancers relative to adult colon were also over-expressed in developing colon. If there was not a selection for developmental genes among those over-expressed in tumors, the expected overlap would be (2,718/8,621) × 2,212 = 697 transcripts. Using Fisher's exact test for the significance of the increased overlap of 1,080 versus 697 transcripts is p < 1e-300. Similarly, genes under-expressed in mouse colon development and human CRCs also strongly overlapped (Figure (Figure6b;6b; 431 of 737, p < 1e-76). This result is significantly greater than the 8-19% of genes that were estimated to be over-expressed in human colon tumors and fetal gut morphogenesis based upon a computational extrapolation of SAGE data . Thus, our findings not only confirm but also significantly expand and experimentally validate the previously suggested recapitulation of embryonic signatures by human CRCs.
All overlaps between tumor expression and development were pooled to form a set of 2,116 ortholog gene transcripts. This was subjected to hierarchical tree and K-means clustering to define six expression clusters, C18-C23 (Figure (Figure6c;6c; Table Table4).4). These clusters provide an impressive partitioning of groups of genes associated with different biological functions critical for colon development, maturation and oncogenesis. Cluster C22 (860 transcripts of genes strongly expressed both developmentally and across all tumors) is highly enriched with genes associated with cell cycle progression, replication, cancer, tumor morphology and cellular movement. Cluster C18 (258 transcripts down-regulated in mouse and human tumors, as well as in development) is highly enriched in genes associated with digestive tract function, biochemical and lipid metabolism. This cluster is clearly composed of genes associated with the mature GI tract. Thus, as opposed to recapitulating developmental gene activation, the cluster C18 pattern indicates a corresponding arrest of differentiation in both mouse and human tumors. Cluster C23 (142 transcripts over-expressed in all mouse models and human CRC, but with low expression in development) maps to genes highly associated with the disruption of basement membranes, invasion and cell cycle progression, as well as altered transcriptional control. Cluster C21 (313 transcripts in which human tumors somewhat variably express a set of genes that are rarely expressed by the mouse tumors) is remarkable for its composition of genes associated with cell cycle proliferation, tissue disruption and angiogenesis. Thus, while categorically quite similar to cluster C23, the genes in cluster C21 represent a separately regulated module that is enriched for genes associated with invasion. Clusters C21 and C23 reveal sets of genes likely involved in tumor progression. Cluster C22 (with genes over-expressed in all mouse and human tumors and strongly expressed in embryonic colon) represents a group of genes highly correlated with transformation. The top-ranked transcription factor present in this cluster, with regulation independent of β-catenin localization, is Myc/MYC (Figure (Figure7b).7b). Although Myc was lower in expression in the Smad3-/- tumors compared to tumors from the other three models, it was elevated in all four models relative to normal adult colon. Myc/MYC was over-expressed in all mouse and human tumors as well as in development. This contrasts with Sox4, which is unaltered in expression in the Smad3-/- and Tgfb1-/-; Rag2-/- tumors but is up-regulated in AOM and ApcMin/+ tumors relative to normal adult colon (Figure (Figure7b).7b). Myc/MYC over-expression may be independent of nuclear β-catenin status. Increased Myc/MYC expression may reflect both activation of canonical Wnt signaling, as it is a target of nuclear β-catenin/TCF , and deregulation of TGFβ signaling, as TGFβ1 is known to repress Myc/MYC [29-31]. These observations suggest a fundamental role for Myc/MYC in colonic neoplasia.
Numerous mouse models of intestinal neoplasia have been developed, each with unique characteristics. The models constructed to date, however, do not fully represent the complexity of human CRCs principally because most are unigenic in origin and produce primarily adenomas and early stage cancers. Although models like ApcMin/+ show molecular similarities to human CRCs, such as initiation of adenoma formation by inactivation of Apc, little is known about the molecular similarities of tumors from the different mouse models. It is also unknown how such common and perhaps large-scale molecular changes in mouse models relate to the molecular programming of human CRC. To shed light on the underlying molecular changes in tumors from mouse models and human CRC, we assessed the relationship at the molecular level of four widely used, but genetically distinct, mouse models that develop colon tumors. A subsequent analysis of the models in the context of embryonic mouse colon development was also undertaken. Finally, to identify consensus species-independent cancer signatures that may define gene expression changes common to all CRCs, we projected relevant mouse model signatures onto a large set of human primary CRCs of varied histopathology and stage.
Tumors from mouse models of CRC exhibit significant phenotypic diversity , and, therefore, were expected to exhibit differential gene expression patterns. Using a combination of inter-model and normal adult gene expression level referencing, our analysis of tumors from mouse models of CRC has revealed a low complexity between models and strains, and has identified common and unique transcriptional patterns associated with a variety of biological processes and pathway-associated activities. Our results demonstrate an imbalance between proliferation and differentiation, with nuclear β-catenin-positive tumors being more proliferative, less differentiated and with lower immunogenic characteristics than tumors from nuclear β-catenin-negative tumors. Mouse tumors characterized by signatures of relative up-regulation of genes associated with cell cycle progression also showed increased canonical WNT signaling activity (ApcMin/+ and AOM). Tumors from mouse models not showing canonical WNT signaling pathway activation (Smad3-/- and Tgfb1-/-; Rag2-/-) were characterized by up-regulation of genes associated with inflammatory and innate immunological responses, and intestinal epithelial cell differentiation. Recent studies have indicated that chronic inflammation caused either by infection with Helicobacter pylori  or Helicobacter hepaticus  is a prerequisite for intestinal tumor development in Smad3-/- and Tgfb1-/-; Rag2-/- mice, respectively.
The activation of canonical WNT signaling in AOM tumors was identified using a between-tumor global median normalization to gene expression data. However, when tumor sample expression was referenced to that of normal adult intestinal tissue, many more genes are up-regulated, including developmental genes that are not dependent on nuclear β-catenin. That canonical WNT signaling-related genes are altered similarly in both AOM and ApcMin/+ tumors suggests biological similarities between the two models. In addition, the relatively consistent programming within the AOM model also emphasizes its value for examining the more complicated genetics that result in strain-specific sensitivity to environmental agents that induce cancer.
Activation of canonical WNT signaling leads to nuclear translocation of β-catenin and, through its interaction with LEF/TCF, the regulation of genes relevant to embryonic development and proliferation , as well as stem cell self-renewal . Consequently, the activated canonical WNT signaling observed in ApcMin/+ and AOM models suggests that tumors may arise as a consequence of proliferation of the stem cell or 'transient amplifying' compartment. In the colonic crypt, loss of TCF4  or DKK1 over-expression  promotes loss of stem cells, suggesting that canonical WNT signaling is required for the maintenance of the intestinal stem cell compartment [34-36]. Conversely, increased nuclear β-catenin/TCF4 activity imposes a crypt progenitor phenotype on tumor cells . In this study, we identified transcriptional activation of the canonical WNT signaling pathway in tumors from ApcMin/+ and AOM mice. This was confirmed by immunohistochemistry (Figure (Figure2b2b).
In colon tumors and perhaps intestinal stem cells, activation of canonical WNT signaling promotes a hyperproliferative state. Proliferation-related characteristics of nuclear β-catenin-positive tumors include increased expression of CCND1, MYC, PCNA , and Sox4 . These genes were also identified as a component of our nuclear-β-catenin-positive signatures. In turn, increased MYC decreases intestinal cell differentiation by binding to and repressing the Cdkn1a (coding for p21CIP1/WAF1) promoter , the gene encoding Wnt-inhibitory factor Wif1, the gene encoding the negative regulator of WNT Naked1 , and the gene encoding the Tak1/Nemo-like kinase, Nlk . Wif1 displays a graded expression in colonic tissue, with higher expression in the stem cell compartments and lower expression in the more differentiated cells at the luminal surface, suggesting that Wif1 may contribute to stem cell pool maintenance independent of WNT signaling inhibition. .
Canonical WNT signaling not only governs intestinal cell proliferation, but also cell differentiation and cell positioning along the crypt-lumen axis of epithelial differentiation. Increased canonical WNT signaling activity enhances MATH1-mediated amplification of the gut secretory lineages . Canonical WNT signaling also influences cell positioning by regulating the gradient of EPHB2/EPHB3 and EPHB1 ligand expression [42,43]. Together, our data suggest a complex imbalance of crypt homeostasis due to enhanced canonical WNT activity.
Our results indicate that tumors arising in response to abnormal TGFβ1/SMAD signaling [14,44] are similar to one another in their specific gene signatures and broadly distinct from those with activated canonical WNT signaling by their absence of nuclear β-catenin. Unique to the dysregulated TGFβ1/SMAD4 signaling models is the strong signature of an immunologically altered state, with up-regulation of genes determining immune and defense responses, such as Il18, Irf1 and mucin pathway-associated genes. Again, these tumors are usually characterized by a strong inflammatory component when evaluated histopathologically, even in the absence of T- and B-cells such as in the Tgfb1-/-; Rag2-/- background.
As shown in Figure Figure2a,2a, the microarray patterns of gene expression for AOM and ApcMin/+ tumors are mirror images of those for Tgfb1-/-; Rag2-/- tumors. It is perhaps not surprising that combining these two transcriptional programs results in increased number and invasiveness of colonic tumors as recently reported for ApcMin/+ mice crossed to Smad3-/- mice . Moreover, combined activation of canonical WNT signaling and inhibition of TGFβ signaling also results in more advanced intestinal tumors in Apcdelta716/+; Smad4+/- mice , and intestine-specific deletion of the type II TGFβ receptor in Apc1638N/wt mice .
The findings that shared over-expressed signatures are identifiable in all four mouse models of CRC, which are also representative of the majority of embryonic colonic over-expressed signatures, and that these signatures are also present in all human CRCs, suggest that colon tumors may arise independently of canonical WNT signaling status. A likely candidate to impart this oncogenic signaling is Myc, which is an embryonic up-regulated transcript that is also upregulated in all human CRCs and mouse tumor models independently of nuclear β-catenin status.
It has long been suggested that cancer represents a reversion to an embryonic state, partly based upon the observation that several oncofetal antigens are diagnostic for some tumors [48,49]. To assess the embryology-related aspects of tumorigenesis and tumor progression in CRC, we analyzed and compared the transcriptomes of normal mouse colon development and models of CRC. Our data show that developmentally regulated genes represent approximately 56% of mouse tumor signatures, and that the tumor signatures from the four mouse models recapitulate approximately 85% of developmentally regulated genes.
There are at least two regulatory programs that determine the expression of developmental genes by mouse tumors (Figures (Figures2,2, ,4,4, and and8).8). The simpler program is evident by the over-expression of the earliest genes of colon development by the nuclear β-catenin-positive models. The more subtle program could be detected only in reference to adult colon and is highly shared by nuclear β-catenin-negative models. This program, though modified by nuclear β-catenin status, is represented by a large scale over-expression of developmentally expressed genes in tumors that are both positive and negative for canonical WNT signaling. Genes found within this signature have a large overlap with those present in the colon at later developmental stages (E16.5-E18.5).
How do genes tightly regulated during mouse colon development become activated in colon tumors? While activated canonical WNT signaling imparts a strong influence, its absence in Tgfb1-/-; Rag2-/- and Smad3-/- tumors, as determined by the absence of nuclear β-catenin, did not prevent the large scale activation of developmental/embryonic gene expression. One mechanism may be through epigenetic alterations. In human CRCs, these types of alterations in gene expression programs  suggest a link between cellular homeostasis and tumorigenesis. The recruitment of histone acetyltransferases and histone deacetylases (HDACs) are key steps in the regulation of cell proliferation and differentiation during normal development and carcinogenesis . Induction of Hdac2 expression occurs in 82% of human CRCs as well as in tumors from ApcMin/+ mice . Alternatively, common regulatory controls may operate in parallel growth and differentiation/anti-diifferentiation pathways such that a single or small subset of regulators, such as MYC or one or more micro RNAs, may be responsible for the control of multiple pathways. Indeed, consistent with our observation of nuclear β-catenin-independent activation of Myc in all mouse models and across the board for human CRC, deletion of Myc has recently been demonstrated to completely abrogate nuclear β-catenin-driven small bowel oncogenesis in mouse models .
As shown in Figure Figure5,5, considerable and intriguing heterogeneity of human CRC is observed among genes highly relevant for differential malignant behavior. However, employing between-tumors normalization and referencing strategies prevents the detection of gene expression patterns that are shared between tumors. Using the adult normal colon as a reference, as shown in Figure Figure6,6, a large fraction of differential gene expression relative to adult colon could be demonstrated that recapitulated developmental gene expression by virtue of both activating embryonic colon gene expression and failing to express genes associated with normal colon maturation. Within these developmentally regulated gene sets, our analyses revealed little evidence of CRC subsets, including those suggestive of nuclear β-catenin negative tumors that might approximate the Smad3-/- and Tgfb1-/-; Rag2-/- signature. Our inability to identify distinct subclasses with respect to developmental genes in the human CRCs is perhaps not surprising in that over 80% of microsatellite-unstable (MSI+) CRCs from HNPCC families exhibit nuclear β-catenin . In addition, within the developmental genes, little evidence was apparent for signatures related to MSI+ tumors, often associated with HNPCC, although some of this type of signature was perhaps apparent in the median normalized depiction of the tumors as highlighted in Figure Figure55.
This report constitutes a comprehensive molecular evaluation and comparison of mouse and human colon tumor gene expression profiles. We have greatly improved our ability to compare tumor gene expression profiles between mouse and human tumors by using a referencing strategy in which gene expression levels in the tumor samples are analyzed in relation to gene expression in corresponding normal colon epithelium. This approach has revealed that gene expression patterns are both shared and distinct between mouse models and human CRCs. Although several recent studies have suggested that tumors recapitulate embryonic gene expression [16,27,54,55], the present study demonstrates the magnitude of this similarity.
Finally, our results suggest that comparisons made between mouse tumor models, developing embryonic tissues, and human CRCs provides a powerful biological framework from which to observe shared and unique genetic programs associated with human cancer. While ortholog-gene based analyses have been used previously to obtain direct comparison of the molecular features of mouse and human hepatocellular carcinomas , our results provide striking support for the hypothesis that cancer represents a subversion of normal embryonic development. By inclusion of detailed mouse embryonic and developmental profile information, our results have revealed critical similarities and differences between the mouse and human tumors that are particularly revealing of oncogenic and tumor suppressor programs, some genes from which should be useful for development of diagnostic biomarkers and identification of therapeutic targets and pathways.
All tumors were isolated as spontaneously occurring lesions in ApcMin/+ , Smad3-/- , and Tgfb1-/-; Rag2-/-, collected at three-to-nine months of age depending on the model (for a review, see ). The only exceptions were two ApcMin/+ tumors, UW_3_2778 and UW_6_2748, that were 13 and 14 months and the three Tgfb1-/-; Rag2-/- tumors, all five of which had histological features of locally invasive carcinoma . Three- to four-month old mice from various AXB recombinant inbred lines were treated with AOM doses chosen for enhancement of inter-strain differences in susceptibility . Mice were given four weekly i.p. injections of 10 mg AOM per kg body weight, and tumors were collected six months after the first injection. Animals were euthanized with CO2, colons removed, flushed with 1× phosphate-buffered saline (PBS), and laid out on Whatman 3 MM paper. A summary of the mouse strains, mutant alleles and source laboratories is presented in Table Table5.5. All tumors were obtained from the colon only, the particular segment of which is indicated in the Gene Expression Omnibus (GEO) database  reposited sample information (GSE5261). The majority of Tgfb1-/-; Rag2-/- and Smad3-/- tumors occur in the cecum and proximal colon and all samples isolated for characterization were obtained from there. In contrast, tumors isolated from ApcMin/+ and AOM mice occurred predominantly in the mid- and distal colon. A small portion of the tumor was placed in formalin for histology, with the remainder finely dissected into RNAlater (Ambion Inc., Austin, TX, USA) and stored at -20°C. Normal adult colon RNA for reference was obtained from whole colon samples harvested from ten eight-week-old C57BL/6 male mice. The tissue was lysed in Trizol Reagent (Invitrogen Systems Inc., Carlsbad, CA, USA) and homogenized. Total RNA was purified using a Qiagen kit (USA-Qiagen Inc., Valencia, CA, USA).
Sample collection protocol and analyses at the H Lee Moffitt Cancer Center and Research Institute have been described previously . Information collected with the samples for this study includes solid tumor staging criteria for tumor, nodes, and metastases (TNM), Dukes staging/presentation criteria, pathological diagnosis, and differentiation criteria.
All RNA samples were purified using Trizol Reagent from finely dissected tumors and were subjected to quality control screening using the Agilent BioAnalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA).
Mouse tumors were analyzed on Vanderbilt University Microarray Core (VUMC)-printed 20 K mouse cDNA arrays, composed principally of PCR products derived from three sources: the 15 K National Institute of Aging mouse cDNA library; the Research Genetics mouse 5 K set; and an additional set of cDNAs mapped to RefSeq transcripts. Labeling, hybridization, scanning, and quantitative evaluation of these two-color channel arrays were performed according to VUMC protocols  using a whole mouse Universal Reference standard (E17.5 whole fetal mouse RNA). Arrays were analyzed by GenePix version 3.0 (MDS Inc., Sunnyvale, CA, USA), flagged and filtered for unreliable measurements, with dye channel ratios corrected using Lowess and dye-specific correction normalization as previously described .
Human RNA samples were labeled for hybridization to Affymetrix HG-U133plus2 microarrays using the Affymetrix-recommended standard labeling protocol (Small-scale labeling protocol version 2.0 with 0.5 μg of total RNA; Affymetrix Technical Bulletin). Microarrays were scanned with MicroarraySuite version 5.0 to generate 'CEL' files that were processed using the RMA algorithm as implemented by Bioconductor .
The four different mouse models of CRC were compared for model-specific differences, then compared to mouse colon development stages, and then to human CRC samples (Figure (Figure1).1). The mouse tumor sample array data are composed of Lowess-normalized Cy3:Cy5 labeling ratios of each individual tumor sample versus a universal E17.5 whole fetal mouse reference RNA (described using MIAME guidelines in the NCBI GEO database under series accession number GSE5261). The first approach to referencing was to compare normalized ratios across the tumor series. To do this, for each gene, the Lowess-corrected ratio for each probe element (sample versus E17.5 whole fetal mouse reference) was divided by the median ratio for that probe across the entire tumor sample series. This is termed the median-per-tumor expression ratio and was useful for identifying, clustering and visualizing differences that occur between the different tumor samples. Since we previously collected mouse expression data for normal E13.5-E18.5 colon samples from inbred C57BL/6J and outbred CD-1 mice  using the identical E17.5 whole fetal mouse reference, this allowed us to combine the data directly. Differential expression profiles in the tumors were combined with relative developmental gene expression levels by direct comparisons of ratios determined within each experimental series. Initial comparisons were made between median normalized tumor data to gene expression levels observed in the E13.5-E18.5 and adult (eight week post-natal) colon samples, which were referenced to either E13.5 samples or to the adult colon. The latter approach subsequently allowed for the broadest comparison of mouse and human data using gene ortholog mapping. Correlated phenomena could be observed from any of the different referencing strategies.
Pairs of human and mouse ortholog genes (12,693) were curated using the Mouse Genome Informatics (MGI; The Jackson Laboratory)  and National Center for Biotechnology Information (NCBI) Homologene  databases. Individual microarray elements or features were mapped to these. The concatenated human and mouse RefSeq IDs was used as the composite ID for the orthologous gene pair in the ortholog genome definition. NIA/Research Genetics mouse cDNAs were mapped to human orthologs using a variety of resources, usually via the Stanford Online Universal Reference resource . Gene transcript assignments were made unique by choosing the longest corresponding transcript. To map the Affymetrix human and mouse array data into the ortholog genome, we used a sequence matching approach. First, we obtained human and mouse transcript sequences from RefSeq  and probe sequences from the manufacturer's website . Next, we computed all perfect probe-transcript pairs. We excluded probes that matched multiple gene symbols but accepted probes that matched multiple transcripts. Probe sets were assigned to represent a given transcript if at least 50% of the perfect match probes of the probe set matched to that transcript. The newly assigned transcript identifiers were then used to map probe sets to ortholog genes. Since some transcripts have multiple probe-set representations on both the Affymetrix and cDNA microarrays to one ortholog identifier, we employed an ad hoc strategy to use the average of those probe sets or cDNAs that exhibited consistent regulation across a sample series. In such cases, the signals of the regulated probe sets that were interpreted as being in agreement were averaged and assigned to the corresponding ortholog. We excluded probe sets or cDNAs that we were aware corresponded to non-transcript genomic sequence as tested using BLAT at the UCSC Goldenpath website .
Mouse-human RefSeq gene ortholog assignments can be found at GenomeTrafac [67,68]. All ortholog assignments and cross-species mapping annotations were incorporated into annotations associated with the Affymetrix HG-U133 plus2.0 genome. Gene expression ratios obtained for the mouse samples were then represented as expression values within the human platform for all of the probe sets that mapped to the corresponding mouse gene ortholog. Data for the primary human sample series, as well as the combined mouse-human data sets, are available in the Cincinnati Children's Hospital Medical Center microarray data server  in the HG-U133 genome under the KaiserEtAl_2006 folders ('guest' login; all cross-platform ortholog gene identifiers are contained as annotation fields within the HG-U133 genome table).
Most normalization, expression-level referencing, statistical comparisons, and data visualization were performed using GeneSpring v7.0 (Silicon Genetics-Agilent (part of Agilent Technologies). Fisher's exact test was performed online at the MATFORSK Fisher's Exact Test server . To identify differentially expressed features between two or more classes, we applied GeneSpring's Wilcoxon-Mann-Whitney or the Kruskal-Wallis test, respectively. For three or more classes, the initial non-parametric test was followed by the Student-Newman-Keuls post-hoc test. Results from the primary analyses were corrected for multiple testing effects by applying Benjamini and Hochberg false discovery rate (FDR) correction . In general, due to the referencing strategies, good platform technical performances, and moderately low within-group biological variation of gene expression, stringent cutoffs could be used, that is, the FDR level of significance was set between FDR < 5.10-5 and FDR < 5.10-4. K-means clustering was performed using the GeneSpring K-means tool and the Pearson correlation similarity measure.
Gene expression clusters were analyzed for the occurrence of multiple genes involved in related gene function categories by comparing each list of coordinately regulated clustered genes to categories within Gene Ontology, pathways, or literature-based gene associations using GATACA , Ontoexpress , and Ingenuity Pathway Analysis, version 3 (IPA, Ingenuity Systems, Redwood City, CA, USA) . To do this, each cluster indicated in Figures Figures2,2, ,4,4, ,5,5, ,66 and and88 was converted to a list of gene identifiers, uploaded to the application, and examined for over-representation of multiple genes from one or more molecular networks, or functional or disease associations as developed from literature mining. Networks of these focus genes were algorithmically generated based on the relationships of individual genes as derived from literature review and used to identify the biological functions and/or associated pathological processes most significant for each gene cluster. Fisher's exact test was used to calculate a p value estimating the probability that a particular functional classification or category of genes is associated with a particular pattern or cluster of gene expression more than would be expected by chance. For each cluster, only the top significant functional classes and canonical pathways are shown. Figure Figure7a7a shows a diagram of the canonical WNT signaling pathway and an associated-gene network that was a top-ranked association of the clusters that exhibited significant over-expression in AOM and ApcMin/+ versus Smad3-/- and Tgfb1-/-; Rag2-/- mouse models. Genes or gene products are represented as nodes, and biological relationships between nodes are represented as edges (lines). All edges are supported by at least one literature reference from a manuscript, or from canonical information stored in the Ingenuity Pathways Knowledge Base.
To confirm the validity of data normalization and referencing procedures as well as the cDNA gene assignments of the printed arrays used in the microarray analyses, we used qRT-PCR to measure relative levels of nine genes found by microarray data analysis to be differentially expressed (FDR < 5.10-5) in tumors from ApcMin/+ and Smad3-/- mice. Total RNAs from C57BL6 ApcMin/+ and 129 Smad3-/- tumor samples (20 μg) were reverse-transcribed to cDNA using the High Capacity cDNA Archive Kit (oligo-dT primed; Applied Biosystems, Foster City, CA, USA). qRT-PCR reactions (20 μl) were set up in 96-well MicroAmp Reaction Plates (Applied Biosystems) using 10 ng of cDNA template in Taqman Universal PCR Master Mix and 6-FAM-labeled Assays-on-Demand primer-probe sets (Applied Biosystems). Reactions were run on an MX3000P (Stratagene, a division of Agilent Technologies) with integrated analysis software. Threshold cycle numbers (Ct) were determined for each target gene using an algorithm that assigns a fluorescence baseline based on measurements prior to exponential amplification. Relative gene expression levels were calculated using the ΔΔCt method , with the Gusb gene as a control. Fold-change was determined relative to expression in normal adult colon from two C57BL/6J mice.
Immunohistochemical procedures were performed as described . ApcMin/+ and Smad3-/- colon tumors were rapidly dissected, fixed in 4% paraformaldehyde, and embedded in paraffin before cutting 10 μm thick sections. Antigen retrieval was performed by boiling for 20 minutes in citrate buffer, pH 6.0. Sections were treated with 0.3% hydrogen peroxide in PBS for 30 minutes, washed in PBS, blocked in PBS plus 3% goat serum and 0.1% Triton X-100, and then incubated with primary antibodies and HRP-conjugated goat anti-rabbit secondary antibody (Sigma, St Louis, MO, USA). Antigen-antibody complexes were detected with a DAB peroxidase substrate kit (Vector Laboratories, Burlingame, CA, USA) according to the manufacturer's protocol.
This study was supported by grants from GI SPOREs P50 CA95103 (RJC) and P50 CA106991 (DWT), R01 CA079869 (DWT), R01 CA046413 (RJC), R01 CA063507 (JG), R37 CA63677 (WFD), and the NCI-sponsored Mouse Models of Human Cancer Consortium (MMHCC) with U01 CA84239 (RJC), U01 CA84227 (WFD), U01 CA98013 (JG), U01 CA105417 (DWT), R24 DK 064403 (BJA), T32 HL07382-28 (WJ), The authors thank Susan Kasper and Ritwick Ghosh for performing stathmin immunostaining. The authors also acknowledge The State of Ohio Biotechnology Research and Technology Transfer Partnership award for sponsorship of the Gastrointestinal Cancer Consortium (TD, JG) and the Center for Computational Medicine (BJA). Additional sponsorship for important collaborative interactions and strategy development among the investigators from each of the participating research groups was provided under the auspices of 'Colon Cancer 2004', a conference sponsored by the AACR and the MMHCC and hosted by The Jackson Laboratory.