The analyses performed here demonstrate that each of the nine gene expression signatures have similar classification performance based on the sensitivity, specificity, negative predictive value, positive predictive value and predictive accuracy. All gene expression signatures added independent information to a multivariate model including standard pathological and clinical criteria. Although the gene expression classifiers were mostly defined to determine the risk of distant metastasis events and not the risk of death from breast cancer, we found similar results for each gene classifier when using either DMFS or BCCS as the endpoint.
The Gene expression Grade Index [
25] and the molecular prognostic index signatures (T17 and T52) [
20] were developed for ER-positive breast cancer, while the 76-gene expression classifier signature [
8] was defined for LN-negative tumors. Despite these prerequisites, we applied the nine gene expression classifiers to the same dataset without any consideration for the heterogeneity of these samples as our first goal was to compare these different classifiers when applied to the same dataset. More importantly, the generalization of subgroup-specific classifiers (for example, LN0 classifiers) across a complete cohort of breast cancer samples (both LN0 and LN1) hints at the existence of common biological processes determining the outcome. We were interested in revealing these processes. Furthermore, when evaluating the signatures in specific subgroups of patients, we showed similar behavior for each of these nine gene signatures. No signature showed strong association with survival when applied to LN-positive, ER-negative or high-grade subgroups. These results are potentially explained by the fact that these factors identify a set of intrinsically poor outcome cases; that is, they contain no good outcome cases. This emphasizes the fact that gene expression classifiers should, in our opinion, not be regarded as a tool to replace standard pathological and clinical criteria, but should instead be integrated with clinical parameters. Gene expression classifiers can be employed to improve stratification in subgroups of breast cancer patients with good prognosis, where the groups are defined based on standard pathological and clinical criteria.
Fan and colleagues [
24] showed similar performance and significant concordance between the 70-gene signature [
6,
7], the Core Serum Response signature [
14], the Genomic Health signature [
10] and the Intrinsic/UNC gene set [
15] when applied to the dataset of van de Vijver and colleagues [
7]. In contrast, we show here that agreement between gene expression signatures is low, with >50% of the samples having at least one discordant class assignment. We showed that the predictive accuracy dramatically decreases with the number of poor prognostic assignments a sample receives. The best classification performance was obtained for the group of patients with only good outcome assignments. These results immediately reveal the dilemma faced by a patient diagnosed with breast cancer, and determines consultation of a collection of signatures to predict disease outcome. The result obtained is uncertain in almost 50% of the cases. As our results were less optimistic than those of Fan and colleagues [
24], we repeated our analyses as described above but this time used the dataset of van de Vijver and colleagues [
7] and the following signatures: the 70-gene signature [
6,
7] (employing Fan and colleagues' labeling [
24]), the Core Serum Response signature [
14], the Genomic Health signature [
10], the Intrinsic/UNC gene set [
15], and the Gene expression Grade Index [
25]. The results recapitulated our earlier results. In particular, only 42% of the samples received a concordant class assignment, while the ER status, HER2 status, pathological grading and tumor size were all correlated with the number of times a sample was classified as poor outcome by the signatures. As was demonstrated earlier, the predictive accuracy decreased with the number of poor outcome assignments. Larger datasets (such as those acquired in the TAILORx and MINDACT trials [
11,
12] are required to shed more light on the cases where the signatures give discordant class assignments.
To gain more insight into the small degree of overlap between the genes comprising the different classifiers, we generated an enlarged signature for each signature. The intersection of the enlarged signatures identified a core of 72 genes significantly enriched in DNA replication, cell cycle and mitosis ontology annotations, which we consider the common background of these gene signatures. Proliferation genes are a major component of many prognostic signatures in breast carcinoma and other tumor types [
44]. Among the 72 genes we found
AURKA,
BIRC5,
CCNB1,
MKI67 and
MYBL2, which define the complete set of proliferation genes from the Genomic Health signature [
10]. The proliferation modules also contain genes frequently described as markers of proliferation in different types of cancer [
45]:
PLK1,
BUB1,
CCNA2,
CCNB1,
CCNB2,
CCNE2,
FOXM1, and
TOP2A. These genes are derived from the functional intersection of the enlarged gene expression signatures, indicating that proliferation is a major driver of the prognosis gene signatures.
The enrichment analysis of the enlarged signatures revealed 11 gene ontology modules. Identification of distinct biological processes correlated with survival or other clinicopathological features is a major step towards improving our understanding of tumor development and to providing accurate information to develop new targeted therapies. Yu and colleagues generated 500 gene signatures of ER-positive and ER-negative tumors [
46], and found the following pathways to be overrepresented in the signatures: apoptosis, proliferation, focal adhesion, RNA splicing and immunity. They emphasized that similar pathways are common to different gene signatures, whereas the individual genes defining these pathways can still have varying degrees of association with outcome.
We showed that the combination of the Immune and RNA splicing modules define a classifier that is highly accurate in predicting both DMFS and BCSS. In addition, the classifier showed an improvement in predictive accuracy when combined with commonly used clinical staging systems. This indicates that not only proliferation but also other functional processes have prognostic power. Teschendorff and colleagues recently showed that the overexpression of a seven-gene immunity module is associated with good outcome in 186 ER-negative breast cancers [
21]. No significant correlation between lymphocyte infiltration and this immunity module was found. Two of these seven genes (
XCL2 and
HLA-F) are also in our classifier. Recent clinical and experimental studies have revealed that not only cancer cell intrinsic processes, but also cancer cell extrinsic processes – including angiogenesis, remodeling of the extracellular matrix, and inflammation – are critical in determining malignant outcome. The role of the immune system during cancer progression has recently gained much attention [
47]. The reciprocal interaction between the immune system and cancer can be regarded as a double-edged sword: whereas certain interactions inhibit or prevent cancer growth, other interactions actually contribute to tumor progression. For example,
in situ analysis of tumor-infiltrating lymphocytes in human colorectal cancer samples revealed that the influx of T lymphocytes is associated with improved survival, and the immunological data were found to be a better predictor of patient survival than the histopathological methods currently used to stage colorectal cancer [
48].
The RNA splicing process is a key molecular event in the generation of protein biodiversity. Alteration of the normal process results in the production of altered mRNA or in an off-balance production of tissue-specific mRNA isoforms [
49-
51]. The main consequences of this abnormal RNA splicing process are a reduction of the normal protein level or the production of abnormal proteins. Aberrant mRNA splicing variants are found in many cancers and can interfere with major biological events such as apoptosis, cell-cycle control, adhesion, differentiation or angiogenesis. Mutations in splicing cis-acting sequences have been associated with the
BRCA1 gene in breast cancer [
52] and the
KIT oncogene in gastrointestinal stromal tumor [
53]. The RNA splicing module we identified contains several genes that are individually strongly associated with survival. More specifically,
SFRS10 is significantly overexpressed in breast cancer and might be responsible for splicing of CD44 isoforms associated with tumor progression and metastasis [
54].
SRPK1 is upregulated in breast cancer and its expression level is proportional to the tumor grade. Inhibition of
SRPK1 results in reduced phosphorylation of
MAPK3,
MAPK1 and
AKT [
55]. Targeted
SRPK1 treatment seems to be a promising way to increase apoptosis, to decrease proliferation and to enhance the sensitivity to chemotherapeutics drugs [
56].
LSM1 is located at 8p11-12 loci and is amplified in almost 20% of breast cancer cases [
57]. Streicher and colleagues showed that overexpression of
hLsm1 transforms mammary epithelial cells, and inhibition of its expression in breast cancer cells reduces anchorage-independent proliferation [
58]. Yang and colleagues similarly showed the same ability of
LSM1 to transform human mammary epithelial cells
in vitro [
57].