It has been suggested that TNBC represent a group of several molecularly [
3] and clinically [
41,
42] distinct disease subtypes. We used gene expression data of a cohort of 394 TNBC to identify molecular subsets within this tumor type. The definition of TNBC was based on gene expression data which is not the standard definition used in the clinic. This might be a caveat but holds the promise that samples erroneously characterized as receptor-negative by immunohistochemistry do not introduce noise into our analysis. We identified 16 metagenes associated with several distinct biological processes that showed variable expression across TNBC (Table ). Some of the metagenes seem to point to the distinct origins of these cancers [
43,
44]. These include the basal-like [
4], the apocrine [
18,
19], and the claudin-low [
28,
29] subtypes of TNBC. Other metagenes were related to non-neoplastic cellular constituents of the tumor microenvironment including stroma [
26,
27], blood cell [
30] and adipocytes [
4], as well as signatures for angiogenesis [
23,
34] and inflammation [
31-
33]. Five metagenes appear to reflect the variable presence of immune cells and may contribute to the clinical behavior of the cancer [
4,
20-
25,
27,
45] (Table ).
Kreike
et al. [
9] detected similar metagenes among 97 TNBC analysed with a different microarray platform. That study suggested that the TNBC clinical phenotype can be equated to the BLBC molecular class determined by the centroid method [
46] since 95% of the TNBCs were assigned
basal-like molecular class [
47]. However, the centroid method is highly susceptible to the composition of the dataset that is used to define the reference centroids [
48] and variants of the method can lead to different results [
49]. Bertucci
et al. [
50] identified only 71% of their 172 TNBC cases as
basal-like when using a slightly different version of the centroid method for molecular classification. When we applied different versions of the centroid method to 1,364 breast cancers, 65% to 90% of the TNBC samples (
n = 172) were assigned to the basal-like class depending on the method used (Additional file
2, Supplementary Table S6). In this paper we took a different approach and first identified metagenes and used these metagenes to define molecular subsets among TNBC. One of our metagenes corresponded closely to the gene signatures that are used to define BLBC in the centroid based methods. Our results indicate that BLBC defined based on the
basal-like metagene expression represent around 73% of TNBC (Table and Additional file
2, Supplementary Table S2).
The proportion of BLBC among TNBC in our study is similar to results from an immunohistochemical study by Rakha
et al. [
7] that defined BLBC by the expression of CK5/6, CK14, CK17 or EGFR. These authors observed a worse survival of the 165 patients with BLBC compared to the remaining 67 TNBC cases, which expressed none of these markers. However, we did not detect differences in the prognosis of BLBC and non-BLBC type triple negative cancers (Additional file
1, Supplementary Figure S7). In the study by Rakha
et al. the prognostic effect was mainly confined to 103 untreated patients. Still, even when we analyzed untreated patients (
n = 186) separately, we detected no prognostic value of the BLBC phenotype (not shown). Our results are also contrary to the immunohistochemical study of Cheang
et al. [
51], which used CK5/6 and EGFR antibodies for TNBC stratification. They also observed a worse prognosis of 336 BLBC TNBC compared to 303 non-BLBC TNBC. However, our study is not directly comparable to these prior reports because our definition of BLBC is fundamentally different from the IHC-based methods. Our results are in line with several other genomic profiling studies that reported limited prognostic value for the BLBC molecular class among clinically triple negative cancers [
18,
19,
50].
We observed strong prognostic value for several of the other metagenes (Additional file
2, Supplementary Table S4). An improved prognosis was observed for patients with tumors displaying high expression of immune system related metagenes which supports recent reports [
20,
23-
25,
27,
39,
40,
52,
53]. An association with decreased survival was observed for high expression of inflammation (IL-8), an angiogenesis/hypoxia signature (VEGF) [
34], and histone-related metagenes (Additional file
2, Supplementary Table S4 and Figure ). A simple combination of high B-Cell and low IL8 metagene expression identifies a subset of TNBC patients (32% of all) with a favorable prognosis and a five-year event-free survival of 84%. In multivariate analysis, only this metagene ratio and lymph node status were significant predictors of TNBC in our cohort of patients (Table and Figure ). Other known prognostic factors in breast cancer, such as age, tumor size and histological grade, were not significant in our cohorts, even in univariate analysis. Most TNBC are high grade and, therefore, grade is not as important for prognosis in this subtype as it is in ER positive disease. TNBCs are also often associated with younger age but the impact of age and tumor size for prognosis within this subtype is not yet fully clear. Still it cannot be excluded that a bias in our cohort is the reason for the lack of the significance of these factors. Our analyses of neoadjuvant treated TNBC samples suggest modest predictive value of the B-cell/IL8 metagene ratio for currently used chemotherapies [
22,
54] (Additional file
1, Supplementary Figure S10). We also observed a pure prognostic value in untreated patients of finding the cohort in line with other reports on B-cell metagene [
24,
27]. Treatment information on the samples from the validation cohort was not available.
Our observation is important since every currently available genomic prognostic signature, (for example, the 70-gene profile [
55], Recurrence Score [
36], Genomic Grading Index [
37]), assigns poor prognostic risk status to all TNBC samples despite their variable outcome [
56-
58]. One of these signatures, the Rotterdam-76-gene prognostic signature [
59], was developed in a way to allow prognostic stratification of ER-negative cancers. However, similar to other reports [
9] we were not able to demonstrate a prognostic value for this signature (Additional file
1, Supplementary Figure S12).
We used an unsupervised class discovery approach to first identify the main molecular subtypes within the data and then assess the prognostic differences between the molecular subsets. Interestingly, when we performed an independent supervised analysis that compared TNBC cases with or without recurrence, we also identified IL-8 as the top ranked gene associated with poor prognosis (Additional file
1, Supplementary Figure S13 and Additional file
2, Supplementary Table S8). However, gene signatures obtained through supervised analysis were not superior to the molecular structure based prognostic predictions in validation (Additional file
1, Supplementary Figure S14). In addition, the biological interpretation of the empirically derived prognostic signature is more difficult than the interpretation of metagenes. In summary, we performed the largest unsupervised analysis of pooled gene expression data from TNBC. We describe a new prognostic signature for these cancers that identify about one-third of TNBC as relatively low risk for recurrence. These cancers are characterized by high B-cell and low IL-8 metagene expression and have about 84% recurrence-free survival at five-years. Whereas, this may not be sufficiently high to forego adjuvant chemotherapy, these observations pave the way to develop a clinically useful multivariate prognostic model for TNBC. A combined, prognostic score, including clinical variables, such as nodal status and perhaps tumor size, and molecular variables, such as optimized B-cell and IL-8 metagenes (measured by an RT-PCR or array-based method), may identify patients with very low risk of recurrence even with ER-, PgR- and HER2-negative breast cancer. Equally important, the prognostic importance of B-cells and the negative impact of IL-8 suggest potential novel therapeutic strategies for TNBC that can be tested in the clinic [
31,
32]. It could allow the selection of those patients who could profit most from novel immune stimulating drugs like anti-CTLA-4 antibodies that have shown promise in melanoma [
60,
61]. IL8 could also directly increase the survival of breast cancer stem cells after chemotherapy [
62], which can be blocked with IL8 directed drugs [
63]. Such an effect might explain the triple negative paradox with high relapse rates despite a good initial response to chemotherapy.