Despite the widespread recognition of the value of molecular subtyping, the complexity of the classification models, which use dozens to hundreds of genes, and uncertainty about their robustness and clinical relevance have been impediments to their general clinical use (18
). Furthermore, quality assessment of molecular subtyping is complex because the truth is unknown. Using a collection of data from 5715 breast tumors, we analyzed five previously described classifiers (three SSPs and two SCMs) and compared these to SCMGENE, a simplified SCM-based classifier that uses only three genes that capture key biological processes in breast cancer namely ER signaling, HER2 signaling, and proliferation. We used the prediction strength statistic (64
) to quantify robustness of subtype classifications, defined as the capacity of an algorithm to assign the same tumors to the same subtypes regardless of the gene expression data used to build the classifier. We found SCMs to be statistically significantly more robust than SSPs. Moreover, among the SCMs, SCMGENE, our simple three-gene model, was statistically as robust as the published SCMs, which use hundreds of genes.
Each classifier demonstrated fair to substantial concordance, underscoring the validity of the subtypes. Among the molecular subtypes, the basal-like subtype was consistently identified independently of the classifier used. In contrast, the luminal A, luminal B, and normal-like tumors were more difficult to classify, consistent with the recent study of Mackay et al. (21
); the separation of the luminal group into A and B was not well supported by our analysis, probably because these subtypes are defined by expression of proliferation-related genes, which exhibit a continuum of expression levels (1
). Like others (20
), we did not find support for the normal-like subtype. It may be that this subtype is an artifact resulting from stromal contamination (22
In the survival analysis of a large set of untreated node-negative breast cancer patients, we confirmed that all six classifiers had a statistically significant prognostic value (9
). When assessing concordance with published prognostic gene signatures, we found that the vast majority of basal-like, HER2-enriched, and luminal B tumors were classified as high risk (8
). Again, all the subtype classifiers and gene signatures yielded statistically similar prognostic value. Notably, we also showed that for a cohort of patients with ER+ tumors defined initially by IHC who were treated with adjuvant tamoxifen monotherapy, those patients with tumors identified by SCMGENE and the other subtype classifiers as basal like and HER2 enriched had a poorer survival, suggesting that these patients may not benefit from tamoxifen therapy. However, the clinical relevance in terms of response to therapy—for example, endocrine or anti-HER2—of those patients classified differently using IHC and gene expression remains unknown.
All subtype classifiers were statistically significantly associated with clinical variables widely used in management of breast cancer patients; the ER+ (IHC) tumors were particularly well identified by SCMOD2 and PAM50, whereas the HER2 amplified/overexpressed (FISH/IHC) tumors were highly concordant with the SCMGENE classification. However, we found no association with the subtype classifiers and tumor size, nodal status, and age at diagnosis. A large study involving central pathology measurement of traditional clinical parameters and gene expression profiling is needed to definitively draw conclusions about the complementarity or superiority of one technology over another; in addition, this would help determine the clinical relevance of the above concordance issues, that is, which method of subtype classification or central pathology using IHC would yield better predictive value for prescription of anti-HER2 or endocrine therapies. Ongoing prospective trials such as the MINDACT may facilitate such comparisons (75
). Our data also suggest that accurate and reproducible measurements of ER, HER2, and proliferation can be used for molecular subtyping in breast cancer. This holds true for currently used methods of centrally reviewed IHC for ER, HER2, and Ki67, particularly for large clinical studies. Although IHC has well-known limitations in terms of intra-laboratory reproducibility and subjective and semiquantitative assessment of protein expression, IHC performed in a central laboratory undoubtedly provides significant additional prognostic value compared with local pathology. However, the good technical reproducibility and the quantitative nature of gene expression profiling (58
) makes expression-based classification models promising candidates to complement the current IHC markers widely used in breast cancer. Our results also support the use of SCMGENE to provide molecular subtype classification for samples in large meta-analysis studies of gene expression profiling that involve data generated by different laboratories using diverse microarray technologies.
This study has several potential limitations. First, because our collection of breast cancer microarray data is composed of datasets that were retrospectively accrued, the selection of these patients may result in unbalanced distribution of the different molecular subtypes. Second, we used the normalized gene expression data as provided in public databases and authors’ websites; no attempts to renormalize the microarray data were made, although a robust scaling procedure ensured that the gene expressions were similarly distributed across datasets. Third, depending on the dataset, we did not annotate and map some probes used in the subtype classifiers because of the diversity of microarray platforms used in our compendium of datasets (Supplementary Table 3
, available online). Fourth, the current implementation of the CVPL does not allow checking and correction for departure from the proportional hazards assumption. Finally, in contrast to SCMs, SSPs rely on hierarchical clustering, which makes automated identification of the main subtypes present in a specific dataset challenging (21
); this may have affected their robustness estimations but also highlights the difficulties of using this type of classification method.
In conclusion, our study demonstrated that for breast cancer molecular subtyping, the simplest classification model, SCMGENE, which is based on the expression levels of three key genes and a simple Gaussian probabilistic model, was surprisingly concordant with the more complex published classifiers and yielded similar prognostic value. It also proved to be one of the most robust classifiers because it uses only ER, HER2, and AURKA gene expression, whereas the other classifiers rely on many more genes. The simplicity and robustness of the SCMGENE model provide an opportunity for wide application using a variety of expression data types. Moreover, our results suggest that, at present, for molecular subtyping of breast cancer, three genes provide adequate discrimination for clinical implementation; the clinical and biological relevance of the value of adding more genes remains to be demonstrated.