Clinical characteristics of the patients (n
= 159) in this study (Table ) showed that those who died or who had distant metastases (n
= 38) more often had tumors ≥ 21 mm in size (P
= 0.06), had a higher mean diameter (P
= 0.05), were more often progesterone-receptor-negative (P
= 0.01) and less often received endocrine therapy (P
= 0.03). No significant difference was detected in the proportion of patients receiving chemotherapy or radiotherapy. A similar pattern was observed when the analyses were limited to breast-cancer-specific deaths (Additional file 1
Univariate comparison of clinical variables among patients with good prognosis and poor prognosis
Of the 159 patients in the training set, 38 patients died or relapsed by 5 years and were thus defined as the poor-prognosis group. Twenty-six of these patients had distant metastases by 5 years, and 12 patients died within 5 years without diagnosis of distant relapse; six of the 12 deaths were due to breast cancer. The remaining 121 patients were defined as the good-prognosis group. Of these patients, after more than 5 years of follow-up, four patients died without recurrence of breast cancer and four patients had distant relapse.
The leave-one-out procedure (Additional file 1
) suggested k
= 64 genes as an optimal number of genes for separating the patients with good prognosis and poor prognosis, giving an overall error rate of 33%. The list of these genes is presented in Additional file 1
. Among the genes that have higher expression in tumors with good prognosis, we found cyclin-dependent kinase inhibitor 1 C (CDKN1C), spinal-cord-derived growth factor B, homeobox A5 (HOXA5) and insulin-like growth factor 1 (IGF1). Of the genes highly expressed in the poor-prognosis group we found genes primarily involved in cell-cycle regulation.
To check whether the expression profile has an independent predictive value compared with standard clinical factors, we performed a multivariate logistic regression analysis of the 5-year status. The results (Table ) showed high risk associated with the poor-prognosis score (odds ratio, 4.19; 95% confidence interval, 1.49–11.77) after adjusting for age, stage, grade, estrogen receptor status and progesterone receptor status. Of these clinical variables, only progesterone-receptor-positive status was associated with better prognosis (odds ratio, 0.35; 95% confidence interval, 0.12–0.99). When we considered breast cancer endpoints (Additional data 1
), the result for the microarray-based prognostic score is more significant than for overall endpoints (odds ratio, 10.64; 95% confidence interval, 2.91–38.87). The multivariate Cox regression analysis of the overall and breast cancer endpoints (Additional data 1
) produced similar results to those of the previous logistic regression analysis.
Multivariate logistic regression of the 5-year disease free status in relation to the poor-prognosis score and other clinical variables
The use of the risk score as a classifier offered only a rigid classification of the patients into good-prognosis and poor-prognosis groups. To overcome this rigidity, we performed a more flexible classification by hierarchical clustering of 159 patients using the 64-gene set; here the risk score was only used for a description of the resulting clusters. The clustering procedure identified three expression-based subgroups with significantly distinct prognoses (Fig. ), arranged from left to right in increasing risk level. There were 59 patients in the high-risk cluster, of which 29 patients (49%) had distant metastases or died within 5 years (Table ). The subset of the patients treated with tamoxifen and its combinations (n = 104) revealed the high-risk signature in 33 patients, of which 16 patients (48%) had distant metastases or died within 5 years (Table ). The high-risk profile was validated by observations from an independent group of adjuvant-treated patients from Uppsala (n = 76) (Fig. ), where 21 out of 35 patients (60%) from the high-risk cluster had distant metastases or died within 5 years (Table ). As seen in Fig. , the clusters were correlated with tumor grade but not with nodal status.
Figure 2 Unsupervised hierarchical clustering of the Stockholm cohort (n = 159) using the 64-gene set. Each column refers to a patient and each row to a gene. Red indicates a high value of gene expression, and green indicates a low value. The list of genes is (more ...)
Prognosis of the clusters identified in the training and validation sets
Supervised clustering of the node-positive treated cohort in Uppsala (n = 76) using the 64-gene set. The accompanying variables have the same meaning as in Fig. 2.
Among the untreated subgroup from Stockholm (n
= 33), 11 out of 16 patients (69%) of the high-risk subgroup reached the primary endpoint by 5 years (Table ). Examinations of the clustering of the untreated patients from Uppsala (n
= 135) (Fig. ) and from the van't Veer cohort (n
= 78) (Fig. ) indicated that the high-risk cluster had a consistently higher 5-year event rate than the other clusters in the same cohort (Table ). A similar result was obtained for the van't Veer cohort when the additional 19 patients used for validation in the original publication [16
] were added: 57% of the high-risk group had a 5-year event rate (data not shown).
Supervised clustering of the node-negative untreated cohort in Uppsala (n = 135) using the 64-gene set. The accompanying variables have the same meaning as in Fig. 2.
Supervised clustering of the van't Veer cohort (n = 78) using 42 genes of the 64-gene set. Meta.5 yr, black if the patient had distant metastasis within 5 years.
To identify women who will do well with or without adjuvant treatment, we examined the clustering of the untreated patients in Figs and . The rates of death or distant metastases within 5 years were three out of 53 patients (5.7%) and four out of 25 patients (16%), respectively. Among the treated groups (Figs and ), the same expression profile is associated with the lowest event rates of two out of 49 patients (4.2%) and two out of 14 patients (14%), respectively, compared with the other clusters (Table ). In the tamoxifen-treated subgroup in Stockholm, none of the 38 patients with a low-risk profile had any event by 5 years (Table ).
To summarize, the gene profiling revealed a statistically significant 5-year outcome result for treated patients in the Stockholm (n = 104, P < 10-6) and the Uppsala (n = 76, P = 0.002) cohorts, respectively (Table ). The expression profile also provided similar 5-year outcome data for patients not receiving adjuvant therapy (Stockholm cohort, n = 33, P = 0.002; van't Veer cohort, n = 78, P = 0,01; Uppsala cohort, n = 135, P = 0.02) (Table ).
To gain a better description of the results throughout the follow-up period and across studies, we computed the Kaplan–Meier survival curves of the risk clusters we found in all datasets (Fig. ). For the high-risk group in all studies, survival tended to drop fastest in the first 5 years after surgery and to level off after 5 years. This means that the 5-year survival rate provided the best comparison between risk clusters. The results were mainly consistent across studies and confirmed the expected survival patterns of risk groups (Fig. ). For the node-negative untreated Uppsala patients (Fig. ), the lack of significance is due to the convergence of the survival curves at around 8 years after surgery. If we limit the comparison to 5-year survival, the survival curves are significantly different (i.e. consistent with the result in Table ).
Figure 6 Kaplan–Meier survival curves of the risk clusters found in (a) the Stockholm cohort, (b) the Uppsala treated cohort, (c) the Uppsala untreated cohort and (d) the van't Veer cohort. L, low-risk group; M, medium-risk group; H, high-risk group. The (more ...)