A data-mining model generated using the ADTree ensemble technique improved the prediction of AxLN metastasis in patients with primary breast cancer, compared with older models such as the MSKCC nomogram. Evaluation using an external validation dataset and bootstrap analysis revealed high AUC values of 0.772 and 0.768, respectively. However, the prediction was not perfect and there are several issues that may affect the prediction performance.
Different variations in patient variables between the training and validation datasets possibly lowered the AUC values for the external validation. There were fewer patients with AxLN metastasis in the Seoul dataset (23.6%) compared with the Tokyo (29.7%) and Kyoto (30.8%) datasets, although this was not statistically significant (P
0.29) (Table ). One reason for this difference is that patients who underwent ALND were included in the Tokyo and Kyoto datasets (14.8%) but not in the Seoul dataset. Interestingly, the number of node-positive patients in the Tokyo and Kyoto datasets was slightly higher among patients who underwent ALND compared with those who underwent SLN (39% vs.
29%), although this was not significant (P
0.15). Despite these differences, the AUC values for the Kyoto and Seoul datasets were similar (0.770 and 0.772, respectively).
The calibration plot (Appendix D (Additional file 1
)) revealed that the predictive probability for the AxLN metastasis high-risk group was overestimated in both the Kyoto and Seoul datasets. Controlled bias in the training dataset consisting of approximately 50% of AxLN-positive patients (Appendix A (Additional file 1
)) likely introduced this overestimation. As demonstrated by Rouzier et al.
], the calibration curves for the Seoul dataset were improved (corrected) by fitting the data to the Kyoto dataset using a polynominal function, which resulted in near-ideal lines (i.e.y
). Meanwhile, the calibration plots for the lower risk groups were relatively good, even without correction, for both the Kyoto and Seoul datasets.
Sensitivity analysis revealed the degree of influence of the variables in the developed model (Figure and Appendix B (Additional file 1
)). In this analysis, the values of each variable were randomized (Figure ). Of the variables causing a greater decrease in AUC values, AxLN size is directly associated with lymph node metastasis. Tumor size is used as a predictive factor in the MSKCC nomogram [6
]. Echogenic halo, interruption of the anterior border the mammary gland on ultrasonography, and skin dimpling are features that reflect tumor infiltration into the surrounding tissue [31
]. Therefore, these variables might represent tumor characteristics in the prediction models.
The mean AUC values obtained for the missing value analysis (0.884 for Kyoto and 0.688 for Seoul) were very different from those obtained for all individuals (0.770 for Kyoto and 0.772 for Seoul) because of the small number of individuals with missing values. However, the differences between the upper and lower CIs were small (0.0047 for Kyoto and 0.0081 for Seoul), which indicates that the developed model has low sensitivity to missing values. One possible reason for this feature is that ADTree can calculate a range of predictive probabilities, even for cases with missing values (see the legend of Appendix C (Additional file 1
)). By contrast, standard ‘if–then’ decision trees and CART models cannot calculate this probability. In addition to the simple structure and high accuracy of ADTree analysis, this tolerance to the missing value is also valuable when applying machine learning to clinical data with missing values.
In the pruning analysis, the AUC values for the datasets from all three institutes generally improved according to the number of ADTrees in the prediction model (Appendix E (Additional file 1
)). Although increasing the number of trees resulted in a more complex model that requires more calculation time for prediction, the model developed using the ensemble procedure showed improved accuracy and generalizability.
The AUC value of the MSKCC nomogram for the authors’ own external validation sets was 0.754 [6
], which is similar to our own for the Seoul dataset (0.772). Therefore, the AUC values of the developed model, the MSKCC nomogram, and the Russells Hall Hospital scoring system were compared with an external validation dataset (Seoul), which yielded values of 0.777 (95% CI: 0.689–0.864, P
0.001), 0.664 (95% CI: 0.560–0.768, P
0.0033) and 0.620 (95% CI: 0.509–0.731, P
0.0032), respectively (Appendix F (Additional file 1
)). The higher AUC value for our ADTree method might be attributed to the flexible model structure and the greater number of variables incorporated into the model. By comparison, the main advantage of both the MSKCC nomogram and the Russells Hall Hospital scoring system is that they require a small number of variables, which can facilitate data collection and interpretation of the model. Thus, these features of each modeling method represent trade-offs that should be considered when applying the models.
In addition to AUC value-based prediction performance, the false-negative rate (FNR) of the prediction model is also important when applying these models in clinical settings. For example, when a predictive value of
20% is defined as low risk for AxLN metastasis, the FNR of both the ADTree model and the MSKCC nomogram using the Seoul dataset was relatively good (5.3% and 2.6%, respectively). However, the nomogram predicted that only 6.9% of the patients were AxLN negative, compared with 23.7% using the developed model.
Unlike the MSKCC nomogram and our ADTree model, Reyal et al. developed MLR-based nomograms using the molecular subtype classification defined by a combination of ER and HER2 status with clinical parameters that included tumor size, LVI and age [33
]. The decision to use ER/HER2 subtype might be attributed to the expected relationship between intrinsic breast cancer subtype and lymph node metastasis. Instead, we treated these variables as independent possible predictive factors and ADTree did not select ER status, but did select HER2 status in model development. Interestingly, HER2 status showed the lowest sensitivity in our model and the contribution of this subtype-related variable to AxLN metastasis was not significant in our study.
There are several limitations and perspectives to be discussed. First, to eliminate inter-institute or inter-interpreter variations, a standardized ultrasonography/mammography scoring system is vital because these variables are key factors for the accurate prediction of AxLN metastasis. Since a larger number of variables is required to achieve accurate prediction, unlike conventional prediction models or scoring systems, a web-based user interface, such as the one used for the MSKCC nomogram [6
], will help to encourage its use and to ensure it is used correctly. In addition to calculating the probability of AxLN metastasis, a web-based platform can also assist with data collection and ensure the prediction model is kept up to date. Alternatively, machine learning-based medical classification systems have been developed following the introduction of electronic medical record systems [34
]. Integrating prediction tools with electronic record systems will enable researchers not only to improve classification algorithms using high-dimensional datasets, but also to avoid time and effort transferring data into the classification system. Although the variables used in our developed model are frequently assessed in preoperative examinations, our proposed model is very flexible as it can incorporate new diagnostic methods or criteria. We are now developing a web-based platform to allow wider use of our model. Finally, further validation using prospective and larger datasets is indispensable before it can be used clinically.