In the past several years, there has been increasing interest and enthusiasm in molecular biomarkers as tools for early detection of cancer. Liquid chromatography tandem mass spectrometry (LC/MS/MS) based plasma proteomics profiling technique is a promising technology platform to study candidate protein biomarkers for early detection of cancer. Factors such as inherent variability, protein detectability limitation, and peptide discovery biases among LC/MS/MS platforms have made the classification and prediction of proteomics profiles challenging. Developing proteomics data analysis methods to identify multi-protein biomarker panels for breast cancer diagnosis based on neural networks provides hope for improving both the sensitivity and the specificity of candidate cancer biomarkers for early detection.
In our previous method, we developed a Feed Forward Neural Network-based method to build the classifier for plasma samples of breast cancer and then applied the classifier to predict blind dataset of breast cancer. However, the optimal combination C* in our previous method was actually determined by applying the trained FFNN on the testing set with the combination. Therefore, in this paper, we applied a three way data split to the Feed Forward Neural Network for training, validation and testing based. We found that the prediction performance of the FFNN model based on the three way data split outperforms our previous method and the prediction performance is improved from (AUC = 0.8706, precision = 82.5%, accuracy = 82.5%, sensitivity = 82.5%, specificity = 82.5% for the testing set) to (AUC = 0.895, precision = 86.84%, accuracy = 85%, sensitivity = 82.5%, specificity = 87.5% for the testing set).
Further pathway analysis showed that the top three five-marker panels are associated with complement and coagulation cascades, signaling, activation, and hemostasis, which are consistent with previous findings. We believe the new approach is a better solution for multi-biomarker panel discovery and it can be applied to other clinical proteomics.