In this paper, data correction and data stratification were tested for the classification of Alzheimer's disease data. Eight features were used as nuisance features, and the classifications were performed using data obtained from the ADNI database. The classification data included neuropsychological tests, MRI analyses, and APOE genotype values.
The results show that in the best case up to 6% units improvement in the classification accuracy can be achieved with data correction and stratification. The biggest improvements were obtained in the classification of stable and progressive MCI subjects, which is the most challenging and interesting classification problem in the analysis of Alzheimer's disease. The classification accuracy was improved in all imaging and neuropsychological feature groups studied. In the classification of controls from AD patients, the largest improvements were obtained for the MRI-based feature groups. However, neuropsychological tests are biased in this comparison as they are already used in the clinical diagnosis, so their results are not very interesting. The data correction method gave better results for imaging biomarkers, whereas data stratification worked well with the neuropsychological tests. The combination of data correction and stratification was not able to further improve the results. Guidelines for the future studies were presented based on the results obtained in this study.
The weakness of data stratification, especially when numerous nuisance features are used, is that the number of training samples decreases which affects the performance of the classifier. This can be seen in , where data stratification gives worse results than the original data. On the other hand, data correction always utilizes the full data set, and, therefore, it is guaranteed that the maximum amount of data is available for the training of a classifier. In addition, data stratification requires one parameter to be chosen. The threshold th for the inclusion criteria was selected in this study as the standard deviation of the feature values. However, the threshold used here may not be optimal, and further studies are needed to find out how this value should be defined.
The data correction method used in this paper was based on linear regression model. We have tested also regression model with the cross-terms, but the results were worse than the results presented in this paper. The method can be easily extended to any higher order polynomials, or other basic functions. However, the more complex the model used the larger training set is required in order to reliable estimate the values for the parameters.
In this paper, we identified four types of interactions between nuisance features and classification features. Different interactions and their combinations need different correction and stratification methods. In this study, the interactions were not detected from the data, but in an optimal situation, the types of interactions are detected for each classification-nuisance feature pair, and the method used is determined based on the type detected. The types could be detected, for example, by studying the results of line fitting and statistical tests, performed either for the whole space of nuisance feature values or for a specific range of values.
The results presented in this study were obtained using linear regression classifier. All the studies were performed also using naïve Bayesian classifier, linear discriminant analysis, and support vector machines (SVM). These methods produced results similar to the ones presented in this paper. A regression classifier was selected because it is the simplest one from the classifiers studied and does not require optimization of any parameters. For example, SVM could give slightly better classification results but the choice of kernel type and parameter values should be optimized separately for each feature group used. The failure of using optimal kernels and parameters could decrease the results dramatically.
One weakness of the study may be the feature selection used. In some feature groups there were tens of features, and therefore, efficient feature selection is required. Only one standard method was tested for the feature selection in this study. A state-of-the art feature selection might give some extra improvement in the results.
There are many kinds of features in the ADNI database (continuous, binary, ordinal, nominal etc…). The methods presented here can be used for all the features that are ordered. Data stratification can be used also for non-ordered data if the thresholds are reasonably selected. The data correction method is best suitable for continuous classification features. In the case of binary or ordinal features large corrections are required to change the classification feature value so that it would affect the classification result. In this study, it was shown that data correction suits for continuous imaging biomarkers, whereas data stratification works well for neuropsychological tests where there are many binary variables. Consequently, different methods for different feature types should be further studied.
The benefit of the methods studied here is that in the clinical decision making, for example using the tool presented in 
, a clinician can visually compare the patient values to either corrected values in a dataset or to the values of a stratified dataset. In the stratified/corrected feature values the group differences are better visible and, consequently, the diagnosis can be performed more reliably. Most of the methods presented in the literature, such as ANCOVA, handle the patient values in a black box and only output a classification result that is not very informative in clinical decision making.
In this paper, the evaluation of the methods was performed using automatic classification methods in order to be able to use large dataset. An alternative approach would have been to give the original and corrected or stratified data to a clinician and asked him/her to make the diagnosis using the data available. Only a small dataset could have been evaluated using this approach, and all the combinations of nuisance features could not have been analyzed. Nevertheless, this study gave valuable information how the personalized diagnostics in AD could be performed. As the next step, this knowledge should be validated in clinical environment.
The methods proposed here can be used as a pre-processing step to improve the classification accuracy of any combination of feature selection and classification methods. In addition, the methods studied in this paper are not specific to Alzheimer's disease, but can be applied to any medical application, and also outside the medical field.