We have evaluated the use of penalized logistic regression for the automatic voxel-wise classification of sMRI images of a subset of CN and AD ADNI participants. We have based our analyses on very recent and powerful methodological developments in the fields of optimization and regularization theory. The GLMNET library employed in this work solves the problem described by Eq.
2 using coordinate-wise descent techniques (Friedman et al.,
2007,
2010) that provide an efficient mechanism to solve problems of high dimension.
The approach applied here is one of the few (Kloppel et al.,
2008; Cuingnet et al.,
2010b; Hinrichs et al.,
2011) reported in the AD classification sMRI literature that directly operate in the voxel space. Some previous approaches (Fan et al.,
2007; Vemuri et al.,
2008; Davatzikos et al.,
2009) developed complex image processing steps that are time consuming driven by the need of dealing with the curse of dimensionality (Bellman,
1961; Donoho,
2000). While the curse of dimensionality is a real problem (which is still poorly understood), its effects on machine learning algorithms vary. One of the main merits of our work is to show that by using PLR and coordinate-wise descent techniques, it is possible to achieve excellent prediction performance when solving very large classification problems. The number of voxels in our analyses for the different tissues varied between 5.7

×

10
5 (WM analyses), 7.4

×

10
5 (GM analyses), and 2

×

10
6 (whole brain analyses Jacobian based), while operating with 98 samples. Our results taken together with those previously reported in relation to SVMs and kernel approaches (Kloppel et al.,
2008; Chu,
2009b) suggest that the regularization mechanisms associated to these linear classifiers effectively deal with classification problems of very large dimension. The difference is that the approach presented here operates directly in the voxel space via coordinate-wise descent optimization while previous SVM work (Kloppel et al.,
2008) by making use of the kernel approach (representer theorem; Kimeldorf and Wahba,
1971; Scholkopf and Smola,
2002) solve an optimization problem of much lower dimensions. This work provides evidence that is not the dimension reduction implicit in linear SVM kernel based methods what makes them to deal effectively with problems of large size but the associated regularization penalty.
On the other hand, the results obtained with PLR predicting cognitive status seem to be very competitive with other previously reported by other researchers. The sensitivities and specificities of 10 of the most successful sMRI classification methods have recently been compared using ADNI data (Cuingnet et al.,
2010c). The best performer in this group achieved sensitivity of 81% and specificity of 95% using a voxel-wise approach with a SVM and the high dimensional DARTEL normalization procedure. Although these results cannot be directly compared to ours for several reasons (differing ADNI samples, sample size, CV procedures, etc.) they serve as a reference, suggesting that our approach reaches similar levels of sensitivity and specificity to the best performers in the comparison.
One advantage of penalized logistic regression over SVMs which have dominated the field so far is that logistic regression directly models the class-conditional probabilities providing a decision probability and not just binary classification, which is very desirable property in a classification algorithm that can be very useful in a clinical setting. These probabilities could be used as an alternative to already existing diagnostic metrics such as STAND-scores or SPARE-AD index (Vemuri et al.,
2008; Davatzikos et al.,
2009). There several potential ways to improve the approach presented here, for example: (1) by introducing spatial constraints via regularization operators (Pascual-Marqui et al.,
1994; Casanova et al.,
2009; Cuingnet et al.,
2010b); (2) By incorporating feature selection and (3) By using more sophisticated penalties.
We found that both GM and WM carry useful information for classification of CN and AD sMRI images, producing high levels of accuracy, sensitivity, and specificity. The large scale regularization approach used here provides discriminative maps localizing the changes to GM structures known to be involved in AD. For example, changes in GM associated with AD have been described to affect the entorhinal cortex and hippocampus before spreading to other temporal, frontal, and parietal areas, many of which were useful for discriminating AD patients from CN subjects in the present study (Braak and Braak,
1991,
1997; Gomez-Isla et al.,
1996; Laakso et al.,
1996,
1998; Insausti et al.,
1998; Frisoni et al.,
1999,
2007; Van Hoesen et al.,
2000; Dickerson et al.,
2001; Thompson et al.,
2003,
2007; Apostolova and Thompson,
2008). The white matter discriminative maps add to a growing body of literature on white matter volume loss in AD (Black et al.,
2000; Moon et al.,
2008; Di Paola et al.,
2010). Several studies have identified volume loss in various portions of the corpus callosum (Di Paola et al.,
2010). The callosal white matter loss has been related to Wallerian degeneration, receiving axons from the temporo-parietal regions involved in AD. Other regions of white matter loss in AD have been less well studied.
Several methodological aspects of this study are worth noting. We utilized a high dimensional warping algorithm to bring the individual structural images into alignment. In particular, we used the SyN methodology, which has been shown to be a top performing method for image normalization. In addition, we used the SyN methodology in a two-step normalization procedure, with the sole purpose of the first step to perform skull-stripping. While there are a variety of skull-stripping algorithms available, in our own testing, we have found the quality of the SyN full brain normalization to provide consistently excellent results allowing direct masking of the results on the basis of the template brain image, without the need for additional manual editing. This enables a second high dimensional normalization of the skull-stripped brain to a skull-stripped template, allowing for a more accurate registration procedure without confounds of extraneous tissues affecting the normalization. We combined the SyN methodology with the SPM8 new segment tool for primary tissue type segmentation. While there are a variety of image segmentation methods available, we have found that the SPM8 multi-class segmentation algorithm performs especially well with elderly brain images as in the ADNI cohort. Proper segmentation in this age group can be very problematic due to the high white matter lesion load, which intensity based segmentation procedures can erroneously classify as GM, adversely affecting classification accuracy. In comparing classification accuracy for modulated GM, modulated white matter, and direct use of the Jacobian, we found the highest accuracy for the modulated GM maps. Interestingly, although classification accuracies were also high for the other input image types, the use of the full Jacobian map (which includes deformation information on gray, white, and CSF) did not improve the classification accuracy (not presented). A limitation of this study is that we did not study here the performance of this approach to detect patients with prodromal AD something that will be pursued in future work.