We investigated whether scalar-valued FA and MD images derived from DTI scans of the brain are able to reliably distinguish between patients with schizophrenia and healthy volunteers. A supervised pattern classifier was developed using Fisher’s linear discriminant analysis based on a training set of DTI scans to prospectively categorize an independent sample of patients with schizophrenia and healthy volunteers. Classification accuracy in this independent cohort was high using either the FA- (94%) or MD images (98%); the difference in accuracy was not statistically significant, however, as evident by the overlap in their 95% confidence intervals. Additional analyses indicated that the observed findings were not significantly influenced by illness duration, diagnosis, or substance abuse. To our knowledge neither the utility of MD to differentiate patients with schizophrenia from healthy volunteers nor whether the combination of MD and FA yields improved classification compared to either modality alone has been investigated. It is noteworthy that the algorithm performed at chance level when structural magnetic resonance images were used for classification suggesting that the inter-subject registration method had successfully removed major shape differences between the subjects’ brain.
Few studies have attempted to discriminate between patients with schizophrenia and healthy volunteers using DTI and automatic pattern recognition methods. Two recent studies, however, examined the utility of FA to distinguish between patients with schizophrenia compared to healthy volunteers (Caan et al. 2006
, Caprihan et al. 2008
) with reported accuracies ranging from 75% using five-fold cross validation (Caan et al. 2006
) to 80% using leave one out cross validation (Caprihan et al. 2008
). Our data suggest that higher classification using FA, and additionally, MD may be possible compared to prior investigations. The improved accuracy in the current study may be related to our use of a relatively large cohort, the use of well-standardized diagnostic instruments for assessing both patients and healthy volunteers and particularly the inter-subject registration algorithm used in the present investigation, which has been demonstrated empirically to be more accurate compared to many other currently available methods (Ardekani and Guckemus et al. 2005
; Klein et al. 2009
). In particular, the accuracy and quality of the registration algorithm (an illustration of the average of 100 registered SPGR images in our study is provided in ) is a critical component in image-based classification and likely to contribute significantly to study results.
It should be emphasized that the current study was carefully designed to avoid the typical pitfalls in prediction studies. As discussed by Demirci et al. (2008)
, four common problems may bias the results of prediction studies: a small cohort that may not be representative of the populations of interest; presenting only the overall prediction accuracy potentially concealing low classification accuracies for classes having fewer subjects; using the full set of subjects for any stage of feature/classifier selection; and reporting the cross validation accuracy for the optimized classifier obtained with the same cross validation. To the best of our knowledge, the sample size in our study is among the largest reported in the literature on the classification of patients with schizophrenia and healthy controls. We have reported both the sensitivity and specificity to characterize the classification performance on both patient and control groups. We have also carefully separated the steps in training and testing the classifiers to avoid the last two sources of bias described above. The steps of standardization, PCA, and principal component selection, and training the LDA were all carried out using the training set, and then the final optimized classifier was applied to classify the previously unseen test subjects. As noted in the Methods section, PCA projects the p
-dimensional (FA/MD) maps into an (n
–1)-dimensional subspace. Following PCA, the number of variables may be further reduced by removing those elements of the (n
–1)-dimensional feature vectors with small discriminative power between the patients and controls. In the present paper, we assess discriminative power by a two-sample t
-test. That is, we eliminate all variables with P
-values greater than some threshold. Caprihan et al. (2008)
utilized a similar approach in which the “discriminative power” of variables is assessed based on the Mahalanobis distance between groups.
When the normalized T1
-weighted images were used as inputs to the classification algorithm, the classifier performed at chance level. This is an indication that the spatial normalization technique (ART) used in this study was effective in removing systematic anatomical differences between patients and controls. It should be mentioned that higher classification accuracies (81% to 91%) have been reported in the literature when using purely structural scans as basis for classification (Davatzikos et al. 2005
; Fan et al., 2005
; Kawasaki et al. 2007
). It should be pointed out, however, that in those studies the feature vectors used for classification were gray matter density maps computed using methods similar to the voxel-based morphometry technique (Good et al. 2001
), and thus our structural imaging results may not be directly comparable to prior investigations.
Brain regions that distinguished between patients and healthy volunteers are provided in the Fisher brains in . The Fisher brains were thresholded such that voxels with an absolute value greater than 30% of the maximum absolute voxel value are illustrated to avoid clutter when visualizing the brain regions that contributed to the classification. Brain regions that contributed to the group classification were mostly symmetric for both FA and MD and were evident in both cortical, white matter and ventricular regions. While MD effects were most evident in cortical and ventricular regions FA effects included these effects and, in addition, deep white matter regions such as the external capsule. Group differences were also observed near the border between the brain and cerebrospinal fluid near the edge of the ventricles and thus, our approach may be sensitive to group differences in water diffusion properties in these regions. Brain regions that contributed to the high classification rate using the DTI scans were not simply a function of possible ventricular enlargement in patients, however, given that group differences in white matter also contributed to the classification and most importantly, group classification based on the structural magnetic resonance images performed only at chance level.
While the classical PCA/LDA algorithm produces robust classifiers as evidenced by the high classification accuracy observed in this study, recent advances in machine learning have led to the development of additional algorithms for pattern representation and classification. Thus, it would be important to evaluate other feature extraction/selection methods in conjunction with state-of-the-art classifiers, including support vector machines (De Martino et al., 2008
; Gerardin et al. 2009
; Sato et al. 2008
). Moreover, it would be worthwhile to determine whether a classifier trained on DTI data from one center could be used to reliably classify scans obtained from another center. In this regard, FA may be a more suitable scalar measure as it has a natural scaling that may be useful for pooling data across centers.
Although the observed effects were identified in a cohort of patients where the diagnosis of schizophrenia was already established an important extension of this work is the identification of individuals at high risk for developing psychosis as anatomical changes in such patients may be observed at a young age (Gogtay et al., 2008
; Thompson et al. 2001
; Vidal et al. 2006
). In this regard Koutsouleris and colleagues (Koutsouleris et al. 2009
) used support vector machines to correctly identify individuals who were in “at risk mental states of psychosis” as well as those individuals who transitioned into psychosis. Another valuable extension of this work would be to determine whether such classification methods distinguish among individuals with different psychiatric disorders, although this may be inherently more complex as some research suggests that some disorders such as schizophrenia and bipolar disorder share common structural alterations (Sussmann et al. 2009
; McIntosh et al. 2004
There were several study limitations that should be acknowledged. One limitation of the classification approach is that it does readily lend itself to localizing group differences in specific regions. The methodology presented here, however, may be used as a starting point for this purpose using randomization methods (Ardekani et al. 1998
). An additional study limitation is the anisotropic voxel size, which could conceivably bias FA measurements, although it should be noted these were comparable in both groups. In addition, the DTI images were acquired with NEX=2, which does not always allow for identification of artifacts especially those from cardiac pulsation. Additionally, patients tend to have lower IQs compared to healthy volunteers, and given evidence that IQ and cognitive abilities correlate with FA and MD (Fryer et al. 2008
, Schmithorst et al. 2005
) an additional possible study limitation is that the discriminant function may be sensitive to generalized brain abnormalities, such as those that may be indexed by IQ, and thus not actually be specific to schizophrenia per se
. Moreover, we were unable to examine other possible factors that could influence brain measurements such as nutrition or antipsychotic medication and the functional significance of the classification methodology was not investigated.