Our results indicate that supervised machine learning techniques can aid the clinical diagnosis of AD. The analytical technique presented here promises to distinguish disease-specific atrophy from that of normal aging in a standard T1 weighted structural MRI scan. Furthermore, the study provides evidence that the method can be developed to correctly differentiate between different forms of dementia.
Before comparing the method to other approaches and a discussion of translation into clinical practice with prospects for future studies, we discuss some methodological aspects.
We used linear SVMs. They allow a localization of voxels relevant to separation of scans into two groups. The voxels which a whole brain SVM classification of AD from controls depended on most were clustered around the parahippocampal gyrus and parietal cortex (, upper panel). A similar distribution was found for all the classifications of AD from normal scans we report here. Classification of FTLD from AD depended on voxels in frontal as well as parietal areas (, lower panel) for group assignation. A recent study using cortical thickness also found parietal areas important in differentiating these two dementia types (
Du et al., 2007). also shows cortical voxels scattered throughout the brain without any regionally specific pattern. They are however specific in the sense that they also contributed to a differentiation between two groups. It can therefore be argued that they too reflect differences in overall brain shape resulting from degeneration of specific structures. We tested the performance of non-linear kernels (such as radial-basis functions) but these failed to improve performance suggesting a linear approach is both valid and adequate. The excellent results obtained using scans for training and testing from different centres that used different scanners shows that linear SVMs generalize well.
The results we obtained are comparable or better than other classification methods described in the literature based on MR images (
Gosche et al., 2002;
Jack et al., 2002;
Barnes et al., 2004;
Csernansky et al., 2004;
Wahlund et al., 2005), most of which restrict analysis to temporal lobe structures. Our method also performs as well as or better than the average reported diagnostic accuracy of clinicians using clinical exam, history, neuropsychological testing and classical image reporting as outlined in NINCDS-ADRDA or DSM-III-R (
Knopman et al., 2001). However, to make this conclusion, a formal comparison with modern conventional clinical assessment is required. The construction of a ‘library’ of SVM’s for all dementia types can be envisaged to help differentiate other conditions that can be confused with AD e.g. vascular dementia, FTLD, Lewy body disease, etc. The preliminary result from our attempt to separate FTLD and AD is very promising because FTLD is a group of degenerative diseases that also affect frontal and temporal lobes but differ in extent and neuropathological characteristics. AD and FTLD can be difficult to separate clinically and patients with confirmed AD pathology have been shown to present with a focal clinical syndrome. A recent postmortem study found that up to 30% of patients diagnosed with the language subtype of FTLD (progressive non-fluent aphasia or semantic dementia) had AD pathology (
Knibb et al., 2006). One limitation of our FTLD versus AD classification is that we did not test pure pathological subtypes of FTLD separately. It is possible that the performance of a suitably trained SVM classifier will be better for distinguishing certain subtypes of FTLD from AD than others. Our sample size is too small to explore this question further.
Unlike methods that include an expert-dependent hippocampal tracing step (
Jack et al., 1992), the SVM technique is fully automated and can use all the information in a brain scan. Automation eliminates observer/experimenter bias completely, generates totally reproducible results with the same image set and makes the method much less labour-intensive. These are important characteristics for a method proposed for clinical use.
Our findings warrant application of the proposed methods to larger image sets such as those being collected for the Alzheimer’s Disease Neuroimaging Initiative (ADNI—
Mueller et al., 2005) for several reasons. The cases from group I are more typical of community based samples, with a later age of onset, whereas cases from group II are more typical of referral centres with greater numbers of early onset cases. That we could get comparable results and even use one scanner’s images to train and another to test suggests the technique will generalize for use in clinical settings. However, it is clear that when the relatively younger subjects from group II are used for training, specificity goes down; a result attributable to the fact that group II included more early onset AD who may show a somewhat different patterns of degeneration, i.e. relatively more parietal involvement (
Schott et al., 2006;
Frisoni et al., 2007). Because of their younger age, subjects from group II are also less likely to have co-morbidity (e.g. subtle vascular changes) but possibly more AD related atrophy for the same MMSE. A limitation of the population used in Group III is that the clinical diagnoses were likely not 100% accurate as no pathological verification was available for this group. Previous studies have shown that a clinical diagnosis is inaccurate, compared to pathological diagnosis, in about 11% of mild cases in which similar diagnostic criteria to those in our sample were used (
Salmon et al., 2002). We therefore speculate that some of the misclassification is in fact due to misdiagnosis in mildly affected AD-patients. The ability to generalize across image-sets from different centres is very important in this respect as it could facilitate the generation of SVMs for rarer forms of dementia based on reliable diagnoses.
It will be a matter of judgement and empirical verification whether to use whole brain or partial data or a combination of the two for diseases other than AD. We recommend an exploratory whole brain approach as a necessary initial step for the time being. We expect that the earlier the stage or more localized a disease, the more a well-placed VOI will improve categorization. In group III, classification by SVM improved substantially by restricting analysis to medial temporal lobes because non-contributory, noisy brain areas were excluded from analysis. However, reduction of the brain volume analysed risks excluding potentially important differential image features. Therefore, combined kernels from whole brain and VOI serve to retain information obtained from the whole brain while weighting the classification to the area of brain most relevant at early stages of disease. The implication for more generalized diseases is that the opposite will be true. In this perspective we tested groups I and II using a medial temporal area analysis and found that classification was slightly worse for leave-one-out and much worse when using one data-set to train and another to test (). As many forms of dementia involve hippocampal atrophy, accurate differential classification of the dementias may well need whole brain analysis.
A goal of machine learning based automated MR image analysis that we believe achievable, is better sensitivity and specificity of ante-mortem diagnosis than is currently possible. The method we have described clearly has potential in achieving more accurate dementia diagnosis in clinical practice. Although the processing and preparation of a training dataset is relatively time consuming (around a week for all data-sets in this study on a standard PC at 2.4 MHz), this is unlikely to be a limiting factor. Firstly, this represents computer processing time without user interaction. Secondly, once a training dataset is prepared, spatial normalization and classification of any new scan can be done in a matter of minutes. The time required is likely to shorten further with the advent of faster computers. The current implementation still requires a user to check intermediate results for misregistration. This step can be further automated by the introduction of thresholds to alert an operator to check image quality. Although it has been suggested that MRI is likely to help diagnosis in specialty clinics only (
Wahlund et al., 2005), we see no reason why the method cannot be translated to a more general setting, since a training set of pathologically confirmed cases can come from a specialist centre and because the method is computer-based, automated and does not require expert anatomical knowledge.
Future studies will focus on the application of SVMs to aid differential diagnosis in situations where more than two diagnoses are possible. Stratification of patients by anatomical severity can be envisaged. The limits of sensitivity, for example in predicting which MCI patients will transform into AD also need definition. Encouraging results with other multivariate classification methods have recently been reported (
Teipel et al., 2007).