We have demonstrated in this paper that the fully automatic hippocampus segmentation method presented here is accurate for data coming from patients and normal subjects acquired on a variety of MRI platforms, with a systematic qualitative evaluation process (the segmentation proved correct in 63%, acceptable in 31% and not satisfactory in 6% of the cases). It has also proven its usefulness in discriminating between cognitively normal subjects, patients with MCI and patients with AD in a setting which corresponds better to clinical routine. The present study confirms the results that were shown in Chupin et al (2008)
, while being applied to more realistic datasets. Furthermore, the segmentation process is fast (15 minutes, including 10 for the registration and 5 for bilateral segmentation) and is implemented as the SACHA module in a user-friendly environment (http://brainvisa.info
Most importantly, no parameter tuning or atlas modification was necessary, compared to Chupin et al (2008)
. The hybrid anatomical and probabilistic priors make the segmentation more robust to pathology and acquisition parameters than the semi-automatic method (Chupin et al, 2007
). Furthermore, the partial integration of probabilistic maps as a constraint in the deformation process makes it more robust to pathology than methods that rely more strongly on a single atlas. In fact, it was previously demonstrated that segmentation based on the registration of a single subject atlas does not perform satisfactorily when the atlas does not belong to the same disease category as the subject (Carmichael et al, 2005
Validation studies on the segmentation of the hippocampus in patients with AD are limited and difficult to compare because of different patient samples and evaluation strategies (Hsu et al, 2002
; Crum et al, 2001
). Recently, a method based on the registration and segmentation module of SPM5 (Firbank et al, 2008
) was evaluated on 9 elderly controls with an RV of 5% and a DO of 74%; and 9 patients with AD, with an RV of 15% and a DO of 67%. Another method, based on finding the best match amongst a library of templates (Barnes et al, 2008
), with a refinement step based on intensity, was evaluated on 19 elderly controls, with a DO of 82%, and 36 patients with AD, with a DO of 84%. Finally, a method based on statistically learned image features was evaluated on 21 subjects (7 controls, 7 patients with MCI and 7 patients with AD) from the ADNI database (Morra et al, 2008
), and a DO value of 83% were reported.
Using fully automatic volumetry of the hippocampus, we were able to discriminate patients with AD from controls with 76 to 80% accuracy, in the present study. This remains in line with previous results based on manual segmentation which report accuracy between 82% and 90% for AD, e.g. Frisoni et al (1999)
, Xu et al (2000)
. As for automatic methods, very few studies investigated the classification of individual patients. Fischl et al. (2002)
detected significant group differences in hippocampal volume but did not investigate classification of individual participants. Using both volume and shape features, Csernansky et al. (2000)
reported a sensitivity of 83% and a specificity of 78%. The accuracy that we report for MCI (61 to 63% for the whole group and 71 to 74% for the patients converting to AD before 18 months) is also comparable to that obtained using manual segmentation (between 60%–74%, e.g. Xu et al (2000)
and Pennanen et al (2004)
). Compared to our results reported in Chupin et al (2008)
and Colliot et al (2008)
(87% for AD vs controls, 74% for MCI vs controls), the classification accuracies obtained here on the ADNI database are slightly lower. This can be explained by several factors. First, ADNI is a multi-centre database (41 centres, different voxel sizes and acquisition parameters) whereas the data in our previous study came from a single scanner. Moreover, the population includes a large number of subjects with vascular lesions, thus being closer to real life datasets. The systematic quality control procedure allowed establishing that the cases which were not consistent for the classification did not always correspond to cases which were not satisfactory for the segmentation. Classification results without unsatisfactory segmentations did not prove any better, which is likely to be due to the intrinsic variability of the hippocampal volume amongst the study population.
We compared the classification accuracy derived from our method and derived from SNT volumes. Accuracy was similar for converter vs non-converter MCI, while SNT volumes were slightly more discriminative for AD vs CN (80% to 76%). Note that our approach is fully automatic and fast, while the SNT approach requires the placement of more than 22 landmarks per hippocampus, and the unsatisfactory segmentations were manually edited.
In the second experiment, we have also shown that the pre-processing steps that we kept from those available in ADNI have an effect on the segmentation and/or the classification results. These correction steps appear useful in the present study. Furthermore, we have shown that the normalisation by total intracranial volume does not improve classification results; this may be due to errors in the TIV values (due to CSF segmentation), or to the absence of linear relationship between the volume of the hippocampus and TIV.
Groups are not matched for age and gender, and the database includes more controls than patients older than 80 year old; we have shown that age impacts on the classification results. Furthermore, controls over 80 will correspond to a far more variable population than younger controls, as most of them will have cerebral atrophy and ventricular enlargement, and are likely to have incipient dementias. Age groups should be taken into account when devising a diagnostic tool, and careful considerations should be given to the age range.
Finally, results when comparing MCI converters and non converters show that the hippocampus conveys useful information for designing prognostic tools. Nevertheless, hippocampal volume is not yet sufficient for a complete discrimination of the two populations. Shape analysis and/or classification methods using both local and global information may give complementary information and improve classification reliability.