In current guidelines (Dubois et al., 2007
), the diagnostic criteria for probable Alzheimer's disease (AD) require a presence of both impairment in episodic memory and one supportive feature, either medial temporal lobe atrophy, abnormal cerebrospinal fluid (CSF) biomarker, specific pattern in PET or proven AD autosomal dominant mutation. In addition, the guidelines specify a list of exclusion criteria. Similar components can be found also from the recent EFNS guideline (Waldemar et al., 2007
). The revision of criteria for AD, mild cognitive impairment (MCI) and preclinical AD is also ongoing and will include further emphasis on biomarkers and imaging.
In medial temporal lobe (MTL), the volume loss of hippocampi, entorhinal cortex and amygdala is a hallmark indicating AD. The guidelines (Dubois et al., 2007
) suggest that the volume loss is “evidenced on MRI with qualitative ratings using visual scoring”. Qualitative and subjective ratings may, however, lead to different results between interpreters and the diagnosis made by even a single interpreter may vary when re-examining images. Therefore, there is a clear need for objective methods for the assessment of hippocampal volume. Although automated tools are developed actively in many research groups, the development of robust, accurate and fast automatic methods is a highly challenging problem and automatic methods are still very much lacking in clinical practice.
Several methods have been published for segmenting hippocampus (Chupin et al., 2009a
; Fischl et al., 2002
; Lötjönen et al., 2010
; Morra et al., 2008
; van der Lijn et al., 2008
; Wolz et al., 2010a
). All these methods segment the hippocampus as a whole although in reality it contains sub-structures. However, the accurate segmentation of these structures is difficult from most images currently available in clinical practice. We therefore concentrate in this work on the segmentation of the hippocampus as a single structure. One of the main objectives of this work is to develop tools for clinical decision making.
Although many published methods are promising, some space remains for interpretations, either in accuracy, robustness or computational speed. First, there is no real gold-standard for defining the accuracy of segmentation. Currently manual segmentations by clinical experts represent the clinical gold-standard for hippocampal segmentation. Therefore, if the difference between automatically and manually generated segmentations is equal to the difference between two manual segmentations, automatic segmentation is typically considered to have corresponding accuracy to the manual segmentation. There are numerous methods characterizing the accuracy of segmentations: differences in various overlap measures between manually and automatically generated segmentations, such as the Dice similarity index, recall and precision values, or distances between the surfaces of objects, or differences in the volumes of objects, or differences in the ability to classify a subject to a correct class or group. Classification accuracy is an important measure if the ultimate goal is to use a certain biomarker in diagnostics. Classification accuracy reflects the robustness of segmentation not segmentation accuracy as such. For example, if an automatic method is consistent but systematically overestimates the volume, i.e., the measure is biased, the accuracy of the segmentation is obviously decreased. This systematic and consistent error does not, however, affect classification accuracy or ability to detect statistical difference between two populations. A less robust or consistent algorithm introduces noise into measurements and thus makes the classification less accurate. In diagnostics, the consistency of segmentation is even more important than ensuring that segmentation is not biased. As there are different guidelines for manual segmentation of the hippocampus, even the clinical gold-standards are biased relative to each other; efforts for harmonizing these guidelines are ongoing (Boccardi et al., 2010
). All these indicators may lead to conflicting interpretations making the evaluation of results sometimes cumbersome. Second, methods are often validated using a relatively small database or somehow constrained data, e.g., from a single site or using only a device from one manufacturer. A clear problem in the evaluation of the accuracy is a limited number of manually segmented cases available because producing a representative set of manual segmentations is a highly laborious task. These issues make the extensive evaluation of the robustness in real clinical conditions difficult. Third, the computation time of a segmentation method is not considered in many scientific publications although it is a relevant issue in clinical practice. Computation times of hours or the requirement of special computer facilities or a need for careful and laborious tuning of the parameters of the method decrease the feasibility of a method in the clinical setting. In summary, demonstrating the usefulness of a method for clinical practice is a laborious task and still often leaves some space for interpretations.
Atlas-based segmentation is a commonly used technique to segment image data. In atlas-based segmentation, an intensity template is registered non-rigidly to an unseen image and the resulting transformation is used to propagate tissue class or anatomical structure labels of the template into the space of the unseen image. The segmentation accuracy can be improved considerably by combining basic atlas-based segmentation with techniques from machine learning, e.g. classifier fusion (Heckemann et al., 2006
; Klein et al., 2005
; Rohlfing et al., 2004
; Warfield et al., 2004
). In this approach, several atlases from different subjects are registered to unseen data. The label that the majority of all warped labels predict for each voxel is used for the final segmentation of the unseen image. This multi-atlas segmentation was shown to produce the best segmentation accuracy for subcortical structures in a comparison study (Babalola et al., 2008
). However, the major drawback of the multi-atlas segmentation is that it is computationally expensive. For example, van der Lijn et al. (2008)
reported computation times of several hours for multi-atlas segmentation.
In (Lötjönen et al., 2010
), we recently presented a method for fast and robust multi-atlas segmentation of volumetric image data. The tool was based on a fast non-rigid registration algorithm, use of atlas-selection and use of intensity information via graph-cut or expectation maximisation (EM) algorithms. The use of atlas selection and the use of intensity modeling improved significantly the segmentation accuracy. The computation time for segmenting the hippocampus was 3–4 minutes using an 8-core workstation. The computation time was clearly shorter than in many published methods and it is not a limiting factor in many applications anymore. However, even shorter computation time would make online segmentation more attractive in clinical practice and allow more freedom in planning clinical work-flows. Other requirements for clinical use include that no manual tuning of segmentation parameters should be needed, and complex and expensive computer facilities and maintenance should not be required. In this work, we propose two major methodological contributions to our previously published method: 1) use of an inter-mediate template space between unseen data and atlas spaces for speeding up the computation time, and 2) use of partial volume modeling in segmenting hippocampus for improving the classification accuracy.
In (Lötjönen et al., 2010
), atlas selection was performed first: the unseen data and all atlases were registered non-rigidly to a template, and atlases being most similar to the unseen data were selected. Then, multi-atlas segmentation was applied: each of the selected atlases was registered separately non-rigidly to the unseen data and classifier fusion was performed. The innovation of our current work is that transformations computed in the atlas selection step are used to initialize the multiple transformations when registering atlases to unseen data. The process becomes much faster as only small tuning of the transformations from atlases to unseen data is needed. The intermediate template space, used in our atlas-selection step, has been previously utilized to speed-up and to improve the accuracy of non-rigid registration by Tang et al. (2010)
using initialization based on principal component analysis and by Rohlfing et al. (2009)
using subject-specific templates generated by a regression model.
The volume of the hippocampus is typically 1–3 ml in elderly subjects, including Alzheimer's disease cases. In a typical clinical setting, the voxel size of MR images is around 1×1×1 mm3
which means that hippocampus is presented only by 1000–3000 voxels. Up to 80–90% of these voxels are on the surface of the object which means that partial volume effect may affect dramatically the estimate of the volume. There are multiple approaches published for estimating the partial volume effects in the EM framework (Acosta et al., 2009
; Shattuck et al., 2001
; Tohka et al., 2004
). In this work, we used the method proposed by Tohka et al. (2004)
In addition to the methodological contributions, we demonstrate using large data cohorts the performance of automatically computed hippocampus volumes 1) in diagnostics of Alzheimer's disease and 2) compared with semi-automatically generated volumes. Data from almost 1000 cases originating from three different patient cohorts are used. For comparison, only 60 cases were used in our previous paper (Lötjönen et al., 2010
In this article, we first introduce a method utilizing the template space to speed up the computation and an approach for modeling the partial volume effect. Thereafter, the data used and experiments performed are described. Finally results are shown and discussed.