Hippocampal segmentation is a key step in many medical imaging studies for statistical comparison of anatomy across populations, and for tracking group differences or changes over time. Specifically in Alzheimer’s disease, hippocampal volume and shape measures are commonly used to examine the 3D profile of early degeneration, and detect factors that predict imminent conversion to dementia [2
]. Early detection of AD has grown in importance over the last decade because of the acknowledged benefits of treating patients before severe degeneration has occurred [11
]. In epilepsy, hippocampal shape measures computed from a pre-operative scan, can also predict whether patients will be seizure-free following surgical treatment [30
]. A broad range of ongoing neuroscientific studies have used hippocampal surface models to examine the trajectory of childhood development [19
], childhood-onset schizophrenia [40
], autism [39
], Alzheimer’s disease and mild cognitive impairment [6
], drug-related degeneration in methamphetamine users [57
], and hypertrophic effects of lithium treatment in bipolar illness [5
]. Hippocampal models are also used in genetic studies that seek anatomical shape signatures associated with increased liability for illness, providing measures to assist in the search for genes influencing hippocampal morphology [37
]. There has also been work developing algorithms for 3D nonlinear registration or computational matching of hippocampal surfaces, based on elastic flows in the surface parameter space [68
], direct surface matching using exterior calculus approaches [63
], spherical harmonic approaches [21
], or level-set approaches and intrinsic shape context measures to constrain 3D harmonic mappings [51
One of the first steps for all these methods is segmenting out the hippocampus from a 3D brain MRI scan. Despite much active work on the computational anatomy of the hippocampus, segmentation is still commonly performed manually by human experts. Manual tracing is difficult and time consuming, so automating this process is highly desirable. As a result, several partially or fully automated approaches have been proposed to segment the hippocampus, but none is currently in wide use.
Semi-automatic methods still require some user input and therefore some amount of expert knowledge. Hogan et al. [23
] used a deformable template approach to elastically deform a hippocampal model to match its counterpart in a target scan. This method was successful, but required 10–15 minutes of user interaction to define both global and hippocampal specific landmarks. Another approach by Yushkevich et al. (ITK-SNAP) [67
] used active surface methods implemented in a level-set framework. In ITK-SNAP, the user must first determine an approximate boundary for the structure of interest, and the final segmentation depends to some extent on the starting position of the active surface. Also, the deforming surface is driven by an intensity-based energy minimization functional. This makes it very difficult to segment a structure like the hippocampus as local intensity information is not sufficient to determine the hippocampal boundary, particularly its junction with the amygdala. Shen et al. [49
] also used an active contour method augmented by a priori
shape information. Nevertheless, they are still subject to some of the same limitations as ITK-SNAP, requiring some user initialization.
Fully automatic methods do not require any user input, and are usually based on extracting and combining some set of image features to determine the structure boundary. Some commonly used features include image intensity, gradients, curvatures, tissue classifications, local filters, or spectral decompositions (e.g., wavelet analysis). However, determining which features are informative for segmentation, and how to combine those features is difficult without expert knowledge of the problem domain, and without proper features for each different problem, segmentation becomes very difficult. Lao et al. [28
], used a multispectral approach to segment white matter lesions based on co-registered MRI scans with different T1- and T2-dependent contrasts. They used SVMs to combine the intensity profile of these different scans, and performed multivariate classification in the joint signal space. This will only work if segmentation is possible with only these specific MRI signals, which in general it is not. Powell [43
] also used SVMs and artificial neural networks to segment out the hippocampus. Although they report very good segmentation performance for their data, their test size is small (5 brains) and they use 25 manually selected features, which means that generalization to other datasets is not guaranteed. Golland et al. [20
] proposed using a large feature pool, and Principal Component Analysis (PCA) to reduce the size of the feature pool, followed by SVM for classification. PCA does not choose features that are necessarily well-suited for segmentation, it only chooses features with a large variance. Therefore, the features chosen by PCA are not guaranteed to give good classification results. Another common approach for fully automated segmentation is to nonlinearly transform an atlas, where the hippocampus is already segmented, onto a new brain scan, using deformable registration. Such an approach was proposed by Hammers et al. [22
], but its accuracy depends on the image data used to construct the atlas, as well as the registration model (e.g., octree- or spline-based, elastic, or fluid) and may have difficulty in labeling new scans with image intensities or anatomical shapes that differ substantially from the atlas. A fully automatic extension of the level-set approach was suggested by Pohl et al. [41
]. In this approach the traditional signed distance function applied in most level-set implementations is transformed into a probability using the LogOdds space. This can lead to a more natural formulation of the multi-class segmentation problem by incorporating statistical information into the level-set approach.
Another fully automated approach for subcortical segmentation is FreeSurfer by Fischl et al. [14
]. FreeSurfer uses a Markov Random Field to approximate the posterior distribution for anatomic labelings at each voxel in the brain. However, in addition to this, they use a very strong prior based on the knowledge of where structures are in relation to each other. For instance, the amygdala is difficult to distinguish from the hippocampus based on intensity alone. However, they always have the same spatial relationship, with the amygdala immediately anterior to the hippocampus, and this is encoded by the statistical prior in FreeSurfer to separate them correctly. FreeSurfer also makes use of additional statistical priors on the likely location of structures after scans are aligned into a standard stereotaxic space, and their expected intensities based on spatially-adaptive fitting of Gaussian mixture models to classify tissues in a training dataset. As FreeSurfer is a freely available package over the internet, we compared its segmentation results to ours throughout this paper. This required us to develop some extensions of the freely available capabilities of FreeSurfer, such as converting its usual outputs – multi-class segmented volumes – into parametric surfaces, allowing us to compare surface-based statistical maps of disease effects, based on the outputs of all segmentation methods.
Recent developments in machine learning, such as AdaBoost [15
], have automated the feature selection process for several imaging applications. Support Vector Machines (SVM) [64
] can effectively combine features for classification. AdaBoost and SVM may be used to classify vector-valued examples, and both have been separately applied to medical image analysis before, but this paper evaluates the benefits of combining them sequentially.
Statistical classification is an active area of pattern recognition and computer vision research in which scalar- or vector-valued observations are automatically assigned to specific groups, often based on a training set of previously labeled examples. In medical imaging, different types of classification tasks are performed, e.g., classifying image voxels as belonging to a certain anatomical structure, or classifying an individual scanned into one of several diagnostic groups (disease versus normal, semantic dementia versus Alzheimer’s disease, for example). For clarification, we note that this paper classifies voxels in a brain MRI scan as belonging to the hippocampus versus not, but in a second step we use these classified structures to create statistical maps of systematic differences in anatomy between Alzheimer’s patients and controls. As such, although the main goal of the paper is to achieve segmentations of the hippocampus, we illustrate the use of the these segmentations in an application where differences between disease and normality are detected and mapped.
Among several algorithms proposed for statistical classification, AdaBoost is a meta-algorithm that sequentially selects weak classifiers (i.e., ones that do not perform perfectly when used on their own) from a candidate pool and weights each of them based on their error. A weak learner is any statistical classifier that performs better than pure chance. Each iteration of AdaBoost assigns an “importance weight” to each example; examples with a higher weight, classified incorrectly on previous iterations, will receive more attention on subsequent iterations, tuning the weak learners to the difficult examples. Testing examples with AdaBoost is therefore simply a weighted vote of the weak-learners.
SVMs, on the other hand, seek a hypersurface in the space of all features that both minimizes the error of training examples and maximizes the margin, defined as the distance between the hypersurface and the closest value in feature space, in the training data. SVMs can use any type of hypersurface by making use of the “kernel trick”. [9
SVMs have been used widely in medical imaging for brain tumor recognition and malignancy prediction [33
], white matter lesion segmentation [44
], for discriminating schizophrenia patients from controls based on morphological characteristics [67
] and for analyzing functional MRI time-series [26
Although SVMs have been widely used in medical imaging, AdaBoost has not. However, as AdaBoost can select informative features from a potentially very large feature pool, it is likely to offer advantages in automatically finding good features for classification. This can greatly reduce, or eliminate the need for experts to choose informative features based on knowledge of every classification problem. Instead, one just needs to define a list of possibly informative features, and AdaBoost will choose those that are actually informative.
For our classification problem, we compared four different classification techniques, (1) FreeSurfer [14
], (2) SVM with manually selected features (manual SVM), (3) AdaBoost, and (4) SVM with features automatically selected by AdaBoost (Ada-SVM). As AdaBoost can select features automatically, we improved the classification ability of AdaBoost and Ada-SVM by implementing them in a hierarchical decision tree framework.
As a testbed to examine segmentation performance, we trained and tested our methods on a dataset of 70 3D volumetric T1-weighted brain MRI scans. 30 of these subjects were reserved for training, and 40 for testing. The training subjects were composed of 10 subjects with Alzheimer’s disease (AD), 10 with mild cognitive impairment (MCI), a state which carries an increased risk for conversion to AD, and 10 age-matched controls. The 40 testing subjects were composed of 20 AD and 20 controls. Due to the small number of MCI subjects available for this study, we choose to add them to the training group because it increased the variability on which to train. All subjects were scanned on a 1.5 Tesla Siemens scanner, with a standard high-resolution spoiled gradient echo (SPGR) pulse sequence with a TR (repetition time) of 28 ms, TE (echo time) of 6 ms, field of view of 220mm, 256×192 matrix, and slice thickness of 1.5mm. For application to drug trials, and neuroscientific studies of disease, we would require our algorithm to perform accurate segmentation for normal subjects and those affected by degenerative disease, which affects hippocampal shape and image contrast; therefore, we trained our classifier on manually segmented scans from both normal and diseased subjects.