Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Neuroimage. Author manuscript; available in PMC 2010 May 3.
Published in final edited form as:
PMCID: PMC2862692

Atlas-Based Hippocampus Segmentation In Alzheimer’s Disease and Mild Cognitive Impairment


This study assesses the performance of public-domain automated methodologies for MRI-based segmentation of the hippocampus in elderly subjects with Alzheimer’s Disease (AD) and mild cognitive impairment (MCI). Structural MR images of 54 age- and gender-matched healthy elderly individuals, subjects with probable AD, and subjects with MCI were collected at the University of Pittsburgh Alzheimer’s Disease Research Center. Hippocampi in subject images were automatically segmented by using AIR, SPM, FLIRT, and the fully-deformable method of Chen to align the images to the Harvard atlas, MNI atlas, and randomly-selected, manually-labeled subject images (“cohort atlases”). Mixed-effects statistical models analyzed the effects of side of the brain, disease state, registration method, choice of atlas, and manual tracing protocol on the spatial overlap between automated segmentations and expert manual segmentations. Registration methods that produced higher degrees of geometric deformation produced automated segmentations with higher agreement with manual segmentations. Side of the brain, presence of AD, choice of reference image, and manual tracing protocol were also significant factors contributing to automated segmentation performance. Fully-automated techniques can be competitive with human raters on this difficult segmentation task, but a rigorous statistical analysis shows that a variety of methodological factors must be carefully considered to insure that automated methods perform well in practice. The use of fully-deformable registration methods, cohort atlases, and user-defined manual tracings are recommended for highest performance in fully-automated hippocampus segmentation.


Hippocampal atrophy has been proposed as a clinical marker for early AD because it is known to occur early in the course of the disease on a spatial scale large enough to be detectable with structural MR images [2] [25]. Visual, qualitative atrophy assessment (e.g. [11]) has been hindered by the relative subtlety of atrophy early in the course of AD [16]. However, the development of reliable, repeatable protocols for human raters to trace the hippocampus (e.g. [21]) have led to the possibility of precise quantitation of AD-related atrophy [6], hippocampus-level quantification of activation in co-registered structural-functional images [12], and quantification of other hippocampal characteristics such as bilateral symmetry [1]. Furthermore, tracing protocols have enabled the study of hippocampal morphometrics in subjects with mild cognitive impairment (MCI), a high-AD-risk clinical condition marked by minor deficits in one or more cognitive domains [30][47].

However, large-scale studies of AD-related hippocampal atrophy are often impractical because manual segmentations are labor-intensive and require training to insure high repeatability between raters. Typical hippocampi take between 30 minutes and 2 hours to trace by hand; the tedious labor quickly causes fatigue. Semi-automated segmentation methods reduce manual labor by having the user identify a sparse set of image landmarks that constrain a subsequent automated segmentation process [37] [15] [7]. However, we focus on fully-automated atlas-based techniques to eliminate the need for a user to manually process each image under study, and to eliminate the landmark-identification process as a source of variability between segmentations of the same image.

Atlas-based segmentation coregisters a subject image and a special reference image called the atlas image on which structures of interest have been manually traced (See Figure 1). The resulting spatial transformation maps the coordinates of the structures from the coordinate space of the atlas image to that of the subject image. Since this approach is posed in terms of image-to-image registration, atlas-based techniques take advantage of methodological advances in registration that are driven by a wide range of application areas such as visualization, image-guided surgery, and voxel-based morphometry. Furthermore, atlas-based approaches are among the easiest to implement since they only require the user to align the atlas and subject images.

Figure 1
Schematic view of atlas-based segmentation. An intensity transformation and geometric transformation are estimated to register the atlas image to the subject image; the geometric transformation is applied to the atlas mask in order to estimate the subject ...

The purpose of this study was to systematically compare the performance of several competing public-domain methodologies for atlas-based segmentation of AD-atrophied hippocampi. We validated several widely-disseminated automated image registration methods [45] [17] [20]; in contrast, previous studies on atlas-based elderly hippocampus segmentation used a single, recently-developed, cutting-edge registration algorithm that lacked a widely-disseminated, standard software implementation (e.g., [9]). Furthermore, we examined the use of two widely-disseminated atlas images [23] [40], as well as individual, manually-traced subject images (as in, e.g., [44]), to serve as the reference, or cohort atlas image. Finally, we examined the impact of varying manual tracing protocols on atlas-based segmentation performance.


Subject data

We gathered MR images of 20,19, and 15 subjects in the AD, MCI, and control populations respectively. All subjects were enrolled in the University of Pittsburgh Alzheimer’s Disease Research Center between 1999 and 2004 and given a structural MR scan at time of enrollment. The spoiled gradient-recalled (SPGR) volumetric T1-weighted pulse sequence, acquired in the coronal plane, had the following parameters optimized for maximal contrast among gray matter, white matter, and CSF (TE=5, TR =25, flip angle = 40 degrees, NEX = 1, slice thickness = 1.5mm/0mm interslice). Along with the MR scan, subjects received a comprehensive battery of neuropsychological and clinical tests at time of enrollment and at yearly follow-up visits (see [27] [28] for evaluation procedure). A consensus meeting of neuroradiologists, psychiatrists, neurologists, and psychologists diagnosed each subject into MCI [30], AD, or control categories.

Skulls were stripped from all images using the Brain Extraction Tool ( BET) [38], and the images were cropped to remove all-zero slices using the crop tool provided with AIR 2.0 [45].

Registration Methods

We compared the performance of software modules in AIR, SPM, FLIRT, and Chen’s method [45] [17] [20] [5] as registration substrates for atlas-based segmentation of elderly hippocampi 1. While several algorithmic details vary between these registration techniques, they are chiefly distinguished from each other in terms of their geometric transformation model- that is, the mathematical equation that maps image coordinates between the atlas image and subject image. We partiitoned the geometric transformation models into three categories in terms of the degree to which they allow the atlas image to spatially deform when it is aligned to the subject image. Affine methods apply the same linear transformation to all voxels in the entire atlas image; semi-deformable mappings deform the atlas image in a spatially smooth, gradual way to align it to the subject image; and fully-deformable methods produce image-to-image mappings that are essentially unconstrained spatially (see Figure 2 for an illustration and [3] for a detailed mathematical explanation). We considered one fully-deformable, three semi-deformable, and three affine registration methods. The AIR affine [45], SPM affine [17], and FLIRT affine [20] methods estimate an affine transformation between images. The AIR semi-deformable method uses the transformation output by the AIR affine method as a starting point for estimation of a spatially-smooth deformation based on a polynomial transformation model; the SPM semi-deformable method uses the transformation output by the SPM affine method as the starting point for estimation of a smooth deformation based on a discrete cosine transform (DCT) transformation model; the Chen semi-deformable method estimates a piecewise-linear transformation [5]. Finally, the Chen fully-deformable method takes the output of the Chen semi-deformable method as a starting point for estimation of an unconstrained, voxel-by-voxel deformation.

Figure 2
Example image deformations produced by fully-deformable, semi-deformable, and affine registration techniques. The moving image is registered to the stationary image using each of the 7 algorithms we analyze. The colored dots show the geometric positions ...

Manual segmentations

We evaluated automated segmentations by comparing them to manual segmentations performed by a single expert rater, R1, who was blind to diagnosis, gender, age, and other clinical data at the time of tracing. Hippocampi were traced on contiguous coronal slices following the guidelines of Watson et al. [43], Schuff et al. [36], and Pantel et al. [29]. The traced structure included the hippocampus proper, the subiculum, and the dentate gyrus. The image and tracing were viewed in all three orthogonal viewing planes during manual segmentation. Rater R1 traced hippocampi on all 54 subject images; additionally, we selected 2 AD, 2 MCI, and 2 control images from the pool of 54 subjects for tracing by two additional trained raters, R2 and R3, using the same protocol. These additional manual segmentations were used to compare automated-manual segmentation agreement to inter-rater agreement. All manual segmentations were digitized into binary volumes for analysis.

Cohort atlases

In the cohort atlas scenario, we selected an image from a subject population (AD, MCI, or control), manually traced left and right hippocampi on it, and treated it as a reference image that all other images in the subject population were registered to during atlas-based segmentation. We refer to the selected subject image as a cohort “atlas” image to emphasize its role as a reference image. Cohort atlas images were selected at random from the subject population, however we note that a variety of more complex strategies for cohort atlas image selection are possible [34]. For each image in each subject population, we considered a hypothetical situation in which that image is selected as the cohort atlas; all other images in the population were registered to the cohort atlas image and hippocampus segmentation results were evaluated. In other words, for a population of k images, we considered k different possible cohort atlases, which we registered to all k − 1 other images in the population for a total of k − 1 trials per registration method.

Standard atlases and atlas tracers

In the standard atlas scenario, we began with an atlas image and manual hippocampus tracings, or manual atlas tracings, provided by an atlas institution (Harvard or MNI). We registered the atlas image to the subject image, use the resulting transformation to transfer a manual tracing of the hippocampus from the atlas image to the subject image. This automated segmentation was evaluated by comparing it to an independent manual subject tracing- a manual tracing of the hippocampi on the subject image performed by rater R1. However, we recognized that the manual tracing protocol used by R1 may differ from that used by human tracers at MNI and Harvard, and that our evaluation risked confounding two factors that could have caused discrepancies between the automated segmentation and manual subject tracing: differences in hippocampus delineation between automated and manual techniques, and discrepancies in hippocampus boundary conventions between manual atlas and subject tracings. For this reason, rater R1 generated manual atlas tracings by tracing left and right hippocampi on the Harvard and MNI atlas images using the same manual tracing protocol used for tracing on the subject images. Experiments analyzed the effects of choice of atlas (MNI vs. Harvard) and manual atlas tracings (performed by R1 vs. performed by the atlas institution) on manual-automated segmentation agreement.

Cohort atlas images reflect possibly anomalous characteristics of a particular scan and subject, and their use is inherently more labor-intensive than standard atlases since they require the user to hand-label the structure of interest on the cohort atlas image. However, cohort-atlas-based segmentation has potential advantages over the more conventional standard-atlas-based approach. If the population of subject images is homogeneous with respect to factors such as sensor acquisition parameters, subject age, and subject disease state, then drawing a cohort atlas image from the population guarantees that these factors will not confound the registration process. Furthermore, hand-labeling the structure of interest on the cohort atlas image insures the investigator that anatomical boundaries reflect his or her conventions.

Manual-automated and manual-manual agreement

Performance of automated segmentation algorithms was measured in terms of manual-automated agreement, that is, agreement between automated segmentations and manual tracings performed by an expert rater. We compared manual-automated agreement to manual-manual agreement, or the agreement between manual tracings performed by pairs of expert human raters. In so doing, we assessed whether switching from manual to automated segmentation significantly increases the variability between the produced segmentation and one produced by an independent human rater. We selected 2 AD, 2 MCI, and 2 control images from our pool of subjects and had the hippocampi segmented manually by human raters R1, R2 and R3. Since R1 traced hippocampi on the full set of 54 subject images, we measured manual-automated agreement in terms of agreement between R1-rated manual tracings and the Chen fully-deformable automated technique. Manual-manual agreement was measured in terms of pairwise agreement between manual tracings by R1 and R2, R1 and R3, and R2 and R3. Manual-automated agreement for each subject image was summarized in terms of the average manual-automated agreement between its R1 segmentation and the automated segmentations from all cohort atlas images in its disease category. Experiments analyzed differences between manual-manual agreement and manual-automated agreement on the 6 multiply-manually-traced hippocampi. Note that this approach differs from the more common approach of measuring agreement between pairs of manual and/or automated segmentations in terms of hippocampal volumes; the key difference is that our approach quantifies agreement in terms of how well the segmentations overlap in the brain. We note that other approaches, based on estimating automated segmentation performance and a single estimate of the true, underlying structure mask, are also available [42].

Performance measure: overlap ratio

We evaluated the agreement between an automated hippocampus segmentation estimate and a manual segmentation using a numerical criterion that measured the degree to which they overlap. We represented the automated segmentation [B with circumflex] and its corresponding manual segmentation B as binary 3D volumes in which voxels labeled as hippocampus had a value of 1. Let VBOTH be the set of voxels labeled as hippocampus by both [B with circumflex] and B; set V[B with circumflex] has voxels labeled as hippocampus by [B with circumflex] but not [B with circumflex]; and set VB consists of voxels labeled as hippocampus by B but not [B with circumflex] (Sets VBOTH, V[B with circumflex], and VB are labeled in red, dark gray, and light gray in Figure 3d). The overlap ratio measures the degree of overlap between the automated and manual segmentations, specifically:

Figure 3
Evaluating consistency between masks using overall and sectional overlap. A ground-truth subject mask and estimated subject mask are shown in light and dark gray. Figure 3d) : Voxels in red overlap between the ground-truth and the estimate. Overlap ratio ...

In other words, the overlap ratio measures the percentage of the combined volumes of [B with circumflex] and B that are both labeled as hippocampus. When [B with circumflex] and B overlap perfectly, or(B, [B with circumflex])=1; when the masks do not overlap at all, or(B, [B with circumflex]) = 0. We note that several authors have quantified manual-automated segmentation agreement using criteria similar to the overlap ratio [10] [22] [37] [24].

Overlap ratio was computed over the entire hippocampus. Furthermore, to characterize automated segmentations in terms of hippocampal sub-regions, we divided the hippocampus into sections and computed performance measures over the voxels in each section. Consider a bounding box (xmin, xmax, ymin, ymax, zmin, zmax) around all the hippocampus voxels in [B with circumflex] and B (i.e., the x coordinates of all voxels in VBOTH [union or logical sum] V[B with circumflex] [union or logical sum] VB are between xmin and xmax, etc.). For each of the three cardinal directions, we partitioned the estimated and ground-truth hippocampi into k sections along that direction and computed overlap ratios in each of the sections. That is, for all i from 1 to k we computed or(Bix,B^ix), where


for all other voxels. Similarly, we compute or(Biy,B^iy) and or(Biz,B^iz) for all i from 1 to k. See Figure 3e for an illustration. Figure 4 suggests that since the hippocampi all have similar gross orientations in the image, the sections can be interpreted as corresponding to rough anatomical regions on the hippocampus. For example, if we cut the shown hippocampi into sections using vertical lines, the sections to the left correspond to posterior regions, and sections to the right correspond to anterior regions.

Figure 4
Points on the left hippocampus in all 19 MCI subjects are shown projected onto the XZ plane of the image. Note that all the hippocampi share the same rough initial orientation in this plane.

Statistical analysis: mixed-effects models

We analyzed the effects of registration method, side of the brain, disease state, manual tracing protocol, and choice of atlas on overlap ratio through mixed-effects statistical models [31] that properly accounted for fixed effects, random effects, and grouping in our data. The fixed effects, including disease state, side of the brain, and registration method, were modeled as additive offsets from a baseline value of the performance measure. Random effects, such as the random sampling of subjects from an overall patient population, were modeled as variance components. Each level of each fixed effect was assigned a coefficient representing the offset it produced from the baseline value. The overall significance of each fixed effect was evaluated through omnibus F tests. Furthermore, we analyzed differences between factor levels- for example, between control, MCI, and AD subjects- by using focused F tests to check for significant differences between their coefficients. Effect size for focused F tests was quantified by the contrast correlation rcontrast [35], which generalizes standard 2-group correlational effect size measures while properly accounting for degrees of freedom. In our analysis, between-group differences refer to differences in model coefficients between two factor levels. Mixed-effects models properly account for the random sampling of subject images and cohort atlas images from overall AD, MCI, and control populations, and properly account for repeated measures. All statistics were performed using R version 1.9.1. Mixed-effects models were fit using maximum likelihood estimation in the nlme package. In order to give multiple views of the complex ways in which overlap ratio varied with respect to fixed effects, we report significance values and effect sizes for between-group tests, as well as box-and-whisker plots that show the median, quartiles, and extreme values of overlap ratio within groups.

Experiments evaluate the degree to which segmentation results varied with respect to disease state, registration algorithm, atlases, manual tracings, and side of the brain. At the core of the experiments is the following sequence of actions:

  1. Registering an atlas image to a subject image
  2. Using the resulting geometric transformation to transfer manually-labeled left and right hippocampus masks from the atlas image to the subject image
  3. Evaluating the consistency between the resulting subject mask estimates and ground-truth manual tracings

We refer to the execution of these actions for a particular choice of atlas image, subject image, registration algorithm, and manual tracings as a segmentation trial. Our experimental results were obtained by performing a series of trials through which each of these 4 factors is varied systematically. In particular, for both of our standard atlases, we ran one trial for each possible combination of the 7 registration algorithms, 54 subject images, and 2 sets of manual tracings supplied with the atlas. For each disease state, and for each registration algorithm, we ran one trial for each possible cohort atlas image and subject image within the disease group.


Cohort atlases

For cohort-atlas-based segmentation, we fit a mixed-effects model in which disease state, side of the brain, and registration method were fixed effects; the subject and cohort atlas identity were random effects; and the performance measures were the dependent variables. The overall effects of side, disease, and method on overlap ratio were statistically significant (p < .0001, p = .0192, p < .0001).

Box plots showing how overlap ratio varies with disease state, side of the brain, registration method, and registration method category, are shown in Figure 5. Effect sizes and p values are shown in Table 1. Overlap ratio was significantly lower in AD compared to MCI and control groups, although no significant difference in overlap ratio was seen between MCI and control groups. No significant difference existed between the FLIRT affine and AIR affine methods. For all other pairs of methods, significant (but in many cases slight) differences in overlap ratio existed. The methods, ranked in decreasing order of overlap ratio, were as follows: Chen fully-deformable, AIR semi-deformable, Chen semi-deformable, SPM affine, SPM semi-deformable, FLIRT affine, AIR affine.

Figure 5
Overlap ratio as a function of disease state, registration method category, and side of the brain for the 54 images using cohort atlases.
Table 1
p Values And Effect Sizes For Differences in Manual-Automated Overlap Using Cohort Atlases

Comparing fully-deformable, semi-deformable and affine methods

We grouped the registration methods into fully-deformable, semi-deformable, and affine categories and fit a mixed-effects model in which the fixed effects were the method category, disease state, and side of the brain; subject and atlas identity were random effects (see Figure 5 and Table 1). Fully-deformable methods had significantly higher overlap ratio than semi-deformable and affine methods. In turn, semi-deformable methods had significantly higher overlap ratio than affine methods, although the effect size was not as pronounced as in the comparison between fully-deformable and semi-deformable categories.

Standard atlases and manual atlas tracings

For standard-atlas-based segmentation, we fit a mixed-effects model in which the fixed effects were the standard atlas (Harvard or MNI), the source of the manual atlas tracings (R1 or Harvard/MNI), side of the brain, disease state, and registration method; subject identity was a random effect; and the performance measures were dependent variables. Figure 6 plots the overlap ratio as a function of atlas image and manual atlas tracing, registration method, side of the brain, and disease state. Results based on R1-traced atlas tracings are referred to as “Harvard By R1” and “MNI By R1”; results based on manual atlas tracings provided by the atlas institution are referred to as “Harvard By Harvard” and “MNI By MNI” respectively.

Figure 6
Overlap ratio as a function of disease state, registration method, manual tracing, and side of the brain for the 54 images using standard atlases.

Overlap ratio was significantly higher for R1-traced manual atlas tracings than hippocampi traced by the atlas institution and was significantly higher for right sides of the brain compared to left. No significant difference in overlap ratio was seen between the MNI and Harvard atlases. Overlap ratio was significantly lower for AD subjects than MCI subjects and controls, but no significant difference was seen between the MCI and control groups. The registration methods, ranked in decreasing order of overlap ratio, were: Chen fully-deformable, AIR semi-deformable, Chen semi-deformable, SPM affine, AIR affine, SPM semi-deformable, FLIRT affine. The difference in overlap ratio between the SPM semi-deformable and FLIRT affine methods was not statistically significant, nor was the difference in overlap ratio between the Chen semi-deformable method and AIR semi-deformable method. Differences in overlap ratio between all other pairs of methods had statistically significant p values, although in some cases the effect sizes were not large.

Cohort atlases vs. standard atlases

We directly compared cohort-atlas-based segmentation to standard-atlas-based segmentation using the Chen fully-deformable registration method, which had shown the highest segmentation performance in experiments described above. We fit a mixed-effects model in which the atlas (MNI, Harvard, or cohort atlas), human tracer (R1 or the atlas institution), side of the brain, and disease state were fixed effects, subject identity was a random effect, and the dependent variable was the overlap ratio. Figure 7 plots the overlap ratio for the standard atlases and cohort atlases in this model, along with p values and effect sizes. The mean overlap ratio was significantly higher for cohort-atlas-based segmentation than standard-atlas-based-segmentation using manual atlas tracings by R1 along with the MNI or Harvard atlas images. Performance measures for standard atlases using manual atlas tracings from the atlas institution were significantly worse in each case.

Figure 7
Left: Overlap ratio for cohort-atlas-based and standard-atlas-based segmentation using Chen’s fully-deformable registration method. Right: p values and the contrast correlation rcontrast for F tests between factor levels in the mixed-effects model. ...

Manual-automated agreement and manual-manual agreement

For the six multiply-manually-traced subjects, we fit a mixed-effect model with overlap ratio as the dependent variable, the type of agreement (manual-manual or manual-automated) and side of the brain as fixed effects, and subject identity as a random effect. Manual-manual agreement was not significantly higher than manual-automated agreement in terms of overlap ratio, although we saw a trend toward slightly higher manual-manual agreement (p = .0916, rcontrast = .264). Box plots comparing the distribution of overlap ratio for manual-manual and manual-automated agreement are shown in Figure 8.

Figure 8
Overlap ratio between manual and automated segmentations (automatic vs. manual) and between pairs of manual segmentations (manual vs. manual).

Sectional results

Figure 9 shows a representative plot of automated-manual mean overlap ratios and manual-manual mean overlap ratios for hippocampal sections taken along the three cardinal directions of our data set. Results are shown for cohort-atlas-based segmentation using the Chen fully-deformable registration method on the right hippocampus in MCI images; however, similar distributions of overlap are seen for both sides of the brain, all registration methods, and all disease states (see [3] for detailed plots). The three cardinal directions correspond roughly to the posterior-anterior, medial-lateral, and superior-anterior hippocampal axes, respectively (see Figure 4). The hippocampal sections most responsible for manual-automated disagreement were located at the extremities of the hippocampus, especially at the superior, inferior, medial, and lateral ends. With the exception of the most extreme sections, mean overlap ratio was generally higher toward the lateral extent of the hippocampus and lower toward the medial extent. Furthermore, with the exception of the most extreme sections, mean overlap ratio was relatively constant with respect to anterior-posterior position. Finally, moving from the superior to inferior extent, mean overlap ratio increased steadily, reached a peak at the central sections, and decreased toward the inferior end. These patterns of manual-automated overlap across sub-regions were similar to patterns of manual-manual overlap on the 6 selected images, although the human raters were relatively more consistent at the lateral extent.

Figure 9
Overlap ratio delineated along posterior-anterior line (left), inferior-superior line (middle), and medial-lateral line (right), for the Chen fully-automated registration method on the right hippocampus in MCI images. Similar patterns of overlap ratio ...


This section summarizes our results in terms of which factors led to higher or lower performance measures in the atlas-based segmentation experiments. A “>” between two factor levels indicates that the overlap was higher for the first factor level compared to the second.

Fully-deformable > semi-deformable ≥ affine

Our results confirm the intuition that methods making use of more highly-deformable geometric transformation models tended to fit the complex shape of the hippocampus more accurately than less-deformable geometric models. This agrees with earlier results that demonstrated that atlas-based hippocampus segmentation based on other highly-deformable registration methods can produce hippocampal volumes consistent with expected disease-related atrophy effects (see, e.g., [9] [14] [19]). We believe that the AIR semi-deformable technique performed better than competing semi-deformable methods because the “deformability” of its geometric transformation- i.e., the degree of its polynomial basis- was allowed to gradually increase over the course of optimization, while the geometric transformations for the Chen and SPM semi-deformable techniques were fixed in their spatial structure. Furthermore, as mentioned above, SPM is explicitly biased toward minimally-deforming transformations, which may steer its geometric transformation away from highly-accurate fit of the hippocampal surface. In a related technical report, a statistical analysis of the severity of segmentation errors shows a similar relationship between the performance characteristics of the three registration categories [3].

Human-human agreement ≥ automated-human agreement for fully-deformable registration

Results suggest a general trend toward higher manual-manual agreement compared to manual-automated agreement (see Figures 9 and and8),8), but the differences are not statistically significant. Thus, while there may be room for improvement of the automated methods, Chen’s fully-deformable method can be competitive with the human raters in terms of overlap ratio. These results, together with results from a related study of the severity of automated segmentation errors [3], suggests that automated methods may be competitive for elderly hippocampus segmentation applications, especially those that can tolerate minor errors in spatial localization. These results extend previous findings that atlas-based techniques can be competitive with manual tracing for other subject groups and brain structures (see, e.g., [41] [8] [4] [26]). Furthermore, the automated results present a very promising starting point for further automated refinement by more complex shape-model-based segmentation techniques (for example, [22] [32] [33]).

MCI ≈ controls > AD

Overall performance measures were significantly lower among AD subjects than MCI or control subjects. One possible explanation for these results is that the degenerative proccesses of AD made image registration inherently more difficult and ambiguous by reducing tissue contrast and/or inducing a high degree of variability in the geometric characteristics of brain structures such as the hippocampus. Another possible explanation is that registering pairs of AD images was no more or less difficult than registering MCI or elderly control brains, but that standard software packages are not optimized for the task. Similarly, the fact that overlap ratios for MCI and control cases were similar could suggest that their image characteristics do not differ so significantly that they affected registration.

To further investigate how performance differences between disease groups were modulated by other algorithmic factors, we fit mixed-effects models similar to those described above, but with additional terms to model interactions between disease states and concurrent algorithmic factors. Specifically, for cohort-atlas-based segmentation, there were fixed effect terms in the model for disease state, side of the brain, registration method, and the interaction between disease state and registration method. For standard-atlas-based segmentation, we included fixed effect terms for the standard atlas, source of the manual atlas tracing, side of the brain, disease state, registration method, interaction between disease state and registration method, and interaction between disease state and standard atlas. For comparing standard atlases to cohort atlases using fully-deformable registration, fixed effects were the atlas, human tracer, side of the brain, disease state, and interaction between disease state and atlas. Cohort-atlas-based segmentation performance was significantly higher for control subjects with the SPM affine (p = .0232. rcontrast = .0206), AIR semi-deformable (p = .0035. rcontrast = .0265), and SPM semi-deformable methods (p < .0001. rcontrast = .0524); standard-atlas-based performance was higher in control subjects with the AIR semi-deformable (p = .0207. rcontrast = .0426) and SPM semi-deformable methods (p = .0128. rcontrast = .0458), and lower in MCI subjects with the Chen semi-deformable method (p = .0463. rcontrast = .0367); and no interaction terms were significant in the model comparing cohort-atlas-based to standard-atlas-based segmentation using the Chen fully-deformable method. While the effect size is relatively low for each interaction term, these results suggest that performance differences between AD, MCI and control groups are attributable more to AIR and SPM registration methods than other registration methods or atlas choices.

Right > left

In terms of automated segmentation performance, a striking bilateral asymmetry was seen all experiments, across all three disease groups. These results echo the slight bilateral asymmetry in atlas-based hippocampus segmentation results shown by Duchesne et al. [13]. However, a mixed-effects model fit to solely manual-manual agreement data did not show a statistically significant bilateral asymmetry in manual-manual overlap ratio (p = .12). Our initial calculations of hippocampal volumes did not show a significant volume asymmetry, echoing the findings of Bigler et al. [1]. Further investigation is needed to explain this bilateral effect.

Cohort-atlas-based ≥ standard-atlas-based

Results from our mixed-effects models suggest that randomly selecting cohort atlas images from a population leads to higher automated segmentation performance than standard-atlas-based segmentation, independent of differences in manual segmentation protocols between institutions. This confirms our intuition that differences in brain morphology and image acquisition characteristics between young, healthy atlas-image brains and elderly, diseased subject-image brains can negatively impact performance of standard-atlas-based segmentation. In particular, differences in brain structure between the young, healthy individuals scanned for standard atlas images and the elderly subjects in our study could pose additional challenges to accurate image registration and segmentation. Future work should investigate the ways in which discrepancies in morphology, image acquisition parameters, and scanning equipment impact atlas-based segmentation results.

Posterior ≈ anterior, lateral > medial, center > periphery

Segmentation errors were evenly distributed between posterior and anterior regions of the hippocampus, were more concentrated in the medial regions than the lateral regions, and were generally more highly concentrated toward the periphery than the center. One possible reason for the medial skew in errors is that CSF forms part of the lateral boundary of the structure over its entire anterior-posterior extent, while in some regions, the medial boundary consists entirely of subtle, ambiguous interfaces with other gray-matter compartments. We suggest that the sharp contrast between gray matter and CSF forms a strong visual cue that the automated methods take advantage of to more accurately localize the lateral boundary. Interestingly, our finding that agreement between pairs of human raters did not vary significantly along the anterior-posterior direction except at the extreme periphery contrasts with the inter-rater consistency maps shown by Thompson et al. [39], which suggest that manual tracings are relatively more consistent in the anterior sections. A possible explanation for this discrepancy is that the consistency measure of Thompson et al. is based on agreement between raters in radial distances from the medial axis of the hippocampus to its surface, and therefore could be less sensitive in anterior sub-regions where radial distances are relatively large.

Manual tracing protocols add significant variability

Geuze et al. recently described a dizzying array of existing methods for manually tracing the hippocampus in MR [18]. Our results (see Figure 6 indicate that discrepancies between these manual protocols can add a highly significant source of variation to what portion of the brain can be expected to be labeled as hippocampus, both in manual tracings and atlas-based automated methods. We emphasize that we are not suggesting that the manual segmentation protocol used by R1 is superior or inferior to those employed for the Harvard or MNI atlases; rather, we have showed that variations in the resulting hippocampi can be significant. Therefore, we recommend that researchers using standard atlas images for atlas-based segmentation should examine the manual tracings and tracing protocols closely to be sure the delineation conventions employed match those of their own laboratory. If they do not, our results have shown that tracing the structure on the standard atlas image or a randomly-selected subject image leads to automated segmentations whose agreement with expert manual segmentations is competitive with manual-manual agreement.


Atlas-based segmentation is a simple, automated method for structure segmentation that can use public-domain tools to produce reasonable structure delineations in images of elderly controls and subjects with MCI and AD. While additional work may be needed to make these automated techniques truly competitive with expert human raters, their performance may be acceptable for image proccessing applications that can tolerate a small amount of hippocampus localization error. While standard digital atlases from MNI, Harvard, and other institutions allow investigators to apply atlas-based segmentation to their subject images with no need for manual labeling, care must be taken to insure that hippocampus tracing protocols from the atlas institution coincide with those of the investigator.

Table 2
p Values And Effect Sizes For Differences in Manual-Automated Overlap Using Standard Atlases


This work was supported by NIH grants NS07391, MH064625, AG05133, DA01590001, MH01077, EB001561, RR019771, RR021813, AG016570,


1While we use the original implementation of Chen’s method, its registration modules are highly similar to the FEM-based and Demons registration modules of the freely-available ITK software package [46]


1. Bigler Erin D, Tate David F, Miller Michael J, Rice Sara A, Hessel Cory D, Earl Heath D, Tschanz Joann T, Plassman Brenda, Welsh-Bohmer Kathleen A. Dementia, asymmetry of temporal lobe structures, and apolipoprotein e genotype: Relationships to cerebral atrophy and neuropsychological impairment. Journal of the International Neuropsychological Society. 2002;8:925–933. [PubMed]
2. Bobinski M, Wegiel J, Wisniewski HM, Tarnawski M, Bobinski M, Reisberg B, De Leon MJ, Miller DC. Neurofibrillary pathology-correlation with hippocampal formation atrophy in alzheimer disease. Neurobiology of Aging. 1996;17(6):909–919. [PubMed]
3. Carmichael Owen T, Aizenstein Howard A, Davis Simon W, Becker James T, Thompson Paul M, Meltzer Carolyn Cidis, Liu Yanxi. Technical report. Carnegie Mellon University Robotics Institute; 2004. Atlas-based hippocampus segmentation in alzheimer’s disease and mild cognitive impairment.
4. Chard DT, Parker GJ, Griffin CM, Thompson AJ, Miller DH. The reproducibility and sensitivity of brain tissue volume measurements derived from an spm-based segmentation methodology. Journal of Magnetic Resonance Imaging. 2002 March;15(3):259–267. [PubMed]
5. Chen Mei. PhD thesis. Robotics Institute, Carnegie Mellon University; Pittsburgh, PA: Oct, 1999. 3-D Deformable Registration Using a Statistical Atlas with Applications in Medicine.
6. Chetelat Gael, Baron Jean-Claude. Early diagnosis of alzheimer’s disease: contribution of structural neuroimaging. Neuroimage. 2003;18:525–541. [PubMed]
7. Christensen GE, Joshi SC, Miller MI. Volumetric transformation of brain anatomy. IEEE Transactions on Medical Imaging. 1997 December;16(6):864–877. [PubMed]
8. Collins D, Holmes C, Peters T, Evans A. Automatic 3d model-based neuroanatomical segmentation. Human Brain Mapping. 1995;3(3):190–208.
9. Crum WR, Scahill RI, Fox NC. Automated hippocampal segmentation by regional fluid registration of serial mri: validation and application in alzheimer’s disease. NeuroImage. 2001;13(5):847–855. [PubMed]
10. Dawant BM, Hartmann SL, Thirion JP, Maes F, Vandermeulen D, Demaerel P. Automatic 3-d segmentation of internal structures of the head in mr images using a combination of similarity and free-form transformations: Part i, methodology and validation on normal subjects. IEEE Transactions on Medical Imaging. 1999 October;18(10):909–916. [PubMed]
11. de Leon MJ, Golomb J, George AE, Convit A, Tarshish CY, McRae T, De Santi S, Smith G, Ferris SH, Noz M, et al. The radiologic prediction of alzheimer disease: the atrophic hippocampal formation. American Journal of Neuroradiology. 1993 July–August;14(4):897–906. [PubMed]
12. Dickerson BC, Salat DH, Bates JF, Atiya M, Killiany RJ, Greve DN, Dale AM, Stern CE, Blacker D, Albert MS, Sperling RA. Medial temporal lobe function and structure in mild cognitive impairment. Annals of Neurology. 2004;56(1):27–35. [PubMed]
13. Duchesne S, Pruessner JC, Collins DL. An appearance-based method for the segmentation of medial temporal lobe structures. NeuroImage. 2002;17(2):515–531. [PubMed]
14. Fischl B, Salat DH, Busa E, Albert M, Dieterich M, Haselgrove C, van der Kouwe A, Killiany R, Kennedy D, Klaveness S, Montillo A, Makris N, Rosen B, Dale AM. Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain. Neuron. 2002;33:341–355. [PubMed]
15. Freeborough PA, Fox NC, Kitney RI. Interactive algorithms for the segmentation and quantitation of 3-d mri brain scans. Computer Methods and Programs in Biomedicine. 1997 May;53(1):15–25. [PubMed]
16. Frisoni GB. Structural imaging in the clinical diagnosis of alzheimer’s disease: problems and tools. Journal of Neurology, Neurosurgery, and Psychiatry. 2001 September;70(6):711–718. [PMC free article] [PubMed]
17. Friston KJ, Ashburner J, Frith CD, Poline JB, Heather JD, Frackowiak RSJ. Spatial registration and normalization of images. Annual Meeting of the Organization for Human Brain Mapping; 1995. pp. 165–189.
18. Geuze E, Vermetten E, Bremner JD. Mr-based in vivo hippocampal volumetrics: 1. review of methodologies currently employed. Molecular Psychiatry. 2004 August 31;:1–13. [PubMed]
19. Hogan RE, Wang L, Bertrand ME, Willmore LJ, Bucholz RD, Nassif AS, Csernansky JG. Mri-based high-dimensional hippocampal mapping in mesial temporal lobe epilepsy. Brain. 2004 August;127(8):1731–1740. [PubMed]
20. Jenkinson Mark, Bannister Peter, Brady Michael, Smith Stephen. Technical report. Oxford Centre for Functional Magnetic Resonance Imaging of the Brain; 2002. Improved methods for the registration and motion correction of brain images.
21. Jack CR, Jr, Theodore WH, Cook M, McCarthy G. Mri based hippocampal volumetrics: data acquisition, normal ranges, and optimal protocol. Magnetic Resonance Imaging. 1995;13:1057–1064. [PubMed]
22. Kelemen A, Szekely G, Gerig G. Elastic model-based segmentation of 3-d neuroradiological data sets. IEEE Transactions on Medical Imaging. 1999 October;18(10):828–839. [PubMed]
23. Kikinis Ron, Portas Chiara M, Donnino Robert M, Jolesz Ferenc A, Shenton Martha E, Iosifescu Dan V, McCarley Robert W, Saiviroonporn Pairash, Hokama Hiroto H, Robatino Andre, Metcalf David, Wible Cynthia G. A digital brain atlas for surgical planning, model-driven segmentation, and teaching. IEEE Transactions on Visualization and Computer Graphics. 1996 September;2(3):232–241.
24. Klemencic Jan, Valencic Vojko, Pecaric Nuska. Deformable contour based algorithm for segmentation of the hippocampus from mri. Proceedings of the International Conference on Computer Analysis of Images and Patterns; 2001. pp. 298–308.
25. Kordower JH, Chu Y, Stebbins GT, DeKosky ST, Cochran EJ, Bennett D, Mufson EJ. Loss and atrophy of layer ii entorhinal cortex neurons in elderly people with mild cognitive impairment. Annals of Neurology. 2001 February;49(2):202–213. [PubMed]
26. Leemput KV, Maes F, Vandermeulen D, Suetens P. Automated model-based tissue classification of mr images of the brain. IEEE Transactions on Medical Imaging. 1999;18:897–908. [PubMed]
27. Lopez OL, Becker JT, Klunk W, Saxton J, Hamilton RL, Kaufer DI, Sweet R, Cidis Meltzer C, Wisniewski S, Kamboh MI, DeKosky ST. Research evaluation and diagnosis of probable alzheimer’s disease over the last two decades: I. Neurology. 2000;55:1854–1862. [PubMed]
28. Lopez OL, Becker JT, Klunk W, Saxton J, Hamilton RL, Kaufer DI, Sweet R, Cidis Meltzer C, Wisniewski S, Kamboh MI, DeKosky ST. Research evaluation and diagnosis of probable alzheimer’s disease over the last two decades: Ii. Neurology. 2000;55:1863–1869. [PubMed]
29. Pantel J, Cretsinger K, Keefe H. Hippocmapus tracing guidelines. 1998. Available at:
30. Petersen RC, Smith GE, Waring SC, Ivnik RJ, Kokmen E, Tangelos EG. Aging, memory, and mild cognitive impairment. Int Psychogeriatr. 1997;9(suppl 1):65–69. [PubMed]
31. Pinheiro JC, Bates DM. Statistics and Computing. Springer; 2000. Mixed-Effects Models in S and S-PLUS.
32. Pitiot A, Toga AW, Thompson PM. Adaptive elastic segmentation of brain mri via shape-model-guided evolutionary programming. IEEE Transactions on Medical Imaging. 2002 August;21(8):910–923. [PubMed]
33. Pizer SM, Fritsch DS, Yushkevich P, Johnson V, Chaney E, Gerig G. Segmentation, registration, and measurement of shape variation via image object shape. IEEE Transactions on Medical Imaging. 1999 October; [PubMed]
34. Rohlfing T, Brandt R, Menzel R, Maurer CR., Jr Evaluation of atlas selection strategies for atlas-based image segmentation with application to confocal microscopy images of bee brains. Neuro Image. 2004 April;21(4):1428–1442. [PubMed]
35. Rosenthal Robert, Rosnow Ralph L, Rubin Donald B. Contrasts and effect sizes in behavioral research : a correlational approach. Cambridge University Press; Cambridge, U.K: 2000.
36. Schuff N, Amend D, Ezekiel F, Steinman SK, Tanabe J, Norman D, Jagust W, Kramer JH, Mastrianni JA, Fein G, Weiner MW. Changes of hippocampal n-acetyl aspartate and volume in alzheimer’s disease. a proton mr spectroscopic imaging and mri study. Neurology. 1997;49:1513–1521. [PubMed]
37. Shen Dinggang, Moffat Scott, Resnick Susan M, Davatzikos Christos. Measuring size and shape of the hippocampus in mr images using a deformable shape model. NeuroImage. 2002 February;15(2):422–434. [PubMed]
38. Smith S. Fast robust automated brain extraction. Human Brain Mapping. 2002;17(3):143–155. [PubMed]
39. Thompson PM, Hayashi KM, de Zubicaray G, Janke AL, Rose SE, Semple J, Hong MS, Herman D, Gravano D, Doddrell DM, Toga AW. Mapping hippocampal and ventricular change in alzheimer’s disease. NeuroImage. 2004 June; [PubMed]
40. Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N. Automated anatomical labelling of activations in spm using a macroscopic anatomical parcellation of the mni mri single subject brain. NeuroImage. 2002;15:273–289. [PubMed]
41. Warfield Simon K, Robatino Andre, Dengler Joachim, Jolesz Ferenc A, Kikinis Ron. Nonlinear Registration and Template Driven Segmentation. Progressive Publishing Alternatives. 1998;chapter 4
42. Warfield Simon K, Zou Kelly H, Wells William M. Simultaneous truth and performance level estimation (staple): An algorithm for the validation of image segmentation. IEEE Transactions on Medical Imaging. 2004 July;23(7) [PMC free article] [PubMed]
43. Watson C, Andermann F, Gloor P, Jones-Gotman M, Peters T, Evans A, Olivier A, Melanson D, Leroux G. Anatomic basis of amygdaloid and hippocampal volume measurement by magnetic resonance imaging. Neurology. 1992;42(9):1743–1750. [PubMed]
44. Webb J, Guimond A, Eldridge P, Chadwick D, Meunier J, Thirion JP, Roberts N. Automatic detection of hippocampal atrophy on magnetic resonance images. Magnetic Resonance Imaging. 1999 October;17(8):1149–1161. [PubMed]
45. Woods RP, Grafton ST, Holmes CJ, Cherry SR, Mazziotta JC. Automated image registration: I. general methods and intrasubject, intramodality validation. Journal of Computer Assisted Tomography. 1998;22:139–152. [PubMed]
46. Yoo Terry S., editor. Insight into Images. Insight Software Consortium; 2004.
47. Convit A, De Leon MJ, Tarshish C, De Santi S, Tsui W, Rusinek H, George A. Specific hippocampal volume reductions in individuals at risk for Alzheimer’s disease. Neurobiology of Aging. 1997 March–April;18(2):131–138. [PubMed]