|Home | About | Journals | Submit | Contact Us | Français|
We used a new method we developed for automated hippocampal segmentation, called the auto context model (ACM), to analyze brain MRI scans of 400 subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). After training the classifier on 21 hand-labeled expert segmentations, we created binary maps of the hippocampus for three age- and sex-matched groups: 100 subjects with Alzheimer’s disease (AD), 200 with mild cognitive impairment (MCI) and 100 elderly controls (mean age: 75.84; SD: 6.64). Hippocampal traces were converted to parametric surface meshes and a radial atrophy mapping technique was used to compute average surface models and local statistics of atrophy. Surface-based statistical maps visualized links between regional atrophy and diagnosis (MCI versus controls: p = 0.008; MCI versus AD: p = 0.001), mini-mental state exam (MMSE) scores, and global and sum-of-boxes clinical dementia rating scores (CDR; all p < 0.0001, corrected). Right but not left hippocampal atrophy was associated with geriatric depression scores (p = 0.004, corrected); hippocampal atrophy was not associated with subsequent decline in MMSE and CDR scores, educational level, ApoE genotype, systolic or diastolic blood pressure measures, or homocysteine. We gradually reduced sample sizes and used false discovery rate curves to examine the method’s power to detect associations with diagnosis and cognition in smaller samples. 40 subjects were sufficient to discriminate AD from normal and correlate atrophy with CDR scores; 104, 200 and 304 subjects, respectively, were required to correlate MMSE with atrophy, to distinguish MCI from normal, and MCI from AD.
Alzheimer’s disease (AD) is the commonest form of dementia, affecting over 24 million people worldwide and over 5 million in the U.S. alone (Jorm, et al. 1987). As the disease progresses, there is a relentless decline in memory function, which progresses to impair language, attention, orientation, visuospatial skills, and executive function. Imaging research has focused on detecting the earliest signs of dementia, either by visualizing the characteristic pathology of plaques and neurofibrillary tangles as they accumulate in the living brain (Braskie, et al. 2008), or by measuring the resulting neuronal loss observable on brain MRI as cortical and hippocampal atrophy (Hampel, et al. 2008; Thompson and Apostolova 2008).
As drug candidates that might slow the progression of Alzheimer’s pathology have begun to be developed, the need to develop robust and sensitive imaging methods to quantify progression of Alzheimer’s disease has become increasingly important. To this end, the National Institute of Aging and pharmaceutical industry funded the Alzheimer’s Disease Neuroimaging Initiative, with the goal of developing improved methods based on imaging and other biomarkers, for AD treatment trials. A variety of methods have been used to quantify structural deficits in brain MRI including: region-of-interest measurements (especially of the hippocampus (Frisoni, et al. 1999)), the “boundary shift integral” technique which quantifies differences between two successive co-registered 3D MRIs (Fox, et al. 2000), cortical thickness mapping (Burggren, et al. 2008; Lerch, et al. 2008; Salat, et al. 2004; Thompson, et al. 2004a), voxel-based morphometry (Good, et al. 2001; Whitwell, et al. 2007), and tensor-based morphometry (Hua, et al. 2008; Studholme, et al. 2004; Studholme, et al. 2006; Thompson and Apostolova 2008).
Due to the need to combat the disease before neuronal loss is substantial, scientific interest has also focused on mild cognitive impairment (MCI), a pre-dementia stage that carries a 4-6-fold increased risk of future diagnosis of dementia, relative to the general population (Petersen 2000; Petersen, et al. 2001; Petersen, et al. 1999). At this stage of the illness, molecular pathology and structural atrophy are most prevalent in the hippocampus and adjacent entorhinal cortex. A common biological marker of disease progression in MCI is morphological change in the hippocampus, assessed using volumetric measures (Jack, et al. 1999; Kantarci and Jack 2003) or by mapping the spatial distribution of atrophy in 3D (Apostolova, et al. 2006a; Apostolova, et al. 2006b; Csernansky, et al. 1998; Frisoni, et al. 2006; Thompson, et al. 2004a).
Parametric surface reconstruction and modelling (hippocampal radial mapping) (Scher, et al. 2007b; Thompson, et al. 2004a; Xu, et al. 2008) visualizes the profile of hippocampal tissue loss in 3D, showing that subregional hippocampal atrophy spreads in a pattern that follows the known trajectory of neurofibrillary tangle dissemination (Becker, et al. 2006; Frisoni, et al. 2006). Distinct atrophic profiles have also been associated with those MCI subjects who imminently convert to AD versus those who do not (Apostolova, et al. 2006b), and with other dementias such as Lewy Body and fronto-temporal dementia (Sabattoli, et al. 2008).
For all such studies, isolating the hippocampus in a large number of MRI scans is time-consuming, and most studies still rely on manual outlining guided by expert knowledge of the location and shape of each region of interest (ROI) (Apostolova, et al. 2006a; Du, et al. 2001). To accelerate epidemiological studies and clinical trials, this process should be automated. Some automated systems have been proposed for hippocampal segmentation (Barnes, et al. 2004; Crum, et al. 2001; Fischl, et al. 2002; Hogan, et al. 2000; Powell, et al. 2008; Wang, et al. 2007; Yushkevich, et al. 2006), but none is yet widely used.
Here, we use a fully automated segmentation approach that we recently developed (Morra, et al. 2008a; Morra, et al. 2008b; Morra, et al. 2008c), called the auto context model (ACM), which is based on a well-known machine learning approach AdaBoost (Schapire, et al. 1998). For an overview of the methodology and validation please refer to an in-press paper located at http://www.loni.ucla.edu/~thompson/PDF/JMorra-MICCAI08.pdf (Morra, et al. 2008c). As ACM can accurately segment the hippocampus, we were interested in using it to assess factors that affect hippocampal morphology in a large database. ACM is a pattern recognition based approach (Duda, et al. 2001), and as such it requires a training set and a testing set. For training, we used 21 hand labeled hippocampi, and we tested the approach on 400 other subjects. All results in this paper are only for the 400 testing subjects. After using ACM to segment the hippocampus on our 400 test subjects, we then applied a statistical mapping approach based on parametric surface meshes (Bansal, et al. 2007; Csernansky, et al. 1998; Styner, et al. 2000; Thompson, et al. 2004a; Wang, et al. 2007). We used these statistical maps to answer several biological questions regarding AD. We tested the following hypotheses: (1) that spatial maps would reveal systematic patterns of hippocampal differences between large groups of AD, MCI, and healthy elderly subjects, and factors that affect atrophy; (2) that MCI subjects who deteriorated (and developed) AD during a 1-year follow-up period, would show great atrophy than those who did not; and (3) that there would be great hippocampal atrophy in carriers of the ApoE4 (apolipoprotein E4) allele that confers heightened risk for AD (Roses, et al. 1995), versus non-carriers. To confirm the clinical relevance of these anatomical measures, we hypothesized that there would be correlations between hippocampal atrophy and several widely used measures of brain function (Mini-Mental State Exam, Clinical Dementia Rating global scores, and sum-of-boxes scores). We correlated baseline hippocampal atrophy with subsequent clinical decline over a 1-year follow-up period. To examine factors influencing the disease process, we also correlated hippocampal atrophy with depression severity, educational level, cardiovascular measures (systolic and diastolic blood pressure), and serum measures. Finally, to evaluate the statistical power of our mapping methods and provide practical information for users of this technique, we gradually reduced the sample size to determine how many subjects would be required, in future studies, to detect associations between hippocampal atrophy and each of our clinical, behavioral and genetic measures.
The Alzheimer’s Disease Neuroimaging Initiative (ADNI) (Mueller, et al. 2005a; Mueller, et al. 2005b) is a large multi-site longitudinal MRI and FDG-PET (fluorodeoxyglucose positron emission tomography) study of 800 adults, ages 55 to 90, including 200 elderly controls, 400 subjects with mild cognitive impairment, and 200 patients with AD. The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non-profit organizations, as a $60 million, 5-year public-private partnership. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. Determination of sensitive and specific markers of very early AD progression is intended to aid researchers and clinicians to develop new treatments and monitor their effectiveness, as well as lessen the time and cost of clinical trials. The Principal Investigator of this initiative is Michael W. Weiner, M.D., VA Medical Center and University of California — San Francisco.
All subjects underwent thorough clinical/cognitive assessment at the time of scan acquisition. As part of each subject’s cognitive evaluation, the Mini-Mental State Examination (MMSE) was administered to provide a global measure of cognitive status based on evaluation of five cognitive domains (Cockrell and Folstein 1988; Folstein, et al. 1975); scores of 24 or less (out of a maximum of 30) are generally consistent with dementia. Two versions of the Clinical Dementia Rating (CDR) were also used as a measure of dementia severity (Hughes, et al. 1982; Morris 1993). The global CDR represents the overall level of dementia, and a global CDR of 0, 0.5, 1, 2 and 3, respectively, indicate no dementia, very mild, mild, moderate, or severe dementia. The “sum-of-boxes” CDR score is the sum of 6 scores assessing different areas of cognitive function: memory, orientation, judgment and problem solving, community affairs, home and hobbies, and personal care. The sum of these scores ranges from 0 (no dementia) to 18 (very severe dementia). Table 1 shows the clinical scores and demographic measures for our sample. The elderly normal subjects in our sample had MMSE scores between 26 and 30, a global CDR of 0, a sum-of-boxes CDR between 0 and 0.5, and no other signs of MCI or other forms of dementia. The MCI subjects had MMSE scores ranging from 24 to 30, a global CDR of 0.5, a sum-of-boxes CDR score between 0.5 and 5, and mild memory complaints. Memory impairment was assessed via education-adjusted scores on the Wechsler Memory Scale - Logical Memory II (Wechsler 1987). All AD patients met NINCDS/ADRDA criteria for probable AD (McKhann, et al. 1984) with an MMSE score between 20 and 26, a global CDR between 0.5 and 1, and a sum-of-boxes CDR between 1.0 and 9.0. As such, these subjects would be considered as having mild, but not severe, AD. Detailed exclusion criteria, e.g., regarding concurrent use of psychoactive medications, may be found in the ADNI protocol (page 29, http://www.adniinfo.org/images/stories/Documentation/adni_protocol_03.02.2005_ss.pdf). Briefly, subjects were excluded if they had any serious neurological disease other than incipient AD, any history of brain lesions or head trauma, or psychoactive medication use (including antidepressants, neuroleptics, chronic anxiolytics or sedative hypnotics, etc.).
Throughout this paper we show whether or not a linkage exists between hippocampal morphology and different covariates of interest, including diagnosis (normal, MCI, AD), MMSE, global CDR, sum-of-boxes CDR, change (over one year) in diagnosis, change in MMSE, change in global CDR, change in sum of boxes CDR, the ApoE genotype, depression severity assessed using the GD (geriatric depression) scale (Yesavage, et al. 1982), systolic and diastolic blood pressure, plasma homocysteine level, and educational level. Based on the available data at the time of writing, only baseline scans were analyzed, to maximize the sample size, but predictions of changes in clinical scores during the year after the baseline scan were also assessed, where available.
The study was conducted according to Good Clinical Practice, the Declaration of Helsinki and U.S. 21 CFR Part 50-Protection of Human Subjects, and Part 56-Institutional Review Boards. Written informed consent for the study was obtained from all participants before protocol-specific procedures, including cognitive testing, were performed.
As noted earlier, when using a pattern recognition approach to identify structures in images, two non-overlapping sets of images must be defined, for training and testing (Morra, et al. 2007; Powell, et al. 2008). The training set consists of a small sample of brain images, representative of the entire dataset, which are manually traced by experts. The testing set is a group of brain images that are to be segmented by the algorithm, but have not been used for training the algorithm. Our training set consisted of 21 brain images, from 7 healthy elderly individuals, 7 individuals with MCI, and 7 individuals with AD, and our testing set consisted of 100 healthy individuals, 200 individuals with MCI, and 100 individuals with AD. We chose to train on 21 subjects because this number was sufficient in previous studies that varied the training sample size (Morra, et al. 2007); smaller training sets degraded segmentation performance. We used 400 testing brains, in 3 diagnostic groups whose size matched the expected proportions that the final ADNI sample will have, once all the data has been acquired. Each of the three groups (AD, MCI, and controls) were age- and gender-matched as closely as possible as shown in Table 1.
To visualize the demographics of the dataset used in our experiments, Figure 1 gives an illustration of all subjects in the testing group broken down by age, sex, and diagnosis. The sex distribution with respect to age is comparable, but there is slightly higher variance in age in the MCI and AD groups.
All subjects were scanned with a standardized MRI protocol, developed after a major effort evaluating and comparing 3D T1-weighted sequences for morphometric analyses (Jack, et al. 2007; Leow, et al. 2006).
High-resolution structural brain MRI scans were acquired at multiple ADNI sites using 1.5 Tesla MRI scanners manufactured by General Electric Healthcare, Siemens Medical Solutions, and Philips Medical Systems. ADNI also collects data at 3.0 T from a subset of subjects, but to avoid having to model field strength effects in this initial study, only 1.5 T images were used. All scans were collected according to the standard ADNI MRI protocol (http://www.loni.ucla.edu/ADNI/Research/Cores/index.shtml). For each subject, two T1-weighted MRI scans were collected using a sagittal 3D MP-RAGE sequence. Typical 1.5 T acquisition parameters are repetition time (TR) of 2400 ms, minimum full excitation time (TE), inversion time (TI) of 1000 ms, flip angle of 8°, 24 cm field of view, acquisition matrix was 192×192×166 in the x-, y-, and z-dimensions yielding a voxel size of 1.25×1.25×1.2 mm3 (Jack, et al. 2007). In plane, zero-filled reconstruction (i.e., sinc interpolation) yielded a 256×256 matrix for a reconstructed voxel size of 0.9375×0.9375×1.2 mm3. The ADNI MRI quality control center at the Mayo Clinic (in Rochester, MN, USA) selected the MP-RAGE image with higher quality based on standardized criteria (Jack, et al. 2007). Additional phantom-based geometric corrections were applied to ensure spatial calibration was kept within a specific tolerance level for each scanner involved in the ADNI study (Gunter, et al. 2006).
Additional image corrections were also applied, using a processing pipeline at the Mayo Clinic, consisting of: (1) a procedure termed GradWarp for correction of geometric distortion due to gradient non-linearity (Jovicich, et al. 2006), (2) a “B1-correction”, to adjust for image intensity non-uniformity using B1 calibration scans (Jack, et al. 2007), (3) “N3” bias field correction, for reducing intensity inhomogeneity (Sled, et al. 1998), and (4) geometrical scaling, according to a phantom scan acquired for each subject (Jack, et al. 2007), to adjust for scanner- and session-specific calibration errors. In addition to the original uncorrected image files, images with all of these corrections already applied (GradWarp, B1, phantom scaling, and N3) are available to the general scientific community, as described at http://www.loni.ucla.edu/ADNI. Ongoing studies are examining the influence of N3 parameter settings on measures obtained from ADNI scans (Boyes, et al. 2007).
To adjust for global differences in brain positioning and scale across individuals, all scans were linearly registered to the stereotactic space defined by the International Consortium for Brain Mapping (ICBM-53) (Mazziotta, et al. 2001) with a 9-parameter (9P) transformation (3 translations, 3 rotations, 3 scales) using the Minctracc algorithm (Collins, et al. 1994). Globally aligned images were resampled in an isotropic space of 220 voxels along each axis (x, y, and z) with a final voxel size of 1 mm3.
Hippocampi were traced using anatomical boundaries and landmarks as detailed in Mega et al. (Mega, et al. 2002) and Narr et al. (Narr, et al. 2002) with a validated protocol used in several prior studies (Lin, et al. 2005; Thompson, et al. 2004b). Beginning rostrally at the posterior uncus, the hippocampus was outlined with the superior border defined by the alveus, demarcating the amygdala superiorly from the underlying hippocampus, excluding the parahippocampal gyrus medially and its white matter inferiorly. Progressing posteriorly, the head of the hippocampus exhibits its characteristic digitations with its superior and medial border defined by the temporal horn of the lateral ventricle, the lateral border by the transverse (choroidal) fissure, and the inferior border by the hippocampal sulcus. When the hippocampal sulcus is not open, a line was drawn from its indentation to the temporal horn. Ammon’s horn takes on its typical appearance at the level of the lateral geniculate through the body of the hippocampus. Measurements at this level include the dentate gyrus, cornu ammonis fields 4 through 1, the subiculum, and the alveus and fimbria. The limit between the subiculum and the transentorhinal cortex of the parahippocampal gyrus was defined by a line from the inferior border of the subiculum to the medial edge of the dentate gyrus as it curves into the hippocampal sulcus. The first section that contains a complete view of the cerebral peduncles is defined as the posterior limit of the hippocampal head. The hippocampal body’s posterior limit is defined as the first slice in which all four colliculi are visualized. This process takes about half an hour per scan.
For an indepth overview of ACM with AdaBoost, including mathematical details and validation experiment, please refer to our previous work (Morra, et al. 2008a; Morra, et al. 2008b). Briefly, our algorithm uses a set of expertly hand-labeled segmentations to learn a classification rule for hippocampal versus non-hippocampal regions using a modified AdaBoost method, based on ~13,000 features (image intensity, position, image curvatures, image gradients, tissue classification maps of gray/white matter and CSF, and mean, standard deviation, and Haar filters of size 1×1×1 to 7×7×7). Here the training set consisted of 7 age-and sex matched subjects of each of 3 diagnostic groups (AD, MCI and normal), yield a total of 21 hand-segmented training images. As all brains have been linearly registered to a standard template, the pointwise summation of all the training masks is used to devise a basic shape prior to capture the global shape of the hippocampus. We also included curvature, gradient, mean, standard deviation, and Haar filters of the shape prior as features. During each iteration of ACM - our extension to AdaBoost - the Bayesian posterior distribution of the labeling is fed back in as an input, along with its neighborhood features, as a new feature for AdaBoost to use as a weak learner. In validation studies using a leave-one-out approach and standard overlap and distance error metrics (Morra, et al. 2008a), these automated segmentations agreed well with human raters; any differences are comparable to differences between trained human raters; and our error metrics compare favorably with those previously reported for other automated hippocampal segmentations. For this paper, we only changed 2 parameters, relative to the implementation described in (Morra, et al. 2008a). First, we used 70 iterations of AdaBoost, an AdaBoost cascade depth of three nodes, and four ACM iterations. These changes were mainly made for speed purposes, as training takes less than 12 hours with this formulation (we note that although the training phase is very computer intensive, the segmentation of new hippocampi takes less than a minute per brain).
One problem inherent in any automated segmentation algorithm is how to deal with partial voluming effects. Because the resolution of MRI scans is not sufficient to resolve the exact boundary of the hippocampus, some voxels will contain both hippocampus and a neighboring structure (such as the amygdala or CSF). However, we were not able to cope with these effects because we were focusing on assigning each voxel to a specific tissue class (in the hippocampus or not). If our training set contained probabilities (how much of a given voxel is hippocampus), we could in theory learn a model that incorporates these probabilities, but our training set only consisted of binary labels. One could also derive a post hoc method estimating the volume fraction of hippocampal tissue in partial volumed voxels by removing the thresholding step at the end of our algorithm. Our automated classifier outputs a probability at each voxel (which may be considered as a maximum a posteriori or MAP probability in a Bayesian framework) representing the likelihood that a particular voxel belongs to the hippocampus, and this floating-point value is thresholded. This could be considered as an approximation of the fraction of that voxel containing hippocampal tissue (this obviously only makes sense at the region boundary). If posterior probabilities were retained in the analysis and used for inference, it would require a more in depth study to prove their validity. In other ongoing work (Aganj, et al. 2008), we have examined the effects of using a partial volume classification model on cortical thickness segmentation, rather than a binary classification at the voxel level. Even so, in this study, given that we have only binarized data to train with, we did not examine partial voluming effects.
After all hippocampal segmentations had been performed, we correlated hippocampal shape with different disease-related factors using a map-based statistical analysis. To accomplish this, 3D parametric surface models were constructed from each segmentation, and these models were geometrically averaged across subjects within each diagnostic group. This results in 3D average surface maps for each diagnostic group, and statistical maps relating morphology to different covariates, such as diagnosis, genotype, clinical scores, future decline, etc. We employed a surface averaging approach used in many prior studies (Thompson, et al. 2004a) but we also note that there are many other systems under active development by our group and others (Shi, et al. 2007; Terriberry, et al. 2007; Vaillant and Glaunes 2005; Wang, et al. 2007) to establish pointwise correspondence for subcortical surfaces. Some use automatically defined intrinsic geometric landmarks on the subcortical surface to enforce higher-order correspondences across subjects when averaging anatomy across a group.
To create a measure of ‘radial size’ for each subject’s hippocampus, first a medial curve was computed threading through the hippocampus, and the distance from each surface point to this curve was calculated, providing a measure that is sensitive to local atrophy. As in prior work, regressions were performed to assign a p-value to each point on the surface in order to link radial size to different covariates of interest. Finally the p-maps are presented as color coded average subcortical shapes.
All our permutation tests are based on measuring the total area of the hippocampus with suprathreshold statistics, after setting the threshold at p < 0.01. To correct for multiple comparisons and assign an overall p-value to each p-map (Nichols and Holmes 2002; Thompson, et al. 2003), permutation tests were used to determine how likely the observed level of significant atrophy (proportion of suprathreshold statistics, with the threshold set at p < 0.01) within each p-map would occur by chance (Thompson, et al. 2003; Thompson, et al. 2004a). The number of permutations N was chosen to be 100,000, to control the standard error SEp of omnibus probability p, which follows a binomial distribution B(N, p) with known standard error (Edgington and Onghena 2007). When N = 8,000, the approximate margin of error (95% confidence interval) for p is around 5% of p; to further improve upon this, we ran 100,000 permutations, with 0.05 chosen as the significance level. We prefer to use the overall extent of the suprathreshold region as we know that atrophy is relatively distributed over the hippocampus, and a set-level inference is more appropriate for detecting diffuse effects with moderate effect sizes at many voxels, rather than focal effects with very high effect size (which would be better detected using a test for peak height in a statistical map).
When reporting permutation test results, one-sided hypothesis testing was used, i.e. we only considered statistics in which the AD or MCI group showed greater atrophy than the controls, in line with prior findings. Likewise, the correlations are reported as one-sided hypotheses, i.e. statistics are shown in the map where the correlations are in the expected direction, e.g. greater atrophy associated with lower MMSE scores, and with higher CDR scores. As a post hoc verification, we also confirmed that the maps in the other direction were indeed null (i.e., they do indeed have null CDFs when evaluated using the FDR method).
To assess our method’s power to establish linkages between morphology and disease, we created cumulative distribution function (CDF) plots of the p-values in our subcortical maps. These CDF plots are commonly created as an intermediate step, when using the false discovery rate (FDR) method to assign overall significance values to statistical maps (Benjamini and Hochberg 1995; Genovese, et al. 2002; Storey 2002; Zhu, et al. 2007). As they show the proportion of supra-threshold voxels in a statistical map, for a range of thresholds, these CDF plots offer a measure of the effect size in a statistical map. They also may be used to demonstrate which methodological choices influence the effect size (Brun, et al. 2007; Chiang, et al. 2007; Lepore, et al. 2007).
A CDF of the p-values may be used to assess how well a method can capture a known relationship between anatomy and disease, or to discover new relationships. In a plot of the observed p-values versus those that would be expected under the null hypothesis (of no correlation), the line y = x represents the null distribution, where on average, k% of the map falls below p = k/100, for any k, 0 =< k <= 100. However, large upward inflections of this line typically represent significant relationships, as reflected in the p-maps. In choosing to plot the empirical CDF of p-values, we note that this chosen presentation is the flip of the more common FDR PP plot.
Since it is widely known that reductions in hippocampal volume are associated with declining cognitive function (Jack, et al. 1999), we first examined overall differences in hippocampal volume for each of our three diagnostic groups.
Figure 2 and Table 2 shows that there is a sequential reduction in volume between AD, MCI, and healthy controls, consistent with many prior studies (Convit, et al. 1997). Figure 2 also illustrates that the right hippocampus is slightly larger than the left, reaching significance in the MCI group, which is also in line with previous studies (Jack, et al. 1989; Jessen, et al. 2006; Laakso, et al. 2000a; Laakso, et al. 2000b; Pennanen, et al. 2004; Ridha, et al. 2006; Wang, et al. 2003; Wang, et al. 2006). This shows that the segmentations of the hippocampi from brain MRIs we are working with show the expected profile of volumetric effects with disease progression, and that the segmentation approach is measuring hippocampal volumes accurately enough to differentiate the 3 diagnostic groups, at least at the group level.
Table 3 shows strong correlations between hippocampal volume and (1) MMSE score and (2) sum of boxes CDR. Correlations are high (0.30-0.38) when all subjects are pooled, considering the low p-values and the relatively large sample size. Even when the 3 diagnostic groups are split, MMSE correlations are highest in MCI, and sum of boxes CDR correlations are highest in AD. Also, Table 4 shows the same correlations when controlling for diagnosis, age, and sex. Although the partial correlation values are generally closer to 0, as might be expected when controlling for many variables, the pattern of significance is generally the same. This shows that the detected associations between hippocampal atrophy and MMSE (and sum of-boxes CDR) are not mediated by diagnostic grouping, age, sex, and ApoE status - the associations persist even after adjustment for these other factors.
The next test we performed to validate our segmentations is to make significance maps (p-maps) for covariates with known linkages to hippocampal morphology. For example, it is known that regional brain atrophy in AD is linked with diagnosis, MMSE, global CDR and sum of boxes CDR scores (Hua, et al. 2008; Thompson, et al. 2004a). As this is one of the largest studies of AD to date (400 subjects), effects of potentially subtle covariates should be detectable.
Figure 3 shows the p-maps for each pairwise diagnostic comparison (AD/MCI/Normal), and correlations between atrophy and MMSE scores, global CDR, and sum of boxes CDR scores, as the covariates. These results would be anticipated based on previous studies of HP volume in AD; the level of atrophy is strongly associated with diagnosis (with greatest effects for the AD v. normal comparison), MMSE scores, and CDR scores linkages. The overall significance of these mapping results was confirmed by both the permutation tests of Table 7, and the FDR analysis of Figure 4. The MMSE and CDR maps resemble each other in part because in our sample, MMSE was relatively highly correlated with global CDR scores (Pearson’s r = -0.694) and sum-of-boxes CDR (r = -0.594); and, partly by definition, global and sum-of-boxes CDR scores are highly correlated (r = 0.656).
As diagnosis, MMSE scores, and CDR scores correlated with hippocampal atrophy as predicted (Figure 3, Figure 4, and Figure 5), and because the accuracy of the hippocampal segmentations was satisfactory on independent test data, we went on to explore effects of other more subtle factors that might be associated with atrophy.
Clinical assessments of disease severity (MMSE, CDR) also show associations with morphology that generally have higher effect sizes than categorization based on diagnosis alone. Even so, these maps must be interpreted cautiously, as MMSE and CDR are used to help determine who is control versus MCI versus AD, so significant correlations may simply reflect the diagnostic classification. As such, maps of MMSE effects, for example, are not necessarily revealing regions in which atrophy is selectively associated with MMSE independent of diagnosis, as the possible range of MMSE scores is almost completely determined by diagnosis. We therefore also computed maps of within-group correlations, to assess effects of MMSE, and the two CDR measures (sum-of-boxes and global) within each of the 3 diagnostic groups as shown by Figure 6 and Table 5. This is a much more challenging effect to detect, as it relies on detecting effects of MMSE within a restricted range.
In addition to testing for group differences using univariate statistics, it is also important to adjust for confounding effects of other covariates that might influence the diagnostic and clinical correlations. The best way to do this is to create p-maps while controlling for various other covariates using the general linear model (multiple regression). Figure 7 shows the three diagnostic p-maps and the three clinical score p-maps computed after controlling for effects of age, sex, and ApoE status. The clinical score correlations are also controlled for diagnosis. As shown by the permutation test results from Table 6, the effects of the primary covariate of interest, in each case, remained significant after controlling for the other covariates, showing the robustness of the associations.
One goal of ADNI is to find measures in brain images that might predict future clinical decline, either for therapeutic planning, or to adjust for factors that influence morphology in future clinical trials. To study this, we correlated hippocampal morphology at baseline with subsequent change, over the following 1 year, in MMSE, global CDR and sum of boxes CDR scores. Correlation maps were created to identify any regionally specific hippocampal deformations that might predict future clinical decline. Such associations have already been found for volumetric measures (Devanand, et al. 2007) and using maps to predict conversion, stability, or recovery from MCI (Apostolova and Thompson 2007). In our MCI group, 26 out of 200 subjects converted to AD by one year after baseline. Fewer than 3 subjects had other types of conversions (e.g., control to AD), so we only correlated atrophy with a binary variable denoting conversion from MCI to AD versus remaining diagnosed as MCI.
Figure 8 shows maps revealing some regions in which atrophy appears to correlate with future outcomes, but these maps were not significant overall after multiple comparisons correction via permutation testing (see Table 7). We do not present our FDR analysis of these results, because none crossed the y = 20x line except at the origin. Right hippocampal atrophy at baseline was associated at trend level (p = 0.056) with future changes in the sum of boxes CDR score, but a trend-level finding of this magnitude is likely to occur by chance when assessing several covariates, and would not therefore survive a Bonferroni correction. Even so, if a more liberal threshold is applied to define the suprathreshold region (p = 0.05 at each voxel), right hippocampal atrophy was associated with future decline in the sum-of-boxes CDR score (p = 0.036, corrected). This link requires verification in future samples. No significant correlations were found between baseline atrophy and future conversion from MCI to AD, but this assessment may be underpowered, because only 26 of the 200 MCI subjects had converted one year after their baseline scan. However, one year may not be enough time to observe drastic effects, and the therapeutic goal is to start AD treatment many years before onset (up to ten years beforehand). Hopefully in a longer range study, more predictive effects will be observed.
In healthy elderly subjects, presence of the ApoE4 allele is correlated with future development of AD (Beffert, et al. 1999). In fact, in our sample, while only 32 (32%) of the controls carried at least one copy of the ApoE4 allele, 107 (54%) of the MCI subjects and 66 (66%) of the AD subjects did, meaning that the genotype is less common in those who are healthy. ApoE4 is also associated with differences in cortical morphology on MRI in healthy children (Shaw, et al. 2007), and with ventricular and hippocampal morphology in elderly subjects (Burggren, et al. 2007; Chou, et al. 2007; Pievani, et al. 2007).
Whether there is any direct association between ApoE4 and hippocampal morphology in aging is not yet clear, so we correlated presence of the ApoE4 allele with the level of hippocampal degeneration, both (1) in the entire sample and (2) in subjects who had not yet developed AD (i.e., MCI and control groups).
As shown in Figure 9, there appears to be regional correlations between ApoE4 and morphology locally in the hippocampal head, but these are not significant after permutation testing (see Table 7), either in the non-AD subjects or in the entire sample. FDR analysis also showed no correlation. Negative findings have little value in small samples, but our finding of no ApoE4 effect in a sample of 370 subjects suggests that any direct effect of ApoE4 on morphology must be relatively subtle. This lack of an effect could also result from the fact that it has been shown that the effects of ApoE4 diminish with age (Juva, et al. 2000). Since all of our subjects are relatively old, the ApoE4 effects may no longer be present. Also, effects of ApoE4 may be more readily observed in a longitudinal study. Finally, it may be that manual tracing could pick up an ApoE4 effect that automated tracings may miss; even though the automated method is capable of handling 400 scans efficiently, it may be less sensitive than a manual method applied to the same sample, if it were feasible to trace all of them.
It has been hypothesized that ApoE2 may be a protective allele, guarding against the onset of MCI or AD (Roses, et al. 1995). Even so, we found no appreciable correlation between HP morphology and presence of the ApoE2 allele. In our dataset, only 22 individuals carried the ApoE2 genotype, so its rarity means a statistically significant correspondence would be difficult to observe even in a sample this large.
We also explored some other covariates previously associated with AD. Higher blood pressure (Skoog, et al. 1996), elevated homocysteine levels (Morris 2003), poorer educational level (Stern, et al. 1994), and measures of clinical depression (Ballmaier, et al. 2007; Ballmaier, et al. 2004) have each been shown to be associated with AD. Therefore, we attempted to detect linkages between these factors and regional hippocampal morphology.
Figure 10 shows that there is little evidence of any association between regional HP morphology and blood pressure measures, years of education, and homocysteine levels; none of these maps were significant overall after correction for multiple spatial comparisons. Table 7 shows the overall significance of these maps. The only measure that was significantly associated with atrophy was depression scores; an FDR analysis is shown in Figure 11. Greater atrophy of the right hippocampal head was found in those subjects with higher geriatric depression scores (p = 0.004, corrected), an effect clearly seen in the maps (white colors). The underlying biological explanation for this is not clear, but genotype may be important, since smaller hippocampal volumes in late-onset depression have been linked to the long variant of the promoter region of the serotonin transporter gene (Taylor, et al. 2005). One postmortem study showed that the presence of a lifetime history of depression was associated with increases in AD-related neuropathological changes in the hippocampus (Rapp, et al. 2006). Some studies of hippocampal atrophy in elderly depressed subjects have hypothesized that ischemic small-vessel disease in the hippocampus may be implicated in the pathogenesis of elderly depression (Ballmaier, et al. 2007). Even so, the correlation between hippocampal atrophy and depression may be mediated by concomitant atrophy in other limbic regions, such as the anterior cingulate gyrus, where cortical atrophy has been associated with apathy in AD patients (Apostolova, et al. 2007).
One final avenue that we explored was how many subjects were necessary to detect a statistically significant linkage between diagnosis or clinical scores and morphology. To investigate this, we randomly threw out subjects from our initial sample of 400, yielding groups of size N = 304, N = 200, N = 104, N = 40, and N = 24. These numbers were chosen to preserve the 1:2:1 relationship between normal, MCI, and AD sample sizes, while maintaining the gender balance in all groups. All final sample sizes were therefore multiples of 8, as sets of 1 normal, 2 MCI, and 1 AD subject of each sex (i.e., 8 subjects overall) were successively eliminated, to create smaller samples. For each different N, a random number was used to throw out samples; therefore, smaller samples are not necessarily subsets of the larger ones.
As shown in Figure 12, reducing the sample size, N, generally decreases the detected effect size, as expected. Clinical assessments of disease severity (MMSE, CDR) also show associations with morphology that generally have higher effect sizes than categorization based on diagnosis alone. This is natural as the clinical assessments are continuous rather than categorical measures. Figure 12 also shows how many subjects are necessary to detect a specific effect reliably (i.e., reject the null hypothesis, if FDR is used to control for multiple comparisons). 40 subjects sufficed to discriminate AD from normal groups and to correlate atrophy with CDR clinical scores (global CDR and sum of boxes CDR). 104 subjects were sufficient to correlate atrophy with MMSE, and 200 and 304 subjects, respectively, were required to distinguish MCI from normal and MCI from AD. In general, more extensive regions of statistical association were detected with larger samples.
This study is representative of several current research efforts that use automated methods to measure hippocampal atrophy in AD, including large diffeomorphic metric mapping (Csernansky, et al. 2004; Wang, et al. 2007), volumetric analysis (Geuze, et al. 2005), and fluid registration (van de Pol, et al. 2007). Clinical measures of disease burden have also been correlated with regional hippocampal atrophy in several surface-based mapping studies of AD and MCI (Apostolova, et al. 2006a; Becker, et al. 2006; Frisoni, et al. 2006), Lewy body dementia (Sabattoli, et al. 2007), and vascular dementia (Scher, et al. 2007a), and of conversion between MCI and AD (Apostolova, et al. 2006b; Apostolova and Thompson 2007).
For each surface model, a medial curve was defined as the line traced out by the centroid of the hippocampal boundary (Styner, et al. 2005; Thompson, et al. 2004a). The medial curve was defined separately in each individual, before averaging the surfaces. The operations of averaging surfaces and defining the medial curve from a surface are not commutative, because a medial curve derived from an average surface would not be the same as the average of the medial curves derived from each individual. Because we were interested in measuring radial atrophy in each individual, we computed these measures in each subject with reference to their own medial curve, but plotted the resulting statistics on the average surface for the groups being compared.
One advantage of using the radial distance measure - and computing it in each individual separately - is that it is invariant to overall 3D shifts or translations of the structure. Other methods, such as voxel based morphometry for example, may incorrectly pick up global shifts of the hippocampus as compressions or expansions inside the hippocampus. This is because the nonlinear deformations that register structures across subjects in VBM are spatially regularized (smooth) and may not be precise enough to register the hippocampal boundaries exactly. By contrast, if AD is associated with some shifting of the mean stereotaxic position of the hippocampus, the radial distance measure will not be affected by it, and will only detect localized reductions in volume intrinsic to the hippocampus. This also makes the radial atrophy mapping technique somewhat robust to small rotational or translational errors in registering the images across subjects, as the radial distances are always measured with respect to a central line threading down the center of the structure. The radial distance is a reasonable proxy for volumetric loss, mapping its distribution in 3D, but will not be sensitive to some types of volumetric change. As radial atrophy is measured with respect to the medial curve, the distance measure reflects the thickness of the hippocampus in a given section, which may not reflect an overall volume difference. For example, any anterior-to-posterior shortening of the hippocampus would be detected by a volume measure, but the radial distance measure is not sensitive to this type of change — it only measures the radial thickness of the structure relative to a centerline. This may be considered a benefit rather than a limitation, as there are some variations across normal subjects in the anterior-posterior extent of the hippocampus, and these variations will be discounted by the radial mapping approach and will not be a source of confounding variance.
Although an FDR correction for the number of elements in each p-map is performed, no correction for the number of clinical markers is done. Ideally, we should lower our threshold according to Bonferroni principles. However, none of the permutation tests were close to the 0.05 level, so this is not necessary. In a recent study (regarding brain regions other than the hippocampus), we performed a 5-fold cross-validation to assess the predictive power of morphometric measures, for assessing conversion to AD from MCI over a 6 month period, using truly independent subsamples (Zhang, et al. 2008).
Several prior studies used 3D surface-based maps to visualize the profile of hippocampal atrophy in AD and MCI, but they relied almost solely on manual tracing, which is extremely time-intensive. Becker et al. (Becker, et al. 2006) found greater hippocampal atrophy in AD versus MCI (in a total of N = 66 subjects) with greatest differences in the CA1 and subiculum regions. Frisoni et al. (Frisoni, et al. 2006) (N = 68), found greater atrophy in AD versus controls, with the main differences in the CA1 field and parts of the subiculum, with CA2 and CA3 regions relatively spared. Finally, Apostolova et al. (Apostolova, et al. 2006a) (N = 65) found MCI v. AD differences in CA1 for both hippocampi, and in CA2 and CA3 only on the right.
Another approach to automatically relate Alzheimer’s disease diagnosis and clinical scores with systematic differences in brain structure on MRI is voxel-based morphometry (VBM). VBM has shown promise in several previous studies, including Chetelat (Chetelat, et al. 2005) which tracked gray matter loss in a longitudinal study of 18 MCI patients, Whitwell (Whitwell, et al. 2007), who also showed gray matter loss over three years in 63 MCI subjects, and Good (Good, et al. 2002) compared VBM to ROI analysis and showed that they compare favorably in detecting structural differences in Alzheimer’s disease.
In our maps, the right hippocampal head shows greater atrophy in MCI versus normal groups; the right hippocampal tail shows atrophy only in the AD-MCI and AD-control comparisons. The finding of right anterior hippocampal atrophy in MCI is consistent with findings by de Toledo-Morrell et al. (De Toledo-Morrell, et al. 2000), who observed that the right entorhinal cortex had greatest atrophy in elderly patients. Some investigators have argued that both the EC and hippocampal formation degenerate before the onset of overt dementia, and that EC volume is a better predictor of conversion (Dickerson, et al. 2007).
Correlations with symptoms have also been examined using hippocampal maps. Ballmaier et al. (Ballmaier, et al. 2007) mapped atrophy of the hippocampal head in depressed versus non-depressed elderly controls. Statistical mapping results, confirmed by permutation testing, showed that regional surface contractions were greater in elderly subjects with late-versus early-onset depression in the anterior aspects of the subiculum, and lateral-posterior aspects of the CA1 subfield in the left hemisphere. Similar studies of patients with Lewy body dementia showed a more restricted pattern of hippocampal atrophy than AD patients at a comparable level of cognitive impairment (Sabattoli, et al. 2008). Hippocampal maps may therefore complement cortical maps in revealing the selective atrophic patterns that characterize different types of dementia.
We also aimed to identify regions of hippocampal degeneration that predicted either a change in diagnostic classification or a decline in standard clinical measures of functional decline. Although we correlated atrophy with several measures of subsequent decline, only the 1-year decline in sum-of-boxes CDR scores was close to being linked with baseline HP atrophy (p = 0.056; a post hoc test gave p = 0.036, corrected, but used a less stringent re-thresholding of the data to define suprathreshold statistics). This difficulty in predicting future decline may be due to the shortness of the follow-up interval; other studies also reported difficulties in predicting cognitive decline based on hippocampal volumes alone, especially when concomitant subcortical brain injuries, such as lacunar infarcts were present (Mungas, et al. 2002). By contrast, Apostolova et al. (Apostolova, et al. 2006b) recently used the same surface mapping approach (but based on manual HP segmentation) and predicted subsequent clinical decline in 20 MCI subjects. Over a 2-year follow-up period, 6 patients developed AD, 7 remained stable, and 7 improved. Smaller hippocampi and specifically CA1 and subicular involvement were associated with future conversion from MCI to AD, while MCI patients who improved and no longer met MCI criteria at follow-up tended to have larger hippocampal volumes and their subiculum and CA1 regions were relatively preserved.
In our study, right hippocampal atrophy was associated with depression severity, consistent with several prior studies of elderly depression. In elderly subjects with late-versus early-onset depression, Ballmaier et al. (Ballmaier, et al. 2007) found greater HP atrophy in the anterior aspects of the subiculum, and lateral-posterior aspects of the CA1 subfield in the left hemisphere. In that study, hippocampal surface contractions correlated with memory measures in late-onset depressed patients. Our maps and those of Ballmeier et al. (Ballmaier, et al. 2007) are consistent with most earlier studies (Bell-McGinty, et al. 2002; Hickie, et al. 2005; Lloyd, et al. 2004; O’Brien, et al. 2004; Steffens, et al. 2000), showing smaller hippocampal volumes in elderly depressed patients compared to controls. We detected correlations for only the right HP in this study, consistent with several other reports that have shown differences as being more pronounced for the right than for the left hemisphere (Bell-McGinty, et al. 2002; Steffens, et al. 2000).
Next, we examined other hypothesized correlations between hippocampal morphology and educational level, blood pressure, homocysteine levels, and depression severity. Each of these factors has been associated with AD, but they have not yet been directly linked with hippocampal morphology. Long term high blood pressure has been associated with earlier AD onset (Skoog, et al. 1996), elevated homocysteine levels have been linked with stroke, and to some extent with AD (Morris 2003), a lower level of education has been shown to be associated with dementia (Stern, et al. 1994), and depression commonly accompanies AD (Ballmaier, et al. 2007; Ballmaier, et al. 2004). We showed that a strong linkage is absent in all of these cases except for depression, which has been supported by other studies (Sheline, et al. 1996). Neither ApoE4 nor ApoE2 was linked with morphology, either in those without AD or in the full patient sample, suggesting that it is not a powerful modulator of hippocampal morphology, despite its role as a risk gene for AD that was confirmed to be over expressed in our MCI and AD groups versus the normal controls. The lack of detectable hippocampal differences between ApoE4 groups is also consistent with other cross-sectional MRI studies that employed manual tracing of the hippocampus (Jak, et al. 2007).
Finally, we computed empirically-based estimates of the minimal sample sizes necessary to detect the well-known correlations of hippocampal atrophy with diagnosis, and with clinical test scores. When conventional volumetric measures were used, Jack et al. (Jack, et al. 2003) estimated that in each arm of a therapeutic trial, only 21 subjects would be required to detect a 50% reduction in the rate of decline if hippocampal volume were used as the outcome measure. This compared with 241 subjects if MMSE scores were used and 320 if the AD Assessment Scale Cognitive Subscale (ADAS-Cog) were used.
Here we found that the more subtle MCI state was difficult to distinguish from either AD or normal aging with fewer than 200-300 subjects overall, but that the four other associations (normal v. AD, and correlations between atrophy and MMSE scores, global CDR scores and sum of boxes CDR scores) only required 40 subjects to detect (24 was not sufficient). This finding was unexpected; prior studies, based on manual tracings, required far fewer subjects to differentiate MCI from AD and from controls (Apostolova, et al. 2006a). The manual segmentations previously employed may have produced more accurate hippocampal models, but they were time consuming to create (often taking several weeks or months, making large-scale analyses difficult or prohibitive). In our current study, our automated approach greatly reduced our segmentation time for a large sample (requiring less than one minute of CPU time on a desktop computer). Whether or not this approach would be optimal for very small studies remains to be proven, as the improvement in automation versus the need to trace a small training set of around 20 images may become limiting in very small samples. Another factor that may have led to our higher sample size requirements for group differentiation is that ADNI collects scans at many acquisition sites. Even so, major efforts have been devoted to protocol design and calibration across scanners and sites.
In future, we plan to map the progression of hippocampal atrophy over time, once the follow-up (longitudinal) scans are available for the full ADNI sample. Morphometric correlates of disease progression may be easier to examine with longitudinal imaging data. A further avenue of work will apply this method to help distinguish neurodegenerative patterns between AD other types of dementia. In our recent studies of Lewy body dementia (Sabattoli, et al. 2007) and vascular dementia (Scher, et al. 2007a), we found a more restricted pattern of hippocampal atrophy than AD patients at a comparable level of cognitive impairment.
Data used in preparing this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative database (www.loni.ucla.edu/ADNI). Many ADNI investigators therefore contributed to the design and implementation of ADNI or provided data but did not participate in the analysis or writing of this report. A complete listing of ADNI investigators is available at www.loni.ucla.edu/ADNI/Collaboration/ADNI_Citation.shtml. This work was primarily funded by the ADNI (Principal Investigator: Michael Weiner; NIH grant number U01 AG024904). ADNI is funded by the National Institute of Aging, the National Institute of Biomedical Imaging and Bioengineering (NIBIB), and the Foundation for the National Institutes of Health, through generous contributions from the following companies and organizations: Pfizer Inc., Wyeth Research, Bristol-Myers Squibb, Eli Lilly and Company, GlaxoSmithKline, Merck & Co. Inc., AstraZeneca AB, Novartis Pharmaceuticals Corporation, the Alzheimer’s Association, Eisai Global Clinical Development, Elan Corporation plc, Forest Laboratories, and the Institute for the Study of Aging (ISOA), with participation from the U.S. Food and Drug Administration. The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. Algorithm development for this study was also funded by the NIA, NIBIB, the National Library of Medicine, and the National Center for Research Resources (AG016570, EB01651, LM05639, RR019771 to PT). Author contributions were as follows: JM, ZT, LA, AG, CA, SM, NP, XH, AT, and PT performed the image analyses; CJ, NS, and MW contributed substantially to the image acquisition, study design, quality control, calibration and pre-processing, databasing and image analysis. We thank the members of the ADNI Imaging Core for their contributions to the image pre-processing and the ADNI project.