|Home | About | Journals | Submit | Contact Us | Français|
As one of the earliest structures to degenerate in Alzheimer’s disease (AD), the hippocampus is the target of many studies of factors that influence rates of brain degeneration in the elderly. In one of the largest brain mapping studies to date, we mapped the 3D profile of hippocampal degeneration over time in 490 subjects scanned twice with brain MRI over a 1-year interval (980 scans). We examined baseline and 1-year follow-up scans of 97 AD subjects (49 males/48 females), 148 healthy control subjects (75 males/73 females), and 245 subjects with mild cognitive impairment (MCI; 160 males/85 females). We used our previously validated automated segmentation method, based on AdaBoost, to create 3D hippocampal surface models in all 980 scans. Hippocampal volume loss rates increased with worsening diagnosis (normal=0.66%/year; MCI=3.12%/year; AD=5.59%/year), and correlated with both baseline and interval changes in Mini-Mental State Examination (MMSE) scores and global and sum-of-boxes Clinical Dementia Rating scale (CDR) scores. Surface-based statistical maps visualized a selective profile of ongoing atrophy in all three diagnostic groups. Healthy controls carrying the ApoE4 gene atrophied faster than non-carriers, while more educated controls atrophied more slowly; converters from MCI to AD showed faster atrophy than non-converters. Hippocampal loss rates can be rapidly mapped, and they track cognitive decline closely enough to be used as surrogate markers of Alzheimer’s disease in drug trials. They also reveal genetically greater atrophy in cognitively intact subjects.
There is a great interest in developing powerful methods to track the progression of Alzheimer’s disease (AD) in the living brain. Clinically, patients develop worsening symptoms over a period of several months to years, with mild memory impairments initially, progressing to debilitating decline in all cognitive domains. The medial temporal lobe structures, including the hippocampus and entorhinal cortex, are among the first to degenerate, and atrophy spreads in a characteristic sequence mirroring the spread of neurofibrillary tangles (Apostolova et al., 2006a,b, submitted for publication) – the cellular hallmarks of AD pathology. The trajectory of atrophy typically spreads from the medial temporal and limbic areas, to parietal association areas, and finally to frontal and primary cortices only late in AD (Braak and Braak, 1991; Thompson and Apostolova, 2007; Thompson et al., 2004).
The hippocampus is the target of many neuroimaging studies of AD (Barnes et al., in press), as it is one of the earliest regions to degenerate structurally. Many studies have used sequential MRI scanning to estimate rates of hippocampal volume loss, either by manually tracing the hippocampus in every scan, or with automated segmentation methods (Khan et al., 2008; Morra et al., 2008b,c, submitted for publication), or by using some combination of manual landmark identification and automated extraction (Chupin et al., 2007; Hogan et al., 2000; Mueller et al., 2007). Several computational anatomy studies have also modeled the hippocampus as a geometrical surface in 3D, plotting statistics at each surface location regarding the rate or significance of atrophy, differences in atrophic rates between groups, or factors that resist or promote degeneration (Bansal et al., 2007; Csernansky et al., 2004; Thompson et al., 2004). Such surface-based approaches can visualize 3D profiles of atrophy in AD populations. As the hippocampus is highly differentiated functionally, selective hippocampal subregions can be identified, such as the CA1 and CA2 subfields, where atrophy occurs first and correlates with early cognitive decline (Csernansky et al., 1998; Thompson et al., 2004; Wang et al., 2003). Surface-based modeling studies have even identified local atrophic patterns that predict future decline (Apostolova et al., 2006b), and have charted the dynamic trajectory of atrophy as it spreads in the hippocampus (Apostolova et al., 2006b, in press, submitted for publication).
A key goal of all AD morphometry methods is to track longitudinal brain changes with high accuracy, providing numerical measures of disease burden that correlate with, or predict, cognitive decline. Many morphometric methods employ maps to track the disease, including tensor-based morphometry (Ashburner, 2007; Hua et al., 2008a,b; Leow et al., 2006; Studholme et al., 2004), cortical thickness mapping (Dickerson et al., in press; Fischl and Dale, 2000; Lerch et al., 2008; Salat et al., 2004; Thompson et al., 2004), ventricular mapping (Carmichael et al., 2007; Chou et al., 2008), voxel-based morphometry (Alexander et al., 2006; Ashburner and Friston, 2000; Baron et al., 2001; Chetelat et al., 2005; Karas et al., 2008; Teipel et al., 2005; Whitwell and Jack, 2005), and large deformation metric mapping (Miller et al., in press). Head-to-head comparisons and meta-analysis of these methods is an objective of several initiatives (Jack et al., 2008a). Optimal MRI-based measures for clinical trials are urgently required, and each method reflects different aspects of the disease process (Barnes et al., in press).
MRI can resolve the hippocampus at millimeter resolution. Even so, extracting the hippocampus from a whole-brain MRI scan is time-consuming, and usually requires the work of an expert. This is a rate-limiting step when one is attempting to associate clinical measures with rates of hippocampal atrophy in hundreds of subjects longitudinally. Some automated hippocampal segmentation methods have already been described (Barnes et al., 2004; Crum et al., 2001; Fischl et al., 2002; Hogan et al., 2000; Powell et al., 2008; Wang et al., 2007; Yushkevich et al., 2006), but it is fair to say that none is yet in wide use.
A promising approach for automated hippocampal segmentation is pattern recognition (Duda et al., 2001). In a prior paper, we developed a pattern recognition approach for hippocampal segmentation on MRI, which was shown to produce accurate, high-quality automated segmentations in 400 subjects from the Alzheimer’s Disease Neuroimaging Initiative (Morra et al., 2008a,b,c). The approach has a high throughput and is efficient to apply, and can be distinguished conceptually from other automated labeling approaches as it uses a powerful variant of AdaBoost, a ‘machine learning’ method that learns features to guide segmentation (Schapire et al., 1998). Our AdaBoost-based methods classify voxels as belonging to the hippocampus based on thousands of features, which are learned from a training set of manually delineated data (typically 20 hand-traced scans; see Morra et al. (2008c) for details). More specifically, we used a cascaded version of the AdaBoost algorithm (Freund and Schapire, 1997) wrapped inside a different algorithm that we call the auto context model (ACM), which updates the statistical prior distributions during the segmentation. Using that approach, our recent cross-sectional study found that 40 scans were sufficient to distinguish AD patients from controls (20 per diagnosis), using hippocampal maps, and we created several spatially detailed maps of associations between diagnosis, genotype, and depression, on baseline hippocampal structure (Morra et al., submitted for publication).
Here we used the same automated segmentation approach to analyze structural changes in the hippocampus over time, using follow-up scans of 490 subjects acquired 1 year apart. These scans were not available at the time of our first study (Morra et al., in press), but now allow us to assess factors that influence rates of degeneration as it progresses over time.
In this paper, we first automatically segment the left and right hippocampus from 490 individuals at both baseline and 1-year follow-up using our automated approach. We then use a statistical mapping approach to create parametric surface maps (Bansal et al., 2007; Csernansky et al., 1998; Styner et al., 2000; Thompson et al., 2004; Wang et al., 2007) that link the 3D profile of atrophy with various covariates of interest.
We had 3 main goals: (1) to map the 3D profile of tissue loss rates (as a percentage per year) in AD, healthy comparison subjects, and in subjects with mild cognitive impairment (MCI), an at-risk group of subjects with five-fold increased risk of conversion to AD in any given year (Petersen, 2000); (2) to plot regions where progressive atrophy was statistically associated with clinical decline on standardized tests, such as the mini-mental state exam (MMSE) (Folstein et al., 1975), and clinical dementia rating (CDR) (Morris, 1993), and (3) to determine whether loss rates in our subjects were associated with factors such as blood pressure measurements, depression scores, educational level, and the ApoE genotype (2, 3, or 4). As it requires more processing to create 3D maps than analyze hippocampal volumes, we also sought to identify what added value maps might provide over simple volumetric measures. For that reason, we present results based on both volumetric analysis and statistical mapping.
The Alzheimer’s Disease Neuroimaging Initiative (ADNI) (Mueller et al., 2005a,b) is a large multi-site longitudinal MRI and FDG-PET (fluorodeoxyglucose positron emission tomography) study of 800 adults, ages 55 to 90, including 200 elderly controls, 400 subjects with mild cognitive impairment, and 200 patients with AD. The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non-profit organizations, as a $60 million, 5-year public–private partnership. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. Determination of sensitive and specific markers of very early AD progression is intended to aid researchers and clinicians to develop new treatments and monitor their effectiveness, as well as lessen the time and cost of clinical trials. The Principal Investigator of this initiative is Michael W. Weiner, M.D., VA Medical Center and University of California – San Francisco.
The maps in this paper are based on baseline and 1-year follow-up scans of 97 AD subjects (49 males/48 females), 245 MCI subjects (160 males/85 females), and 148 healthy control subjects (75 males/73 females). Tables 1 and and22 show the clinical scores and demographic measures for our sample. Based on the available data in the ADNI database at the time of writing (September 2008), we used the largest possible sample that we could assemble who had both baseline and 1-year follow-up scans. To avoid possible issues with combining data across field strengths, we analyzed 1.5 T scan data only. Only a quarter of the subjects enrolled in ADNI are randomized to 3 T scans, so we will compare 3 T and 1.5 T in the future when sufficiently many 3 T follow-up scans become available. We also note that 6-month follow-up scans are available in ADNI, but here we focused on the 1-year follow-ups as we knew from prior work (Thompson et al., 2004) that they would allow sufficient time for deterioration in both cognition and structure for structure–function correlations to be readily established.
All subjects underwent thorough clinical/cognitive assessment at the time of baseline scan acquisition, and again at follow-up 1 year later. As part of each subject’s cognitive evaluation, MMSE was administered to provide a global measure of cognitive status (Cockrell and Folstein, 1988; Folstein et al., 1975) where scores of 24 or less (out of a maximum of 30) are generally consistent with dementia. Two versions of the CDR were also used as a measure of dementia severity (Hughes et al., 1982; Morris, 1993). The global CDR represents the overall level of dementia, and a global CDR of 0, 0.5, 1, 2 and 3, respectively, indicate no dementia, questionable, mild, moderate, or severe dementia. The “sum-of-boxes” CDR score is the sum of 6 scores assessing different areas: memory, orientation, judgment and problem solving, community affairs, home and hobbies, and personal care. The sum of these scores ranges from 0 (no dementia) to 18 (very severe dementia). Memory impairment was assessed via education-adjusted scores on the Wechsler Memory Scale-Logical Memory II (WMS-R LM II; Wechsler, 1987). Subjects with MCI had objective memory loss measured by education-adjusted scores on the WMS-R LM II. AD diagnosis was made according to the National Institute of Neurologic and Communicative Disorders and Stroke and the AD and Related Disorders Association (NINCDS-ADRDA) criteria (McKhann et al., 1984). The normal subjects in our sample had MMSE scores between 25 and 30, a global CDR of 0, a sum-of-boxes CDR between 0 and 0.5 and failed to meet criteria for MCI or AD. The MCI subjects had MMSE scores ranging from 23 to 30, a global CDR of 0.5, a sum-of-boxes CDR score between 0.5 and 5, and mild memory complaints. The AD subjects in our sample had MMSE scores between 20 and 26, a global CDR between 0.5 and 1, and a sum-of-boxes CDR between 1.5 and 9.0. As such, these subjects would be considered as having mild AD. Detailed exclusion criteria, e.g., regarding concurrent use of psychoactive medications, may be found in the ADNI protocol (page 29, http://www.adni-info.org/images/stories/Documentation/adni_protocol_03.02.2005_ss.pdf). Briefly, subjects were excluded if they had any serious neurological disease other than incipient AD, any history of brain lesions or head trauma, or psychoactive medication use (including antidepressants, neuroleptics, chronic anxiolytics or sedative hypnotics, etc.).
The study was conducted according to Good Clinical Practice, the Declaration of Helsinki and U.S. 21 CFR Part 50-Protection of Human Subjects, and Part 56-Institutional Review Boards. Written informed consent for the study was obtained from all participants before protocol-specific procedures, including cognitive testing, were performed.
As previously stated, when using a pattern recognition algorithm, a set of ground truth data must be supplied in order to learn the patterns. In medical image segmentation, we typically use a small set of brains manually segmented by an expert, that represent a variety of hippocampi with features that are likely to be encountered when segmenting other brains. In prior studies of the asymptotic behavior of the segmentation accuracy versus the size of the training sample, we found that a training sample of size ~20 subjects’ scans was sufficient to give good segmentation accuracy, if scans were delineated by a trained rater following a standardized protocol with established inter-rater reliability as set out in Morra et al. (submitted for publication). Since we are performing a longitudinal study of patients of various ages, different sexes, and stages of disease, our training set should encompass as much variation as possible. Therefore, we trained our segmentation algorithm on 21 subjects using scans at both time points – baseline and 1 year (i.e., a total of 42 scans). The subjects were 7 AD, 7 MCI, and 7 healthy controls with age ranges and sex ratios that matched that of all other ADNI subjects. Using inter-rater reliability tests, we also found, in our validation studies (Morra et al., 2008c), that the effect of using different trained raters for training the algorithm is relatively small; intriguingly, the algorithm agrees with human raters (even those who were not involved in training it) as well as two independent human raters agree with each other, which is a reasonable benchmark for the target performance accuracy of a machine labeling a structure. For the “testing” subjects (i.e., scans not used for training but used as the basis for the results), we had 97 AD subjects (49 men/48 women), 245 MCI subjects (160 men/85 women) and 148 control subjects (75 men/73 women). Table 1 gives an overview of baseline statistics (including cognitive scores, genotype, and other demographics) and Table 2 gives an overview of the follow-up information 1 year later.
The MRI acquisition and pre-processing steps were exactly as in Morra et al., (2008b), but they are summarized here for completeness. All subjects were scanned with a standardized MRI protocol, developed after a major effort evaluating and comparing 3D T1-weighted sequences for morphometric analyses (Jack et al., 2008a; Leow et al., 2006).
High-resolution structural brain MRI scans were acquired at multiple ADNI sites using 1.5 T MRI scanners manufactured by General Electric Healthcare, Siemens Medical Solutions, and Philips Medical Systems. ADNI also collects data at 3.0 T from a subset of subjects, but, as noted earlier to avoid having to model field strength effects in this initial study, only 1.5 T images were used. All scans were collected according to the standard ADNI MRI protocol (http://www.loni.ucla.edu/ADNI/Research/Cores/index.shtml). For each subject, two T1-weighted MRI scans were collected using a sagittal 3D MP-RAGE sequence. Typical 1.5 T acquisition parameters are repetition time (TR) of 2400 ms, minimum full excitation time (TE), inversion time (TI) of 1000 ms, flip angle of 8°, 24 cm field of view, acquisition matrix was 192×192×166 in the x-, y-, and z-dimensions yielding a voxel size of 1.25×1.25×1.2 mm3 (Jack et al., 2008a). In-plane, zero-filled reconstruction (i.e., sinc interpolation) yielded a 256×256 matrix for a reconstructed voxel size of 0.9375×0.9375×1.2 mm3. The ADNI MRI quality control center at the Mayo Clinic (in Rochester, MN, USA) selected the MP-RAGE image with higher quality based on standardized criteria (Jack et al., 2008a). Additional phantom-based geometric corrections were applied to ensure spatial calibration was kept within a specific tolerance level for each scanner involved in the ADNI study (Gunter et al., 2006).
Additional image corrections were also applied, using a processing pipeline at the Mayo Clinic, consisting of: (1) a procedure termed GradWarp for correction of geometric distortion due to gradient non-linearity (Jovicich et al., 2006), (2) a “B1-correction”, to adjust for image intensity non-uniformity using B1 calibration scans (Jack et al., 2008a), (3) “N3” bias field correction, for reducing intensity inhomogeneity (Sled et al., 1998), and (4) geometrical scaling, according to a phantom scan acquired for each subject (Jack et al., 2008a), to adjust for scanner- and session-specific calibration errors. In addition to the original uncorrected image files, images with all of these corrections already applied (GradWarp, B1, phantom scaling, and N3) are available to the general scientific community, as described at http://www.loni.ucla.edu/ADNI. Ongoing studies are examining the influence of N3 parameter settings on measures obtained from ADNI scans (Boyes et al., 2008).
To adjust for global differences in brain positioning and scale across individuals, all scans were linearly registered to the stereotactic space defined by the International Consortium for Brain Mapping (ICBM-53) (Mazziotta et al., 2001) with a 9-parameter (9P) transformation (3 translations, 3 rotations, 3 scales) using the Minctracc algorithm (Collins et al., 1994). Globally aligned images were resampled in an isotropic space of 220 voxels along each axis (x, y, and z) with a final voxel size of 1 mm3. Both baseline and follow-up scans were independently registered to the same ICBM template.
Throughout this paper we use an automated segmentation approach derived from AdaBoost to segment the hippocampus from a whole brain MRI. For a complete description of the algorithm including statistical motivations, please see our previous work (Morra et al., 2008c). We used the exact same feature database, along with the same method, a cascade of AdaBoosts wrapped inside of the auto context model (ACM), and the same model parameters. Fig. 1 gives an overview of AdaBoost, while Fig. 2 gives an overview of ACM.
After segmentation was completed, we created various maps to link different disease-related factors with the rate of hippocampal atrophy over time. This is accomplished by first creating 3D parametric surface maps of each individual’s hippocampus; a medial curve is then threaded down the center of the hippocampus. For each surface model, a medial curve was derived from the line traced out by the centroid of the boundary for each hippocampal surface model (Thompson et al., 2004). The local radial size was defined as the radial distance between each boundary point and its associated medial curve. This radial measure may be interpreted as a local measure of thickness, and has been shown to be sensitive to the volumetric atrophy at different locations along the anterior to posterior axis. Both this method (Thompson et al., 2004) and others (Liu et al., 2008; Shi et al., 2007; Terriberry et al., 2007; Vaillant and Glaunes, 2005; Wang et al., 2007) are currently under development to link radial atrophy with covariates of interest.
The current method has been proven capable of detecting disease-related differences in AD (Apostolova and Thompson, 2007; Thompson et al., 2004), MCI (Apostolova et al., 2007; Becker et al., 2006; Morra et al., 2008c), pre-MCI (Apostolova et al., in press), frontotemporal dementia (Frisoni et al., 2006), and in other neurobiological or neuropsychiatric disorders such as epilepsy, depression, bipolar illness, psychopathy, and autism (Ballmaier et al., 2008; Bearden et al., 2008; Boccardi et al., submitted for publication; Lin et al., 2005; Nicolson et al., 2006; Ogren et al., submitted for publication).
To create spatially detailed maps of atrophic rates, the parametric meshes imposed on each subject’s hippocampal anatomy were compared between baseline and follow-up scans, and at each surface point, a measure of the estimated change in radial distance over time was plotted as a percentage of the baseline value. As in prior work (Thompson et al., 2004), group average maps of hippocampal loss rates were computed by using the parametric grids to associate corresponding anatomy across subjects, pooling data on atrophic rates, by averaging the percentage loss rates across subjects (other definitions of average loss are possible, but this is a reasonable operational definition). These maps were color-coded and visualized separately for each hippocampus and for each group, prior to making statistical comparisons, as detailed next.
Rates of surface contractions over time were statistically compared between groups at equivalent locations using Student’s t-tests (1-tailed), and were correlated with different clinical characteristics including diagnosis, ApoE genotype, cognitive scores, and declines in these scores over the 1-year interval. Because multiple covariates were assessed, we divided them into two categories – those for which prior hypotheses and some prior evidence exist regarding the association (e.g., change in MMSE should link with change in volume), and others that were more exploratory (e.g., blood pressure may be associated with rates of volume loss); for the exploratory correlations, a Bonferroni correction for multiple comparisons was enforced, to avoid inflating the probability of false positive findings.
In each statistical map, a p-value was assigned to each surface point that associates the rate of change (in percent per year) in the radial distance at that point to the covariate of interest. For visualization, the associated p-values describing the uncorrected significance of these statistics were plotted onto geometric average hippocampi created from the surface models of all subjects involved in each statistical test.
Overall p-values, for the spatial pattern of effects observed in each map, were computed using a permutation testing approach. Permutation methods measure the distribution of features in statistical maps that would be observed by accident if the subjects were randomly assigned to groups (Thompson et al., 2003) and provide a p-value for the observed effects that is corrected for multiple comparisons. All our permutation tests are based on measuring the total area of the hippocampus with suprathreshold statistics, after setting the threshold at p<0.01. To correct for multiple comparisons and assign an overall p-value to each p-map (Nichols and Holmes, 2002; Thompson et al., 2003), permutation tests were used to determine how likely the observed level of significant atrophy (proportion of suprathreshold statistics, with the threshold set at p<0.01) within each p-map would occur by chance (Thompson et al., 2003, 2004). The number of permutations N was chosen to be 100,000, to control the standard error SEp of the omnibus probability p, which follows a binomial distribution B(N, p) with known standard error (Edgington and Onghena, 2007). When N=8000, the approximate margin of error (95% confidence interval) for p is around 5% of p; to improve upon this, we ran 100,000 permutations, with 0.01 chosen as the significance level. We prefer to use the overall extent of the suprathreshold region, as we know that atrophy is relatively broadly distributed over the hippocampus, and a set-level inference is more appropriate for detecting diffuse effects with moderate effect sizes at many voxels, rather than focal effects with very high effect sizes (which would generally be better detected using a test for peak height in a statistical map).
When reporting permutation test results, one-sided hypothesis testing was used, i.e., we only considered statistics in which the AD or MCI group showed more rapid atrophy than the controls, in line with prior findings. Likewise, the correlations are reported as one-sided hypotheses, i.e., statistics are shown in the map where the correlations are in the expected direction, e.g., greater atrophy associated with lower MMSE scores, and with higher CDR scores.
Throughout this paper we report whether or not a linkage is detected between the local rate of hippocampal atrophy and different covariates of interest, including diagnosis (normal, MCI, AD), MMSE, global CDR, sum-of-boxes CDR, changes (over 1 year) in these scores, the ApoE genotype, depression severity assessed using the Geriatric Depression Scale (GDS) (Yesavage et al., 1982), systolic and diastolic blood pressure, plasma homocysteine levels, and educational level.
First we performed a volumetric analysis of overall hippocampal volumes. We aimed to confirm that the volumetric loss rate for each group was (1) progressively greater, in the order: control<MCI<AD, and (2) in the expected numerical range based on a recent MRI-based meta-analysis of hippocampal loss rates (Barnes et al., in press).
Fig. 3 shows that, in line with many prior longitudinal studies of AD versus controls (Jack et al., 1998, 2004) and versus MCI subjects (Jack et al., 2000, 2004), hippocampal tissue loss is detected in all 3 diagnostic groups. As noted in Fig. 3, the mean hippocampal loss rate increases with worsening diagnosis (AD=5.59%/year [95% confidence interval, CI: +/−1.44%]; MCI=3.12%/year [95% CI: +/−0.79%]; healthy controls=0.66%/year [95% CI: +/−0.96%]).
When comparing baseline volume to follow-up volume, all groups were losing tissue with statistical significance (p<0.01) except for the normals on the right side, where no significant change was detected. This is not entirely surprising as many prior studies report low or undetectable rates of loss in cognitively intact, healthy controls. These values are marginally higher than those reported in the meta-analysis by Barnes et al. (in press), who included nine studies from seven centers, with data from a total of 595 AD and 212 matched controls. In Barnes et al. (in press), mean (95% CIs) annualized hippocampal atrophy rates were found to be 4.66% (95% CI 3.92, 5.40) for AD subjects and 1.41% (0.52, 2.30) for controls. The difference between AD and control subjects in this rate was 3.33% (1.73, 4.94). These slight differences may depend on the anatomical definition of the hippocampus, or on the severity of dementia in the AD group. Also, the left/right asymmetry in the loss rates proved to be statistically significant (p<0.05) in the normal and MCI group, at least in this sample. Although a left/right difference in rates has been found before, what is surprising is that the asymmetry is reversed in MCI compared to controls (and compared to AD for that matter). This reversal was not hypothesized and requires independent confirmation; if true, it may suggest that the right hemisphere may lag behind the left in its rate of atrophy initially, but then may accelerate in MCI to exceed that on the left.
Next, we examined the correlations between annual hippocampal volume loss and annual change in MMSE, global CDR and sum-of-boxes CDR across the full sample and separately within each diagnostic group (Tables 3 and Table 4). All correlations were highly significant, and in the expected directions. Their absolute values were in the range |r|=0.10–0.23 for all measures. In the full sample with all diagnostic groups combined (N=490), there is a strong correlation between every clinical score and the rate of change in hippocampal volume. Even so, this is not surprising as there is a group difference in atrophic rate and the clinical data strongly correlate with group. It is more useful to know whether these correlations were maintained when the sample is broken down by diagnosis – Table 4 shows that correlations are generally maintained in the non-AD groups. This suggests that progressive hippocampal degeneration is more strongly correlated with clinical scores in individuals who have not yet lost a large proportion of their hippocampus (non-AD subjects). Also, the power to detect these correlations depends on the sample size, and the MCI sample size was almost twice as large as the number of AD subjects.
An active area of AD research focuses on whether the ApoE4 genotype predicts future mental decline, whether the ApoE2 genotype is protective against dementia, and whether the characteristic morphometric signatures of AD (such as profiles of loss rates) differ by genotype. Answering this question has practical value, because a drug may be more effective in patients with a certain genotype, or, it may be easier to detect a drug effect in genetic subgroups for which the annual rate of decline is greater. For all experiments involving ApoE, individuals with the 2/4 genotype (one protective and one high-risk allele) were excluded because it is not clear how to group these subjects. First, Table 5 shows that there is a highly significant correlation between the ApoE4 genotype and the rate of decrease in hippocampal volume, when all individuals are pooled together. This is somewhat expected, as individuals with ApoE4 were more likely to have AD (Table 1). As such, this correlation is not as interesting, as it simply reflects the over-representation of E4-carriers in the AD group. What is more interesting is that this same correlation (on the left side; Table 5, top row) was still detected in the healthy normal group. This suggests that even cognitively intact individuals carrying ApoE4 have faster hippocampal loss. In groups split by diagnosis, E4-carriers lost hippocampal tissue faster than non-carriers, except in the case of the right hippocampus in MCI (Table 4).
It has also been hypothesized that ApoE2 is a protective factor against the development of AD; in prior studies with a related morphometric technique, called tensor-based morphometry, we found that ApoE2 gene carriers – 1/6 of the normal group – showed reduced ventricular expansion, suggesting a protective effect (Hua et al., 2008b). Here we found no significant effects of ApoE2, perhaps due to the low number of individuals with the ApoE2 genotype (Table 6). The association between ApoE2 and left hippocampal loss rates in normals was just outside of trend level (p=0.109), but may be detectable in an even larger sample (Fig. 4).
In addition to severity of disease measures and ApoE genotype, we investigated associations between other clinical factors and hippocampal loss rates. We chose to examine conversion from MCI to AD (treated as a binary variable over the 1-year follow-up interval), the Geriatric Depression Rating scale (GDS) (Yesavage et al., 1982), homocysteine levels, baseline systolic and diastolic blood pressure, and years of education. As multiple measures were tested, these correlations should be regarded as exploratory and were subjected to a Bonferroni-type correction for multiple comparisons. Table 7 shows the correlation between these covariates and atrophic rates in all subjects tested; Table 8 shows them broken down by diagnosis. First, those MCI subjects who converted to AD showed a faster rate of reduction in hippocampal volume on the right side. This effect, only significant on the right side, is consistent with an independent study (Apostolova et al., in press), in which we found that the right CA1 radial distance is predictive (increases the hazards) of conversion to AD in MCI. None of the other correlations were significant in the whole sample; some correlations were detected in the subgroups broken down by diagnosis. Although failure to detect a correlation does not mean that there is no correlation in the overall population, the fact that correlations were not detectable in a sample of N=490 subjects scanned twice suggests that effects of these covariates on hippocampal loss rates is likely to be relatively small, or that a longer follow-up interval, more than 1 year, or a more accurate measurement method, may be required to detect their influence on hippocampal degeneration.
Linking volumetric measurements to clinical covariates is interesting, but fails to give a picture of how the hippocampus is degenerating, and which specific regional changes link with worsening cognition. Therefore in this section we present a complementary analysis to the volumetric measures using 3D statistical maps (p-maps). First, percent change maps are presented, broken down by diagnosis, to show how the hippocampus is changing. Next, we have investigated, in 3D, the correlations between disease severity and other clinical covariates and atrophic rates. The global significance levels for all p-maps, adjusted for multiple comparisons, were established by permutation testing (Table 9).
The first statistical maps to look at are the correlations between baseline and follow up. Fig. 5 shows that for all three diagnostic groups baseline hippocampi were significantly different from follow up, which is to be expected due to the volumetric results of Fig. 3.
First, Fig. 6 shows differences in atrophic rates between the two most diagnostically dissimilar groups (AD and normals); AD patients lost hippocampal tissue more rapidly than normals, in line with our volumetric results (Fig. 3). No other comparisons proved significant by permutation testing, although there was a trend on the left for faster loss rates in AD v. MCI groups (p=0.07; Table 9), suggesting an acceleration in the loss rate between these stages. Consistent with this, AD and MCI groups showed a greater difference in hippocampal volume on the left side than the right side (Fig. 3).
Next, Fig. 7 shows the statistically significant correlations between atrophic rates and various clinical covariates. The four clinical tests (MMSE, global CDR, sum-of-boxes CDR, and change in sum-of-boxes CDR) showed correlations with numerically greater effect sizes on the left side than the right suggesting either that the left hippocampus is changing more as the disease progresses, or that such changes have a greater impact on cognition. When educational level was used as the covariate, years of education were linked with atrophic rates for the right hippocampus only. The correlation between education and hippocampal loss rates was only detected in the p-maps, and not in the overall volumetric analysis, underscoring the fact that volumetric comparisons and p-maps may reveal different linkages, but requiring future confirmation in independent samples.
Our last set of p-maps, in Fig. 8, show hippocampal regions where atrophic rates differed according to the presence of the ApoE4 gene. Only when all subjects are pooled was the map significant (and only on the left side). However, as noted before, this is not an especially meaningful association as ApoE4 carriers are present in higher proportions among those with AD than among controls, so that over-representation in the disease group alone would explain the gene effect on rates of atrophy. What is more interesting is whether ApoE4 carriers show faster atrophic rates if we split up the groups diagnostically – and none of the split maps proved significant. Nevertheless the same area showed genotype effects at the voxel level in each diagnostic split (see the top of the left hippocampal head of both the control group and MCI group). Because this same area appears to be correlated with hippocampal decline in all diagnostic groups, it may serve as a region of interest for independent confirmation in future studies.
We also attempted to show that ApoE2 was a protective factor, but no maps proved significant, perhaps due to the small number of ApoE2 individuals in this study.
To better understand whether the relatively faster atrophic rates in the healthy ApoE4 risk gene carriers was simply because the ApoE4 carriers in the group were more cognitively impaired than the non-carriers, we ran a logistic regression to determine what percentage of the variation in baseline MMSE scores was explainable based on the subjects’ genotype. Intriguingly, in the healthy controls, genotype did not explain a significant proportion of variance in baseline MMSE scores (r=0.049; p=0.58; N=126), nor was there any association between genotype and baseline MMSE scores in the MCI group (r=0.08; p=0.23; N=228) or in the AD group (r=0.015; p=0.88; N=92).
This means that the ApoE4-carrying controls had significantly faster atrophic rates than non-carriers even without any detectable cognitive differences on the MMSE test at the time of the baseline scan. This should underscore the potential relevance of genetic testing in identifying those with genetically accelerated but latent neurode-generation especially at a time when no sign of accelerated decline can be detected on a global cognitive test such as the MMSE.
Here we presented the largest hippocampal mapping study to date (using 980 scans from 490 subjects). There were four main findings. First, our results with an automated analysis technique agreed well with prior studies in smaller samples, performed using manual or other relatively labor intensive methods. We found that the mean hippocampal volume loss rates increased with worsening diagnosis (AD: 5.59%/year; MCI: 3.12%/year; normal: 0.66%/year). Our estimated hippocampal loss rates in AD are a little high compared to some other studies (Du et al., 2003; Jack et al., 2000, 1998), but are nonetheless close to the average rates reported in a recent meta-analysis. Barnes et al. (in press) included nine studies from seven centers, pooling data from a total of 595 AD and 212 matched controls. They found that mean (95% CIs) annualized hippocampal atrophy rates were 4.66% (95% CI 3.92, 5.40) for AD subjects and 1.41% (0.52, 2.30) for controls. Factors that might affect these rates include differences in anatomic delineation criteria — for example, some protocols include the hippocampal tail and others do not. Also, if hippocampal loss rates truly accelerate as the disease progresses, it is plausible that any study of AD patients who are more severely affected may find a greater mean rate of atrophy. Even so, this is not a compelling argument in our case, as our AD patients are relatively mildly impaired and yet show atrophic rates slightly higher than the average rate in the Barnes et al. meta-analysis. Second, all of our change measures were correlated with MMSE, global and sum-of-boxes CDR, and with changes in these measures. All such correlations were in the anticipated directions. The automated measures satisfied a further criterion expected of a reliable measure of disease progression — reliable correlation with cognitive decline. Third, we found that ApoE4 carriers had faster rates of volume loss, both in the full sample of 490, and when only cognitively intact controls were examined. This finding may have a practical benefit for treatment trials, because samples pre-selected to over-represent E4 carriers may show greater interval changes on MRI and may therefore be better powered to detect a statistical reduction in loss rates (Jack et al., 2008b). In addition, one might argue that those carrying the risk gene are better candidates for early intervention as an accelerated disease process is already detectable on MRI, at least at the group level. Even so, the case for using ApoE4 as an enrichment criterion in addition to using information on baseline cognitive deficits also depends on whether the knowledge of a person’s genotype provides added information for predicting atrophic rates relative to what can be predicted from cognition alone; this requires further study. Fourth, of the 245 MCI subjects examined, those who converted to AD during the 1-year evaluation period had greater loss rates than non-converters.
We presented several analyses using both hippocampal volumes and 3D significance maps, to determine whether maps provide more information than simple volumetric summaries. For the main effects, both maps and volumes gave similar results, for example regarding the changes over time in all 3 diagnostic groups, and the correlations with MMSE, CDR scores and interval changes in the scores. The map data also showed similarly localized regions of ongoing hippocampal loss in controls, MCI and AD groups (Fig. 5).
The cases where maps and volumes gave different answers were typically when detecting effects of subtle factors that were associated with loss rates with borderline significance. In the full sample, a higher educational level was associated with slower right hippocampal loss rates (p=0.0051), an effect detected in the maps but not in the volumetric analysis. This significance level is borderline when corrected for multiple comparisons (0.0051×10=0.051), so it requires independent replication.
ApoE4 gene carriers showed faster loss rates in the volumetric analysis, even in controls, but these effects did not reach the corrected significance threshold in the maps (after correction for multiple comparisons by permutation testing). Even so, local effects of genotype were consistently found in the same regions in controls, MCI and AD groups, providing a possible search region for future studies of gene effects.
A major surprisewas that the maps comparing atrophy rates between diagnostic groups did not differentiate MCI from controls. The volumetric analyses showed that atrophy rates increased in the order CTL, MCI, AD, and all groups showed progressive loss (except for normals on the right side). While both the volumetric and the 3D analyses showed that AD subjects have significantly greater atrophy rates vs. controls, only the volumetric data showed a difference between MCI and controls. While maps tend to outperform volumetric summaries when detecting effects that are relatively concentrated or localized, as the most affected subregions may show higher effect sizes than an overall volumetric measure, maps may not outperform overall numerical summary measures when the effects are relatively diffuse. Numeric summaries such as hippocampal volume often involve an implicit averaging that may counteract any highly localized boundary segmentation errors, which can deplete the power of maps in the same regions. In addition, the inability to distinguish MCI from controls based on the maps alone may reflect the clinical heterogeneity of the MCI group, as a combination of various conditions in a sample likely yields a regionally diffuse mean atrophy pattern for the group as a whole. This highlights the need for both volumetric measures and map-based statistical analysis.
To obtain the most powerful statistics from map-based statistical analysis, it may be beneficial in the future to use these thresholded maps as search regions of interest for factors that influence the rates of atrophy. By focusing on the regions that are changing the most, it may be possible to statistically define surface subregions of interest in training samples to develop more powerful or precise measures of hippocampal atrophy in future independent samples. In this sense, this is the same logic as using AdaBoost to label the hippocampus, by combining classifiers with the greatest capacity for error reduction.
Some comparison is warranted between this approach and other methods for hippocampal mapping. Other hippocampal mapping studies by our group have contrasted the atrophy patterns in Lewy Body dementia, vascular dementia (Scher et al., in press), amnestic MCI (Becker et al., 2006), fronto-temporal dementia (Frisoni et al., 2006), and in those at genetic risk for AD (Boccardi et al., 2004) revealing morphometric signatures characteristic of each condition. These prior reports used the same surface parameterization methods and radial distance measures as were used in this report (statistics of radial atrophy), but the studies relied on manual tracing rather than automated segmentation, which greatly accelerates the rate at which scans can be analyzed.
Our work is related to that of Styner, Gerig, and Yushkevich on M-reps (medial representations) (Styner et al., 2003, 2005; Yushkevich et al., 2005), in which a medial curve is defined through a structure, and distances of boundary points to the centerline are examined. The radial atrophy mapping technique is robust to small rotational or translational errors in registering the images across time, as the radial distances are always measured with respect to a central line threading down the center of the structure. Other methods, such as voxel-based morphometry, for example, may incorrectly pick up global shifts of the hippocampus as compressions or expansions inside the hippocampus, as the nonlinear deformations that register structures are spatially regularized (smooth) and may not be precise enough to register the hippocampal boundaries exactly. As radial atrophy is measured with respect to the medial curve, the distance measure reflects the thickness of the hippocampus in a given section, and it is not exactly the same as a volume difference. For example, any anterior-to-posterior shortening of the hippocampus would be detected by a volume measure, but the radial distance measure is not sensitive to this type of change — it only measures the radial thickness of the structure relative to a centerline. This may in general be considered a benefit rather than a limitation, as there are some variations across normal subjects in the anterior–posterior extent of the hippocampus, and these variations will be discounted by the radial mapping approach and will not be a source of confounding variance. Even so, discounting information on the anterior–posterior extent of the hippocampus may not be beneficial when measuring within-subject rates of change, unless the benefit of the information is outweighed by variations in segmentation accuracy across time that tend to cause errors in reproducibly defining the posterior limit of the hippocampus.
A related approach using large-deformation diffeomorphic metric mapping (Joshi and Miller, 2000) has been used to deform labeled anatomical templates of the hippocampus onto new images, using a combination of manual landmarking of points on the hippocampus and 3D fluid image registration (Csernansky et al., 2000; Haller et al., 1996; Wang et al., 2007). The surface of the hippocampus was parcellated a priori on a neuroanatomical template into three zones approximating the locations of underlying subfields, and Large Diffeomorphic Metric Mapping (LDDMM) was used to generate the hippocampal surfaces of all subjects and to register the surface zones across subjects. In a cross-sectional study, Wang et al. (2006) found that inward deformities of the hippocampal surface in proximity to the CA1 subfield and subiculum may be used to distinguish subjects with questionable AD from nondemented subjects. In a longitudinal study similar to ours, Wang et al. (2003) used LDDMM to analyze hippocampal change over time in 18 subjects with questionable AD (CDR 0.5) and 26 age-matched nondemented controls (CDR 0) scanned 2 years apart. In CDR 0 subjects, they observed shape changes between baseline and follow-up largely confined to the head of the hippocampus and subiculum, while in the CDR 0.5 subjects, shape changes involved the lateral body of the hippocampus as well as the head region and subiculum. In a subanalysis of 9 subjects from the same sample, who converted from the nondemented (CDR 0) to the questionable AD (CDR 0.5) state, Qiu et al. (2008) found that compared to the non-converters, the lateral aspect of the left hippocampal tail showed inward surface deformation in the converters. With a similar method, Csernansky et al. (2005) found that inward deformation of the left hippocampal surface in a zone corresponding to the CA1 subfield is an early predictor of the onset of DAT in nondemented elderly subjects. This is consistent with our finding of more rapid hippocampal loss rates in MCI converters than non-converters. Surface-based maps may be used in the future to define hippocampal subregions where changes predict imminent cognitive decline or the onset of dementia. Using the radial distance method we have likewise demonstrated that CA1 and subicular atrophy associates with future conversion from MCI to AD (Apostolova et al., 2006b) and from normal cognition to MCI (Apostolova et al., in press).
Related shape modeling studies have involved modeling the hippocampal surface using spherical harmonic functions (SPHARM) (Styner et al., 2004; Thompson and Toga, 1996), and using the coefficients of the harmonic expansion to infer shape differences between dementia patients and controls. In Gutman et al. (2008), we used a support vector machine classifier based on spherical harmonics to classify 49 AD patients and 63 controls with 75.5% sensitivity and 87.3% specificity, with 82.1% correct overall. This approaches the 89–96% classification accuracy of the best diagnostic predictors (Vemuri et al., 2008). Given the proliferation of new MRI-based measures of hippocampal degeneration based on automated surface matching methods (Wang et al., 2005), automated partitions of surfaces (Shi et al., 2007), random field theory on surfaces (Bansal et al., 2007) and machine learning methods (Li et al., 2007), future studies will likely focus on defining which MR-based measures provide optimal statistical power for detecting factors that slow the progression of AD, to optimize the power of future interventional trials.
Data used in preparing this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative database (www.loni.ucla.edu/ADNI). Many ADNI investigators therefore contributed to the design and implementation of ADNI or provided data but did not participate in the analysis or writing of this report. A complete listing of ADNI investigators is available at www.loni.ucla.edu/ADNI/Collaboration/ADNI_Citation.shtml. This work was primarily funded by the ADNI (Principal Investigator: Michael Weiner; NIH grant number U01 AG024904). ADNI is funded by the National Institute of Aging, the National Institute of Biomedical Imaging and Bioengineering (NIBIB), and the Foundation for the National Institutes of Health, through generous contributions from the following companies and organizations: Pfizer Inc., Wyeth Research, Bristol-Myers Squibb, Eli Lilly and Company, GlaxoSmithKline, Merck & Co. Inc., AstraZeneca AB, Novartis Pharmaceuticals Corporation, the Alzheimer’s Association, Eisai Global Clinical Development, Elan Corporation plc, Forest Laboratories, and the Institute for the Study of Aging (ISOA), with participation from the U.S. Food and Drug Administration. The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. Algorithm development for this study was also funded by the NIA, NIBIB, the National Library of Medicine, and the National Center for Research Resources (AG016570, EB01651, LM05639, RR019771 to PT). Author contributions were as follows: JM, ZT, LA, AG, CA, SM, NP, AT, and PT performed the image analyses; CJ, NS, and MW contributed substantially to the image acquisition, study design, quality control, calibration and pre-processing, databasing and image analysis. We thank the members of the ADNI Imaging Core for their contributions to the image pre-processing and the ADNI project.
Conflict of interest
The authors declare that there are no conflicts of interest.