|Home | About | Journals | Submit | Contact Us | Français|
We used a previously validated automated machine learning algorithm based on adaptive boosting to segment the hippocampi in baseline and 12-month follow-up 3D T1-weighted brain MRIs of 150 cognitively normal elderly (NC), 245 mild cognitive impairment (MCI) and 97 DAT ADNI subjects. Using the radial distance mapping technique, we examined the hippocampal correlates of delayed recall performance on three well-established verbal memory tests – ADAScog delayed recall (ADAScog-DR), the Rey Auditory Verbal Learning Test -DR (AVLT-DR) and Wechsler Logical Memory II-DR (LM II-DR). We observed no significant correlations between delayed recall performance and hippocampal radial distance on any of the three verbal memory measures in NC. All three measures were associated with hippocampal volumes and radial distance in the full sample and in the MCI group at baseline and at follow-up. In DAT we observed stronger left-sided associations between hippocampal radial distance, LM II-DR and ADAScog-DR both at baseline and at follow-up. The strongest linkage between memory performance and hippocampal atrophy in the MCI sample was observed with the most challenging verbal memory test – the AVLT-DR, as opposed to the DAT sample where the least challenging test the ADAScog-DR showed strongest associations with the hippocampal structure. After controlling for baseline hippocampal atrophy, memory performance showed regionally specific associations with hippocampal radial distance in predominantly CA1 but also in subicular distribution.
Alzheimer’s disease (AD) is the most common cause of dementia among the elderly. At the time when dementia of Alzheimer’s type (DAT) can be clinically diagnosed with current criteria, AD pathology has already spread and irreversibly destroyed the brain parenchyma. Thus recent major efforts in AD research have concentrated on the search for disease-associated biomarkers that can reliably identify patients in prodromal DAT (e.g., MCI and pre-MCI) stages and support the expedited evaluation of novel disease-modifying therapies.
MCI is an intermediate state between normal aging and dementia. Amnestic MCI patients suffer from memory impairment while still enjoying functional lifestyles (Petersen, 2007). Most amnestic MCI patients have the pathological hallmarks of AD - neocortical senile plaques, neurofibrillary tangles, atrophy and neuronal loss in layer II of the entorhinal cortex (Jicha et al., 2006a; Price and Morris, 1999), but have not yet progressed sufficiently to meet criteria for DAT. MCI, and more recently pre-MCI, i.e. cognitively normal elderly who progress and develop MCI and DAT in the future (Apostolova et al., 2008), have become an intense scientific focus.
Historically, DAT clinical trials have relied on cognitive and functional outcome measures alone. In recent years, there has been increased interest in developing laboratory and imaging disease biomarkers in addition to cognitive and functional endpoints or even as substitutes for them, i.e., as surrogate markers (Cummings et al., 2007; Thal et al., 2006). Biomarkers are currently the only feasible approach to quantifying disease-associated changes in the pre-symptomatic AD (pre-DAT) stages (Cummings et al., 2007; Dubois et al., 2007).
One major international AD scientific effort, the Alzheimer’s Disease Neuroimaging Initiative (ADNI), was established to collect and evaluate putative clinical, imaging and laboratory AD biomarkers. The ADNI (Principal Investigator: Michael W. Weiner, M.D., VA Medical Center and University of California – San Francisco) is a large multi-site longitudinal MRI and fluorodeoxyglucose positron emission tomography (FDG-PET) study of 200 elderly controls, 400 subjects with amnestic MCI, and 200 patients with DAT (Mueller et al., 2005) (also see http://www.loni.ucla.edu/ADNI and ADNI-info.org).
Hippocampal atrophy remains the best-studied structural AD imaging biomarker to date. Hippocampal atrophy progresses steadily throughout the course of AD (Jack et al., 2000; Jack et al., 1998; Jack et al., 1997) and shows strong correlations with Braak and Braak pathological staging (Bobinski et al., 1997; Bobinski et al., 1995; Schonheit et al., 2004) and verbal memory performance (Apostolova et al., 2006b; de Toledo-Morrell et al., 2000). Hippocampal dysfunction manifests with memory loss (de Toledo-Morrell et al., 2000; Fleischman et al., 2005; Mortimer et al., 2004).
The present study uses ADNI baseline and one-year follow-up cognitive and imaging data from 490 subjects to examine the relationships between three memory tests and hippocampal atrophy. The cognitive portion of the Alzheimer’s Disease Assessment Scale (ADAScog) (Welsh et al., 1994) is one of the most commonly used cognitive instruments in clinical trials in DAT (Doody et al., 2008; Mulnard et al., 2000; Rogers et al., 1998; Wilcock et al., 2000) and MCI (Petersen et al., 2005; Salloway et al., 2004). Its verbal memory portion is adopted from the Consortium to Establish a Registry for Alzheimer’s Disease (CERAD) battery (Welsh et al., 1994) and a delayed recall test has been added. The Rey Auditory Verbal Learning Test (AVLT) and the Wechsler Memory Scale - Logical Memory II test (LM II) are commonly used by neuropsychologists to assess verbal memory and have been extensively validated for use in cognitively normal and demented subjects (Rey, 1964; Wechsler, 1987). LM II also has been utilized as a screening tool in MCI clinical trials (Petersen et al., 2005; Salloway et al., 2004).
The ADNI is a 5-year longitudinal study of 800 adults, ages 55–90, including 400 amnestic MCI, 200 DAT and 200 NC subjects. The current analyses used all subjects with available baseline and 1-year follow-up cognitive and imaging data as of September 2008. The sample consisted of 490 subjects, of whom 97 were diagnosed with DAT, 245 with MCI and 148 who were cognitively normal elderly (NC). Diagnosis of DAT was based on the National Institute of Neurological and Communicative Disorders and Stroke and the AD and Related Disorders Association (NINCDS-ADRDA) criteria (McKhann et al., 1984). DAT subjects had Mini-Mental Examination (Folstein et al., 1975) (MMSE) scores between 20–26 and a Clinical Dementia Rating scale (Morris, 1993) (CDR) score of 0.5–1 at baseline; they may be considered mild DAT patients. All MCI subjects had memory complaints but did not meet criteria for dementia. They scored between 24–30 on the MMSE, had a global CDR score of 0.5 and a memory score of 0.5 or greater on the CDR. In addition, they also exhibited objective memory impairment on LM II. NC subjects did not meet criteria for MCI or DAT. Their MMSE scores were between 24–30 and their global CDR was 0. Subjects were excluded if they refused or were unable to undergo magnetic resonance imaging (MRI). Also excluded were those with other neurological disorders, active depression or history of psychiatric diagnosis, including major depression or alcohol or substance dependence, within the past 2 years, and those with less than 6 years of education or were not fluent in English or Spanish. The full list of inclusion/exclusion criteria may be accessed on pages 23–29 of the online ADNI protocol (see http://www.adni-info.org/images/stories/Documentation/adni_protocol_03.02.2005_ss.pdf).
We used baseline and follow-up delayed verbal recall scores from three previously validated cognitive tests – the ADAScog delayed recall (ADAScog-DR), AVLT 30 minute delayed recall (AVLT-DR) and LM II delayed recall (LM II-DR). These data are freely distributed to interested researchers (see http://www.loni.ucla.edu/ADNI and ADNI-info.org). All three measures were used as continuous variables in our analyses. In addition, we also included global cognitive scores from the MMSE and CDR sum of boxes (CDR-SOB) for comparison in some of the analyses. The 3D hippocampal maps showing these correlations in the full sample have been previously published elsewhere (Morra et al., 2008b; Morra et al., 2008c).
All subjects were scanned with a standardized high-resolution MRI protocol (http://www.loni.ucla.edu/ADNI/Research/Cores/index.shtml) (Jack et al., 2008; Leow et al., 2006). Images were obtained on scanners developed by one of three manufacturers (General Electric Healthcare, Siemens Medical Solutions, and Philips Medical Systems). ADNI also collects data at 3.0 T from a subset of subjects, but to avoid having to model field strength effects in this study, only 1.5 T images were used. At each visit, two T1-weighted MRI scans were collected using a sagittal 3D MP-RAGE sequence for each subject. The TE/TR/TI (echo, repetition, and inversion time) parameters were optimized for best contrast to noise in a feasible acquisition time. The raw data had an acquisition matrix of 192×192×166 and voxel size 1.25×1.25×1.2 mm3 in the x-, y-, and z- dimensions (Jack et al., 2008). An in-plane, zero-filled reconstruction (i.e., sinc interpolation) resulted in a 256×256 matrix and a reconstructed voxel size of 0.9375×0.9375×1.2 mm3 in the x-, y-, and z- dimensions. The image with higher quality (of two that were obtained identically for each subject) was selected by the ADNI MRI quality control center at the Mayo Clinic (in Rochester, MN, USA)(Jack et al., 2008). Phantom-based geometric corrections were applied to ensure that spatial calibration was kept within a specific tolerance level for each scanner involved in the ADNI study (Gunter et al., 2006). Additional image corrections included GradWarp correction for geometric distortion due to gradient non-linearity (Jovicich et al., 2006), a “B1-correction” for image intensity non-uniformity (Jack et al., 2008) and an “N3” bias field correction, for reducing intensity inhomogeneity (Sled et al., 1998). The B1-correction (Jack et al., 2008) is different from the N3 bias field correction as it adjusts for image intensity inhomogeneity due to the B1 magnetic field non-uniformity using calibration scans. B1 calibration scans are collected to correct the image intensity non-uniformity that results when RF transmission is performed with a more uniform body coil but MRI signal reception is performed with a less uniform head coil. By contrast, the “N3” bias field correction, for reducing intensity inhomogeneity (Sled et al., 1998), is an image post-processing routine that is not dependent on calibration scans derived from the scanner. It essentially adjusts the spatial profile of image intensities using a multiplicative spline function, to make the histogram as sharp as possible. It also aims to adjust for the central bright artifact that can occur due to the dielectric effect. Both the uncorrected and corrected image files are freely available to interested researchers at http://www.loni.ucla.edu/ADNI.
All brain scans were linearly registered to the International Consortium for Brain Mapping (ICBM-53) standard brain template (Mazziotta et al., 2001) with a 9-parameter (9P) transformation (3 translations, 3 rotations, 3 scales) using the Minctracc algorithm (Collins et al., 1994). Globally aligned images were resampled in an isotropic space of 220 voxels along each axis (x, y, and z), with a final voxel size of 1 mm3.
The hippocampi were segmented with our new automated machine-learning hippocampal segmentation approach (AdaBoost) based on a statistical method called adaptive boosting originally developed by Freund and Shapire (Freund and Shapire, 1997). The technique has been described in detail in several publications by Morra et al. (Morra et al., 2008a; Morra et al., 2008c; Morra et al., 2008d). AdaBoost uses a training set of image data to develop mathematical rules for classifying future data, i.e., for labeling each voxel in a new image as belonging to the hippocampus or not. The training set consists of small number of representative images and their manual segmentations (in this case 21 subjects - 7 NC, 7 MCI and 7 DAT subjects) delineated by an expert (A.E.G., inter-rater reliability: Cronbach’s Alpha=0.97, intra-rater reliability: Cronbach’s Alpha =0.98) using a well-established, detailed anatomical tracing protocol with high inter- and intra-rater reliability. Based on the specific feature information contained in the positive and negative voxels of the training dataset (i.e., those belonging and not belonging to the structure of interest), AdaBoost develops a set of rules and computes the optimal combination of features for accurate segmentation of unknown images. Thousands of local features are taken into account, such as image gradients, local curvature of image interfaces, tissue classification as gray or white matter, and also statistical information on the likely stereotaxic position of the hippocampus. Using established numerical procedures from the fields of machine learning and computer vision (Morra et al., 2009), the training phase estimates the optimal weighting of these features in a mathematical formula that computes the probability of being inside the hippocampus. The algorithm’s performance has been validated in prior reports, and, when labeling new data previously unseen by the algorithm, it has been found to agree with human raters as well as human raters agree with each other (Morra et al., 2008a). Once a successful classification model is created the AdaBoost algorithm is applied to the full study cohort (in this case to all 490 baseline and follow-up scans).
After converting each hippocampal segmentation into a 3D parametric mesh model, we computed the medial core (a 3D medial curve threading down the center of each structure). The radial distance from each 3D hippocampal surface point to the medial core was computed (Apostolova et al., 2006a; Apostolova et al., 2006b). This provides a measure of the thickness of the structure at each surface point. The cognitive scores were then entered as covariates in a general linear model predicting the radial distance at each surface point of the mesh models. Associations between cognitive performance and hippocampal radial distance were sought in the full sample (N=490) and separately for the NC, MCI and DAT groups.
We used one-way Analyses of Variance (ANOVAs) with a post hoc Bonferroni correction for multiple comparisons for continuous variables to examine group differences in baseline and follow-up measures of age, education, MMSE, LM II-DR, AVLT-DR and ADAScog-DR. A chi-squared test for categorical variables was used to determine any group differences in gender. Pearson’s correlation analyses were used to investigate possible associations between hippocampal volume and memory performance in the full sample, and separately within each diagnostic group.
Linear regression models were used to map the association between LM II-DR, AVLT-DR, ADAScog-DR and hippocampal radial distance at baseline and at follow-up in all subjects, and separately within each diagnostic group, as well as to map the associations between change in memory performance and hippocampal thinning over the 12 month follow-up period. The 3D statistical maps were further subjected to multiple comparisons correction by permutation analyses (permuting the predictor variable in this case the memory scores) with the stringent threshold of p<0.01. Permutation tests on maps have been widely used in the brain mapping literature but there are differences among the approaches.Other then FDR tests there are 3 types of tests commonly used on statistical maps – 1) peak height, 2) cluster size, 3) total supra-threshold volume (total volume of all clusters any size, i.e., set-level inference). These are described in detail in Frakowiak 1997 (Frackowiak et al., 2007). The approach used here, from Thompson et al. (Thompson et al., 2003), differs somewhat from the approach of Nichols and Holmes (Nichols and Holmes, 2002), and the two approaches aim to control different error rates. Some permutation-based approaches, e.g., Nichols and Holmes (Nichols and Holmes, 2002), aim to control the family-wise error rate (chance of one or more false positives in the entire map) based on the permutation distribution of the image-wise maximum statistic, as they build an empirically-based null distribution for the image-wise maximum statistic based on randomizations of the data. The most extreme 5% of the null distribution for the maximal statistic may be used to threshold the raw statistical map. This allows one to reject the null hypothesis at individual voxels while knowing that the chance of family wise error (FWE) is controlled at 5%. Corrected p-values for each voxel are obtained by evaluating the percentage of the permutation distribution for the maximal statistic that exceeds the voxel statistic. This allows particular points to be declared significant. In contrast, our approach (Thompson et al., 2003), determines a single corrected p-value for each map (which is reported in Table 3), based on the number of points surviving a particular a priori threshold (which we set to 0.01 in our analyses). When this is used, one may argue that an overall-significant map must contain one or more (corrected) significant points, but this does not allow the interpretation of the p-values at each point as being corrected. Even so, our approach is akin to set-level inference in functional imaging, which is more sensitive to a distributed pattern of weak effects than null distributions based on the maximum statistic.As one moves from peak height to cluster size and set-inference approaches (we use the latter in this and our other papers) there is a trade off of localization ability for statistical power as it is easier for the total supra-threshold volume to catch effects all over the structure even if it does not coalesce into regions that exceed a pre-set number of voxels. We find this to be the best-suited approach for weak distributed effects.
Overall permutation corrected p-values for our statistical maps are presented in Table 3, but the maps themselves are illustrated with uncorrected p-values.
We conducted a separate linear regression analysis with hippocampal radial distance at follow-up as the dependent variable and memory test scores in follow-up as the predictor variable while controlling for baseline hippocampal volume in the full dataset and in each diagnostic groups. To avoid issues about in-sync permuting in the settings of two linear regression covariates for these maps we applied a map-wise false discovery rate (FDR) correction (Benjamini and Hochberg, 1995). These methods are widely used for multiple comparisons correction in statistical brain maps, and to derive a statistical threshold(critical value, t) where possible, that controls the expected proportion of false positive results (FDR) in the map. If there is such a threshold that controls the FDR at 5%, then the pattern of results in the map is declared, by convention, to be significant overall (Benjamini and Hochberg, 1995). Overall FDR corrected p-values for our statistical maps are presented in Table 4 while the maps themselves are illustrated with uncorrected p-values.
In addition we used the map-wise FDR correction method to derive cumulative distribution function (CDF) plots of the p-values in our main regression maps (Figures 1–3). These cumulative plots of p-values, or CDFs, allow for direct visual comparisons of the strength of correlations between each memory measure, MMSE and CDR-SOB and hippocampal radial distance. CDF plots can be used to rank statistical maps in terms of their effect sizes. In other words, statistical maps with CDFs that rise more steeply at the origin also to have a higher proportion of voxels with effect sizes exceeding any given fixed threshold. In the CDF plots, the x axis represents any arbitrary p-value threshold that is applied to the map (between 0 and 1), the y axis shows the proportion of the statistical map (i.e., a fraction between 0 and 1) showing effects that are more significant than that chosen p-value threshold. The y=20x line denotes the allowed 5% false discovery rate, which is the maximum proportion of false positives that is allowed for a map to be declared significant overall. If a given CDF function curves above the y=20x line, but then crosses it again at a point other than the origin, then the map shows statistically significant effect. In general, when effect sizes in the maps are greater overall, the CDF crosses the y=20x line at a higher statistical threshold (x value), meaning that a broader range of statistical thresholds can be applied to the data – and therefore more voxels reported as significant - while still keeping the false discovery rate below the conventional 5% (see Figure 6). In these graphs, we show the critical value, t, which is the highest threshold that can be applied to the statistical map while keeping the expected proportion of false positives below 5%. This is termed the “critical” value, e.g. the critical uncorrected p-value, which can maintain a certain, specified FDR.
The results from the Bonferroni-corrected ANOVAs and the chi-squared test for demographic and cognitive comparisons are presented in Table 1. There were no group differences in age. The MCI group had significantly more male subjects (65.3%) than the DAT (50.5%, p=0.01) and NC (51.3%, p=0.004) groups. The DAT group had fewer years of education relative to the MCI and NC groups [mean = 14.9 years for DAT vs. mean = 16 years for MCI (p=0.006) and NC (p=0.02)]. As expected, all cognitive measures at baseline and at follow-up were significantly worse in the DAT and MCI groups relative to the NC group, as were the cognitive differences between the MCI and DAT groups (all p<0.001 after Bonferroni correction for multiple comparisons).
The results of the correlation analyses between overall hippocampal volume and the memory tests in the full sample, and within each diagnostic group, are presented in Table 2. All three verbal memory measures showed significant correlations with hippocampal volume in the full sample and in the MCI group. In DAT, LM II-DR was correlated with both the left and right hippocampus at baseline (left r=0.33, p=0.001 and right r=0.35, p<0.0001) but only with the left hippocampus in follow-up (left r=0.21, p=0.044), while ADAScog-DR was correlated only with the left hippocampus both at baseline (r=−0.28, p=0.005) and in follow-up (r=0.21, p=0.04). There were no significant correlations between hippocampal volume and memory performance in NC.
The uncorrected significance maps for each memory measure are shown in Figures 1–3. Table 3 lists the global permutation-corrected significance of the statistical maps shown in Figures 1–3. In agreement with the volumetric results, no significant associations between hippocampal radial distance and verbal memory performance were detected in NC (see third row images in Figures 1–3), but significant associations were detected in the full sample (see second row images in Figures 1–3). Significant associations between memory and hippocampal radial distance were detected in both DAT and MCI. LM II-DR showed significant associations with hippocampal radial distance bilaterally in MCI at both time points but more significant in follow-up. (see Table 3 and the fourth row images in Figure 1). In DAT, significant associations were seen with LM II-DR bilaterally at baseline but only on the left in follow-up (see Table 3 and the bottom row images in Figure 1). AVLT-DR showed significant associations with hippocampal radial distance bilaterally in MCI both at baseline and at follow-up (see Table 3 and the fourth row images in Figure 2). DAT subjects failed to show significant associations with AVLT-DR at both time points probably as a result of floor effect (see Table 3 and the bottom row images in Figure 2). ADAScog-DR showed significant associations with hippocampal radial distance at baseline on the left hippocampus in MCI and bilaterally at follow-up (see Table 3 and the fourth row images in Figure 3). In DAT significant ADAScog-DR associations were seen only for the left hippocampus at both time points (see Table 3 and the bottom row images in Figure 3). To assess whether naturally occurring hippocampal asymmetry might lead to the findings above (more bilateral associations in MCI and more left sided associations in DAT) we conducted a post hoc left vs. right hippocampal radial distance comparison in each diagnostic group and in the full sample at each time point. Figure 4 demonstrates a very well-conserved asymmetry pattern from normal aging to DAT with larger posterior CA1 radial distances on the left and larger subicular and CA2-3 distances on the right. There were no changes in the asymmetry pattern from baseline to follow-up in any group and the pooled sample. All left vs. right comparisons across the three diagnostic groups and in the pooled sample both at baseline and in follow-up were highly statistically significant (pcorrected<0.001 for all maps depicted in Figure 4).
Our analyses aimed to uncover correlations between hippocampal thinning over 12-months and 12-month change in verbal memory performance did not result in statistically significant results.
The results from the linear regression analyses of the relationship between follow-up memory performance with follow-up hippocampal radial distance while controlling for baseline hippocampal volume are shown in Figure 5 and supplemental figure on the journal’s website. Figure 5 shows that after controlling for pre-existing atrophy memory performance shows strong regionally specific correlations in areas corresponding to the CA1 hippocampal subfield and parts of the subiculum. After applying FDR correction the pooled sample maps for all three verbal memory measures as well as the LM II-DR maps for the DAT group remained statistically significant (see Figure 5 and the supplemental figure on the journal’s website as well as Table 4 listing the highest p-value threshold (critical value, t) that keeps FDR at 5%).
We used CDF plots to objectively compare and rank the associations between the three memory measures, MMSE and CDR-SOB and hippocampal radial distance, as well as to demonstrate which of the cognitive tests showed the best linkage with hippocampal atrophy within each diagnostic group (Figure 6). In agreement with the results presented so far, Figure 6 showed no significant associations between memory test score and hippocampal radial distance in NC, while the strongest associations were between AVLT-DR and hippocampal radial distance in MCI and between ADAScog-DR and hippocampal radial distance in DAT. MMSE and CDR-SOB were less sensitive than the memory measures in all three diagnostic groups.
As seen in Figures 1–3, the associations between memory performance and hippocampal structure are not uniformly distributed. In MCI and DAT, there is consistently a strong relationship between delayed recall performance and the lateral hippocampal area closely corresponding to the CA1 subfield. The inferior hippocampal surface, which captures most of the subiculum region, is another subregion that shows significant associations with memory performance in MCI and DAT. CA2 and CA3, or the top medial part of the hippocampus, show a correlation in MCI with AVLT-DR at baseline and follow-up, LM II-DR at follow-up and ADAScog-DR at baseline.
DAT is already an epidemic among the elderly in the US and worldwide. To address the pressing need to better understand and treat AD, many researchers are focused on developing and validating AD-related biomarkers - quantitative AD-associated measures that serve as an indirect metric of disease severity. In the present study, we analyzed the relationship between memory loss - the most pervasive AD symptom - and hippocampal atrophy, the most established AD imaging biomarker. We examined how well hippocampal radial distance, a measure of hippocampal thickness, correlates with one task that assesses delayed recall for short stories (LM II) and two tests of delayed recall of a list of unrelated words (AVLT and the verbal memory test from the ADAScog) across the full sample (N=490, consisting of 148 NC, 245 MCI and 97 mild DAT subjects) and separately within each diagnostic group. We used a newly developed high-throughput hippocampal automated segmentation technique that has been previously applied in MCI and DAT (Morra et al., 2008a; Morra et al., 2008c; Morra et al., 2008d). As this method shows promise as a potential analytic tool for clinical trials, we wanted to explore the associations between hippocampal morphology and several of the memory measures that have been repeatedly used as screening tests or as primary and secondary outcomes in MCI and DAT clinical trials.
None of the three verbal memory measures showed significant associations with hippocampal volume/morphology among cognitively normal elderly. In MCI, all three measures showed significant associations with atrophy in both the left and right hippocampi, with AVLT-DR showing the strongest linkages. One explanation for this finding is that subjects with mild cognitive problems find AVLT more challenging than the other two tasks. AVLT consists of a list of 15 words, whereas the ADAScog list consists of only 10 words. In addition, AVLT requires subjects to learn a distractor list following learning of the to-be-remembered words, which can interfere with consolidation of the first list. It is known that memory consolidation is highly dependent upon hippocampal functions. Further, the two word list tests (AVLT and ADAScog-DR) are comprised of unrelated words, in which it would be more difficult for subjects to utilize memory mnemonics (e.g., drawing associations among the words). In contrast, a short story (as used in LM II) provides both a context and built-in associations. Subjects are more readily able to recall the information content of short stories as opposed to lists comprised of unrelated words as they can more easily utilize memory strategies, such as retaining the theme of the story and using pictorial rehearsal.
The observed stronger associations between LM II-DR and hippocampal atrophy in MCI at follow-up is another interesting observation. It could be due to two separate processes. First for some MCI subjects it could reflect disease progression where the progressive loss of the ability to compensate by means of these memory strategies results in a tighter hippocampal-memory performance association. On the other hand many MCI subjects demonstrated a learning effect (i.e., had improved LM II scores in follow-up, see Table 1). Perhaps the ones benefiting most were the subjects with the least amount of atrophy, which in turn further strengthened the association between memory performance and hippocampal radial atrophy.
The DAT subjects were the only group where differences were detected only on one hippocampus. They showed predominantly left sided hippocampal-memory recall associations. The greatest effect was observed with the cognitive test specifically designed for DAT subjects – the ADAScog. This task is the easiest of the three and showed higher performance levels in DAT subjects, suggesting that the ADAScog would be a better measure for studies hoping to provide above-floor performance levels. DAT subjects showed better performance on LM II-DR at baseline and a significant LM II-DR association with the hippocampal formations. However, this association becomes nonsignificant at follow-up when most DAT subjects can remember only one information unit on average (floor effect, see Table 1). AVLT-DR, the most challenging memory test of the three, showed no associations with hippocampal volume or radial distance in DAT, likely due to a floor effect (DAT subjects recalled an average of less than 1 word, see Table 1).
We used two different criteria to assess whether a map was significant after multiple comparisons correction, as both tests are somewhat prevalent in the brain mapping literature, although not always applied to the same data as they are usually considered as alternatives. First, the total supra-threshold surface area (with p-values more extreme than 0.01) was used and a corrected p-value was given for its rank in a null distribution obtained by randomization. And second, we used FDR theory to see if there was a statistical threshold that could be applied to the map that controlled the false discovery rate at the conventional 0.05 level. These are slightly different criteria and they are not always true in the same situations; they agreed for control and DAT groups but permutation gave slightly more powerful results in MCI. These two tests, which have different definitions, are generally true at the same time (i.e., they generally declare the same maps as significant), but for some effects one of the tests may work and the other may not. There is a point of connection between the CDF plots used in the FDR theory and the permutation tests, in that one could look up whether the map, thresholded at p=0.01, controls the FDR. By contrast, permutation tests find out the null distribution for this suprathreshold area by randomization. For the memory scores in the MCI subjects, the CDF plots show that the supratheshold area for map thresholded at p=0.01 is around 30 times higher than that which would be expected by chance for AVLT-DR, but only 5–10 times higher than that which would be expected by chance for the other memory scores. This number has to exceed 20 for the FDR to be considered to be controlled at 5% when the data are thresholded at p=0.01, i.e., 20% of the map has to be significant at p=0.01. By contrast, the permutation tests establish a non-parametric distribution for the suprathreshold area. As seen in Table 3, it was rare (less than 1 in 20 occurrences) for the suprathreshold area in randomized (null) data to exceed that seen by chance. It has to be concluded that permutation tests at the 0.01 threshold were sometimes more powerful in detecting effects than an FDR test applied at the same threshold. Even so, FDR is adaptive and considers all possible thresholdings of the data, so some effects in MCI, and all effects in DAT, had critical values lower than 0.01. As such they passed FDR as there was some threshold that controlled the false discovery rate.
Our regression analyses assessing associations between the change in memory scores and change in hippocampal radius were not significant. This may be due to the relatively short follow-up interval of only 12 months. Over such a short period of time, the changes in measures of disease progression such as memory scores and hippocampal volumes or thickness can be quite noisy. Table 1 shows that these measures show (1) very small mean changes over 12 month period, and (2) quite large standard deviations. In combination these make it hard to detect significant associations. We plan to repeat these analyses with the 2 or 3-year follow-up data once enough follow-up scans become available.
Once we controlled for baseline hippocampal volume in all subjects we uncovered highly significant regionally specific associations between follow-up memory test performance and hippocampal radial distance predominantly in CA1 distribution.
The CA1 subfield is known to be very susceptible to Alzheimer’s type pathology. It is one of the first regions to be affected and throughout the disease course consistently shows the highest pathology burden (Bobinski et al., 1997; Bobinski et al., 1995; Schonheit et al., 2004; Zarow et al., 2005). The observed linkage between CA1 atrophy and verbal memory is in strong agreement with our previous report (Apostolova et al., 2006b) as well as with recent data from another research group. (Fletcher et al., 2008). In addition an association between CA1 and the global ADAScog score tapping more broadly into global cognitive function has also been reported (Csernansky et al., 2005a).
Most studies that have investigated the associations between memory performance and hippocampal atrophy to date have conducted their analyses on either a mixed samples of demented and nondemented subjects or in DAT subjects alone. Only a few studies have separated DAT and NC subjects in their analyses, and among those several reported a lack of or even a negative correlation between memory performance and hippocampal volume in cognitively normal elderly (Chantome et al., 1999; Foster et al., 1999; Kohler et al., 1998; Ylikoski et al., 2000). One recent study extracted hippocampal volumes from postmortem MR images and sought possible correlations between hippocampal volume and verbal memory performance within 1 year of death (Mortimer et al., 2004). The authors reported a weaker association between memory performance and hippocampal volume in nondemented vs. demented individuals. These reports are in line with our findings. Even though the hippocampus plays a crucial role for memory encoding and consolidation in NC, age-associated memory decline depends less on hippocampal and more on extra-hippocampal (e.g., white matter) integrity (Grady, 2008). Therefore, even though we found associations among the hippocampi and memory in the entire sample of controls, MCI, and DAT subjects, our data indicate the importance of analysing specific subgroups separately, as the correlations for the whole sample were driven by the MCI and DAT data.
The observed bilateral hippocampal/memory association in MCI is in agreement with our previous report (Apostolova et al., 2006b). We also observed a pronounced left-sided laterality effect in DAT as already documented by de Toledo-Morrell et al (de Toledo-Morrell et al., 2000). It has also been reported that atrophy of the left but not the right CA1 subfield was predictive of future progression from cognitively normal (CDR=0) to early DAT (CDR=0.5) (Csernansky et al., 2005b). We do not have a good explanation for this bilateral to unilateral shift from MCI to DAT but one could perhaps hypothesize that in the MCI stage, subjects have greater compensatory capabilities and it takes bilateral disruption of the hippocampal networks for verbal memory impairment to manifest itself. As AD pathology increases and the ability to employ complex, higher order compensatory strategies dissipates (i.e., in the DAT stage), the associations between memory and the hippocampus may become detectable on the left side of the brain where the majority of language processing takes place. Another plausible explanation is that the substantially larger MCI group sample size gives us sufficient power to uncover right-sided verbal memory associations although admittedly our first report of bilateral hippocampal verbal memory associations in MCI came from a much smaller sample size (Apostolova et al., 2006b).
To our knowledge this is the largest study investigating the correlations between verbal memory and hippocampal structural integrity. The strengths of this study are its large size, the detailed subject assessment, the unified MRI protocol across multiple sites and its meticulous data quality control. Additional strengths are the advanced preprocessing and 3D modelling techniques used to map discrete structural-functional correlations from normal aging to dementia. One limitation of the study stems from the limited generalizability. The ADNI study uses rigorous exclusion criteria, as it was designed to closely resemble a clinical trial population. As such, it does not necessarily represent the general elderly population and its findings should be generalized with caution. Another weakness is the etiologic/pathologic uncertainty in the MCI stage. At least 30% of amnestic MCI have been found to harbor non-AD pathology (Jicha et al., 2006b). Such subjects if present in ADNI could be reducing our power to find statistically significant associations. Another limitation is that the link between an inward movement of the hippocampal surface and volumetric atrophy of the underlying subfield is yet to be validated. Such validation is difficult to do with 1.5T magnetic field strength. High-field and ultra-high field structural imaging can potentially provide us with enough resolution for subfield tracing and allow for such validation to take place.
In summary, our findings highlight the importance of the use of hippocampal atrophy as a biomarker by showing significant and robust relationships with the most pervasive symptom of AD, memory loss. Complex measures such as the AVLT-DR seem to be well suited for the MCI population and show strong association with hippocampal radial distance while DAT subjects may require less challenging verbal memory recall measures such as LM II-DR and ADAScog-DR.
Supplementary Figure: 3D significance maps showing the associations between follow-up memory test scores and 12-month hippocampal radial distance while controlling for baseline hippocampal volume in each diagnostic group. In the significance maps, red and white colors denote puncorrected<0.05. The final FDR corrected global p-values are listed in Table 4.
Data used in preparing this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative database (www.loni.ucla.edu/ADNI). Many ADNI investigators therefore contributed to the design and implementation of ADNI or provided data but did not participate in the analysis or writing of this report. A complete listing of ADNI investigators is available at www.loni.ucla.edu/ADNI/Collaboration/ADNI_Citation.shtml. All data collection was funded by the following ADNI funding sources (Principal Investigator: Michael Weiner; NIH grant number U01 AG024904): National Institute of Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Foundation for the National Institutes of Health, Pfizer Inc., Wyeth Research, Bristol-Myers Squibb, Eli Lilly and Company, GlaxoSmithKline, Merck & Co. Inc., AstraZeneca AB, Novartis Pharmaceuticals Corporation, the Alzheimer’s Association, Eisai Global Clinical Development, Elan Corporation plc, Forest Laboratories, and the Institute for the Study of Aging (ISOA), with participation from the U.S. Food and Drug Administration. The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. We thank the members of the ADNI Imaging Core for their contributions to the image pre-processing and the ADNI project.
All study analyses were funded by NIA K23 AG026803 (jointly sponsored by NIA, AFAR, The John A. Hartford Foundation, The Atlantic Philanthropies, The Starr Foundation and an anonymous donor; to LGA), the Turken Foundation (to LGA); NIA P 50 AG16570 (to LGA, JLC and PMT); NIBIB EB01651, NLM LM05639, NCRR RR019771 (to PMT); and NIMH R01 MH071940, NCRR P41 RR013642 and NIH U54 RR021813 (to AWT).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.