This study investigated the test-retest reliability of an fMRI face-name associative encoding paradigm in cognitively intact and mildly memory-impaired older individuals over a four to six week inter-scan interval. Importantly, good reliability of activation patterns in whole brain maps and specifically in the hippocampus was observed over this intermediate inter-scan interval in elderly non-demented older adults. Furthermore, we observed good reproducibility of hippocampal activation by using an abbreviated fMRI paradigm that would be suitable to add on to standard safety and volumetric protocols in a clinical trial. Substantially more variability was observed in the pattern and magnitude of deactivation within the medial parietal (precuneus) regions of the default network using the same analytic methods, suggesting that task-induced deactivation may be less reliable overall than hippocampal activation, in older and cognitively impaired subjects.
We investigated percent signal change in regions of interest using a method that would allow us to sample from regions engaged in the task within each individual's anatomy. We restricted our analyses to voxels that were actually activating or deactivating at either scanning session. We further defined the ROIs by the regions where these activated areas lay within each individual's anatomically defined hippocampus and precuneus regions. This was done to better account for differences in individual anatomy, as older subjects might have early regional atrophy in the medial temporal lobe and the medial parietal cortices. Notably, when analyses were repeated using only MNI-based, structurally defined ROIs, we observed very similar results to those reported here using individual anatomically defined ROIs. Thus it may not be critical to include individual volumetric information in fMRI analyses when only normal or mildly impaired subjects are included. However, the need for volumetrically individualized fMRI analysis may be required when studying more impaired MCI or AD patients, who likely have greater regional atrophy.
Behavioral performance on the post-scan memory test did not demonstrate any statistical differences between scanning sessions, and may reflect the fact that most of our subjects performed fairly well at both scanning sessions. Nevertheless, it is important to note that event-related contrasts demonstrated similar signal reliability as block design contrasts since event-related designs may prove more useful in the evaluation of individuals who demonstrate significant cognitive change over time. However, a disadvantage to event-related designs is the inherent loss of statistical power as well as insufficient bin sizes for some contrasts based on inter-subject variability, particularly when utilizing an abbreviated version of this paradigm.
One important objective of this study was to establish the reliability of hippocampal signal during an associative memory task, given this region's importance in memory function and previous concerns about the inter-subject variability of hippocampal activity (Daselaar, et al. 2003
; Rombouts, et al. 1997
). Similar to results of reliability studies of fMRI paradigms investigating auditory working memory (Wei, et al. 2004
) and fear responses in the amygdala (Johnstone, et al. 2005
), we found that paired-associate encoding-related hippocampal activation has good reproducibility, based both on ICC values and within-subject variance measures. ICCs for hippocampal ROIs mostly ranged between 0.4–0.8, showing moderate to high signal reliability for this associative encoding task. Individual ICCs from the left hippocampus (~0.5), was generally lower than the ICCs from right hippocampus (~0.7). These ICCs are in the range of those reported in other imaging reliability studies (Eaton, et al. 2008
; Johnstone, et al. 2005
; Manoach, et al. 2001
; Wei, et al. 2004
), and are also consistent with results from a recent fMRI study reporting on a memory paradigm in older subjects (Clement and Belleville 2009
). It is perhaps not surprising that we observed somewhat better signal reliability from the right hippocampus, as the literature implicates the right hippocampus in playing a crucial role in spatial encoding (Schacter, et al. 1996
; Sperling, et al. 2001
; Squire, et al. 1992
). Furthermore, our paradigm may specifically probe the role of the right hippocampus in novel face encoding (Chua, et al. 2007
; Sperling, et al. 2003b
). Our recent longitudinal studies in aging and MCI may implicate the right hippocampus as being predictive of clinical decline (O'Brien, et al. 2010
), suggesting that the right (non-dominant) hippocampus may be more vulnerable to the process of neurodegeneration.
Dice spatial overlap coefficients calculated from each individual's anatomically defined hippocampus had low to moderate values of 0.3 – 0.5, and were similar to those reported in other studies (Clement and Belleville 2009
; Machielsen, et al. 2000
; Rombouts, et al. 1997
). These results are consistent with suggestions in the literature that statistical comparisons of magnitude of signal in activation contrasts may be more reliable than spatial voxel comparisons (Clement and Belleville 2009
). As spatial overlap is partially determined by proportion of the region activated, this may have particular implications for the hippocampus, and more specifically, for our task, which primarily activates the anterior hippocampal formation. Given the marked inter-subject variability in the morphology of the hippocampus and the extent of activation, the reproducibility of fMRI signal in this region may not be best evaluated using spatial overlap methods (Rombouts, et al. 1997
Establishing hippocampal reliability with this associative memory paradigm in older cognitively normal and mildly impaired subjects should be informative for future work investigating and quantifying cognitive decline of older subjects at risk for Alzheimer's disease. The comparison of reliability between a full associative memory paradigm and an abbreviated version of the same paradigm suggests that the short version may have sufficient reliability to be usefully incorporated into clinical trial MRI protocols. Interestingly, after separating the subjects into two groups based on their Clinical Dementia Rating score, it was observed that although the groups both showed overall moderately high signal reliability, the CDR0 group showed higher reliability than the CDR 0.5 group in the left hippocampus. However, reliability was comparable in the right hippocampus across both block and event-related designs. Our findings are generally consistent with that of Clement and Belleville (2009)
, who also observed overall a similar degree of reproducibility in older controls and MCI subjects in single-measure ICC across an average of multiple regions, although they did observe variability in hippocampal activation across conditions and clinical group.
Interestingly, we observed a dissociation between the reliability of behavioral memory measures and fMRI activity in comparing CDR 0 and CDR 0.5 groups. We observed that although CDR 0.5 subjects show a higher ICC for behavioral memory performance across scan sessions than CDR 0 subjects, the CDR 0.5 subjects also show decreased ICC with respect to right hippocampal activation. We speculate that these results could reflect a high between-subject variability among CDR 0.5 subjects in objective memory performance, as these individuals are known to be clinical heterogeneous. However, it is also possible that CDR 0.5 subjects may demonstrate slightly greater within-subject variability in neural activity over short time frames, reflecting evidence of early vulnerability of the neural systems engaged in memory encoding.
A secondary objective of this study was to examine the reproducibility of task-related deactivation, as the default network has received increasing interest in the aging and neurodegenerative disease literature. The precuneus/posterior cingulate area is documented as exhibiting “beneficial” task-related deactivation, or increase of negative signal (Miller, et al. 2008a
). Some studies further report a left lateralization in the areas deactivating in response to task (Binder, et al. 1999
; Mazoyer, et al. 2001
), a finding that was also demonstrated in the current study. While deactivation was present at both scanning sessions in the full paradigm, the locations and magnitude of deactivations varied greatly between subjects, as well as within subjects across time. One potential explanation for this is that areas of the precuneus involved in task-related deactivations may inherently demonstrate more individual regional variability in activity than the anatomically smaller and more reliably activated hippocampal formation.
It is also thought that the normal aging process disrupts coordinated intrinsic activity between different components of the default network, most prominently seen in the precuneus/posterior cingulate region of the brain (Andrews-Hanna, et al. 2007
; Damoiseaux, et al. 2007
; Grady, et al. 2006
; Persson, et al. 2007
) as well as task-induced deactivations (Sambataro, et al. 2010
; Lustig, 2003
). In particular, recent data suggest that the presence of amyloid pathology, even in cognitively intact older subjects, may disrupt normal default network activity during the resting state (Hedden, et al. 2009
; Sheline, et al. 2010
). Interestingly, older subjects with high amyloid burden, as well as cognitively normal older subjects with genetic risk factors for AD demonstrate significant reductions in deactivation during cognitive tasks, with some subjects even demonstrating paradoxical activation in this area, similar to reports in MCI and AD (Fleisher, et al. 2009
; Lustig, et al. 2003
; Petrella, et al. 2007
; Pihlajamaki, et al. 2009
; Sperling, et al. 2009
). For these reasons, it may not be surprising that we observed much more variability in patterns of deactivation at each scanning session than previously seen in younger subjects (Gusnard and Raichle, 2001
; Lustig et al., 2003
). The results from the present study suggest that the intra- and inter-subject variability of default network activity during memory encoding, both in terms of magnitude as well as spatial extent, is higher than the variability observed in activation of the “task-positive” network in older subjects. Thus, although the default network may be a very sensitive indicator of early neural alterations seen in aging and prodromal AD, these regions may provide less reliable metrics for demonstrating pharmacologic effects.
As a large number of potential treatments for memory impairment are entering clinical trials, it is important to develop measures which can detect a “signal of efficacy” in a short time period that may predict subsequent effects over a longer trial. Our power calculations based on these results indicate that a relatively small number of subjects would be needed to detect significant pharmacological effects on hippocampal activity within a six-week span. These results are particularly encouraging for the abbreviated version of the paradigm, which is more feasible than the full paradigm to add on to early “proof of concept” trials or scans acquired for safety monitoring. Although a few more subjects would be needed to detect these effects in the abbreviated version of the paradigm, the relative benefits of adding such a short cognitive paradigm may outweigh the relative cost. The results of the power analyses lend further support to the promise of fMRI as a potential biomarker to detect acute efficacy in proof-of-concept clinical trials. We are currently including an abbreviated version of this paradigm in ongoing fMRI studies in the context of a placebo-controlled clinical trial in mild AD patients.
It is important to note the limitations and challenges of this study. One limitation inherent to fMRI studies is inconsistency in signal intensity due to extensive variability in individual subject's hemodynamic response and neurovascular coupling, particularly in older subjects (D'Esposito, et al. 2003
; Miller, et al. 2002
). We decided to include every subject from the initial dataset in our study, and did not exclude any subjects based on criteria of motion in either their functional scans or their structural images. The reason for this was to make it as applicable to a clinical trial setting as possible, where it may not be feasible to choose only the scans that are optimal for analysis. However, because some older subjects had more head motion than typically accepted in studies of young subjects, our reproducibility analyses likely suffered by including those data points. Lastly, unlike our previous reliability study in young subjects (Sperling, et al. 2002
), we did not control for over the counter medication (e.g. antihistamines), alcohol, or caffeine use, as this may not be feasible in a clinical trial. These factors may have also affected fMRI activity or cognitive performance during scan sessions. It is also important to note that we only assessed short-term test-retest reliability, which does not provide information about the utility of fMRI in assessing long-term change in the context of a potential disease-modifying medication in AD.
Strengths of this study include its focus on examining reliability in an aging and cognitively declining population, a demographic that is often overlooked in reliability studies. Additionally, all 27 of these subjects were scanned at the same place and at the same time of day to reduce potential secondary biases. We also used freely available analysis platforms that have been shown to be reliable and generalizeable for various types of data. Beyond demonstrating similar reliability between the hippocampal activations of cognitively normal and mildly impaired individuals at an inter-scan interval of 4–6 weeks, we also found that a more practical, abbreviated version of our memory paradigm demonstrated similar reliability to the full version.