|Home | About | Journals | Submit | Contact Us | Français|
Episodic memory is the first and most severely affected cognitive domain in Alzheimer's disease (AD), and it is also the key early marker in prodromal stages including amnestic mild cognitive impairment (MCI). The relative ability of memory tests to discriminate between MCI and normal aging has not been well characterized. We compared the classification value of widely used verbal memory tests in distinguishing healthy older adults (n = 51) from those with MCI (n = 38). Univariate logistic regression indicated that the total learning score from the California Verbal Learning Test-II (CVLT-II) ranked highest in terms of distinguishing MCI from normal aging (sensitivity = 90.2; specificity = 84.2). Inclusion of the delayed recall condition of a story memory task (i.e., WMS-III Logical Memory, Story A) enhanced the overall accuracy of classification (sensitivity = 92.2; specificity = 94.7). Combining Logical Memory recognition and CVLT-II long delay best predicted progression from MCI to AD over a 4-year period (accurate classification = 87.5%). Learning across multiple trials may provide the most sensitive index for initial diagnosis of MCI, but inclusion of additional variables may enhance overall accuracy and may represent the optimal strategy for identifying individuals most likely to progress to dementia.
Episodic memory, defined as recollection of specific past events or information, is the first and most severely affected cognitive domain in Alzheimer's disease (AD) (Bäckman, Jones, Berger, Laukka, & Small, 2005). Episodic memory deficits are a key indicator of prodromal dementia stages, specifically for amnestic mild cognitive impairment (MCI) (de Jager & Budge, 2005; Ganguli, Dodge, Shen, & DeKosky, 2004; Kavé & Heinik, 2004; Marruff et al., 2004), and they are thought to represent the effect of early neuropathological changes to the hippocampal and entorhinal cortices (Becker et al., 2006; Petrella et al., 2006; Wicklund, Johnson, Rademaker, Weitner, & Weintraub, 2006). MCI is considered a transitional period between normal aging and dementia (Morris, 2006; Petersen et al., 2001). MCI patients experience memory loss that differs significantly from normal age-related change, but these individuals do not meet criteria for dementia due to their generally preserved overall cognitive function and activities of daily living (ADLs). Several studies have reported episodic memory impairment in preclinical AD anywhere from 3 to 8 years prior to formal diagnosis of AD (Amieva et al., 2005; Bäckman, Small, & Fratiglioni, 2001; Saxton et al., 2004).
The usefulness of neuropsychological measures for detecting early cognitive change is well established (Arnaiz & Almkvist, 2003), with some studies reporting accuracy rates of 85 – 90% for correct identification of MCI patients who will progress to AD (Collie & Marruff, 2000; Salmon et al., 2002; Small, Fratiglioni, Viitanen, Winblad, & Bäckman, 2000). With such high accuracy, however, comes variable levels of sensitivity and specificity for individual neuropsychological instruments as predictors of diagnostic conversion. Modrego (2006) reported that sensitivity and specificity varied from 64 to 89% and 76 to 97% respectively, depending on the measure of cognitive status utilized. In addition to this variability, memory measures have limitations when used to distinguish normal aging from MCI (Collie & Marruff, 2000). Widely used screening tests such as the Mini Mental Status Examination (MMSE) and Mattis Dementia Rating Scale-2 (DRS-2) have a ceiling effect in MCI, making it difficult to distinguish normal aging from early cognitive decline when using these measures (de Jager, Hogervorst, Combrinck, & Budge, 2003). Practice effects are another potential limitation, masking the detection of true serial change over time in MCI (McCaffrey, Duff, & Westervelt, 2000). Another factor that confounds the interpretation of neuropsychological test performance is the standardization process. Normative samples can be distorted by standardization procedures that fail to consider risk factors for vascular disease and dementia, serving to lower mean performance levels and increase variability-- resulting in an unreliable estimate of the “normal” range of functioning on certain tests (Brooks, Iverson, Holdnack, & Feldman, 2008; de Jager et al., 2003; Gurol et al., 2006; Holtzer et al., 2008; Ritchie, Frerichs, & Tuokko, 2007). These confounding variables (i.e., ceiling effects, practice effects, standardization limitations) make it difficult to identify appropriate memory measures that are both sensitive and specific for detection of MCI.
Arnaiz and Almkvist (2003) reviewed the literature and reported that verbal episodic memory was generally considered the best predictor of cognitive decline in MCI, though instruments utilized across studies were not homogenous and likely differed in terms of sensitivity. Moreover, differential results appeared to arise from variation in task difficulty in memory tests rather than from variations in the specific processes that the tasks intended to measure. In another study comparing different forms of episodic memory measures, Ivanoiu and colleagues (2005) found high specificity (100%) but extremely low sensitivity (12%) for the MMSE in the diagnosis of MCI. Even though patients attained significantly lower scores on both the MMSE and the DRS-Memory tests relative to healthy controls, both measures failed to differentiate declining versus stable patients and therefore were not helpful in distinguishing those at risk for conversion to AD. Others have found similar results with incipient AD patients (Karrasch, Sinerva, Gronholm, Rinne, & Laine, 2005; Tierney, Szalai, Dunn, Geslani, & McDowell, 2000), highlighting the importance of using instruments that sufficiently tax memory resources.
List learning tasks, which involve learning across multiple trials, are perhaps the most challenging and sensitive measures of episodic memory when used to assess early cognitive changes in MCI and AD. De Jager and colleagues (2003) found that compared to story recall and a visual memory test, only the total learning score of the Hopkins Verbal Learning Test (HVLT) (Brandt, 1991) discriminated MCI patients and healthy controls. Kavé and Heinik (2004) found that study participants met criteria for MCI on delayed recall when assessed using a list learning test (Rey-Auditory Verbal Learning Test, RAVLT; Schmidt, 1996), though they demonstrated normal performance on a visual memory test (Rey-Osterrieth Complex Figure Test; Spreen & Strauss, 1988) and story recall (Logical Memory I and II from the Wechsler Memory Scale-Third Edition, WMS-III; Wechsler, 1997). The authors concluded that in order to make a diagnosis of MCI, a marked decline in at least one memory test, most likely a list learning task, should be obtained. They also argued that requiring impairment on all memory measures would probably lead to a delay in diagnosis until the criteria for dementia had already been met, in turn skipping the MCI stage altogether.
In addition to providing strong diagnostic sensitivity for MCI, list learning tests can accurately predict diagnostic conversion to AD (Griffith et al., 2006; Marruff et al., 2004) and help characterize the severity of the memory impairment in early stages of dementia (Fox, Olin, Erblich, Ghosh Ippen, & Schneider, 1998). Karrasch and colleagues (2005) reported modest sensitivity (33%) and high specificity (93%) for the 10-item word list test from the Consortium to Establish a Registry in Alzheimer's Disease (CERAD; Morris et al., 1989) battery in MCI. The authors also noted that utilizing a test with a greater number of words should lead to increased sensitivity. Tierney, Yao, Kiss, and McDowell (2005) found that within a comprehensive dementia work-up that included a neuropsychological assessment, the short delayed recall of the 15-item RAVLT was the only measure to emerge from a 10-year regression analysis as a predictor of AD; the RAVLT also emerged as the most significant test in the 5-year prediction analysis. Esteves-Gonzalez, Kulisevsky, Boltes, Otermin, and Garcia-Sanches (2003) found that the total learning and delayed recall scores of the RAVLT helped identify individuals with subjective memory complaints who would progress to AD over at least a 1-year interval, and it also differentiated between MCI and normal aging. It should be noted that other memory tests have shown predictive validity for clinical progression from MCI to AD, including delayed recall of story memory and verbal paired associates (Guarch, Marcos, Salamero, Gastó, & Blesa, 2008) and visuospatial paired associate learning (Ahmed, Mitchell, Arnold, Nestor, & Hodges, 2008), but to our knowledge these measures have not been directly compared to list learning tasks.
The current study compared the ability of widely used clinical memory tests to classify individuals as either MCI or healthy controls (HC). We also sought to determine whether a single memory variable or combination of variables would provide the greatest diagnostic sensitivity and specificity for MCI. Finally, in our cohort of MCI participants, we investigated which memory test(s) accurately predicted conversion from MCI to probable AD over a follow-up period.
Participants were recruited into a longitudinal memory and aging study through flyers, public lectures, newspaper advertisements, and referrals from various Medical Center clinics. Screening for eligibility involved review of recent medical records and standardized phone interview, which included basic demographic information, medical and psychiatric history, and objective and subjective memory screens (Rabin et al., 2007). Inclusion criteria were age 60 years or greater, right-handedness, and fluency in English. To be consistent with the population characteristics from which this sample was recruited, participants were required to have at least 12 years of formal education or a GED. Exclusion criteria included medical, psychiatric, or neurological conditions (other than MCI) that could significantly affect brain structure or cognitive status; head injury with loss of consciousness greater than five minutes; history of cancer with chemotherapy and/or radiation therapy; use of psychotropic medication; and current or past substance dependence. Also excluded were individuals with non-amnestic forms of MCI (Petersen, 2004; Winblad et al., 2004) as determined on the basis of a multidisciplinary, clinical consensus meeting that included neuropsychological test findings and other relevant data. Despite active attempts to recruit minority participants, the sample was ultimately composed entirely of Caucasian individuals, consistent with the predominant demographic composition of the surrounding northern New England region. All participants provided written informed consent according to procedures approved by the Institutional Committee for the Protection of Human Subjects.
Approximately eight weeks after telephone screening, eligible participants underwent a comprehensive neuropsychological evaluation, including measures of memory, attention, language, executive functions, visuospatial ability, general intellectual ability, and psychomotor speed, as well as standard dementia screens. Tests were administered by postdoctoral fellows and highly trained research assistants who were blind to diagnostic status at baseline assessment. All tests were administered during a single session at our medical center, with a fixed order of administration designed to minimize the introduction of any new verbal stimuli during delay periods for verbal memory tests. Tests included: Action Naming Test (Piatt, Fields, Paolo, & Troster, 1999; 2004), Barona Index (Barona, Reynolds, & Chastain, 1984), Boston Naming Test (Goodglass, Kaplan, & Barresi, 2001), California Verbal Learning Test, Second Edition (CVLT II; Delis, Kramer, Kaplan, & Ober, 2000), Mattis Dementia Rating Scale-2 (DRS-II; Jurica, Leitten, & Mattis, 2001), Mini-Mental State Examination (MMSE; Folstein, Folstein, & McHugh, 1975), Test of Practical Judgment (Rabin et al., 2007), Trail Making Test (Delis, Kaplan, & Kramer, 2001; Reitan & Wolfson, 1985), Wechsler Adult Intelligence Scale-Third Edition (WAIS-III Digit Symbol, Digit Span, Block Design, Vocabulary, and Information subtests; Wechsler, 1997), Wechsler Memory Scale-Third Edition (WMS-III Logical Memory [LM] I and II and Visual Reproduction [VR] I and II subtests and LM and VR Recognition; Wechsler, 1997), and Wisconsin Card Sorting Test (short form; Heaton, Chelune, Talley, Kay, & Curtis, 1993). Estimates of baseline intellectual functioning included WAIS-III Vocabulary and the American National Adult Reading Test (Grober & Sliwinski, 1991).
Participants and their informants also completed a series of cognitive questionnaires. A Cognitive Complaint Index (CCI; Saykin et al., 2006) ranging from 0 to 100 was calculated based on the total number of items that could be endorsed across all subjective measures including: Memory Self-Rating Questionnaire (Squire, Wetzel, & Slater, 1979), self- and informant- versions of the Neurobehavioral Function and Activities of Daily Living Rating Scale (Saykin, 1992; Saykin et al., 1991), self- and informant-versions of the Informant Questionnaire on Cognitive Decline in the Elderly (Jorm & Jacomb, 1989), four cognitive items from the Geriatric Depression Scale (GDS; Yesavage et al., 1982), 10 cognitive items from a telephone-based screening for MCI (Rabin et al., 2007) and 23 items from the Memory Assessment Questionnaire (Santulli et al., 2005), adapted in part from the Functional Activities Questionnaire (Pfeffer et al., 1982). A Board-certified geropsychiatrist ruled out significant impairment in activities of daily living (ADLs), depression, and other Axis I psychiatric conditions based on a semi-structured interview that included the Hamilton Rating Scale for Depression (Hamilton, 1960). A neurological examination and the original and revised Hachinski Ischemia Scales (Hachinski et al., 1975; Small, 1985) were completed to rule out incidental pathology.
Criteria for MCI were based on the Petersen recommendations (Peterson, 2004; Petersen et al. 2001), operationalized for the purpose of this study:1 (1) Presence of significant cognitive complaints, preferably corroborated by an informant. Specifically, we required endorsement of > 20% of possible complaints on the CCI. (2) Presence of objective memory impairment. Specifically, consistent with the methodology utilized by Troyer et al. (2008), we required that the age-corrected scaled score (age SS) on one or more memory tests be considerably lower (i.e., ≥ 1.5 standard deviations lower) than was the age SS for Verbal IQ (as measured by WAIS-III Vocabulary, mean = 10, SD = 3). The standardized memory tests selected for classification purposes were WMS-III Visual Reproduction Immediate Recall, Delayed Recall, and Recognition conditions and the DRS-2 memory subtest. The memory tests were selected to avoid criterion contamination of the verbal memory measures of particular interest for the present report and were not used in the logistic regression analyses. (3) Unimpaired activities of daily living as determined by a multidisciplinary clinical consensus team, incorporating data from the geropsychiatric evaluation, telephone screen, and Neurobehavioral Function and Activities of Daily Living Rating Scale. (4) Preserved general cognitive functioning as measured by cutoff scores of > 24 of 30 or ≥ 128 of 144 on the MMSE and DRS-2 respectively. (5) No clinical depression or other psychiatric disorder as determined by a multidisciplinary clinical consensus team, and incorporating information from the geropsychiatric evaluation, Geriatric Depression Scale, and Hamilton Rating Scale for Depression.
Healthy control participants had: (1) No significant cognitive complaints, as indicated by endorsement of ≤ 20% of possible complaints on the CCI. (2) Intact functioning on memory testing. Specifically, we required that the age-corrected scaled score (age SS) for all of the following memory tests not fall considerably (i.e., ≥ 1.5 standard deviations) below the age SS for Verbal IQ (as measured by WAIS-III Vocabulary, mean = 10, SD = 3): WMS-III Visual Reproduction Immediate Recall, Delayed Recall, and Recognition and DRS-2 memory. (3) Normal activities of daily living measured as described for the MCI group. (4) Preserved general cognitive functioning also measured as described for the MCI group. (5) No clinical depression or other psychiatric disorder as described for the MCI group. As noted above, participants who demonstrated impairment in cognitive domains other than memory were not eligible for study inclusion. We also analyzed the baseline scores of individuals who later went on to develop AD. These participants met criteria for probable mild AD, as defined by the NINCDS-ADRDA criteria (McKhann et al., 1984), over the course of longitudinal follow-up.
Tests selected for the regression equations encompassed a range of memory processes, task procedures, types of stimuli, and difficulty levels. For example, we included the three word memory from the MMSE and two list learning tasks: the 16-item California Verbal Learning Test-Second Edition (CVLT-II) with five learning trials and short- and long-delay recall conditions as well as a 10-item list learning test developed in our lab for purposes of telephone screening with three learning trials and a delayed recall condition (Memory and Aging Telephone Screen, MATS; Rabin et al., 2007). As a measure of story recall, we used the WMS-III Logical Memory (LM) subtest with immediate, delayed recall, and recognition conditions. We also included delayed recall of LM Story A, as an earlier version of this measure is currently used as part of the diagnostic criteria for amnestic MCI in the Alzheimer's Disease Neuroimaging Initiative (Mueller et al., 2005) and the Alzheimer's Disease Cooperative Study (ADCS) clinical trials (Thal, 2005). Finally, we used WAIS-III Information (INFO), considered a test of remote memory or longer-term recall in older adult populations (Lezak, Howieson, & Loring, 2004), which has been shown to be predictive of AD within 5 years of diagnosis (Tierney et al., 2005). Notably, we were unable to include WMS-III Visual Reproduction or DRS-2 memory subtests because these tasks were used to classify participants.
Logistic regression was employed to determine the maximal sensitivity and specificity of each test and to permit a ranking of tests that were considered to be most suitable for group differentiation. Multivariate logistic regression was used to calculate the optimal test combination for group discrimination. Age, education, and sex were included as covariates in all statistical analyses.
Participants included 89 older adults who met the eligibility criteria specified above and were classified as either MCI (n = 38) or HC (n = 51). The average number of memory measures (out of five possible) on which MCI participants fell below the mean was 2.48, SD = .99. There were no significant group differences in demographics (see Table 1). As mentioned above, age, education, and sex were all entered as covariates in the analysis and did not significantly affect the pattern of results.
We used univariate logistic regression to calculate the optimal sensitivity and specificity for each test (see Table 2). The CVLT-II learning score was ranked highest (sensitivity = 90.2; specificity = 84.2) in distinguishing MCI from normal aging. LM recognition (LMREC) and CVLTII long delay scores ranked second and third, respectively, in terms of discriminability. LM immediate recall (LM I) and INFO emerged lower on the list, and attained relatively low specificity. A summary of the regression analysis for each test is provided in Table 3. Next, we used multivariate logistic regression to calculate the optimal combination of tests for group discrimination. Combining CVLT-II learning and delayed recall of LM Story A (LM II-A) (−2 Log Likelihood = 27.68; Nagelkerke R2 = .88) yielded the highest overall classification (sensitivity = 92.2; specificity = 94.7). Combining CVLT-II learning and MATS delay recall (−2 Log Likelihood = 30.26; Nagelkerke R2 = .82) yielded the lowest overall classification (sensitivity = 88.4; specificity = 88.2).
We also used univariate logistic regression to determine which test(s) predicted progression from MCI to probable AD. As mentioned previously, these participants met criteria for probable mild AD, as defined by the NINCDS-ADRDA criteria (McKhann et al., 1984).2 Only MCI participants with at least one wave of follow-up data (n= 32) were included in this analysis. The six MCI participants who were lost to follow-up were statistically comparable to the remaining 32 in terms of baseline age, years of education, sex, and performance across memory tests: CVLT-II total, CVLT-II short delay, CVLT-II long delay, LMI, LMII, LMREC, MATS Total, MATS delay, MMSE W-Missed, and INFO (p > .05 for all analyses). The average duration of follow-up was 2.97 yrs, SD = 1.16 (range = 1 – 4 years). Nine participants from the MCI group developed probable AD (27.3%), resulting in an MCI comparison group of 23 participants. The mean time of diagnostic conversion for the nine participants was 1.89 yrs, SD = 1.05 (range = 1 – 4 years). As expected based on prior research (DeCarli et al., 2004; Howieson et al., 2008; Karrasch et al., 2005; Petersen & Negash, 2007), MCI participants who underwent diagnostic conversion showed significant performance reductions on memory tests at baseline as compared to their MCI counterparts who did not convert to AD during the observation period (see Table 4).
Results indicated that LMREC (sensitivity = 95.7; specificity = 62.5) and CVLT-II long delay (sensitivity = 100; specificity = 44.4) were the highest ranked tests in terms of distinguishing converters from non-converters (see Table 5). LM II-A ranked third in terms of discriminability. Notably, specificity was low across memory tests in this analysis (range 0% to 62.5%). A summary of the regression analysis for each test is provided in Table 6. Next, we used multivariate logistic regression to determine which test combination would yield the highest group discrimination. Combining LMREC and CVLT-II long delay (−2 Log Likelihood = 26.10; Nagelkerke R2 = .45) resulted in increased specificity (66.7); however, there was no meaningful change in sensitivity (97.5) or accurate classification (87.5). Combining LMREC with LM I (−2 Log Likelihood = 25.64; Nagelkerke R2 = .46) yielded the lowest overall group classification (sensitivity = 62.6; specificity = 55.6; accurate classification = 75.0).
This study compared various memory tests in terms of their ability to classify older adults as MCI or healthy controls. Results indicated that the CVLT-II total learning score across five trials provided the most accurate classification of participants, followed closely by recognition and delayed recall trials of LM and the CVLT-II, respectively. This finding is consistent with prior reports regarding the usefulness of list learning tasks for detecting early AD (Estevez-Gonzalez, 2003; Tierney et al., 2005) and with Petersen's observation that a sensitive index of MCI is the inability to acquire information beyond the short term memory span, and that this deficit becomes apparent over several learning trials (Petersen, Smith, Ivnik, Kokmen, & Tangalos, 1994). Not surprisingly, total learning score from a shorter, 10-item list learning task (Rabin et al., 2007) did not provide as strong diagnostic sensitivity or specificity as the 16-item CVLT-II. This is consistent with the idea that sensitivity of word list learning tests may be enhanced by increasing both the number of items to be remembered (> 10 items) and number of learning trials (> 3 trials) (Karrasch et al., 2005), while exerting caution to ensure that task demands are not too overwhelming for the population of interest. It is also important to note that number of words missed on the MMSE, immediate recall of a story memory test (i.e., WMS-III LM I), and remote memory (as measured by WAIS-III Information) were among the lowest ranked tests in terms of discriminability. These findings should be borne in mind when designing neuropsychological test batteries for clinical or research purposes.
We also sought to determine whether a single memory test or combination of variables would provide the greatest diagnostic sensitivity and specificity for MCI. Inclusion of a delayed recall condition of a story memory task (i.e., WMS-III LM Story A) enhanced the overall accuracy of classification (from 87.6% to 93.3%). This finding is interesting in light of controversy regarding the sensitivity of story recall versus list learning tests in detecting early memory changes. For example, some have suggested that story recall should be paired with a more demanding, list learning task to ensure proper detection of MCI, which often presents as subtle memory change (Kavé & Heinik, 2004). Indeed, combining list learning with story memory could make up for the lack of sensitivity of story memory, generally considered a less taxing measure because it provides a logical context for the examinee. This may be especially obvious in assessments of highly educated or highly functioning individuals who can better compensate for their declining memory capacity. The strategy of combining tasks also might provide a more accurate assessment of MCI individuals who perform poorly on list learning tasks for reasons other than true memory deficits (e.g., they become overwhelmed by task demands). Another possibility would be to develop new, more challenging measures of story memory, which might be able to detect subtle memory decline on their own.
Another goal was to determine which memory test(s) best predicted conversion from MCI to AD. It should be noted that the sample size for these analyses was considerably smaller than for the previous analyses, and results should be interpreted with caution given issues of statistical power and generalizability of findings. In our sample of 32 MCI participants with available follow-up data, 27% (n = 9) progressed to probable AD, consistent with published conversion rates (Busse, Hensel, Guhne, Angermeyer, & Riedel-Heller, 2006; Daly et al., 2000). Findings revealed that the LM recognition was the highest ranked test in terms of classification (87.1%) followed closely by the delayed recall condition of CVLT-II (84.4%) and LM-II A (81.3%). It is notable that four of the top five tests for predicting diagnostic conversion all were episodic memory tests of delayed recall or recognition as opposed to tests of learning or short-term recall. These findings are consistent with the idea that immediate recall trials may measure attention and working memory (extensively mediated by prefrontal cortex and related circuitry) to a greater degree than delayed-recall trails, which primarily tap retention and delayed-retrieval effects (and on which performance is highly dependent on entorhinal and hippocampal systems) (Shankle et al., 2005). These distinctions may account for the superiority of delayed recall or recognition tasks for predicting diagnostic conversion as compared to making the initial diagnosis. Additionally, previous research has shown that the delayed recall condition of a lengthy supraspan test (i.e., RAVLT) can predict conversion to AD (Esteves-Gonzalez et al., 2003). Our findings together with these observations make a compelling case for using a combination of story memory and list learning tasks to ensure accurate detection of MCI and the ability to monitor progressive decline.
Future research might expand the current findings to include memory paradigms not traditionally assessed during neuropsychological evaluations. For example, Troyer et al. (2008) reported on the clinical usefulness of calculating associative memory scores from standardized tests of object-location recall (i.e., BVMT-R) and symbol-symbol recall (i.e., Digit symbol incidental recall) as part of MCI evaluations. Specifically, their ROC curve analyses showed better classification of individuals to MCI or healthy control groups with measures of associative recall as opposed to item recall. Anderson et al. (2008) explored the roles of automatic and controlled memory mechanisms in older adults and found that individuals with MCI had intact familiarity but a marked difficulty recollecting prior items in their context. The authors suggested that this selective effect likely reflects disrupted hippocampal functioning and offered suggestions for rehabilitation focused on enhancing contextual recollection or training recollection directly.
Given issues of respondent burden and battery construction within our longitudinal study, we were limited to a relatively small group of memory tests. Further, to avoid circularity in defining and discriminating MCI, we used the WMS-III Visual Reproduction and DRS-2 Memory for diagnostic purposes and did not include these measures in the regression analyses. Ideally, in future research we would classify MCI and HC participants using memory tests with stronger diagnostic validity to minimize the possibility of misclassification. We would also strive to include additional measures to enable comparison of the relative value of visual versus verbal memory tests. Combining the CVLT-II and WMS-III LM with other types of memory tasks might lead to the optimal assessment method for detection of MCI and monitoring of clinical course. As well, it might be possible to enhance the diagnostic and predictive value of list learning tests by using techniques such as correspondence analysis, in which weighted scores are derived for each participant from their item responses on immediate- and delayed-recall trials. This method, which weights each word's relative importance by its position in the list and by the trial in which it was recalled, has proven superior to use of straightforward total learning or cutoff scores for early detection of AD (Shankle et al., 2005).
Generalization of our results is limited by the high level of education and estimated baseline level of functioning of the participants. Cognitive reserve theory (Spitznagel, Tremont, Brown, & Gunstad, 2006; Stern, 2002) predicts that highly educated older adults will perform better for a longer period of time before the load of the medial temporal pathology impairs memory functioning. Thus, it will be important to replicate these findings in a more diverse group of elders who may demonstrate impairment on some of our less taxing memory measures (e.g., MMSE words missed). The current results are not meant to characterize non-amnestic forms of MCI, which may present with cognitive impairment in multiple domains or a single non-memory domain. Given the focus of the present report and power limitations, we were unable to consider non-memory variables in the analyses. Furthermore, although our sample was recruited on the basis of amnestic features of MCI and no participant demonstrated severe or disproportionate impairment on non-memory tests, it is widely accepted that some MCI participants will show subtle deficits on tests of other cognitive domains. For example, neuropsychological measures of executive function have been shown to be a prominent feature of prodromal AD and useful in some predictive models (Albert, Blacker, Moss, Tanzi, & McArdle, 2007; Albert, Moss, Tanzi, & Jones, 2001; Chen et al., 2000; Elias et al., 2000). Inclusion of executive function tests in our predictive models may have yielded valuable diagnostic information and improved the specificity of our measures for predicting dementia. Finally, although the combination of CVLT-II learning and LM Story A delayed recall offered the optimal group discrimination for our participant groups, all combinations of available tests yielded fairly high sensitivity (> 88%) and specificity (> 88%), implying that use of more than one memory measure is a key factor in enhancing classification accuracy (perhaps nearly as important as the specific choice of tests). It is likely that inclusion of multiple memory processes and the combined total number of memory items presented are important. Measurement reliability is related to total item count, which may be a key psychometric factor.
In summary, learning across multiple trials appears to be the most sensitive diagnostic index for MCI, though inclusion of additional variables can enhance overall accuracy. Further, though delayed recall tests best predicted progression from MCI to AD, the specificity of memory measures on their own was low and may be enhanced by using multiple tests covering various cognitive domains. Overall, it is clear that learning across multiple trials and memory for new supraspan information will continue to play an important role in research on preclinical dementia in terms of determining study eligibility, examining treatment effects, and monitoring the cognitive status of participants over time. To this end, it will be important to develop and utilize instruments with multiple, equivalent test forms to reduce practice effects or use empirical methods to determine whether observed change scores exceed chance expectations (Chelune, 2003). Future research also might investigate the usefulness of specific memory tests in relation to important biomarkers such as APOE epsilon 4 genotype status, hippocampal and entorhinal cortex volumes, CSF and plasma Aβ42 concentrations, and PET amyloid imaging. These biomarkers are important for research but are expensive and ultimately better cognitive assessment protocols, if validated, could help reduce clinical diagnostic and monitoring costs for individuals known to be at risk for developing AD. Finally, list learning and story memory tests permit investigation into various aspects of memory functioning (e.g., encoding, retrieval, recognition, strategy use, the contribution of meaning to retention and recall). Future research should more closely examine which memory processes are tapped by specific tests and the nature and extent of decline in these processes during the insidious transition from healthy aging to MCI to AD.
Supported, in part, by grants to AJS from the National Institute on Aging (R01 AG19771) and the Alzheimer's Association (IIRG-99-1653 sponsored by the Hedco Foundation). The authors thank Margaret Nordstrom, Katherine Nutter-Upham, and Heather Pixley for their assistance with this study.
1Our procedure for diagnosing MCI in the current study differs from our standard approach reported elsewhere (e.g., Saykin et al., 2004, 2006; Rabin et al., 2007), in which classification decisions are made by a multidisciplinary clinical consensus panel, applying criteria developed by Petersen and colleagues and incorporating all neuropsychological test data and other available information. In order to avoid criterion contamination in the current study, we rigorously applied these criteria and used only the measures described herein to assign participants to groups.
2Examiners could not always be blinded at follow-up but the clinical ratings were adjudicated by a multidisciplinary consensus conference comprised of clinicians and researchers, some of whom had had contact with the participant and some of whom had not.