Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Brain Imaging Behav. Author manuscript; available in PMC 2013 December 1.
Published in final edited form as:
PMCID: PMC3806057

Development and assessment of a composite score for memory in the Alzheimer’s Disease Neuroimaging Initiative (ADNI)


We sought to develop and evaluate a composite memory score from the neuropsychological battery used in the Alzheimer’s Disease (AD) Neuroimaging Initiative (ADNI). We used modern psychometric approaches to analyze longitudinal Rey Auditory Verbal Learning Test (RAVLT, 2 versions), AD Assessment Schedule - Cognition (ADAS-Cog, 3 versions), Mini-Mental State Examination (MMSE), and Logical Memory data to develop ADNI-Mem, a composite memory score. We compared RAVLT and ADAS-Cog versions, and compared ADNI-Mem to AVLT recall sum scores, four ADAS-Cog-derived scores, the MMSE, and the Clinical Dementia Rating Sum of Boxes. We evaluated rates of decline in normal cognition, mild cognitive impairment (MCI), and AD, ability to predict conversion from MCI to AD, strength of association with selected imaging parameters, and ability to differentiate rates of decline between participants with and without AD cerebrospinal fluid (CSF) signatures. The second version of the RAVLT was harder than the first. The ADAS-Cog versions were of similar difficulty. ADNI-Mem was slightly better at detecting change than total RAVLT recall scores. It was as good as or better than all of the other scores at predicting conversion from MCI to AD. It was associated with all our selected imaging parameters for people with MCI and AD. Participants with MCI with an AD CSF signature had somewhat more rapid decline than did those without. This paper illustrates appropriate methods for addressing the different versions of word lists, and demonstrates the additional power to be gleaned with a psychometrically sound composite memory score.

Keywords: Memory, psychometrics, longitudinal analysis, cognition, hippocampus


Impairments in memory are a hallmark of Alzheimer’s disease (AD) and are requisite for diagnoses of the disease (McKhann et al. 1984). Assessment of memory was a crucial criterion influencing the composition of the neuropsychological battery used in the AD Neuroimaging Initiative (ADNI). The battery includes a variety of indicators of memory, including the Rey Auditory Verbal Learning Test (RAVLT) (Rey 1964), elements from the AD Assessment Scale—Cognitive Subscale (ADAS-Cog) (Mohs et al. 1997), the recall of three items from the Mini-Mental State Examination (MMSE) (Folstein et al. 1975), and recall of elements from a story from Logical Memory I of the Wechsler Memory Test-Revised (Wechsler 1987).

There are at least two reasons a memory composite score may be useful. First, summarizing all of the memory data with a single score facilitates comparisons with other variables without needing to address challenges raised by testing multiple hypotheses that would ensue if each of the memory indicators was considered separately. These other variables could be neuroimaging summaries, biomarkers, clinical diagnoses, or measures of other cognitive domains. Second, by including multiple indicators in a single score, the impact of measurement error due to idiosyncratic single items or subdomains is minimized.

Different word lists for the RAVLT and ADAS-Cog were administered at different study visits. A particular challenge that arose in these analyses was to address the two different versions of the RAVLT word lists and the three different versions of the ADAS-Cog word lists. It is important to determine whether these different versions of the RAVLT and ADAS-Cog have the same difficulty level before using total scores in longitudinal analyses. The assumption that different forms are equivalent is a strong assumption that needs to be checked (Millsap 2011). One of our goals was to compare the difficulties of the different versions of the RAVLT and ADAS-Cog used in ADNI.

Our primary goal was to develop and evaluate the validity of a psychometrically sophisticated memory composite score from the ADNI neuropsychological battery. We compared our composite memory score to a variety of other scores in a series of analyses to address the validity and performance of our composite score. First, we determined the ability of the composite to detect change over time in each diagnostic group. Second, we determined the ability to predict conversion from mild cognitive impairment (MCI) to AD. Third, we evaluated the strength of the relationship with MRI-derived parameters found in previous studies to be related to memory, including hippocampal volume, cortical thickness of the parahippocampal region, fusiform gyrus, and entorhinal cortex (Yonelinas et al. 2007; Walhovd et al. 2009; Fjell et al. 2008; Murphy et al. 2010; Van Petten et al. 2004). Finally, we compared rates of decline among people with normal cognition and with MCI who had a pattern of cerebrospinal fluid (CSF) biomarkers consistent with early AD (an “AD signature”) to rates of decline among people without the AD signature.


Participants and data source

Data used in this study were obtained from the ADNI database ( The ADNI was initiated in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies, and non-profit organizations. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessments can be combined to measure the progression of MCI and early AD. Determination of sensitive and specific markers of very early AD progression is intended to aid researchers and clinicians to develop new treatments and monitor their effectiveness, as well as lessen the time and cost of clinical trials. Michael W. Weiner, MD, VA Medical Center and University of California-San Francisco is the Principal Investigator of this initiative. This $60 million, multiyear public-private partnership involves many co-investigators from a broad range of academic institutions and private corporations. More than 800 participants, aged 55 to 90, have been recruited from across more than 50 sites in the US and Canada. This includes approximately 200 patients diagnosed with early AD who were followed for up to 2 years. Longitudinal imaging data, including structural 1.5 Tesla MRI scans, were collected on the full sample. Neuropsychological and clinical assessments were collected at baseline, and at follow-up visits occurring at six- to twelve-month intervals. Further information about ADNI can be found in (Jack et al. 2010a) and at The study was conducted after Institutional Review Board approval at each site. Written informed consent was obtained from all study participants, or their authorized representatives.

Diagnosis of amnestic MCI required patient-reported memory complaints, objective memory deficits, intact functional activities, a Clinical Dementia Rating (CDR) Scale (Morris 1993) global score of 0.5, and a MMSE (Folstein et al. 1975) score of 24 or more. Participants with AD met the National Institute of Neurological and Communicative Diseases and Stroke—Alzheimer’s Disease and Related Disorders Association criteria for probable AD (McKhann et al. 1984).

Cognitive and clinical measures

Memory indicators

Considerations for compiling the ADNI neuropsychological battery included the following: 1. Coverage of the domains of interest (memory, executive functions, language, attention, and visuospatial abilities); 2. Adequate sampling of cognitive domains of interest in subjects who are normal or who have MCI or AD; 3. Can measure change over a 2–3 year period; 4. Avoid ceiling or floor effects; 5. Were efficient and met practical demands; 6. Were utilized in the AD Clinical Study (ADCS) MCI trial and worked well in that setting. Additionally, the tests are widely used in AD Centers (ADCs) that are required to collect a Uniform Data Set, to reduce the amount of testing needed for participants enrolled in ADNI from ADCs.

The RAVLT uses a 15-item list of unrelated words. This list is read to the participant, who is asked to recall aloud as many of the words as they can. The number of successfully recalled words is recorded. The list is then repeated, and the participant again asked to recall as many words as they can. This process is repeated for a total of 5 learning trials, resulting in 5 scores. Then the examiner reads a new list of 15 words to the participant (an interference word list), and the participant is asked to recall as many of these words as possible. The participant is then asked to recall the initial word list, and the number of words recalled is recorded. After thirty minutes of other testing, the participant is again asked to recall as many words from the initial list as they can. The two versions of the RAVLT include different versions of the initial and interference word lists.

The ADAS-Cog includes two different memory tasks. First is a word list learning task similar to but distinct from that of the RAVLT. The ADAS-Cog word list includes 10 unrelated words (rather than 15) that are printed on cards. The participant is asked to read them aloud (while in the RAVLT they are read to the participant) and to remember them. There are three learning trials (rather than five in the RAVLT). After five minutes (rather than 30) of unrelated testing, the participant is asked to recall as many words as possible from the list.

The second memory task included in the ADAS-Cog is a word recognition task. In this task, the participant is given 12 cards with words printed on them, and asked to read them aloud and to remember them. Then the target words along with 12 distractor words are shown to the participant, who is asked to indicate whether the word was one they were supposed to recall. Two scores are recorded: the number of target words correctly identified as being part of the list (i.e., true positives), and the number of distractor words correctly identified as not being part of the list (i.e., true negatives).

The three different versions of the ADAS-Cog include different lists of the 10 words for the list learning trial as well as different lists of the 12 words for the recognition task.

For logical memory, a brief fact-laden passage is read aloud once. The participant is asked to recall as many of the passage’s 25 elements as they can, and the number of elements correctly recalled is recorded. After 30–40 minutes of other cognitive testing, the participant is again asked to recall the passage, and the number of elements correctly recalled in this delay condition is recorded.

In the MMSE, 3 words are read to the participant, who is asked to repeat them. Distractor tasks are then administered, after which the participant is asked to spontaneously recall the three words. Scores of 1 point are recorded for each item correctly recalled, and 0 for each item not correctly recalled.

Comparitor measures

We compared our composite (described below) to a variety of comparitors. The standard sum score for the five learning trials of the RAVLT was a primary comparator. Others included four versions and scores for the ADAS-Cog, including the original version (ADAS-Classic), the modified version of the ADAS-Cog that includes delayed recall (ADAS-Modified), a Rasch score developed for the original version of the ADAS-Cog (ADAS-Rasch (Wouters et al. 2008)), and a score obtained by recursive partitioning of the ADAS-Cog (ADAS-Tree (Llano et al. 2011)). Other comparitors included the total MMSE score and the sum of boxes from the CDR.

Dementia evaluation

Conversion from normal or MCI to AD was a primary outcome for ADNI and so was tracked very closely. Complete methods for identifying dementia cases can be found in the ADNI protocol available at the ADNI web site

Selected MRI-based imaging parameters

All participants had an MRI evaluation at each study visit. We identified four MRI parameters a priori as being associated with memory: hippocampal volume, thickness of the parahippocampus, thickness of the entorhinal cortex, and thickness of the fusiform gyrus. The neuroimaging methods utilized by ADNI have been described in detail previously (Jack et al., 2008) utilizing calibration techniques to maintain consistent protocols across scanners and sites. Raw dicom data of T1-weighted MP-RAGE scans acquired from 1.5 Tesla scanners at baseline visits from all participants were obtained via the ADNI database ( Images were processed through FreeSurfer version 4.0.3, a software program freely available at to obtain measurements of hippocampal volume and cortical thickness measurements for parahippocampal, entorhinal, and fusiform gyrus regions.


A subset of participants (n=415) had baseline lumbar punctures for CSF, which was evaluated for assays of amyloid β1-42 (Aβ), total tau, and phosphorylated tau181p (ptau). De Meyer et al. used Aβ and ptau to classify ADNI participants as having an “AD signature” or not (De Meyer et al. 2010), and provided us with the classes for these analyses.

Psychometric analyses of baseline data

Our initial modeling of memory focused on baseline data to determine whether a single factor model would be appropriate or whether a more complicated model would be necessary.

We used Mplus statistical software for all models (Muthén and Muthén 2006). Mplus facilitates very flexible modeling but allows a maximum of 10 categories per categorical indicator. We re-coded memory indicators to have a maximum of 10 categories. We developed a re-coding algorithm based on preserving variability at the extremes of the distribution at the expense of variability in the middle range of the distribution. Specific re-coding we used is shown in Table S1.

We compared a single factor model to a bi-factor model that included additional terms to capture covariance not due to the underlying factor defined by all of the indicators (McDonald 1999; Reise et al. 2007). Our initial task was then to identify one or more specific candidate bi-factor models to compare with the single factor model. We considered two approaches: one accounting for theoretical considerations regarding memory subtypes assessed by each of the indicators, and the other accounting for methods effects.

For the first approach, before we looked at data we (P.K.C., A.C., and D.M.) assigned memory indicators from the ADNI data set into categories based on the memory subtype it assessed (“content” models). Specific subtypes we considered were list learning and paragraph recall. For the second approach, we considered whether the same stimulus was being assessed several times (“methods” models). For example, for the ADAS-Cog, there were three word list learning trials and a recall trial of the same list of words, while the recognition task was of a different list of words but had both true and false positives. We thus modeled a secondary methods factor for the first four indicators which would capture the facility people had with those specific words beyond their overall memory ability, and a secondary residual correlation between the true and false positives for the recognition task, which captures additional covariation between those indicators beyond their relationship with overall memory.

We compared these candidate secondary domain structures on the basis of published desirable thresholds for the fit statistics (Reeve et al. 2007). We specifically focused on the confirmatory fit index (CFI), where values >0.95 are consistent with excellent fit; on the Tucker-Lewis Index (TLI), where values >0.95 are consistent with excellent fit; and on the root mean squared error of approximation (RMSEA), where values <0.08 are consistent with adequate fit and values <0.05 are consistent with excellent fit. Based on these analyses, the bi-factor model with methods effects was far superior to the content bi-factor model, so we only considered the methods effects model in subsequent analyses.

Finally we compared the single factor and the methods bi-factor models. We noted the fit indices for these two models, though fit statistics were not deciding criteria. Much more important for our purposes was the correlation between memory factor scores from the two models, and the scatter plot showing the relationship between these scores. We also compared the loadings for each indicator on the overall memory factor, with and without the secondary domain structure.

Mplus code for all of these analyses is available on request from the first author.

Psychometric analyses of longitudinal data

The task of modeling the longitudinal memory data was complicated by the multiple forms of the ADAS-Cog word lists and the RAVLT word list. Furthermore, Logical Memory I was only assessed at annual visits. The only indicators consistently present across visits were the three word recall items from the MMSE. Technically these three dichotomous indicators (i.e., correct / incorrect) could be used to anchor the scales across time points (Steven P. Reise et al. 1993), but we were concerned that this anchoring would be too sparse for firm conclusions to be drawn. Because of the multiple versions of the RAVLT and the ADAS-Cog administered at different ADNI study visits, we needed to use longitudinal data to establish our final composite scores, since we could not assume that the different versions were of the same difficulty.

Based on results from initial cross-sectional modeling described above, we limited ourselves to single factor models. We divided the data set into two parts: first, the annual visits (baseline, month 12, and month 24), and second, the other visits (month 6, 18, and 36). Logical Memory I and II were assessed at each of the visits in the first half of the data set, so those much richer indicators were used as anchors alongside the three dichotomous MMSE indicators. Furthermore, at each of those visits, only the first version of the RAVLT was assessed, so it could also act as an anchor. The only thing that varied at those visits was thus the three different versions of the ADAS-Cog. We fit a longitudinal model using all available data for the annual visits of the first half of the data set. We identified the scale by specifying the variance of the general factor to be 1 at the baseline visit, when its mean was 0. We allowed the mean and the variance of the general factor to vary at other time points, and the general factors were freely correlated with each other. We freely estimated the loadings on the general factor, but constrained those loadings from the same indicators to be the same across time points. For example, for the first MMSE item, we freely estimated the loading on the overall memory factor at each time point, but constrained that loading to be the same at baseline, month 12, and month 24.

We captured point estimates for the loadings and thresholds for the three MMSE items, Logical Memory I and II, and the three versions of the ADAS-Cog from the first half of the data set. We then turned our attention to the second half of the data set that included data from study visits at months 6, 18, and 36. The second version of the RAVLT word list was used at each of these study visits. We used the MMSE items, the ADAS-Cog version 2 (month 6), version 1 (month 18), and version 3 (month 36), and Logical Memory (month 36) as anchors to estimate item parameters for the second version of the RAVLT. The longitudinal modeling strategy was similar to that described for the first half of the data. Because we were fixing item loadings and thresholds for the anchor items, the scale was still anchored to the mean of 0 and variance of 1 at the baseline visit, we freely estimated the means and variances at each of the study visits included in this second half of the data. Script files for these analyses are available on request.

We extracted factor scores for each participant at each study visit (named ADNI-Mem in the ADNI data set). We compared item parameters (factor loadings and category thresholds) across the three different versions of the ADAS-Cog and the two different versions of the RAVLT.

Mplus code for all of these analyses is available on request from the first author.

Comparisons of scores

We performed several analyses to compare our memory composite to other scores.

Rates of change

We examined the sensitivity of each measure to change over time in each of the three diagnostic groups using z-statistics based on the coefficients and standard errors for time from mixed models for the cognitive outcomes using random intercepts and slopes and an unstructured covariance matrix, controlling for age, education, sex and presence of one or more APOE ε4 alleles. We used the coefficients for year and the adjusted residual standard deviation from these models to determine sample sizes needed per group to detect a 25 % reduction in the rate of decline in 12 months for a two-arm trial, with 80 % power and alpha = 0.05, assuming a two-sided test.

Time to conversion for people with MCI

We compared the strength of association between cognition and risk of developing dementia, using accelerated failure time models of time to AD, with a Weibull distribution, controlling for age, education, sex, and presence of one or more APOE ε4 alleles. We performed two sets of analyses. First, we evaluated baseline cognitive scores. Second, we performed a lagged analysis to compare the strength of association between cognitive variability at each visit and risk of developing dementia at the subsequent study visit.

Strength of association with MRI parameters

We determined the strength of association between cognitive scores and selected MRI values from baseline in each of the diagnostic groups using linear regression models predicting the cognitive outcome, adjusting for total intracranial volume, age, education, sex, and presence of one or more APOE ε4 alleles.

Ability to differentiate trajectories of participants with CSF AD signatures among people with normal cognition and with MCI

We used mixed effects models to determine the ability of each cognitive measure to differentiate the cognitive trajectories of participants with an AD profile of CSF biomarkers compared to people without that profile. Our rationale for limiting these analyses to participants with normal cognition and with MCI was that people ultimately destined to develop AD should have greater rates of decline in cognition in general and memory in particular than people not destined to develop AD, but that the AD CSF profile might not have a relationship with subsequent trajectories of cognition among people with established AD (Jack et al. 2010b). Analyses were conducted within each diagnostic group with random intercepts and slopes and an unstructured covariance matrix, controlling for age, education, sex, and presence of one or more APOE ε4 alleles.


Characteristics of participants

Of the 819 ADNI participants eligible at baseline, 803 had complete data for our cognitive outcomes at one or more study visits. Of these, 225 had normal cognitive functioning, 394 had mild cognitive impairment (MCI), and 184 had AD. Demographic, clinical, CSF, and imaging data for these individuals are shown in Table 1.

Table 1
Demographic, clinical, CSF and MRI data by baseline diagnosis (n=803 with complete cognitive data)

Cross-sectional analyses of memory indicators

We compared candidate bi-factor models as described in the Methods section. Our best-fitting candidate model had secondary domains for methods effects, and split the RAVLT into a learning factor (including the interference list) and a recall factor. The path diagram for the selected bi-factor model is shown in Fig. 1. Loadings for the bi-factor model are shown in Table 2. The first column of data shows standardized loadings for the overall “Memory” factor. The second column of data shows loadings for the relevant subdomain. We shaded the rows to highlight membership of particular memory indicators in particular subdomains. Two pairs of items had residual correlations rather than underlying factors; we show the residual correlation in one row of the table and place one or two asterisks in the corresponding row of the partner indicator. All of the standardized factor loadings on the overall “Memory” factor were well over 0.30, McDonald’s threshold for salience (McDonald 1999), suggesting that all of the items—including the three dichotomous MMSE words—are salient indicators of overall memory. For each indicator, loadings on the overall “Memory” factor were higher than the loading on the method subdomain factor. Several of the loadings on the method subdomain factors were below the 0.30 threshold for salience. There was a negative correlation between the true and false positive indicators for the ADAS-Cog recognition task. The factor loadings for these two items indicate that both true hits and true misses are salient indicators of overall memory, and that they have a negative residual correlation, meaning that beyond their overall relationship with memory they have a negative relationship with each other. We suspect this reflects the effects of strategies for guessing. If a respondent is not sure whether a candidate word was truly presented and guesses, and has a strategy of guessing “present,” then the number of true hits will be higher and the number of true misses will be lower; conversely, if a respondent has a strategy of guessing “absent,” then the number of true hits will be higher and the number of true misses will be lower. Taken together, these strategies for guessing result in a negative residual correlation—the parts of these scores not reflecting overall memory are negatively related to each other.

Fig. 1
Bi-factor model path diagram for baseline data. RAVLT=Rey Auditory Verbal Learning Test. ADAS=Alzheimer’s Disease Assessment Schedule. MMSE=Mini-Mental State Examination. Covariation across all the indicators is modeled with loadings on the primary ...
Table 2
Factor loadings for the primary and secondary factors for the bi-factor model from baseline

We compared the bi-factor model described above to the single factor model that assumed no residual relationships. The bi-factor model fit the data better than a single factor model. For the bi-factor score, the CFI was 0.97, the TLI was 0.99, and the RMSEA was 0.086. For the single factor model, the CFI was 0.89, the TLI was 0.97, and the RMSEA was 0.179.

Category thresholds are determined from the proportions of people responding in each category, and threshold values for all indicators are identical for the single and bi-factor models; the only difference between the models was to be found in the factor loadings. We show a comparison of the factor loadings in Table 3. As expected, most loadings on the general factor were somewhat attenuated in the bi-factor model compared to the single factor model, since some of the covariation assumed to be related to the general factor in the single factor model was modeled in secondary domains and residual correlations in the bi-factor model. The largest absolute difference was for Trial 1 of the RAVLT, which had loadings of 0.62 in the single factor model and 0.55 in the bi-factor model, a difference of 0.07, or 11 % of the single factor loading. None of the other indicators had differences as large as 10 %. As expected, when ignoring the negative residual correlation between the recognition tasks for the ADAS-Cog, the loadings on the primary factor were somewhat smaller. Differences in loadings for those two indicators were small between the single factor and the bi-factor model, and loadings on the overall factor were still over the 0.30 threshold for salience.

Table 3
Factor loadings on the general (overall memory) factor for the single factor and bi-factor models

The overall correlation between single-factor and bi-factor scores for memory at the baseline exam was 0.99. The correlation for participants with AD was 0.98; for participants with MCI it was 0.99; for participants with normal cognition it was 0.98. A scatter plot did not suggest any systematic differences from the diagonal (Figure S1).

These results suggested that a single-factor model was appropriate for our purposes, as there was negligible difference between single-factor and bi-factor scores.

Version effects for the RAVLT and the ADAS-Cog

The loadings for each of the indicators from the two versions of the RAVLT were very similar (Table S2); as a proportion, they ranged from 5 % smaller to 3 % larger between the two versions. The difficulty levels for the category thresholds, however, displayed important differences between the two versions, as shown in Fig. 2. The values for the thresholds between item categories are plotted on the Y axis. Version 1 thresholds are shown in blue circles, while version 2 thresholds are shown in green diamonds. For all of the trials with the exception of List B (the distractor list), the Version 2 list is more difficult (has higher thresholds) than the Version 1 list. As expected, recall is more difficult than recognition (see two right-most sets of thresholds). These differences in difficulty thresholds mean RAVLT total scores for any person with high memory ability levels would be expected to differ by 5 or 6 points entirely as a function of which version of the test was used. For people with lower memory ability levels, expected differences in RAVLT total scores are smaller, but the expected difference would still be 2 or 3 points entirely as a function of which version of the test was used.

Fig. 2
Difficulty levels for the elements of the two versions of the Rey Auditory Verbal Learning Test. The five learning trials are indicated by the numbers 1 through 5; the interference trial by the letter B, the first recall trial by the number 6; delayed ...

The ADAS-Cog versions were more similar to each other, at least in terms of category thresholds (see Fig. 3). Version 1 had a greater spread of thresholds than Version 2 and to a lesser extent than Version 3, which means that it should be somewhat better able to differentiate among people at the extremes of memory ability with fewer ceiling or floor scores. The loadings for the learning trials and recall of the three versions of the ADAS-Cog list learning task were very similar to each other, with differences ranging from 4 percent lower to 2 percent higher (Table S3). The recognition present and recognition absent tasks had somewhat dissimilar loadings. In no case were these strong indicators of overall memory (standardized loadings ranged from 0.43 to 0.56, roughly half the magnitude of loadings for the list learning indicators). The largest overall difference in loading between versions was 0.13 for recognition correct between Version A and Version C, which in terms of percentage was a 30 % difference in loadings.

Fig. 3
Difficulty levels for the elements of the three versions of the Alzheimer’s Disease Assessment Scale – Cognitive Subscale. Recog=Recognition. Version A threshold difficulty levels are depicted with blue circles, Version B with green diamonds, ...

Comparison of the ADNI-Mem to other measures

Table 4 shows the standardized coefficients for change over time for our ADNI-Mem composite score and for the comparison measures. The table highlights the two tests of memory (ADNI-Mem and the RAVLT) in the top section, and proceeds to address tests of global cognition (several scores derived from the ADAS-Cog and the MMSE) and a global clinical measure (the CDR sum of boxes). There is not much change that occurs over the course of two years for ADNI participants with normal cognition. This is reflected in the small standardized coefficients for all of the measures. Indeed, on average, ADNI-Mem and two of the global ADAS-Cog scores indicate very modest improvement in cognition over two years (positive coefficients). Among people with MCI, ADNI-Mem performed somewhat better than the RAVLT sum score, and nearly as well as the global ADAS-Cog scores or the clinical CDR sum of boxes. Among people with AD, all of the scores are able to detect robust changes over time, and ADNI-Mem performed somewhat better than the RAVLT total score.

Table 4
Coefficients for time, in mixed models for cognition controlling for age, education, gender and presence of one or more APOE ε4 alleles. Bold font indicates p<0.05. Sample size needed per group to detect a 25 % decrease over 12 months, ...

Table 5 shows results for the ability of the scores to predict conversion to dementia. Results appeared similar for all of the scores, though time ratios (the equivalent of hazard ratios had we used Cox models) for ADNI-Mem were either the best or second best among all of the measures assessed.

Table 5
Time ratios (TR), with 95 % confidence intervals (CI), for predicting conversion to dementia, controlling for age, education, gender and presence of one or more APOE ε4 alleles. ADAS and CDR-SB scores reversed so that higher scores represent better ...

Table 6 shows results for the cross-sectional association of each score with four neuroimaging parameters from MRI. Findings among people with normal cognition are difficult to understand, as there is a statistically significant inverse relationship between fusiform thickness and our ADNI-Mem composite score. This inverse relationship was also present for the RAVLT total score. For people with MCI, there were strong associations in the expected direction between ADNI-Mem and all four neuroimaging markers, suggesting that poorer memory performance was associated with smaller hippocampal volumes and with thinner cortex in the parahippocampal, fusiform, and entorhinal regions. Further, in each case the strength of association for these imaging findings was somewhat stronger than that for the total RAVLT score, and comparable to that of the various versions of the ADAS-Cog. Among people with AD, there was again a strong association between ADNI-Mem and each of the imaging parameters, and the strength of this association was somewhat stronger in each case than that for the RAVLT total score.

Table 6
Coefficients for MRI thickness measures from regression models for the cognitive measure controlling for age, education, gender, presence of one or more APOE ε4 alleles, and intracranial volume. Bolded coefficients indicate p-values < ...

Table 7 shows results for the differences in intercept and rates of decline associated with having an AD CSF signature for people with normal cognition and MCI. Among people with normal cognition, there was little difference in trajectories associated with having the AD CSF signature, though there were differences in trajectories for the modified ADAS-Cog and the CDR sum of boxes in the hypothesized direction (i.e., people with the AD CSF signature had faster rates of decline). Among people with MCI, all of the measures considered suggested faster rates of decline among people with the AD CSF signature. This difference was largest for ADNI-Mem.

Table 7
Z-scores for the slope and intercept of CSF-based AD signature group from mixed models for change in the cognitive outcomes, controlling for age, education, sex and presence of one or more APOE ε4 alleles. Bolded coefficients indicate p-values ...


In this paper we present methods we used to derive a memory composite from the neuropsychological battery administered in ADNI. We found a single factor model to be quite acceptable for the memory indicators from this battery. Our composite addresses an under-appreciated challenge in these data, which is that the study administered three different versions of the ADAS-Cog word lists and two different versions of the RAVLT word lists. We found that the ADAS-Cog item thresholds were similar across versions, though the relative importance of the recognition tasks varied somewhat. For the RAVLT, on the other hand, we found an important difference in difficulty levels, as the second version of RAVLT was systematically more difficult than the first version. Failing to account for these differences in difficulty levels could result in strange results if standard sum scores are used. Our memory composite performed well in comparison to other cognitive measures. It was able to detect change over time well among people with MCI and AD. It was a strong predictor of conversion from MCI to AD. It was strongly associated with a priori specified neuroimaging parameters selected on the basis of their known association with memory performance. It was able to detect differences in changes over time for people with MCI who had CSF biomarkers suggesting an AD signature.

These results suggest that the two RAVLT word lists used in ADNI are not equivalent to each other (list 2 is systematically harder than list 1). If standard total scores are used, this may result in artifactual saw-tooth patterns in plots of performance over time, since people with no change in actual memory performance would be expected to have higher scores / lower scores / higher scores / lower scores at alternating visits. Because of the design of the study, participants with AD did not have an 18-month study visit, so their four observations would have the pattern higher scores / lower scores / higher scores / higher scores. The scoring approach adopted for ADNI-Mem accounts for the different difficulty levels of the two versions of the RAVLT. We did not account for different versions of the RAVLT when using changes in the RAVLT in analyses; we are not familiar with traditional methods for doing so, and to our knowledge different version effects have not been considered in publications that have analyzed ADNI RAVLT data.

The three versions of the ADAS-Cog were much more similar to each other than were the two versions of the RAVLT to each other. Nevertheless, there were differences in the relative importance of the recognition tasks across the different versions of the ADAS-Cog. Attention could be paid to the relative importance of these recognition tasks in the different versions of the ADAS-Cog, especially if the scoring to be applied to these versions does not account for this.

The ADNI-Mem composite score has several desirable features. It appears to have good validity, as it performed as well or better than the RAVLT in each of the analyses performed. Unlike the standard sum scores used for the RAVLT, however, ADNI-Mem accounts for the different versions of the RAVLT and the ADAS-Cog. ADNI-Mem also includes additional information from logical memory and from the MMSE, incorporating all of the memory-related information available from the neuropsychological battery administered in ADNI. Basing inferences on a multiple indicator composite rather than single measures conserves statistical power by reducing the number of potential comparisons, and may reduce measurement error. It uses a sophisticated modern psychometric approach that is based entirely on inter-relationships among items rather than external criteria such as those used in the recursive partitioning approach that generated the ADAS-Tree scores. The modern psychometric approach used to generate the ADNI-Mem scores has linear scaling properties that are appropriate for tracking changes over time (Crane et al. 2008; Mungas and Reed 2000).

The rationale for using the ADNI-Mem score in analyses of ADNI data is thus multifaceted. From a theory perspective, it has many desirable properties. These include incorporating all memory indicators, thus maximizing measurement precision of the memory level underlying responses to memory items; it has linear scaling properties that are especially important in longitudinal analyses; and it accounts for version effects in the RAVLT and ADAS-Cog. From a data-driven perspective, it also has desirable properties: it appears to be at least as valid as its constituent parts, and did well in predicting people who would progress from MCI to AD and in detecting changes over time. We have submitted our ADNI-Mem scores to the ADNI data base and recommend their use by any researcher using the ADNI data set who has substantive questions about memory. Specifically, the ADNI-Mem scores may be particularly useful for imaging researchers who wish to compare image processing and analysis techniques in terms of the strength of associations between imaging and memory.

Limitations should be considered in interpreting our results. We were limited by the battery of tests administered by ADNI. We suspect—but cannot confirm—that similar findings would have been obtained had other tests been used. Although the ADNI battery is fairly rich in its assessment of memory, the advantages of a composite score approach would presumably be even more apparent if even more tests were available. We did not compare the ADNI neuropsychological battery to any other battery of tests, and cannot comment on whether it may be superior to other batteries used clinically or in other research studies. The ADNI data set includes rich neuroimaging results available from study participants, making it an ideal setting for our analyses comparing various scores to imaging findings. We selected four specific measures a priori. Had we selected different measures we could have found different findings. Similarly, there are a variety of ways of estimating hippocampal volume. We relied on one particular technique. Only a subset of the ADNI sample had CSF measures. Our findings would have been more robust had our sample sizes for the CSF analyses been larger.

In conclusion, this paper outlines the methods for developing the ADNI-Mem composite measure of memory for the ADNI study, and compares it to several other cognitive tests. We also found that the two versions of the RAVLT are of very different difficulty levels, a fact that is accounted for in the composite ADNI-Mem scores. The ADNI-Mem scores should be used when a single indicator of memory performance is desired. We have supplied these scores so that they are available in the ADNI data set.

Supplementary Material



Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Abbott; Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Amorfix Life Sciences Ltd.; AstraZeneca; Bayer HealthCare; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals Inc.; Eli Lilly and Company; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; GE Healthcare; Innogenetics, N.V.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Servier; Synarc Inc.; and Takeda Pharmaceutical Company. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health ( The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory of Neuro Imaging at the University of California, Los Angeles. This research was also supported by NIH grants P30 AG010129, K01 AG030514, and the Dana Foundation. Data management and the specific analyses reported here wer\e supported by NIH grant R01 AG029672 (Paul Crane, PI), P50 AG05136 (Murray Raskind, PI), and R13 AG030995 (Dan Mungas, PI).


Electronic supplementary material The online version of this article (doi:10.1007/s11682-012-9186-z) contains supplementary material, which is available to authorized users.

Contributor Information

Paul K. Crane, University of Washington, Box 359780, Harborview Medical Center, 325 Ninth Avenue, Seattle, WA 98104, USA.

Adam Carle, University of Cincinnati School of Medicine, Cincinnati Children’s Hospital Medical Center, and University of Cincinnati College of Arts and Sciences, 3333 Burnet Avenue, MLC 7014, Cincinnati, OH 45229, USA.

Laura E. Gibbons, University of Washington, Box 359780, Harborview Medical Center, 325 Ninth Avenue, Seattle, WA 98104, USA.

Philip Insel, Center for Imaging of Neurodegenerative Diseases (CIND), San Francisco VA Medical Center, 4150 Clement Street, San Francisco, CA 94121, USA.

R. Scott Mackin, Center for Imaging of Neurodegenerative Diseases (CIND), San Francisco VA Medical Center, 4150 Clement Street, San Francisco, CA 94121, USA.

Alden Gross, Department of Psychiatry, Institute for Aging Research, Hebrew Senior Life, 1200 Center Street, Boston, MA 02131, USA.

Richard N. Jones, Department of Psychiatry, Institute for Aging Research, Hebrew Senior Life, 1200 Center Street, Boston, MA 02131, USA.

Shubhabrata Mukherjee, University of Washington, Box 359780, Harborview Medical Center, 325 Ninth Avenue, Seattle, WA 98104, USA.

S. McKay Curtis, University of Washington, Box 359780, Harborview Medical Center, 325 Ninth Avenue, Seattle, WA 98104, USA.

Danielle Harvey, Division of Biostatistics, Department of Public Health Sciences, University of California at Davis, One Shields Avenue, Davis, CA 95616, USA.

Michael Weiner, Center for Imaging of Neurodegenerative Diseases (CIND), San Francisco VA Medical Center, 4150 Clement Street, San Francisco, CA 94121, USA.

Dan Mungas, Department of Neurology, University of California at Davis, 4860 Y St., Suite 0100, Sacramento, CA 95817, USA.


  • Crane PK, Narasimhalu K, Gibbons LE, Mungas DM, Haneuse S, Larson EB, et al. Item response theory facilitated cocalibrating cognitive tests and reduced bias in estimated rates of decline. Journal of Clinical Epidemiology. 2008;61(10):1018–1027. e1019. [PMC free article] [PubMed]
  • De Meyer G, Shapiro F, Vanderstichele H, Vanmechelen E, Engelborghs S, De Deyn PP, et al. Diagnosis-independent Alzheimer disease biomarker signature in cognitively normal elderly people. Archives of Neurology. 2010;67(8):949–956. doi: 10.1001/archneurol.2010.179. [PMC free article] [PubMed] [Cross Ref]
  • Fjell AM, Walhovd KB, Amlien I, Bjornerud A, Reinvang I, Gjerstad L, et al. Morphometric changes in the episodic memory network and tau pathologic features correlate with memory performance in patients with mild cognitive impairment. AJNR American Journal of Neuroradiology. 2008;29(6):1183–1189. doi: 10.3174/ajnr.A1059. Research Support, Non-U.S. Gov’t. [PubMed] [Cross Ref]
  • Folstein MF, Folstein SE, McHugh PR. Mini-mental state. A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research. 1975;12(3):189–198. [PubMed]
  • Jack CR, Jr, Bernstein MA, Borowski BJ, Gunter JL, Fox NC, Thompson PM, et al. Update on the magnetic resonance imaging core of the Alzheimer’s disease neuroimaging initiative. Alzheimers Dement. 2010a;6(3):212–220. doi: 10.1016/j.jalz.2010.03.004. [PMC free article] [PubMed] [Cross Ref]
  • Jack CR, Jr, Knopman DS, Jagust WJ, Shaw LM, Aisen PS, Weiner MW, et al. Hypothetical model of dynamic biomarkers of the Alzheimer’s pathological cascade. Lancet Neurology. 2010b;9(1):119–128. [PMC free article] [PubMed]
  • Jack CR, Jr, Bernstein MA, Fox NC, Thompson P, Alexander G, Harvey D, et al. The Alzheimer’s Disease Neuroimaging Initiative (ADNI): MRI methods. J Magn Reson Imaging. 2008;27(4):685–691. doi: 10.1002/jmri.21049. [PMC free article] [PubMed] [Cross Ref]
  • Llano DA, Laforet G, Devanarayan V. Derivation of a new ADAS-cog composite using tree-based multivariate analysis: prediction of conversion from mild cognitive impairment to Alzheimer disease. Alzheimer Disease and Associated Disorders. 2011;25(1):73–84. [PubMed]
  • McDonald RP. Test theory: a unified treatment. Mahwah: Erlbaum; 1999.
  • McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stadlan EM. Clinical diagnosis of Alzheimer’s disease: report of the NINCDS-ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer’s Disease. Neurology. 1984;34(7):939–944. [PubMed]
  • Millsap RE. Statistical approaches to measurement invariance. Routledge; 2011.
  • Mohs RC, Knopman D, Petersen RC, Ferris SH, Ernesto C, Grundman M, et al. Development of cognitive instruments for use in clinical trials of antidementia drugs: additions to the Alzheimer’s Disease Assessment Scale that broaden its scope. The Alzheimer’s Disease Cooperative Study. Alzheimer Disease and Associated Disorders. 1997;11(Suppl 2):S13–21. [PubMed]
  • Morris JC. The Clinical Dementia Rating (CDR): current version and scoring rules. Neurology. 1993;43(11):2412–2414. [PubMed]
  • Mungas D, Reed BR. Application of item response theory for development of a global functioning measure of dementia with linear measurement properties. Statistics in Medicine. 2000;19(11–12):1631–1644. [PubMed]
  • Murphy EA, Holland D, Donohue M, McEvoy LK, Hagler DJ, Jr, Dale AM, et al. Six-month atrophy in MTL structures is associated with subsequent memory decline in elderly controls. NeuroImage. 2010;53(4):1310–1317. doi: 10.1016/j.neuroimage.2010.07.016. Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t. [PMC free article] [PubMed] [Cross Ref]
  • Muthén L, Muthén B. Mplus users guide. Version 4.1 ed. Los Angeles: Muthen and Muthen; 2006.
  • Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) Med Care. 2007;45(5 Suppl 1):S22–31. [PubMed]
  • Reise SP, Morizot J, Hays RD. The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of life research: an international journal of quality of life aspects of treatment, care and rehabilitation. 2007;16(Suppl 1):19–31. [PubMed]
  • Reise SP, Widaman KF, Pugh RH. Confirmatory factor analysis and item response theory: two approaches for exploring measurement invariance. Psychological Bulletin. 1993;114(3):552–66. [PubMed]
  • Rey A. L’examen clinique en psychologie. Paris: Presses Universitaires de France; 1964.
  • Van Petten C, Plante E, Davidson PS, Kuo TY, Bajuscak L, Glisky EL. Memory and executive function in older adults: relationships with temporal and prefrontal gray matter volumes and white matter hyperintensities. Neuropsychologia. 2004;42(10):1313–1335. doi: 10.1016/j.neuropsychologia.2004.02.009. Clinical Trial Research Support, U.S Gov’t, P.H.S. [PubMed] [Cross Ref]
  • Walhovd KB, Fjell AM, Amlien I, Grambaite R, Stenset V, Bjornerud A, et al. Multimodal imaging in mild cognitive impairment: metabolism, morphometry and diffusion of the temporal-parietal memory network. NeuroImage. 2009;45(1):215–223. doi: 10.1016/j.neuroimage.2008.10.053. [PubMed] [Cross Ref]
  • Wechsler D. WMS-R: Wechsler Memory Scale—Revised manual. NY: Psychological Corporation / HBJ; 1987.
  • Wouters H, van Gool WA, Schmand B, Lindeboom R. Revising the ADAS-cog for a more accurate assessment of cognitive impairment. Alzheimer Disease and Associated Disorders. 2008;22(3):236–244. [PubMed]
  • Yonelinas AP, Widaman K, Mungas D, Reed B, Weiner MW, Chui HC. Memory in the aging brain: doubly dissociating the contribution of the hippocampus and entorhinal cortex. Hippocampus. 2007;17(11):1134–1140. doi: 10.1002/hipo.20341. [PMC free article] [PubMed] [Cross Ref]