|Home | About | Journals | Submit | Contact Us | Français|
Data from the Einstein Aging Study (EAS) were used to prospectively evaluate the free recall score from the Free and Cued Selective Reminding Test (FCSRT-FR) and Logical Memory I immediate recall (LM-IR) subtest of the Wechsler Memory Scale–Revised for prediction of incident Alzheimer disease (AD) dementia among individuals from a community-based cohort with memory complaints.
Analyses included 854 participants, age ≥70 years, who initially had no dementia, and had memory complaints. Clinic evaluations were completed annually and AD dementia was diagnosed using standard criteria (n = 86 cases; average follow-up 4.1 years). Time-dependent receiver operating characteristic analysis was used to evaluate the prognostic ability of FCSRT-FR and LM-IR for incident AD over various durations of follow-up.
For identifying those with memory complaints who will develop incident AD dementia over 2–4 years, the FCSRT-FR had better operating characteristics than LM-IR. APOE ε4 status, age, and education did not affect cut points; however, positive predictive values were higher among APOE ε4-positive individuals.
For follow-up intervals of 2–4 years, the FCSRT-FR is more predictive than the LM-IR for identifying individuals with memory complaints who will develop incident AD. APOE ε4 status improves positive predictive value, but does not affect the choice of optimal cuts.
Diagnostic guidelines for Alzheimer disease (AD) have been revised to reflect the view of the disease as a continuum.1,2 The long preclinical course of AD dementia suggests the potential for preventive measures aimed at the preclinical stages, and the search for biomarkers to identify asymptomatic individuals at high risk is underway. In the absence of definitive biomarkers to facilitate primary prevention, secondary prevention trials target persons without dementia with memory impairment and aim to delay or prevent onset of clinical AD dementia.1,2 A critical step in the design of these trials is the ability to discriminate between those with memory complaints attributable to underlying AD and those with impairment due to other causes.3−5 There are different approaches to conceptualizing the symptomatic predementia phase of AD. Definitions of mild cognitive impairment (MCI) rely on norming cognitive performance relative to an age- and education-matched sample. However, if the goal is to identify individuals with high probability of developing AD dementia, the ideal cognitive cut scores for cognitive screening could be determined prospectively based on ability to predict incident AD dementia. Demographic and genetic covariates might enhance the predictive ability of these longitudinally derived screening cut scores. Because intervention studies may involve various lengths of follow-up, it is important to determine whether the cut scores that optimize prediction depend upon length of follow-up.
The Einstein Aging Study (EAS) cohort provides the opportunity to evaluate the ability of 2 previously validated tests, the free recall component of the Free and Cued Selective Reminding Test (FCSRT-FR)6,7 and Logical Memory I immediate recall (LM-IR) of the Wechsler Memory Scale–Revised,8 to identify which community-based older individuals with cognitive complaints are likely to develop AD dementia over specified follow-up periods. We determine the sensitivity, specificity, and positive predictive values, the cut points that optimize prediction of incident AD dementia over 2–4 years of follow-up, and assess whether APOE ε4 status, a known genetic risk factor for AD dementia,9 age, or education level affect test operating characteristics.
The analysis is based on longitudinal data from the EAS. The cohort includes systematically recruited adults from a multiethnic, community-dwelling population in Bronx County, New York. Enrolled participants were at least 70 years of age, Bronx residents, noninstitutionalized, and English-speaking. Exclusions included visual or auditory impairments, active psychiatric symptomatology that interfered with assessments, and nonambulatory status.10
Written informed consent was obtained according to protocols approved by the institutional review board.
In-person evaluations were completed at baseline and annually. Assessments included demographic information, medical history, medications, and health behaviors. Participants underwent a standard neurologic examination adapted from the Unified Parkinson's Disease Rating Scale.11 The neurologist documented clinical impression of cognitive impairment using the Clinical Dementia Rating (CDR).12 Subjective cognitive complaints were ascertained using the Clinical History–Cognitive Impairment/Dementia Questionnaire of the Consortium to Establish A Registry for Alzheimer's Disease (CERAD)–self 13 and the Albert Einstein Health Self-Assessment Form (HSA), which inquires about current memory problems and changes in memory compared to 1 and 10 years ago. Global cognitive status was ascertained by the Blessed Information-Memory-Concentration test.14 Memory was measured using the picture version of the FCSRT-FR6 and the LM-IR test.8 Each test is scored such that a higher score indicates better performance (FCSRT-FR range = 0–48; LM-IR range = 0–50).
The FCSRT begins with a study phase in which participants are asked to name pictures of 16 common items. They are then presented with these pictures in a series of pages, each with 4 objects arranged in a grid. Participants orally identify each picture following a categorical prompt. After all 16 are correctly identified, immediate recall is tested by removing the visual stimulus. In the free recall condition, the participant recalls as many of the pictures as possible. If the participant fails to recall a picture, he or she is provided with a category cue to test cued recall. There are 3 free and 3 cued recall trials. A 20-second interference (counting backward) is administered between each.6,7 This analysis utilizes the sum of free recall across the 3 trials (score range 0–48). Free recall has been shown to be predictive of incident dementia, and has been correlated with hippocampal measures in the EAS sample.6,15 Free rather than total recall (sum of free and cued items recalled) was selected as total recall demonstrates ceiling effects in our community-based sample, and because FCSRT-FR has been shown to be independent of education and race.6
The LM-IR assesses the ability to learn and retrieve contextually related verbal information through immediate free recall of 2 short stories, each containing 25 elements of information.8
Dementia diagnosis was based on standardized criteria from the DSM-IV16 and required impairment in memory plus at least one additional cognitive domain, accompanied by functional decline. Diagnoses were assigned at consensus case conferences which included comprehensive review of all cognitive tests, neurologic signs and symptoms, and functional status by 2 neurologists and a neuropsychologist. Memory impairment was defined as scores in the impaired range on any memory test in the neuropsychological battery (FCSRT-FR ≤246 or 1.5 SDs below the age-adjusted mean on LM-IR). Functional impairment was determined by scores on the Lawton Brody Scale,17 clinical evaluation, and informant questionnaires. AD dementia was diagnosed in dementia cases meeting established criteria for probable or possible AD.18 AD refers to AD alone or in combination with other dementia disorders. A subset of individuals who participated in the EAS have come to autopsy, providing an important quality control for diagnostic accuracy. Based on an autopsy sample of 175, a clinical diagnosis of dementia had a positive predictive value (PPV) of 96% for significant pathology upon autopsy. A clinical diagnosis of possible or probable AD had a PPV of 79% for the presence of NIA-Reagan19 intermediate or high likelihood Alzheimer-type pathology, while 71% of those with Braak stage ≥3 had been assigned a clinical diagnosis of AD.
Because both LM and FCSRT are part of the neuropsychological battery reviewed at consensus case conference, there is potential for diagnostic circularity to bias this assessment of the tests. Therefore we repeated analyses using CDR >1.0 to define dementia independent of these measures. Results using either outcome were not materially different, suggesting that circularity did not confound our results. Furthermore, diagnoses are based on the entire range of neuropsychological and neurologic assessments across multiple time points, while this analysis is comparing prediction of baseline memory assessments. The analyses presented are for the outcome of AD diagnosis. Results of analyses using CDR diagnosis are presented as supplemental tables on the Neurology® Web site at www.neurology.org for reference.
DNA was extracted from whole blood, or was isolated from buffy coat that had been stored at −70°C using the Puregene DNA Purification System (Gentra System, Minneapolis, MN). Amplification and sequencing primers for genotyping of the target APOE single nucleotide polymorphisms, rs429358 (position 112) and rs7412 (position 158), were designed using PSQ version 1.0.6 software (Biotage, Uppsala, Sweden); in each case, the reverse primer was biotinylated. Genotyping was performed using a Pyrosequencing PSQ HS 96A system (http://www.pyrosequencing.com). Individuals were classified as APOE ε4 carriers if they had at least one ε4 allele.
Baseline characteristics were reported using descriptive statistics. The Wilcoxon rank sum test for continuous variables and χ2 test for categorical variables were used for the comparison between those who developed AD during follow-up and those who did not. To evaluate the prognostic capacity of baseline FCSRT-FR and LM-IR for predicting incident AD over various durations of follow-up, we applied time-dependent receiver operating characteristics (ROC) analysis20 using the Nearest Neighbor Estimation21 of the joint distribution of the cognitive test and time to AD. The method can accommodate censored data. Areas under the ROC curves (AUC) were calculated. Bootstrap techniques were used for confidence interval (CI) estimation and hypothesis testing. We sampled the observations in the data with replacements for 1,000 bootstrap samples. For each sample, we computed AUC for predicting AD dementia at 2–4 years of follow-up for each test as well as the difference in AUC between the 2 tests. ROC curves could not be fit for a follow-up interval of 1 year, as there were insufficient cases that developed in this interval. The 95% CI for each AUC and the difference was then obtained. Youden's index, the sum of sensitivity and specificity minus one, is used to select the optimal cutoff value.22 The independent contributions of FCSRT-FR and LM-IR were assessed in Cox proportional hazards models adjusted for age, sex, education, and race. The risk score from each model was calculated as a weighted sum of the predictors in the model, with weights determined by the estimated coefficients of the Cox model. Time-dependent ROC analysis was then applied to the risk scores. Interactions of each cognitive test with age, education, and APOE ε4 status were tested in separate Cox models to determine whether these factors alter the relation of cognitive test to incident AD dementia.
A total of 1,187 participants had at least one annual follow-up evaluation, and were free of dementia at baseline. Of these, 896 reported memory complaints on either the CERAD or the HSA, and 854 had complete data on FCSRT-FR and LM-IR and were included in the analysis. Analyses including APOE ε4 status are based on the subset of 564 individuals with available information on APOE ε4 status. The groups with and without genotyping were similar with respect to age, sex, ethnicity, medical comorbidities, baseline FCSRT-FR and LM-IR, and risk of incident AD dementia.
Those who developed incident AD dementia were slightly older at baseline, with less education and worse performance on baseline FCSRT-FR and LM-IR, but were not significantly different with respect to sex, race, medical index, and follow-up time (table 1).
Results of time-dependent ROC analyses are shown in table 2. These data allow for comparisons of specificity of each test for a fixed level of sensitivity, and vice versa. FCSRT-FR consistently shows higher specificity for a given sensitivity, and also demonstrates higher sensitivity for a given specificity. Based on Youden's index,21 the FCSRT-FR and LM-IR cuts that optimize prediction of incident AD dementia are similar over the range of 2–4 years of follow-up. For FCSRT-FR, cuts were 2 years, 28; 3 years, 29; 4 years, 27. LM-IR cuts that optimize prediction of incident AD over 2, 3, or 4 years are 16, 17, and 16, respectively.
Figure 1 shows the time-dependent ROC curves for FCSRT-FR and LM-IR for prediction of AD over 2, 3, or 4 years of follow-up, along with optimal cuts derived by Youden's index, and the corresponding AUC. Regardless of length of follow-up, the ROC curve for FCSRT-FR dominates that for LM-IR across the entire range of false-positive rates, demonstrating that for any fixed specificity, FCSRT-FR is a more sensitive marker for AD. AUCs at 2, 3, and 4 years of follow-up are 0.87, 0.88, and 0.89, respectively, for FCSRT-FR and 0.78, 0.77, and 0.75 for LM-IR. The bootstrap 95% CIs for the difference in AUC between FCSRT-FR and LM-IR at years 2, 3, and 4 are (0.025, 0.18), (0.036, 0.16), and (0.070, 0.21), respectively, indicating a statistically significant difference in classification accuracy at each point (p ≤ 0.05).
Table 3 presents the results of Cox models predicting incident AD dementia. When added to a model with FCSRT-FR, LM-IR was only moderately associated with incident AD (models 1, 2). In the subset of participants with APOE ε4 status, having an APOE ε4 allele was significantly associated with risk of AD dementia, but this did not attenuate the association of FCSRT-FR with the outcome (models 3–5). ROC curves for the risk scores derived from the Cox models demonstrate that the addition of LM-IR and APOE ε4 status to FCSRT-FR does not appreciably improve the diagnostic accuracy of incident AD over that based on FCSRT-FR (figure 2).
To test whether test performance differs by age, education, or APOE ε4 status, we tested additional Cox models that included interaction terms for these variables with either FCSRT-FR or LM-IR. None of the interactions tested were significant (p for interactions all >0.10). Furthermore, stratified ROC analyses showed similar operating characteristics for LM-IR and FCSRT-FR over strata defined by age, education, or APOE ε4 status. The Youden's index optimal cuts were similar to those derived for the entire cohort (FCSRT-FR cuts 27–29 and LM-IR cuts 16 and 17) and were consistent across strata. However, the positive predictive power of either test was somewhat higher in the group with an APOE ε4 allele.
The goal of this analysis was to prospectively evaluate the predictive validity of 2 commonly used memory tests, the FCSRT-FR and the LM-IR, for identifying which individuals with memory complaints will develop incident AD dementia. To date, no community-based study has prospectively compared the predictive ability of these 2 tests. Furthermore, the extent to which duration of follow-up and APOE ε4 status affect the choice of screening cut points has not been evaluated.
For follow-up times of 2–4 years, the cut scores that maximized prediction were fairly consistent for FCSRT-FR or LM-IR. However, when the goal is to distinguish those who will develop AD dementia, the FCSRT-FR appears preferable. FCSRT-FR has a higher specificity for a given value of sensitivity and ROC analysis demonstrated consistently greater AUCs for FCSRT-FR compared with LM-IR. These findings are consistent with prior studies showing that FCSRT-FR has high discriminative validity for dementia.3,6 Recently, the FCSRT total recall score has been shown to be superior to LM for discriminating between MCI cases with and without AD biomarkers.23 We examined free recall given that our sample was not selected for MCI and thus exhibited ceiling effects for total recall. Although both the LM-IR and FCSRT-FR assess short-term declarative verbal memory, the FCSRT-FR is unique in that it utilizes control of attention and strategy use to maximize performance and improve encoding and retrieval of learned information. In addition, when applied to a community-based sample, the test does not require age- and education-specific norming, and does not suffer from ceiling effects.6
The stratified analyses suggest that APOE ε4 status does not affect the choice of screening cut point for either test. Although APOE ε4 was a significant predictor of AD dementia, the addition of APOE ε4 status to a prediction model did not improve the diagnostic accuracy compared to using the FCSRT-FR alone, adjusting for demographic factors, and the relation of either FCSRT-FR or LM-IR to dementia incidence did not vary by APOE ε4 status.
These data are useful for the design of clinical trials, as they provide operating characteristics of 2 memory tests widely used for determining which patients are likely to progress to dementia in a relatively short time horizon. The ability to select eligibility criteria derived prospectively over specified time frames should increase efficiency of screening for participants with memory complaints who are likely to develop dementia over specified follow-up periods. The information provided allows for comparison of the tradeoffs between sensitivity and specificity. This balance depends upon the context for the screening. In a situation where potential study participants will undergo a 2-stage screen, the choice would depend on the extent to which the second screen is costly or invasive. When the second test is relatively noninvasive and inexpensive, the choice may be to maximize sensitivity of the cognitive screen and follow it with a more specific assay to exclude persons with memory complaints not related to underlying AD dementia. If, however, the second test is costly or invasive, such as PET scanning or lumbar puncture, the goal may be to maximize specificity of the cognitive screen to minimize secondary testing of those unlikely to develop incident dementia. Such tradeoffs must also be evaluated in the context of risk vs benefit for the intervention. In the clinical setting, the information provided here has utility for estimating the time frame for follow-up screening. In this context, the goal may be to maximize the PPV.
A strength of this study is that the population was community-based and not derived from a clinic setting. Thus, results may be generalizable to trials enrolling subjects from the community. The EAS cohort is representative of the Bronx, New York population with respect to demographics and is clinically well-characterized.10 We were able to prospectively assess the operating characteristics of the screening tests with respect to specified lengths of follow-up. This comparison is restricted to the FCSRT-FR and LM-IR tests. While these are widely used, we acknowledge that a number of other instruments exist; however, they were not in our test battery. In particular, other studies utilize the delayed rather than the immediate recall condition of the LM.24 This was not available for this analysis, and results should be interpreted with this distinction in mind.
The consistency of our results using either case conference diagnosis or CDR diagnosis alleviates concerns regarding diagnostic circularity (tables e-1 and e-2 and tables 2 and 3). In addition, case conference diagnosis is based on information from a broad neuropsychological battery including other tests of episodic memory, a clinical evaluation, informant reports, as well as neurologic examinations and functional data. The influence of any single cognitive test at the baseline assessment is thus likely to be small.
For distinguishing individuals with memory complaints who will develop dementia over 2–4 years, the FCSRT-FR test is a better predictor than LM-IR. These data are useful for planning clinical trials with fixed duration, or for determining the follow-up intervals for rescreening in the clinical setting. While APOE ε4 is a significant predictor of incident AD dementia, the choice of cut scores did not vary by APOE ε4 status. Thus, when screening for a clinical trial, the cognitive test alone may be sufficient. In the clinical setting, knowledge of APOE ε4 status increases the PPV and thus may be of benefit for the patient.
The authors thank Charlotte Magnotta, Diane Sparracio, and April Russo for assistance with participant recruitment; Betty Forro, Alicia Gomez, Wendy Ramratan, and Mary Joan Sebastian for assistance in clinical and neuropsychological assessments; Michael Potenza for assistance with data management; and all of the study participants who generously gave their time in support of this research.
Dr. Derby participated in study concept and design, design and interpretation of the statistical analysis, and in drafting and revising the manuscript. Ms. Burns participated in study concept and design, revising the manuscript for content, and interpretation of data. Dr. Wang participated in the design of the analyses, performed the statistical analyses, and participated in the interpretation of results and the drafting and revising of the manuscript. Ms. Katz participated in study design, data collection, conceptualizing the analysis plan, interpreting the statistical results, and revising the manuscript. Dr. Zimmerman participated in the data collection and interpreting the statistical results and revising the manuscript. Dr. L’Italien participated in the analysis and interpretation of data. Dr. Guo participated in drafting and revising the manuscript for content, including medical writing for content, the analysis and interpretation of data, and statistical analysis of data. Dr. Berman participated in drafting and revising the manuscript for content, including medical writing for content, and the analysis and interpretation of data. Dr. Lipton participated in study design, data collection, conceptualizing the analysis plan, interpreting the statistical results, and revising the manuscript.
Supported by a research grant from Bristol-Myers Squibb, the S and L Marx Foundation, and the National Institutes of Health (NIA-AG03949).
C.A. Derby receives research support from NIH P01 AG039409 (project leader), P01AG027734 (coinvestigator), 5R01 AG22374 (site principal investigator), 2U01AG012535-16 (principal investigator), NCRR 5UL1RR025750-03 (investigator), Bristol-Myers Squibb, and the S and L Marx Foundation. L.C. Burns is an employee of Bristol-Myers Squibb. C. Wang receives research support from NIH P01 AG039409 (investigator), R01 AG039330 (investigator), R01 AG036921 (investigator), R01 HL094581 (investigator), and Bristol-Myers Squibb. M.J. Katz receives research support from NIH-NIA P01 AG027734 (investigator), R01 AG022374 (investigator), NIH-NIA R01 AG034119 (investigator), NIH-NIA R01 AG012101 (investigator), NIH-NIA P01 AG039409 (investigator), NIH-NIA AG038651 (investigator), and Bristol-Myers Squibb. M.E. Zimmerman receives research support from the NIH [P01 AG03949 (coinvestigator), P01AG027734 (coinvestigator)] and the Alzheimer's Association [NIRG-11-206369 (principal investigator)] and has received research support from Bristol-Myers Squibb and Merck. G. L’Italien is an employee of and holds stock options from Bristol-Myers Squibb. Z. Guo is an employee of and holds stock options from Bristol-Myers Squibb. R.M. Berman is an employee of and holds stock options from Bristol-Myers Squibb. R.B. Lipton receives research support from the NIH [PO1 AG03949 (program director), PO1AG027734 (project leader), RO1AG025119 (investigator), RO1AG022374-06A2 (investigator), RO1AG034119 (investigator), RO1AG12101 (investigator), K23AG030857 (mentor), K23NS05140901A1 (mentor), and K23NS47256 (mentor)] and the S and L Marx Foundation; serves on the editorial boards of Neurology® and as senior advisor to Headache; has reviewed for the NIA and NINDS; holds stock options in eNeura Therapeutics; and serves as consultant, advisory board member, or has received research support from Allergan, American Headache Society, Autonomic Technologies, Boston Scientific, Bristol-Myers Squibb, Cognimed, Colucid, Eli Lilly, Endo, eNeura Therapeutics, GlaxoSmithKline, MAP, Merck, Nautilus Neuroscience, Novartis, NuPathe and Pfizer. Go to Neurology.org for full disclosures.