|Home | About | Journals | Submit | Contact Us | Français|
Although the oldest old are the fastest growing segment of the population, little is known about their cognitive performance. Our aim was to compile a relatively brief test battery that could be completed by a majority of individuals aged 90 or over, compensates for sensory losses, and incorporates previously validated, standardized, and accessible instruments. Means, standard deviations, and percentiles for 10 neuropsychological tests covering multiple cognitive domains are reported for 339 nondemented members of the 90+ Study. Cognitive performance declined with age for two-thirds of the tests. Performance on some tests was also affected by gender, education, and depression scores.
Individuals over the age of 90 years represent one of the fastest growing segments of the United States population. According to the U.S. Census, approximately 1.5 million individuals were aged 90 and older in 2000 (U.S. Census Bureau, 2001). Within the elderly population, the number of individuals aged 90 and older showed the largest increase (45%) between 1990 and 2000 and is expected to increase to over 10 million people by 2050. Despite these changing demographics and the increasing numbers of older adults referred for neuropsychological evaluation, little in the way of normative data is available to clinicians who evaluate the oldest old.
Neuropsychological assessment has retained its key role in the diagnosis of Alzheimer's disease (AD) and other forms of dementia despite improvements in neuroimaging techniques such as magnetic resonance imaging (MRI) and positron emission tomography (PET). According to criteria from the National Institute of Neurological and Communicative Disorders and Stroke–Alzheimer's Disease and Related Disorders Association (NINCDS-ADRDA), a clinical diagnosis of possible or probable AD can be based on an individual's neuropsychological test profile after all other possible medical, psychiatric, and neurological explanations for the symptoms have been excluded (McKhann et al., 1984). The NINCDS-ADRDA guidelines established neuropsychological criteria, with the cutoff score for impairment as being the 5th percentile or lower in eight cognitive domains including orientation, memory, attention, language, perceptual skills, praxis, reasoning, and functional status. Since memory and other cognitive impairments are the primary and most important criteria for the diagnosis of dementia, and given the increasing prevalence of dementia with advancing age, it is essential that clinicians have reliable and valid neuropsychological tests with appropriate norms in order to successfully differentiate elderly individuals with cognitive deficits from those who remain mentally intact. Consequently, the purpose of the present study was to develop a battery of neuropsychological instruments appropriate for assessing the cognitive functioning of the oldest old and to collect sufficient normative data to allow clinicians to differentiate healthy from impaired elderly in this advanced age group.
During the past 40 years, clinicians and researchers have developed numerous instruments (e.g., Halstead–Reitan Neuropsychological Battery, Reitan, 1985; Wechsler Adult Intelligence Scale, WAIS, Wechsler, 1981; Wechsler Memory Scale, Wechsler, 1997b) for the purpose of assessing changes in an individual's cognitive status. Due to long administration time, many of these instruments have proved too taxing for elderly individuals who are more susceptible to fatigue (Putnam & DeLuca, 1990), frustration, and uncooperativeness (Lichtenberg & MacNeill, 2003). Thus, it is difficult for clinicians and researchers to draw valid conclusions regarding individuals' actual cognitive abilities. In an effort to decrease administration time and maintain rapport with the patient, clinicians have frequently relied on shorter screening measures such as the Mini-Mental State Examination (MMSE; Folstein, Folstein, & McHugh, 1975). While brief, screening instruments lack sensitivity for detecting subtle or mild forms of cognitive impairment (Petersen, Smith, Ivnik, Kokmen, & Tangalos, 1994).
Recently, researchers have started to address issues such as testing fatigue by developing batteries of reasonable length and administration time, such as the Consortium to Establish a Registry for Alzheimer's Disease (CERAD; Morris et al., 1989) and the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS; Randolph, 1998). While scores on the CERAD and RBANS batteries can help differentiate healthy from cognitively impaired individuals, neither battery provides adequate normative data for persons over the age of 89. Additionally, one of the most frequently used normative data sets for older individuals is the “Neuropsychological Tests Norms Above Age 55” (Ivnik, Malec, Smith, Tangalos, & Peterson, 1996). These norms, derived from the Mayo's Older Americans Normative Studies (MOANS), have made a significant contribution to the literature, but ages were aggregated so that individuals as young as 76 years of age were included in the oldest age category. Indeed, a review of the neuropsychological literature yielded relatively few studies with adequate sample sizes to allow clinicians to draw any clear conclusions regarding test performance in individuals aged 90 and older.
Consequently, the need for a standardized neuropsychological battery with norms appropriate for use with the oldest old is paramount. To address this limitation, we compiled a battery of 10 tests assessing seven domains including global cognition, language, recent memory, executive function, psychomotor speed, visual-spatial ability, and attention/working memory. Our aim was to compile a battery of tests that would (a) discriminate between cognitive changes associated with normal aging and those seen in dementia, (b) be relatively brief and easily completed by a majority of individuals over the age of 90, (c) compensate for sensory losses (hearing and vision deficits) often present in the oldest old, and (d) incorporate previously validated, standardized, and accessible instruments already familiar to many clinicians and researchers.
In this study, we report normative data on the MMSE, the Modified Mini-Mental State Examination (3MS; Teng & Chui, 1987), the 15-item Boston Naming Test (BNT; Kaplan, Goodglass, & Weintraub, 1978; Mack, Freed, Williams, & Henderson, 1992), Letter and Category Verbal Fluency (Benton & Hamsher, 1989; Morris, Mohs, Rogers, Fillenbaum, & Heyman, 1988), California Verbal Learning Test-II Short Form (CVLT-II, Short Form; Delis, Kramer, Kaplan, & Ober, 1987, 2000), Trail Making Test A, B, and C (TMT A, B, & C; Army Individual Test Battery, 1944; Delis, Kaplan, & Kramer, 2001), Clock Drawing Test (Freedman et al., 1994; Rouleau, Salmon, Butters, Kennedy, & McGuire, 1992), CERAD Constructions (Morris et al., 1988), and WAIS-III Digit Span (Wechsler, 1997a).
The 90+ Study is a longitudinal, population-based investigation of aging and dementia in the oldest old. In the early 1980s, a health survey was mailed to residents of Leisure World, a retirement community in southern California. The 13,978 residents who completed the survey became members of the Leisure World Cohort Study (Paganini-Hill, Chao, Ross, & Henderson, 1989; Paganini-Hill, Ross, & Henderson, 1986). These participants are followed by periodic resurvey (1983, 1985, 1992, and 1998) and determination of vital status by search of national and commercial death indices and ascertainment of death certificates. The 1,150 individuals still alive and aged 90 years or older on January 1, 2003, were eligible for participation in the 90+ Study. All participants were asked to undergo a comprehensive in-person evaluation or to provide information via self-completed or informant-completed mailed questionnaires. The in-person evaluation included past medical history, family history, functional assessments, neurological examination, and neuropsychological battery. The subjects of this study comprise the first 339 nondemented participants of the initial 481 participants who were examined in person as of October 2004. All participants provided written informed consent, and all procedures performed were approved by the Institutional Review Board of the University of California, Irvine.
A neuropsychological test battery that could assess multiple domains in a large population of age 90+ of various abilities, without excessive floor or ceiling scores, was desired. Several memory tests, including CERAD word list (Morris et al., 1988), Cued Selective Reminding (Grober & Buschke, 1987), New York University Paragraph Recall (Kluger, Golomb, Mittelman, & Reisberg, 1999), and Logical Memory (Wechsler, 1997a, 1997b), were piloted and rejected due to either floor scores or length of administration, or both.The resulting battery included 10 tests assessing multiple cognitive domains. Tests were administered in the order shown in Table 1 with standardized administration by trained and certified psychometrists. Amplifiers were provided for participants who were extremely hard of hearing, and visual stimuli were presented in Size 90 boldface font to promote visibility. The average time to complete the entire battery was 1 hour. The Geriatric Depression Scale (GDS; Yesavage et al., 1982) was included to explore the relation between affective state and cognition and was administered after the test battery. A brief description of the individual tests in the battery and any modifications made in the administration procedures follow.
The participant's overall cognitive functioning was evaluated with the Modified Mini-Mental State Examination (3MS), which tests 10 cognitive domains: attention, concentration, orientation, short-term memory, long-term memory, verbal fluency, reading, writing, constructional praxis, and abstraction. As all of the items from the MMSE are incorporated in the 3MS, a MMSE score can be easily derived for each individual. Total scores on the 3MS range from 0 to 100 points while scores on the MMSE range from 0 to 30 points. The only change made to the standard administration procedure was that the three to-be-remembered words were printed on separate cards using enlarged font and were presented to the participant at the same time as the examiner said the words aloud.
Three tests were used to assess language abilities—namely, confrontational object naming, category fluency for animal names, and letter fluency (F). A short, 15-item version of the BNT was used rather than one of the longer 30-, 45-, or 60-item versions to minimize fatigue. The Animal Fluency test requires the participant to name aloud as many animals as he or she could in 1 minute. This test is performed as part of the 3MS. Participants received credit for naming general categories as well as specific exemplars, but not for both. For example, if the examinee gave an exemplar (e.g., eagle) from an already named category (e.g., bird), credit was only given for the exemplar. Extinct animals (e.g., dinosaur) were credited, but not mythical creatures (e.g., unicorn). Repeated responses were counted only once. Letter fluency was assessed using only letter “F” rather than the more traditional three letters (F, A, S) to reduce administration time and fatigue. On this test, the participant was asked to name aloud in 1 minute as many words starting with the letter “F” as he or she could. To avoid confusion with similar-sounding letters, a large F was printed in 200-size font on a card and was presented as a prompt during this test. Points were not awarded for responses that included proper nouns or variations on the same word (e.g., fall, falling).
Recent memory was assessed with a modification of the short nine-item version of the CVLT-II. In this test, the participant is asked to remember a list of nine words across four learning trials. The list is composed of three words from three different categories presented in a random order. The same order of stimulus presentation is used across the four trials, and each learning trial is followed by a test of immediate free recall. Our primary modification was to present the words both verbally and visually during the four learning trials, rather than only saying the words aloud as recommended in the standard instructions. A Short Delay Free Recall test was administered following an interference task of counting backwards from 100 by ones for 30 seconds. After approximately 10 minutes of nonverbal testing, the Long Delay Free Recall was administered and was immediately followed by tests of cued-recall and yes/no recognition.
Parts A and B of the TMT were administered with standard procedures. TMT A requires the participant to connect the dots in numerical order, 1–2–3, and so on. TMT B requires the participant to connect the dots in order by shifting set, 1–A–2–B–3–C, and so on. The maximum time limits for Parts A and B were extended to 180 and 300 seconds, respectively.
On Part C of the TMT, the participant uses a colored marker to trace a dotted line connecting 25 circles. The original Delis–Kaplan Executive Functioning version of the TMT Part C is a two-page task, which our participants found daunting. Therefore we proportionately modified the two-page version into a comparable one-page version, which was better received by participants. The amount of time the participant needed to trace over the dotted line from the “start” to “finish” circles was recorded in seconds.
In the Clock Drawing Test the participant was asked to place the numbers as on a clock on a predrawn circle and draw the minute and hour hands to show “ten after eleven.” Scoring was based on the presence and sequencing of the numbers and the positioning of the two hands. The CERAD Construction Test asked the participant to copy four line drawings of increasing complexity (i.e., circle, four-sided diamond, intersecting rectangles, and cube). Standard scoring criteria for each figure with a maximum total score of 11 points were used.
The WAIS-III Digit Span Test was administered and scored using standard procedures. The Digit Span Test requires a participant to repeat number sequences of increasing length immediately after hearing the number sequence.
Floor scores were assigned on all neuropsychological tests whenever the participant did not understand the instructions for administration, quit the test before finishing, or became confused during the test. In addition, floor scores were assigned on the Trail Making tests when the participant was unable to complete the test in the time allowed.
Neurological examiners (trained physicians or nurse practitioners) performed a structured neurological examination, which included mental status testing, which covered multiple domains including memory, language, orientation, calculations, and others. Examiners also had access to the participants' MMSE and 3MS scores and their responses to selected items of the Functional Activities Questionnaire (FAQ; Pfeffer, Kurosaki, Harrah, Chance, & Filos, 1982), Activities of Daily Living (ADL; Katz, Ford, Moskowitz, Jackson, & Jaffe, 1963), and Clinical Dementia Rating (CDR; Morris, 1993) from that day to determine dementia status. Although designed as self-report instruments, the examiners asked the questions of the participants. When available at the time of the visit, informants were asked the same questions regarding the participants' functional abilities. The neurological examiner's trained judgment was used to differentiate functional loss due to cognitive impairment from physical impairment. The neurological examiners were blinded to all neuropsychological test results other than the MMSE and 3MS. Based on the participant's cognitive and functional status during the neurological evaluation, the examiner determined the presence or absence of dementia applying Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV; American Psychiatric Association, 1994) criteria. Participants meeting DSM-IV criteria for dementia were excluded from this study (N=142). Of the participants included (N=339), 47% were deemed to have normal cognition, and 53% had some cognitive or functional loss but not of sufficient severity to meet DSM-IV criteria for dementia (Cognitively Impaired–Not Demented, CIND).
Means, standard deviations, and percentiles (5, 10, 25, 50, 75, 90, and 95%) were derived for the overall sample and were stratified according to age in groups of approximately similar size (90–91, 92–94, and 95+ years). The effect of age was assessed in a regression analysis with age as a continuous variable. The age-adjusted independent effects of gender, education (≤high school, some college to college graduate, and some graduate school or higher), and GDS score (<4 vs. ≥4) were assessed by multiple regression analyses with categorical covariates. A GDS score of 4 was selected as a cutoff for depression based on published data in elderly populations (de Craen, Heeren, & Gussekloo, 2003). All statistical analyses were performed using SAS software version 8.01 for Windows (SAS Institute Inc., Cary, NC).
Characteristics of the first 339 nondemented particiants in the 90+ Study who completed the neuropsychological battery are shown in Table 2. The sample included 231 women and 108 men with an average age of 94 years (range 90–103). The majority of participants were living in the community, and 52% lived alone. Almost half of participants reported a history of heart disease and nearly one third a history of cancer. All participants were Caucasian, although 2 participants also identified themselves as Hispanic or Latino.
Normative data for the sample by age categories are shown in Table 3. Performance declined with increasing age for more than two thirds of the tests. Age was significantly associated with performance on the MMSE, 3MS, BNT, Animal Fluency, all CVLT tests, TMT A & B, Clock Drawing Test, and Digit Span Backward. Gender, education, and depressive state were also related to test performance. After adjusting for age, women scored on average significantly better than men on the MMSE (women=26.4 vs. men=25.6, p=.03), CVLT Trial 4 (women=6.5 vs. men=6.0, p=.04), CVLT Sum (women=22.7 vs. men=21.0, p=.02), CVLT Long Delay (4.8 vs. 4.1, p=.03), CVLT Cued Long Delay (5.4 vs. 4.5, p < .01), and TMT A (66.1 vs. 75.3, p=.05). Education was associated with performance on five tests. Age-adjusted scores increased with higher education on the 3MS (p for trend < .001), BNT (p for trend=.03), Animal Fluency (p for trend < .01), Letter F Fluency (p for trend < .001), and Clock Drawing Test (p for trend=.03). After adjusting for age, participants with GDS scores ≥ 4 had poorer scores on the 3MS (85.8 vs. 89.7, p < .01), Animal Fluency (11.4 vs. 13.5, p < .001), and Clock Drawing Test (4.9 vs. 5.6, p=.03). The neuropsychological scores of the 76 participants who did not complete the GDS were more similar on all tests to those participants scoring ≥ 4 (results not shown).
Table 4 shows the percentage of people completing each procedure and the reasons for failure to complete. At one end of the range, most participants completed Animal Fluency. In contrast, more than one third of the participants were unable to complete TMT A, B, or C due to vision, fatigue, or inadequate time. Tests administered towards the end of the session (Letter Fluency, Digit Span) were frequently not completed because the testing took longer than the participant expected or the participant complained of fatigue.
The current study extends the available norms on a comprehensive battery of neuropsychological tests to people 90 years and older. Data on 10 widely available and well established neuropsychological instruments were collected from over 300 nondemented individuals in this age group. These tests span seven cognitive domains (i.e., global cognition, language, recent memory, executive function, psychomotor speed, visual-spatial ability, and attention/working memory) commonly impaired in AD and other dementias. We made considerable effort to keep total administration time fairly short (approximately 1 hour), minimize fatigue, and compensate for any sensory losses in vision and hearing that might compromise performance. Overall, the oldest old participants received the battery favorably.
The number of individuals in this study is considerably larger than that in other published normative studies. A comprehensive review (Mitrushina, Boone, & D'Elia, 1999) of the existing normative data for many commonly utilized neuropsychological instruments included six tests in the current battery (BNT, Verbal Fluency, CVLT, TMT A & B, Clock Drawing Test). Without exception, reviewed studies had small samples of the oldest old. For example, of 24 studies evaluating the TMT, only two studies included individuals 90 years and older. Moreover each of these studies included only a few individuals in this age range: 21 participants aged 85–94 (Ivnik et al., 1996) and 50 participants aged 81–91 (Richardson & Marottoli, 1996). Another study on older adults (age range 62–95 years) residing in retirement villages and hostels in Australia gathered normative data for several neuropsychological tests (Anstey, Matters, Brown, & Lord, 2000). However, the number of participants in the oldest age range (90–95) was very small; at most 23 individuals over 90 years of age contributed data for any given instrument. Thus, the numbers of 90+ participants have been too small to generalize findings.
Individuals' scores on the neuropsychological tests in the present study were influenced by age, gender, education, and affective state. Performance decreased significantly with increasing age on approximately two thirds of the tests in the battery—namely, the MMSE, 3MS, BNT, Animal Fluency, CVLT, TMT Parts A and B, Clock Drawing Test, and Digit Span Backwards. Thus, advancing age affected test scores in all domains. Women had better performance on the CVLT, MMSE, and TMT A. Despite a relatively narrow range of education in this sample, individuals with more schooling significantly outperformed their less educated peers on the 3MS, BNT, Animal and Letter Fluency, and Clock Drawing tests.
In the current investigation, participants with four or more depressive symptoms on the GDS had lower scores on the 3MS, Animal Fluency, and Clock Drawing. This suggests that mood may affect cognitive performance in the oldest old. However, the cross-sectional design of our study and our use of a brief screening instrument that provided a measure of depressive symptoms rather than a comprehensive psychiatric evaluation limit definitive conclusions. Furthermore, other studies present conflicting results of the relation between depression and neuropsychological functioning in the oldest old. Palsson, Johansson, Berg, and Skoog (2000) reported a poorer cognitive performance in depressed versus nondepressed oldest old participants, whereas Backman, Hassing, Forsell, and Viitanen (1996) did not find an association between level of depression and neuropsychological functioning. Given the mixed results across studies, the effects of depression in the oldest old age group need to be studied using larger and more diverse samples that include formal mood and cognitive evaluations.
Despite our best efforts to design a battery of neuropsychological measures appropriate for use with the oldest old, some of the participants were not able to complete all 10 tests. Since individuals with any comorbidity including visual or hearing impairments were not excluded, participants may not have been able to complete specific tests. Approximately 37% of nondemented 90+ participants failed to complete TMT A, TMT B, or both despite these tests being positioned halfway through the battery. Problems with visual disabilities (11%), fatigue (8%), and lack of time (10%) accounted for much of the missing data in TMT B, but a significant number of participants either refused to do the test (7%) or failed to complete it for other reasons (3%). Since the TMT A and B measure executive functioning, this may represent a significant decline in frontal lobe function associated with extreme aging. It is interesting to note that we did not see a similar effect of age on TMT C, which primarily measures motor speed. Also, as apparent in Table 5, individuals who completed specific tests demonstrated higher levels of cognitive performance as measured by both the MMSE and the 3MS than did individuals who failed to complete them. Lower global cognitive ability may be associated with failure to complete individual neuropsychological tests. For example, for TMT B 55% of noncompleters were CIND, while 45% of noncompleters were classified as normal. As individuals experience cognitive declines, they may be more likely to refuse or may experience fatigue more rapidly in the testing environment and be less likely to complete some components of the neuropsychological battery.
Norms for neuropsychological tests are useful to the extent that they can be generalized. To examine the representativeness of the 339 older participants in this study, we compared their demographic characteristics to those of individuals aged 90+ years in the general U.S. population. In the 2000 U.S. Census, the vast majority (89%) of the 90+ adults in the United States were Caucasian, with the remaining composed of 8.5% Black, 2% Asian, and 0.6% Native American or Inuit. A total of 76% of all 90+ year olds were female regardless of race. Thus, our sample largely reflects the current composition of the 90+ population in the U.S. Investigations in other ethnic/racial groups and in less educated populations will be needed particularly since these demographics are likely to change in the future.
In conclusion, this study describes a battery of neuropsychological tests selected and modified for use in very elderly adults. Strengths of the battery include its relative brevity, use of multiple well-established, widely utilized, and readily available instruments, and capacity to assess a broad range of cognitive domains. This study provides normative data for these neuropsychological tests from the largest group of individuals aged 90+ years published to date. These results provide a foundation for the evaluation of cognitive functioning in the rapidly growing number of individuals in their 10th decade and beyond.
The 90+ Study is supported by the National Institute on Aging Grant R01AG21055 and the Al and Trish Nichols Chair in Clinical Neuroscience. The authors thank the extraordinary participants and families of the 90+ Study who made this work possible. We also acknowledge the great work of all the psychometric testers, neurological examiners, and study staff.