|Home | About | Journals | Submit | Contact Us | Français|
Neuropsychological tests, including tests of language ability, are frequently used to differentiate normal from pathological cognitive aging. However, language can be particularly difficult to assess in a standardized manner in cross-cultural studies and in patients from different educational and cultural backgrounds. This study examined the effects of age, gender, education and race on performance of two language tests, the animal fluency task (AFT) and the Indiana University Token Test (IUTT). We report population-based normative data on these tests from two combined ethnically divergent, cognitively normal, representative population samples of older adults.
Participants aged ≥65 years from the Monongahela-Youghiogheny Healthy Aging Team (MYHAT) and from the Indianapolis Study of Health and Aging (ISHA) were selected based on (1) a Clinical Dementia Rating (CDR) score of 0; (2) non-missing baseline language test data; and (3) race self-reported as African American or white. The combined sample (n=1885) was 28.1 % African American. Multivariate ordinal logistic regression was used to model the effects of demographic characteristics on test scores.
On both language tests, better performance was significantly associated with higher education, younger age, and white race. On the IUTT, better performance was also associated with female gender. We found no significant interactions between age and sex, and between race and education.
Age and education are more potent variables than are race and gender influencing performance on these language tests. Demographically-stratified normative tables for these measures can be used to guide test interpretation and aid clinical diagnosis of impaired cognition.
Differentiation of normal from pathological cognitive aging is at the cornerstone of dementia diagnosis in older adults. Neuropsychological tests are frequently used in this process. The technical challenge associated with interpreting test performance is increased when a patient’s background diverges significantly from typical or published normative reference samples. Individuals from lower-educated or ethnic minority populations often have educational and cultural characteristics that complicate interpretation of cognitive test performance (Unverzagt et al., 2007; Byrd et al., 2005; Manly et al., 1998a; Manly et al., 1998b). Ethnic minority elders are the fastest-growing proportion of older adults in developed countries such as the United States (Day, 1996), highlighting a need for cognitive tests and normative data that can more accurately identify individuals with true cognitive impairment and early dementia. Normative data can be particularly valuable when they are drawn from population-based samples of older adults from different racial/ethnic groups, as well as from different economic, educational and cultural backgrounds.
Alzheimer’s disease (AD) is the most common cause of dementia in late life (Canadian Study of Health and Aging Working Group, 1994; Hendrie et al., 1995). Current diagnostic criteria specify presence of memory impairment plus deficits in at least one other cognitive area, including language, motor, visual-spatial, or executive disturbances (American Psychiatric Association, 1994). Of these, language impairment may be the most difficult to assess in older adults who are demographically different than the typical normative groups which are the basis for published test reference ranges. One reason for this is that language test performance is strongly influenced by educational and cultural factors (Lezak et al., 1994; Manly et al., 2004; Unverzagt et al., 2007).
The token test, in various forms, has a long history in neuropsychology (De Renzi and Faglioni, 1978; Benton and Hamsher, 1989). This test is characterized primarily and historically as a test of verbal comprehension, including immediate memory span for verbal sequences and capacity to understand syntax. Secondarily, the task also taps some aspects of executive ability (via working memory), especially in its more syntactically complex and longer items. Likely for this reason, the token test is highly sensitive to brain damage and to dementia (Swihart et al., 1989). With slight modifications this task has been adapted to screening batteries for dementia (Unverzagt et al., 1999). The IUTT is one such modified version which has shown diagnostic utility in differentiating demented from non-demented older adults (Jones and Ayers, 2006).
The animal fluency task (AFT) has also been widely used in neuropsychology (Lezak et al., 1994). This task is a type of ‘category fluency’ task, which in turn is a type of ‘verbal fluency’ task. In contrast to the token test, the AFT is an expressive language test. Factor analytic studies indicate numerous dimensions of cognitive processes involved in the AFT (and other verbal fluency tasks), including primarily language functions, such as vocabulary storage, but also speeded mental processing and executive functions (Lamar et al., 2002). Again, similarly to the token test, the multidimensionality of cognitive processes, including executive functions, likely contributes to the high sensitivity of verbal fluency tasks to global brain damage as well as to dementia.
The goals of the present study were to establish population-based norms, and to examine the effects of age, gender, education and race on language task performance in two ethnically divergent, cognitively normal, representative population samples of older adults. To allow comparison of white and African-American groups, we pooled data from two ongoing epidemiological population-based studies in the United States, one largely white and one exclusively African American. Specifically, we examined two commonly used clinical measures of language ability, verbal fluency (animal fluency task; AFT) (Rosen, 1980) and modified version of the token test: the Indiana University Token Test (Indiana University Token Test; IUTT) (Unverzagt et al., 1999).
The communities selected for the MYHAT study are close to the city of Pittsburgh, in the state of Pennsylvania, and surround the confluence of the Monongahela and Youghiogheny rivers. Official voter registration lists have been used as the sampling frames for our community studies for the past two decades, given the stability of the local population. The 2004 voter registration list was used as the frame for an age-stratified random sample for recruitment beginning in selected towns in 2006. Sampling aimed to achieve approximately equal numbers of participants in age-intervals 65–74, 75–84, and 85+ years. Sampling ratios were derived accordingly, with subsequent over-sampling in those aged 80–84 at study entry to compensate for small numbers of individuals 85+ years old. Community outreach and recruitment procedures were approved by the University of Pittsburgh Institutional Review Board. Outreach efforts included press releases to local newspapers, speaking engagements with community groups, and meetings with municipal officials and community leaders to explain the study and obtain local support. Selected individuals were contacted first by mail and then via mail telephone or a flyer personally delivered to the home. Recruitment criteria were (a) age 65 years or older, (b) living within the selected area, (c) not already in long-term care institution. Individuals were considered ineligible if they (d) were too ill to participate, (e) had severe vision impairment, (f) had severe hearing impairment, (g) were decisionally incapacitated (Ganguli et al., In press). A total of 2036 individuals were recruited. The current report is based on data from the first (baseline) data collection wave of this ongoing longitudinal study.
Indianapolis is the capital of the state of Indiana. ISHA had two phases of recruitment. The initial phase began in 1992 when a random sample of residences in 29 contiguous census tracts within Indianapolis were approached by interviewers seeking residents of the address that self-identified as African American and were age 65 years or older. A total of 2,212 individuals were enrolled (85% of the eligible persons approached). In 2001, the cohort was enriched using Medicare rolls to include African Americans who were aged 70 years or older living in Indianapolis. A total of 1,892 individuals were enrolled from this phase. ISHA is a longitudinal study; data reported here were collected from the third wave onwards, corresponding to the given participant’s first exposure to the IUTT test.
All participants provided written informed consent following procedures approved by the University of Pittsburgh’s Institutional Review Board. Testing took place at either the participant’s residence or the project field office. A single-stage assessment was employed. Initially, the Mini-Mental State Examination (MMSE) was administered and scored with correction for age and education (Mungas et al., 1996). Those scoring >=21 on the age-education corrected MMSE were designated as either cognitively normal or mildly cognitively impaired. They proceeded to further evaluation, including neuropsychological assessment, clinical history, physical and neurological examination, and completion of the Clinical Dementia Rating (CDR) scale (Hughes et al., 1982). The neuropsychological battery consisted of tests tapping multiple cognitive domains and took approximately one hour to administer.
All participants provided informed consent to participate and the project had ongoing review and approval of the IUPUI Institutional Review Board. Clinical assessments were conducted by clinicians usually in the participant’s home. The study utilized a two-stage study design with screening of the entire cohort using the Community Screening Interview for Dementia (CSI-D) followed in a subgroup by a clinical assessment including neuropsychological assessment, physical and neurological examinations, standardized health examination and assessment of function, and semi-structured informant interview. CDR scores (Hughes et al 1982) were assigned in a consensus diagnostic conference. The neuropsychological assessment used a modified and expanded version of the Consortium to Establish a Registry for Alzheimer’s Disease (CERAD) test battery which took 45 minutes to administer.
This report is focused on two tests of language ability that were administered in both studies. The IUTT was administered to participants in Indianapolis beginning with the third wave of the ISHA study, whereas the AFT was administered starting with the first wave. Both language tests were administered in MYHAT in the first wave.
Participants were asked to name as many different animals as possible in 60 seconds. The raw score is the number of animal names generated, including superordinate and specific category names, male/female variants, and parent/offspring variants. This is an expressive language task tapping multiple cognitive processes including fluency of speech, integrity of semantic storage and retrieval, and self-directed search strategy (Lezak et al., 1994).
This is a test of aural comprehension and execution of two-and three-step commands. Participants are presented with an 8 ½ × 11-inch sheet of paper, that contains four rows of circles and squares that vary in shape (circles and squares), color (red, black, yellow, and green) and size (large and small). The examiner reads a series of commands (12 total) of increasing complexity (e.g., simple command: “Point to a circle;” complex command: “Before pointing to the small red circle, point to the large green square.”). A correct performance receives 2 points. After an incorrect performance, the command is re-stated. Correct performance at this level receives 1 point (score range from 0–24).
For this study, participants were selected from Wave 1 of MYHAT who had a Clinical Dementia Rating (CDR) global score = 0, indicating absence of dementia and normal cognitive functioning for age. In the ISHA sample, participants with first exposure to the IUTT, CDR = 0, and diagnosis of normal were selected for the analytic sample. Participants from either site were also excluded for missing values on the outcome measures (IUTT, AFT) or the independent covariates (age, sex, education, and race).
We categorized the combined MYHAT-ISHA sample into subgroups aged less than 80 years vs. 80 or more years old, and those with less than 12 years of education vs. those with 12 or more years of education. We first examined the distribution of scores on the AFT and IUTT tests in the combined sample between the above age, sex, education, and race groups. The distribution of scores was normal on the AFT but skewed on the IUTT. For establishment of norms, we divided the combined sample into white and African-American groups and reported the mean (SD), 5th, 10th, and 50th percentile scores in each group, within further demographic subgroups.
Given that the IUTT was not normally distributed, it was not possible to treat it as a continuous variable in a linear model. Therefore, we treated both IUTT and AFT as categorical variables, grouped in 3 ordered categories (<10th %tile, 10th %tile to 50th %tile, ≥50th %tile) which we will refer to as low, medium, and high level scores. We did not break out the upper end of the distributions further as the lower percentiles were of main clinical interest (i.e., consistent with our goal of identifying cognitive impairment).
Test scores ordered at the multiple levels noted above were used as the outcome variables in multivariable analyses were performed using ordinal logistic regression modelling. This is a proportional odds model, which makes and tests the assumption that the cumulative odds ratios for any two values of a covariate are the same for each level of the outcome variable. The covariates entered in the model were age (<80, ≥80 years old), sex, education (<12 years, ≥ 12 years), and race (white, African-American). The odds ratio (OR) represents the effect of a change in level of the predictor variable on the logit (log odds). For example, an odds ratio of 1.2 for education would indicate that being in the higher education group (≥12 years) was associated with a 20% greater odds of a higher level test score compared to the lower education group. An approximate likelihood ratio test was used to verify the proportional odds assumption in the final model. To further verify the proportional odds model, we fit binary logistic regressions using IUTT cutoffs of at least 10th percentile and at least 50th percentile and compared the results with those of the ordinal regressions. Variables were considered significant if the corresponding p value of the likelihood ratio test <0.05. Interactions among the final covariates were then tested.
The sample size provided 90% power, at alpha level 0.05, to detect an odds ratio as small as 1.49 for greater education, 1.45 for younger age, 1.45 for female gender, and 1.56 for white race.
We also included a post-hoc analysis examining change in effect of race on language test performance after inclusion of a measure of reading ability (Wide Range Achievement Test Reading – 3rd ed.; WRAT-3) to estimate quality of education in the MYHAT sample, only (n = 1331; African-Americans: n = 50, 3.8%). This measure was available in the MYHAT study but not in the ISHA study, and therefore it was not included in the main a priori analyses involving the combined MYHAT/ISHA sample.
SAS 9.0 and Stata 9.0 were used for all data management and statistical analyses.
The total size of the cohort was 2036, of whom 1982 individuals had age- and education-adjusted MMSE scores > 21 and underwent the full assessment, and 1967 completed the neuropsychological battery. The analyses reported here are restricted to those reporting their race as either African American or white, being rated as 0 on the CDR, and providing complete data on the IUTT and AFT, resulting in an selected sample of n = 1413 (see Table 1). The number of African Americans in this sample was n = 55 (3.9 %).
From 1993 to 2004, 516 participants were tested with the IUTT for the first time. Further restricting to CDR rating of 0, and no missing data on the animal fluency score and demographic information, 472 participants were included in this analysis (121 tested during 1993 to 1995, 98 in 1998, 146 in 2002 and 107 in 2004. All participants were African American (see Table 1).
Table 1 presents demographic characteristics of the combined study sample with CDR ratings of 0 (n = 1885). The mean age was 77.1 (SD 7.1), the mean education was 12.2 years (SD 3.0), and 10.1 % of the sample had self-reported education in the range of 0–8 years. 65.2 % were women and 28.1 % were African-American.
Table 2 shows results from the multivariate ordinal logistic regression analysis showed the effects and relative importance of the covariates (predictor variables) on the two language ability test scores. Results are given as odds ratios (ORs) for an IUTT (or AFT) score which was at least at the 10th %ile (medium and high scores) compared with less than 10th %ile (low score), and at least the 50th %ile (high score) of the sample, compared with scores less than the 50th %ile (low and medium scores). Since this is a multivariable model, the OR for each covariate is adjusted for the effects of the other significant covariates. Greater age was independently associated with lower scores on the IUTT (OR, 0.50, 95% CI, 0.40–0.61). The odds ratio of 0.5 shows that compared to being younger (under age 80), being older (aged 80 or more) is associated with a 50% lower probability of a high score compared to a medium score, and of a medium score compared to a low score. Inversely stated, being younger is associated with a better score with an odds ratio of 2.0. Similarly, greater age was associated with lower scores on the AFT (OR, 0.56, 95% CI, 0.46–0.68). Female gender was associated with higher scores on the IUTT only (OR, 1.59, 95% CI, 1.28–1.96). Higher education was associated with better scores on the IUTT (OR, 3.06, 95% CI, 2.39–3.93) and with the AFT (OR, 1.91, 95% CI, 1.51–2.42). Race was associated with IUTT scores (African-American vs. white: OR, 0.74, 95% CI, 0.58–0.95) and with AFT scores (African-American vs. white: OR, 0.79, 95% CI, 0.63–0.99). Tests of the proportional odds assumption were not significant (chi-square statistic with 4 degrees of freedom =3.92, p=0.42), indicating that the proportional odds model was appropriate for these data.
Rank-ordering the effects of the different covariates, we found that for the IUTT, the odds of higher scores were associated with higher education (odds ratio 3.06 or threefold increase in odds of better score) younger age (odds ratio 2.0 or twofold increase), female gender (odds ratio 1.59), and white race (odds ratio 1.35). On the AFT, the odds of a higher score were associated with higher education (odds ratio 1.91), younger age (odds ratio 1.79, inverse of 0.56) and white race (odds ratio 1.27, inverse of 0.79).
We examined potential variation in effects across the combined sample with the use of interaction terms in the regression models. We found non-significant interactions between age and sex, and between race and education.
Table 3 provides normative data (mean, standard deviation, 5th, 10th, and 50th percentile scores) for the AFT in African American and white subgroups of the combined MYHAT-ISHA sample, combined as well as stratified by gender, education, and age. Table 4 provides similar normative data for the IUTT in African American and white subgroups of the combined MYHAT-ISHA sample, combined as well as stratified by gender, education, and age.
The post-hoc analysis of the effect of adding WRAT-3 Reading score in the regression models, in the restricted MYHAT sample, resulted in attenuation of the effect of race on both the AFT (OR=0.49 to OR=0.57) and the IUTT (OR=0.446 to OR=0.55). For both tests the effect of race on test performance in the multivariate models with reading level included became non-significant (p > .05).
In this study, we examined the influence of demographic factors on language-based cognitive test performance among cognitively normal older adults from two U.S. population-based samples. In this ethnically diverse sample, verbal spontaneity and aural comprehension were influenced by age and education while race and gender also affected performance but to a much smaller degree. One can observe that stratifying normative tables by demographic factors does affect, by up to several points difference, the lower percentile cut-off scores at the tails of the test score distributions. Thus, these analyses formed the basis for demographically-stratified normative tables for these measures which can be used to guide test interpretation and aid clinical diagnosis.
Regarding the IUTT, we found that performance within our clinically unimpaired cohort was significantly influenced by age, gender, education, and race. These findings are consistent with other studies who found that younger adults performed better than older adults (Ivnik et al., 1996); that better educated individuals performed better than those with fewer years of education (De Renzi and Faglioni, 1978); and that women performed better than men (Sarno et al., 1985). We are unaware of other studies examining the effect of race on IUTT scores. With regard to race in this present study, we found that white participants performed better than African Americans. However, in the MYHAT study which included a measure of reading ability, the reading score was a significant predictor for both language test scores, and the effect of race was attenuated and became non-significant for both language tests. This post-hoc analysis is consistent with literature that suggests “years of education” may be an inadequate measure of the educational experience of multicultural elders, and that race is likely confounded by literacy and quality of education.
Regarding the AFT, we found that better performance was associated with younger age, higher education and white race. There is some variability in the literature reporting on influence of demographic factors on verbal fluency tests, with many studies consistent with our significant findings on age (e.g., Acevedo et al., 2000) and education (e.g., Ivnik et al., 1996; Troyer 2000), but many reporting no age effect (e.g., Bolla et al., 1990) or education effect (Stuss et al., 1998). Similarly, some investigators have reported gender differences in verbal fluency with female superiority (Acevedo et al., 2000; Bolla et al., 1990), while others have reported no gender difference (Troyer, 2000). One study reported differences in AFT performance among racial groups of male veterans, with white participants performing better than African-American participants, who in turn scored higher on average than Hispanic participants (Johnson-Selfridge et al., 1998). In general, it is likely that negative studies reporting lack of significant effects of demographics on test performance have insufficient power to detect effects, and may account for inter-study differences in findings. We are unaware of any studies which contradict the direction of the associations reported here.
A principal reason for focusing on tests of language, rather than of other cognitive domains, is that language can be particularly difficult to assess in a standardized and unbiased manner in cross-cultural studies and in patients from different cultural backgrounds. While all participants in the ISHA and MYHAT studies were English-speaking Americans, our experience may be of value to clinicians and researchers working with older adults of a variety of backgrounds. Age and education (and related cohort effects) are likely to influence test performance within any group (Unverzagt et al., 1996). However, while our study did indicate statistically significant effects of education and race at the group level, the IUTT and the AFT may nevertheless be expected to be useful in truly illiterate populations. In such populations, other widely used tests such as initial letter fluency could not be used, as it requires by the very nature of the test, that the individual is literate (i.e., generating words that begin with a certain letter). In terms of face validity, both the category fluency task and the IU Token Test would appear relatively resistant to cultural bias. They primarily require naming familiar objects within a category or pointing to recognizable shapes and colors, provided the concepts of shape and color are recognized within a cultural group. For example, compared to confrontation naming tasks, there is no need to generate lists of objects of high and low frequency in a given language and culture. As well, the instructions appear straightforward and easy to translate, and scoring is also relatively easy, making training of testers less of a challenge. However, tests should only be interpreted and applied in cross-cultural and cross-national studies after adequate piloting and norming, We did this in the present study. Further, to our knowledge, this is the only study that has simultaneously examined the influence of age, education, gender and race on test performance for these language tasks. In general, our results suggest that age and education are more potent variables than are race and gender.
One limitation in the present study is that we did not have a common measure of literacy or reading level ability available with which to estimate individual differences in the quality of education, which may vary as a function of race. There is evidence that educational quality differences, as estimated by reading abilities, contribute to racial cognitive test performance differences (Manly et al., 1998a; Manly et al., 1998b; Manly et al., 2004; Gasquoine, 1999).
Another limitation of the present study is that ethnicity is confounded with study site. It is possible that unknown, regionally specific factors may be interacting with race to contribute to the obtained pattern of results. Finally, caution is warranted when applying any normative test data to individual clinical patients who are significantly different from the normative samples on relevant demographic characteristics.
Normative studies from large community samples, such as the present one, help establish the demographic factors that affect cognitive test performance and aid in identifying cognitive impairment in older adults, especially those of minority background or of lower educational attainment. Important future research directions for similar normative studies would include introducing approaches to try to capture unmeasured variance in our demographic factors, such as “race”, (i.e., quality of education, measures of acculturation, persistence of poverty) (Manly et al., 1998a; Manly et al., 1998b; Manly et al., 2004; Gasquoine, 1999). As well, future studies should continue to expand the normative literature to more ethnic and cultural groups to keep pace with the ever increasing cultural diversity of older adults in many societies around the world.
Conflict of Interest Declaration: This study was supported by the following grants from the National Institutes of Health: R01 AG023651, K24 AG022035, R01 NR04508, R01 AG026096, AG 0009956 and P30 AG10133. The sponsor had no direct role in the conduct and reporting of this study. There are no conflicts of interest to disclose for any of the authors.
Description of authors’ roles: B. Snitz: conceptualization and design, interpretation of results, writing. F. Unverzagt: obtaining grant support, supervision of data collection and management, conceptualization and design, interpretation of results, editing and writing. C-C. Chang: data analysis and interpretation, writing and editing. J. Vander Bilt: supervision of data collection and management, conceptualization and design, editing. S. Gao: obtaining grant support, data management, interpretation of results, editing and writing. J. Saxton: conceptualization and design, interpretation of results, editing. K. Hall: obtaining grant support, interpretation of results, writing and editing. M. Ganguli: overall project design, obtaining grant support, conceptualization and design, interpretation of results, writing and editing.