|Home | About | Journals | Submit | Contact Us | Français|
The Trail Making Test (TMT; Army Individual Test Battery, 1944) is among the most commonly-used neuropsychological tests in clinical practice (Rabin, Barr, & Burton, 2005), in part because it is among the instruments most sensitive to brain damage (Reitan & Wolfson, 1994). The cognitive alternation required by Part B reflects executive functioning, although other cognitive abilities, such as psychomotor speed and visual scanning, are also required for the successful completion of the test (Lezak, Howieson, & Loring, 2004). Despite early research suggesting that TMT performance is independent of age (Boll & Reitan, 1973), newer recent studies using larger and more representative samples have reported declining performance with increasing age (e.g., Ivnik, Malec, Smith, Tangalos, & Petersen, 1996; Kennedy, 1981; Rasmusson et al., 1998).
TMT normative data have been reported in many studies using a wide variety of sample demographics (e.g., Tombaugh, 2004). Larger-scale efforts examining performance include the Mayo Older Adults Normative Study (Ivnik et al., 1996) and the Revised Comprehensive Norms for an Expanded Halstead-Reitan Battery (Heaton, Miller, Taylor, & Grant, 2004). Most published normative data for the TMT capture time to completion. Additional research has examined the clinical value of a ratio (B/A; Giovagnoli et al., 1996) or difference score (B-A; Arbuthnott & Frank, 2000) between the two TMT conditions. The ratio score in particular has been examined as a symptom validity indicator (O’Bryant, Hilsabeck, Fisher, & McCaffrey, 2003; Ruffolo, Guilmette, & Willis, 2000). Frequency of errors, while often recorded and reported clinically, has not been empirically evaluated in prior TMT normative studies. Some investigators have pointed out that the examiner’s correction of errors adds additional time to the total score, thus accounting for difficulties reflected in the number of errors (Stuss et al., 2001).
The TMT error rate is difficult to interpret in isolation, particularly because errors are common among cognitively normal adults. In one study, 34.7% of control participants committed at least one error on TMT Part B (Ruffolo et al., 2000). Stuss and colleagues (2001) examined error rates in brain-lesioned patients and lesion location (i.e., frontal vs. nonfrontal). For a cutoff of >1 error, they found high positive predictive power (i.e., high error rate suggests presence of frontal lesion vs. other lesion) but poor negative predictive power (i.e., less than 2 errors does not necessarily implicate a nonfrontal lesion). Some have found that head-injured individuals are more likely to commit errors and less likely to correct them without prompting (e.g., Armitage, 1946), though other studies failed to distinguish head-injured participants from controls based on errors (Klusman, Cripe, & Dodrill, 1989).
Three different TMT error types have been identified (Mahurin et al., 2006; McCaffrey, Krahula, & Heimberg, 1989), including: (1) sequential or tracking errors (proceeding to the incorrect number or letter on Part A or B); (2) perseverative errors (failure to proceed from a number to a letter, or vice versa, on Part B); and (3) proximity errors (proceeding to an incorrect nearby circle on Part A or B). Mahurin and colleagues (2006) examined TMT errors, time to completion, and other neuropsychological measures among patients with schizophrenia and depression as well as healthy controls. While healthy controls completed both parts faster than the patients with depression (who completed each part faster than the patients with schizophrenia), elevated error rates were only observed in the schizophrenia group. Regression analyses implicated visual search in TMT A time performance, processing speed and mental tracking in TMT B time score, and mental tracking ability in TMT B error rate. However, a nonstandard administration was used in this study, as errors were not corrected by the examiner.
The purpose of the present investigation was twofold: (1) to provide normative data for TMT total errors in cognitively normal older participants, and (2) to determine the diagnostic accuracy of TMT B errors in a sample consisting of cognitively normal elderly and individuals meeting diagnostic criteria for mild cognitive impairment (MCI) or Alzheimer’s disease (AD).
All study participants were enrolled in the Boston University Alzheimer’s Disease Core Center (BU-ADCC) patient/control registry, which longitudinally follows older adults with and without memory problems. As previously reported (Jefferson et al., 2006; 2007), participants undergo an annual neurological examination and neuropsychological evaluation. Inclusion criteria require that participants be community-dwelling and English-speaking with adequate hearing and visual acuity to participate in the examinations. All participants must have a study partner (to provide collateral information about functioning). Exclusion criteria include a history of major psychiatric illness (e.g., schizophrenia or bipolar disorder) or other significant central nervous system disorder (e.g., stroke, head injury with loss of consciousness). A multidisciplinary diagnostic review team consisting of neurologists, neuropsychologists, and research staff review evaluation results in order to reach a consensus diagnosis for each participant.
The current study was based on 622 participants evaluated between 2002 and 2006. Of these participants, 96 were unable to complete the TMT due to cognitive impairment (n=74), physical limitations (n=8), or refusal (n=14). These included individuals who could not attempt the task as well as individuals who discontinued during the task due to confusion; only participants who completed the entire task were included. Control participants included 269 individuals who were designated by the consensus diagnostic conference as cognitively normal. Inclusion criteria for this group included: (1) cognitive test performances within the normal range (i.e., no scores >1.5 SD below published norms), and (2) a Clinical Dementia Rating (CDR; Morris, 1993) global rating of 0. One hundred twelve individuals met criteria for “possible MCI.” Diagnostic criteria for this group required functional independence, as reported in the clinical interview, and objective cognitive impairment in one or more domains (i.e., neuropsychological performance falling ≥1.5 standard deviations below available normative data for at least one test in that domain). Eighty-eight individuals met criteria for “probable MCI,” which is similar to “possible MCI” but with the additional criterion of having a cognitive complaint by the participant and/or study partner. These participants (44% of total MCI sample) also met the Petersen et al. (2004) workgroup criteria for MCI. All MCI participants (possible and probable) had a CDR=0.5. The AD sample included 57 individuals meeting NINCDS-ADRDA criteria for probable (n=25) or possible (n=32) AD (McKhann et al., 1984) with CDR scores ≥1.0.
Descriptive statistics for all demographic variables are provided in Table 1. The entire sample of 526 participants comprised 204 men and 322 women with a mean age of 73.2 (SD = 8.7) and education of 15.4 years (SD = 3.1). Non-Hispanic Caucasian individuals accounted for 77% of the sample, while African-American participants comprised the remaining 23%.
If the TMT A or B time to completion was >1.5 SD below the mean using Tombaugh’s (2004) normative data, then the score was considered impaired. The TMT was used in the consensus conference for diagnostic decision-making (e.g., impairment on the TMT might contribute to a participant being labeled “MCI probable”). To avoid any tautological concerns, 14 participants with MCI were excluded from the study because their cognitive impairment was detected solely by TMT A and/or B.
Each participant underwent a neuropsychological protocol assessing global cognition, language, verbal and visuospatial memory, attention and processing speed, executive functioning, visuospatial skills, and motor skills (Jefferson et al., 2006; 2007). For the purpose of the present study, only the TMT, Mini Mental State Examination (MMSE; Folstein, Folstein, & McHugh, 1975), and Wide Range Achievement Test – 3rd Edition (WRAT-3; Wilkinson, 1993) Reading subtest are reported. The MMSE was included to provide a general estimate of cognitive functioning, while WRAT-3 Reading was included as an estimate of premorbid ability as well as literacy (Manly, Touradji, Tang, & Stern, 2003) to confirm that participants had adequate language skills to complete TMT B.
In administering the TMT, errors were defined as any incorrect line that reaches its target. As soon as an error was committed, the examinee was told only that he or she had made an error, and was directed to return to the last correct target. If an incorrect line was begun but not completed, the segment was not scored as an error. While a 5-minute time limit is typically imposed on TMT B performance, participants in this study continued to completion; only individuals who were able to complete the task were included in this study.
The local Institutional Review Board approved data collection for this study. All participants provided written informed consent prior to testing. Cognitive evaluations were conducted in a single session for all participants by trained research assistants.
The sample was classified by diagnostic category (i.e., Control, MCI, or AD) and between-group comparisons were made using analyses of variance (ANOVAs) or, when appropriate, analyses of covariance (ANCOVAs) for age, education, TMT A and B time to completion, and MMSE and WRAT-3 Reading standard scores. Pairwise multiple comparisons were conducted using Tukey HSD tests. Mann-Whitney tests were used to evaluate differences in error rate by age, sex, and education. The number of errors committed on TMT B was categorized into groups (0 errors, 1–2 errors, and 3 or more errors) and diagnostic group comparisons were made using a Kruskal-Wallis test.
Among the controls, cumulative percentages were calculated for errors made on TMT A and B. For TMT A and B time to completion, descriptive categories (i.e., superior, above average, etc.) were identified using percentile rankings within each of several demographically-derived subgroups based on age (55–74, 75–98) and education (<16 years, ≥16 years). Finally, for all participants, classification accuracy statistics (i.e., sensitivity, specificity, and positive and negative predictive powers) were calculated using combinations of error and time to completion scores. Sensitivity refers to the proportion of the “disordered” group correctly identified by the test; specificity is the correctly-classified proportion of the non-target sample. Positive predictive power (PPP) represents the rate at which a positive finding on the test accurately predicts diagnostic group membership, while negative predictive power (NPP) is the analogous statistic for negative findings on the test.
Diagnostic between-group comparisons revealed a main effect for age, F(2,523)=18.7, p<.001. Post-hoc comparisons revealed that the AD sample was significantly older than the Control and MCI groups, p<.001. No differences were observed between the Control and MCI groups. There was a main effect for MMSE score, F(2,523)=168.3, p<.001, and post-hoc comparisons revealed the three diagnostic groups differed in the expected direction (Controls>MCI>AD; p<.001). A main effect was also detected for WRAT-3 Reading scaled score, F(2,495)=16.0, p<.001. The Control group had a higher mean scaled score than the MCI (p<.001) and AD (p=.020) group; MCI and AD groups did not differ on this variable.
When demographic variables (age, education, and sex) were related to TMT time to completion scores among Control participants, age (r=.451, p<.001) and education (r=−.216, p<.001) were significantly correlated with TMT A time to completion. These variables were also correlated with TMT B time to completion (age: r=.458, p<.001; education: r=−.329, p<.001). Time to completion did not differ by sex for either TMT A or TMT B.
Based on these demographic differences, normative data for TMT A and B time to completion score were presented for 4 groups categorized by age (55–74, 75–98) and education (<16 years and ≥16 years; see Tables 2 and and33).
In the normative (i.e., Control) sample, demographic variables were not related to TMT A error score. Though TMT B error rate did not vary by sex, it was significantly related to both age (r=.170, p=.006) and education (r=−.268, p<.001).
Participants who made no errors on TMT B had higher MMSE scores (M = 29.3, SD = 1.1) than those who committed at least one error (M = 28.8, SD = 1.5), t(118.8)=2.86, p=.005. No difference was found on WRAT-3 Reading scaled score between individuals with no errors (M=111.8, SD=12.0) and those with at least one error (M=110.0, SD=9.9).
Normative data for TMT A error rates are presented in Table 4 for the entire Control sample. TMT B error rates are presented for two age groups (55–74, 75–98) within each of the two education groups (less than 16 years of education, and at least 16 years) in Table 5. TMT A error totals in this sample ranged from 0 to 2, while TMT B errors ranged from 0 to 6.
TMT A and B time to completion scores were compared across the Control, MCI, and AD groups. After controlling for age, education, and (for TMT B) literacy level, main effects were detected for both TMT A (F(4,521)=54.91, p<.001) and B (F(5,520)=34.38, p<.001) in the expected direction for both variables (i.e., Control<MCI<AD all p<.001; see Table 1).
Kruskal-Wallis and Chi-square tests were employed to identify error rate differences across diagnostic categories. There were no group differences for TMT A errors. However, an overall effect of diagnostic category was found for TMT B errors, Kruskal-Wallis H = 62.1, p<.001. Post-hoc tests revealed that Control participants differed from MCI (p<.001) and AD (p<.001) participants, though the MCI group did not differ from the AD group.
In many individual cases, TMT B time to completion and error scores did not agree (see Table 6). For example, among individuals with AD, most (71.9%) committed TMT B errors, whereas approximately half (52.6%) had impaired time to completion scores. Only 5 AD participants (8.8%) had impaired time to completion scores without any errors, while 28% of the AD sample (n=16) committed errors with normal time to completion scores. Given the frequent disagreement between these scores, it can be concluded that the two variables are independently meaningful.
The ability of error score or time to completion score to predict diagnostic classification (i.e., Control, MCI, or AD) was evaluated. This cannot be done using the time cutoff used in the diagnostic process (z-score < −1.5), as all participants in the Control sample had scores above this level by definition. The cutoff for TMT B time score impairment was therefore set to be a z-score < −1.0, which is often accepted as a more liberal cutoff for impairment (e.g., Heaton et al., 2004). Based on a receiver-operating characteristic (ROC) curve analysis, it was determined that an error score ≥1 provided an optimal error cutoff for diagnostic classification of Control vs. AD group membership. Both the time and error cutoff scores were independently effective; however, in spite of a significant correlation between TMT B errors and TMT B time to completion (r=.572, p<.001), the scores were often not in agreement. Using the error score to predict impairment on the time to completion score resulted in a PPP of only 44.4% (meaning that only 44.4% of individuals who committed errors also had an impaired time score). Using time score to predict impairment on the error score resulted in a PPP of 75.7% (meaning that 75.7% of individuals whose time score was impaired also committed errors).
Because of the discrepancy in diagnostic utility of the two cutoff scores, a combination of the variables was examined by determining classification accuracy statistics for four separate possible combinations of error and time scores: abnormal error score (any errors committed), abnormal time to completion (z ≤−1.0), impairment for both scores (both z ≤−1.0 and commission of greater than 1 error), and any impaired scores (z ≤−1.0 and/or any errors committed). Using these criteria, classification accuracy statistics were compiled for each of three sets of diagnostic comparisons (see Table 7). Overall, the incorporation of both time and error scores led to comparable or slightly improved classification rates.
The present study described TMT error rates in a sample of well-characterized, elderly, cognitively intact participants. Data were generated for age-and-education-based norms for TMT Parts A and B time to completion and cumulative percentages of errors committed. In addition, time to completion and error scores on TMT B were explored to determine their incremental utility in diagnostic classification of cognitively healthy older adults and patients with MCI or AD. While many studies have described TMT time performance in aging adults, the present study also provides error frequencies. The additional normative data provided in this study allow the clinician to use the error and time to completion variables in combination rather than in isolation.
In this study, error rate was less susceptible to subtle age differences than was time to completion, which is consistent with past work (Robins-Wahlin et al., 1996). This finding may reflect the well-known association between processing speed (a major component of successful TMT performance) and age (Salthouse & Fristoe, 1995). Errors may not increase substantially with aging, and may thus be a consistent measure of impairment across the lifespan.
In classifying individuals into diagnostic categories, the use of a combined error-and-time algorithm demonstrated a slightly higher specificity and PPP than the use of TMT B time to completion or error score alone. Also, the presence of errors demonstrated a greater sensitivity in each set of comparisons than the presence of an impaired time to completion score. While these findings did not demonstrate a strong advantage to the use of an algorithm toward diagnostic classification, they did show that errors and time to completion lack strong dependence on each other, and both should therefore be considered in interpreting TMT performance. These findings were not surprising based on work by Mahurin et al. (2006), in which much variance in time to completion was accounted for by variables related to visual scanning. In contrast, working memory and executive functioning contributed to TMT B error score. The two scores thus appear to tap different neuropsychological domains, suggesting that they could individually be clinically important.
Future studies may wish to examine error subtypes (i.e., sequential vs. perseverative; McCaffrey, Krahula, & Heimberg, 1989) in addition to total error rates, and explore error rates in neurologically and psychiatrically diverse populations. Different degrees of impairment and different degenerative disorders may be associated not only with varying error rates, but also with different types of errors. For instance, examining error rates in other dementias (e.g., vascular dementia, frontotemporal dementia), neurological conditions (e.g., Parkinson’s disease, multiple sclerosis), or psychiatric disorders (e.g., schizophrenia, depression) would yield more specific information regarding the underlying neuroanatomic substrates associated with error types.
There are several limitations to the current work. First, the AD participants comprised a smaller sample size as compared with the MCI and control groups. This difference is largely due to cognitive limitations that prevented many AD participants from completing the TMT. Sample size differences significantly influence classification accuracy statistics, such as those used in this study, suggesting that our findings should be interpreted with due consideration of the base rates of AD and/or MCI in the population of comparison. Second, our neuropsychological protocol excludes data for those participants who are unable to establish mental set during TMT B. This exclusion is most commonly found among the AD participants, so those who had the greatest difficulty in overall functioning are not included; this likely reduces the overall performance differences between our MCI and AD groups and contributes to the diminished PPP in MCI versus AD classification. Third, sample characteristics may limit the generalizability of findings to certain populations; for example, the present sample is highly educated, and therefore the findings may not extend to populations of less-educated individuals. Last, all participants whose MCI diagnoses were based solely on TMT A or B performance (n=14) were excluded from the present study. This exclusion was intended a priori to avoid tautological concerns, but it may have resulted in a subtle diagnostic bias or a bias in the overall TMT findings.
In summary, the present study presented geriatric normative data for TMT time to completion and error scores, and compared TMT performance among Control, MCI, and AD participants. The results demonstrate the clinical utility of TMT error scores, in addition to time to completion, in assessing individuals referred for dementia evaluations.
Author Note: This research was supported by P30-AG13846 (Boston University Alzheimer’s Disease Core Center), M01-RR00533 (General Clinical Research Centers Program of the National Center for Research Resources, NIH), R03-AG026610 (ALJ), R03-AG027480 (ALJ), K12-HD043444 (ALJ), and K23-AG030962 (ALJ).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.