The current study sought to evaluate the diagnostic accuracy of the RBANS in detecting milder cognitive deficits, such as those associated with amnestic MCI. The results of this study provide equivocal support for the RBANS in these mildly impaired individuals. On the one hand, older adults classified as amnestic MCI (either single- or multidomain) scored significantly below their cognitively intact peers on the Total score, 3 of the 5 Indexes, and 6 of the 12 subtests. Additionally, the AUC from the ROC analyses suggested adequate separation between the two groups in the current study on measures of learning and memory. Finally, specificity values for all memory-related subtests and Indexes were 0.82 or better and negative predictive power was similarly high. On the other hand, sensitivity values and positive predictive powers were quite poor for these memory subtests and Indexes on the RBANS (with the Delayed Memory Index and Total Scale having the best combination of sensitivity and specificity at the −1.0
SD cutoff). Although this is not an ideal situation, mixed results in assessing the diagnostic accuracy of a test is not uncommon in medicine. For example, in a study comparing several diagnostic criteria for dementia (including NINCDS-ADRDA criteria for AD) to neuropathology, the diagnostic criteria had low sensitivity and high specificity (
Holmes, Cairns, Lantos, & Mann, 1999). Regardless, caution should be exercised when using the RBANS in cases of possible amnestic MCI.
The RBANS has already demonstrated strong diagnostic accuracy in AD. Two studies (
Duff, Humphreys Clark, et al., 2008;
Randolph, Tierney, Mohr, & Chase, 1998) found significant differences between patients with AD and healthy elders with nearly 40 standard score points separating these two groups on the Delayed Memory Index. Since amnestic MCI is suspected to be the prodrome of AD, it was expected that the RBANS would again separate individuals with MCI from intact peers, at least on the memory Indexes of the RBANS. Smaller, but still statistically significant, differences were observed in the current study (e.g., 9.0 standard score points on the Delayed Memory Index). However, the sensitivity of the RBANS was very different between these two studies (Delayed Memory Index at −1.0
SD: Duff et al. = 0.97, current study = 0.56). Although there are similarities between Duff and colleagues and the current study, differences also exist. Inherently, the AD patients from Duff and colleagues were more impaired than the MCI patients in the present study (mean Total score: 64.5 vs. 92.4). The present MCI sample was larger, older, and had more women than Duff and colleagues' AD sample. When examining the comparison group in these two studies, our study's comparison group was larger, slightly younger, and had more women than Duff and colleagues. Although the demographic differences between the samples probably explains some of the differences in diagnostic accuracy, we suspect that the severity of cognitive impairments in these two samples explains most of the difference in diagnostic accuracy (i.e., very large RBANS differences between AD and controls lead to stronger diagnostic accuracy than the modest RBANS differences between MCI and controls).
In the AD sample of
Duff and colleagues (2008), the participants with dementia fell significantly below comparison subjects on all 5 Index scores and all 12 subtest scores. In the current study, significant differences were observed between patients diagnosed with amnestic MCI and comparison elders on only three Indexes (Immediate Memory, Language, and Delayed Memory) and only six subtests (List Learning, Semantic Fluency, Coding, List Recall, Story Recall, and Figure Recall). These differences are largely expected given the pathological conditions examined in each study. Since the current subjects were classified as amnestic MCI (i.e., prodrome AD), they should primarily have impairments of memory, which reflects 2 of the 5 Indexes and 6 of the 12 subtests (i.e., non-memory tasks should not necessarily be affected). However, since our MCI participants included multidomain subtypes (i.e., amnestic plus non-memory deficits), some non-memory differences were expected and found. The other identified cognitive differences in the MCI sample were on measures of semantic fluency and processing speed, and both of these types of tasks have been reported to fall below expectations in cases of MCI (
Cooper, Lacritz, Weiner, Rosenberg, & Cullum, 2004;
Economou, Papageorgiou, Karageorgiou, & Vassilopoulos, 2007). In a related vein, the RBANS Indexes with the two best sensitivity values at the −1.0
SD cutoff in the current study were the Delayed Memory Index and the Language Index.
Sensitivity, specificity, positive and negative predictive powers, ROC curves, and AUC estimates are routinely used in medicine to evaluate clinical measures (
Nash et al., 2006;
Schmidt et al., 2006;
Stephan et al., 2006). Unfortunately, despite strong specificity, none of the RBANS Indexes or subtests achieved sensitivity that would be considered acceptable for clinical diagnostic purposes when either a 1, 1.5, or 2
SD cutoff was implemented. Sensitivity refers to the proportion of actual positive cases that are correctly identified as such (e.g., the percentage of MCI cases who are identified as having MCI). Specificity, however, refers to the proportion of negative cases that are correctly identified as such (e.g., the percentage of controls who are identified as not having MCI). Although an ideal diagnostic test would have an optimal balance of sensitivity and specificity, the current study did not find that balance in the RBANS. The high specificity values suggest that the RBANS can be used to identify negative cases (e.g., those without MCI), which still could be useful for clinical trials by excluding inappropriate subjects. However, the generally low sensitivity suggests that the RBANS does not accurately identify the cases of interest (e.g., those with MCI). Despite these less than optimal test characteristic values, there is some movement in them as the cutoff changes from −1.0 to −2.0
SD in Table . For example, as the cutoff on the Total Scale score shifts from −1.0 to −2.0
SD, sensitivity decreases (0.549 to 0.099) and specificity increases (0.800 to 0.968). Although these shifts are somewhat expected, they might provide avenues for fine tuning of the RBANS diagnostic accuracy.
It is possible that the low sensitivity suggests that our cases of amnestic MCI do not really have this condition. In fact, the RBANS Immediate and Delayed Memory Indexes in this group averaged 97.9 and 92.4, respectively. Although these two Indexes do fall approximately 1
SD below premorbid intellect, these two Memory Indexes still fall in the average range. It should be reiterated that all subjects in the current study were classified by scores on two other memory tests, the BVMT-R and the HVLT-R, to avoid circularity with the RBANS. The scores from these two measures tended to be more impaired, especially for the delayed recall measures (BVMT-R: Total Recall = 72.1, Delayed Recall = 69.2; HVLT-R: Total Recall = 90.7, Delayed Recall = 78.9; effect sizes [Cohen's
d] between intact and MCI for Delayed Recall: BVMT-
R = 2.2, HVLT-
R = 1.5). Furthermore, although there were some statistical differences between the MCI and intact groups on non-memory measures (e.g., COWAT, Animals, TMT, and SDMT), the MCI group generally performed in the average range on these measures (e.g., scores ranged from 39th to 63rd percentiles). On the basis of the results of these non-RBANS measures, our amnestic MCI subjects appear to have this condition, at least psychometrically. However, as noted in the “Materials and Methods” section, we did take some liberties with our application of the Petersen criteria for MCI (e.g., averaging two delayed recall measures, memory discrepancies from premorbid intellect, reliance on a single baseline assessment to determine MCI status), and these may have affected the classification of our sample, the resulting RBANS test characteristics, and the generalization of our findings to other studies. In one additional study that examined the RBANS in MCI,
Hobson et al. (2010) found considerably lower scores on the Delayed Memory Index than in the current sample (77.0 vs. 92.8, respectively). However, there were notable differences between these two samples (e.g., Hobson's sample was recruited from a Memory Disorder Clinic vs. community-dwelling sample; Hobson's sample used age-corrected scores vs. age- and education-corrected scores; Hobson's sample examined multiple subtypes of MCI vs. only amnestic MCI).
As noted above, our method of classifying MCI required individuals to fall 1.5
SD below an estimate of premorbid intellect (i.e., WRAT-3 Reading). Some may view this approach as “unconventional,” as others in the field require individuals to fall 1.5
SD below the mean of normative data. However, the stricter criteria (i.e., 1.5
SD below the normative mean) might unfairly penalize individuals with relatively higher and lower intellectual functioning, as they have to present with more or less decline from premorbid levels before breaking the rigid cutoff, respectively. For example, an individual who is premorbidly in the high average range (e.g., 84th percentile) needs to decline by approximately 77 percentile points to break the 1.5
SD below the normative mean. Conversely, an individual who is premorbidly in the low average range (e.g., 16th percentile) only needs to decline by approximately 9 percentile points to break this same diagnostic barrier. By using a more flexible and individualized barrier (i.e., 1.5
SD decline from your premorbid level), decline (and the resulting diagnostic decisions) can be determined more comparably across individuals. Our method of approximating the MCI barrier is quite consistent with the literature. For example, the initial studies of MCI from the Mayo clinic group used a threshold that was “generally 1.5 SDs below age- and education-matched control subjects” (
Petersen et al., 1999, p. 307). Within this same article (p. 305), the authors present means and standard deviations for their MCI subjects on several memory measures. When these means are compared to MOANS normative data for 79-year olds, most fall at about 1.5
SD below the mean (e.g., Logical Memory II = scaled score of 5, Visual Reproductions II = scaled score of 7, RAVLT percent retention = scale score of 6). However, these are “mean” scores, which suggests that some sizable minority of the sample had scores above this point. This trend of loosely defined MCI has carried throughout most of the Mayo clinic MCI papers. Other authors have also viewed the MCI criteria as flexible (e.g.,
Bennett et al., 2002, p. 199: “judged to have cognitive impairment by a neuropsychologist but did not meet accepted criteria for dementia”—additionally, presented Logical Memory II data for their MCI group fell at a MOANS scaled score of 7;
Busse et al., 2003, p. 73: “more than one SD below age- and education-specific norms”;
Farias et al., 2009, p. 1152: “fell approximately 1.5 SDs below age-corrected norms”;
Fleischer et al., 2007, p. 2: “cutoff score approximately 1.5 to 2 SDs below the education adjusted norms”;
Griffith et al., 2006, p. 168: “objective memory impairment falling approximately 1.5 standard deviations or more below”;
Luis et al., 2004, p. 308: “cognitive impairment but of insufficient magnitude to negatively affect daily functioning”). Although these references do not encompass all MCI papers and their criteria for defining this state, they do suggest that there are many different definitions of MCI (both conventional and unconventional). So should one decide to use a rigid or flexible criterion for MCI? One opinion on this matter comes from Dr Ronald Petersen in his 2004 paper (p. 189):
In the literature, the cutoff score of 1.5 SD below age norms has been suggested by some investigators. In the original description of the MCI cohort followed at the Mayo Clinic, the MCI group's mean performance was 1.5 SD below their agemates. However, this was not a cutoff score, and of course, nearly half of the group had memory performance score falling somewhat <1.5 SD below the mean. This criterion should be interpreted in conjunction with the first criterion. The memory complaint is meant to represent a change in function for the person. The second criterion corroborates the complaint by attesting to and an actual impairment in performance. The clinician may be challenged by persons who are of either high intellect whose performance is now in the statistically ‘normal’ range, but this level of performance represents a change for that person, and by the person with a low education whose lower cognitive performance may not represent a change.
There are numerous examples in the literature to suggest that correcting for premorbid intellect is appropriate and wise in neuropsychology, especially in evaluating milder cognitive deficits in older adults. For example, Brooks and colleagues have recently published several papers that review the relatively high base rates of “impaired” performances in healthy subjects. In one (
Brooks et al., 2008), the authors specifically address the relevance to MCI, as they report that healthy individuals who are “premorbidly” low tend to achieve more impaired scores than healthy individuals who are “premorbidly” high functioning. One consequence of a rigid criterion for impairment is that low functioning individuals will be classified as MCI (or with other cognitive disorders) at a much higher rate. Another consequence is that high-functioning individuals will rarely be identified with cognitive impairments. For these latter individuals, there is the risk of missing an appropriate diagnosis because they have so much further to fall before passing a set threshold. In a related article (
Brooks et al., 2009), these authors recommend that premorbid functioning be considered when assessing evidence of cognitive decline. They also provide information that suggests that clinicians evaluating higher functioning individuals might use a more lenient criterion (e.g., 16th percentile) to define objective evidence of cognitive impairment. These authors are not alone in suggesting that intelligence (current or premorbid) provides value in assessing cognitive decline. The classic normative data for older adults (MOANS) have recently been re-calculated to adjust for IQ scores of patients (
Steinberg et al., 2005a,
2005b;
Steinberg, Bieliauskas, Smith, Ivnik, et al., 2005;
Steinberg, Bieliauskas, Smith, Langellotti, et al., 2005). Although any deviations from convention need to be supported with validation and longitudinal findings, these studies represent a growing trend within the field of neuropsychology to develop better methods for defining cognitive impairment, particularly in the elderly. For example, the ongoing revisions of the Diagnostic and Statistical Manual of Mental Disorders 5th Edition (
www.dsm5.org) suggest that premorbid intellect be considered (along with age, education, gender, and cultural factors) when determining if there has been a significant decline in cognition to support a diagnosis of Major Neurocognitive Disorder (formerly Dementia). Given our sample of highly educated individuals, our methods appear appropriate for capturing mild impairments in high-functioning individuals. Nonetheless, we also re-ran all our analyses after classifying individuals based on a stricter criterion for MCI (i.e., 1.5
SD below the normative mean). The results were very similar to those presented above, and the interested reader can contact the first author for a copy of those results.
Another explanation for the low sensitivity might be due to the clinical condition that we studied, as other studies comparing MCI to controls have generated similar results (
De Jager, Hogervorst, Combrinck, & Budge, 2003). It should not be surprising that a milder condition (e.g., MCI) separates less well from healthy controls than a more severe condition (e.g., AD). In clinical practice, it may be more feasible to tailor diagnostic decisions to the individual with some flexibility (e.g., weighting multiple sources of information and test data), whereas research requires more standardized cutoff scores that might somewhat arbitrarily separate a true continuum (e.g., cognitive functioning). The resulting mixed groups, when compared with distinct groups, could lead to lowered diagnostic accuracy.
There are several important limitations of this study. First, the classification of the current subjects was based almost entirely on cognitive test scores. Future studies should utilize additional clinical information to make this diagnosis (e.g., thorough physical examination, neuroimaging, biomarkers). Second, the amnestic subtype of MCI (single- or multidomain) was the only subtype examined in the current study, and these diagnostic accuracy estimates might not apply to non-amnestic MCI subtypes. Similarly, the diagnostic accuracy of the RBANS for other neuropsychiatric conditions with milder cognitive impairments (e.g., depression and substance abuse) should not be inferred from the current findings. Although most cognitive tests were corrected for age and education, three were not (BVMT-R, HVLT-R, and WRAT-3 Reading). These three tests were correcting for the age of the participants using data from the test manuals. However, this inconsistency in the norming of the measures could create some anomalies in classification of the participants or possibly bias against the RBANS. Finally, the current sample was exclusively Caucasian and well-educated, so the generalizability of these findings to a more diverse sample is uncertain. Despite these limitations, the current study provides some information about the diagnostic accuracy of the RBANS in suspected MCI, although this information suggests caution when using this measure in patients with milder cognitive deficits, such as those seen in MCI.