Contrary to clinical lore, this study did not find the expected relationships between the Stroop Interference test and behavioral disinhibition or frontal lobe atrophy. In the first of the study's two main findings, performance on the Stroop Interference test showed no association with behavioral disinhibition after controlling for color naming speed, age, and global cognitive functioning. Indeed, Color Naming performance showed a stronger relationship with behavioral disinhibition than did Stroop Interference performance, and this association remained significant after controlling for age and MMSE. In the second main finding, poorer Stroop Interference performance was associated most significantly with greater parietal lobe atrophy bilaterally, but also with greater left MFG atrophy, and less ACC atrophy bilaterally, after controlling for color naming speed, age, global cognitive functioning, and intracranial volume. The relationship between Interference and parietal lobe atrophy remained even after individuals with Alzheimer's disease, who are liable to show the highest rates of parietal lobe atrophy, were removed from the sample.
The relationship between the Stroop Interference test and behavioral disinhibition highlighted the importance of accounting for Color Naming speed when using the Stroop for clinical or research purposes. Without controlling color naming speed when investigating the Interference effect, any slowing on the Interference task could be due simple to slowing in processing speed. That is, if color naming speed is not controlled for when investigating the relationship between Stroop Interference and inhibition difficulty, it would be impossible distinguish between general slowing and true inhibition difficulty. Indeed, our results suggest that a very large proportion of variance in Stroop Interference scores can be attributed to simple color naming speed. Subsequent analyses showed that before accounting for color naming speed, Stroop Interference performance showed a weak, but statistically significant, negative correlation with behavioral disinhibition. After controlling for Color Naming speed using hierarchical regression, Interference speed was no longer a significant predictor of behavioral disinhibition.
This pattern of results was consistent across several statistical approaches. In each set of analyses, Interference performance failed to show the expected significant negative association with behavioral disinhibition, after simple color naming speed had been accounted for. When using logistic regression or the sample of people who had some evidence of behavioral disinhibition (i.e., ratings of ‘1’ or higher on NPI Disinhibition), the results were identical to those of the hierarchical linear regression. The lack of a statistically or clinically-significant relationship between slowed Interference speed and behavioral disinhibition in the analyses focusing on only individuals who were showing some level of behavioral disinhibition suggests that, even before controlling for Color Naming speed, Stroop Interference tasks are insensitive to behavioral disinhibition. Moreover, it does not appear that an association between Interference speed and behavioral disinhibition in the present study was attenuated to non-significance due to a speed-accuracy tradeoff; Interference errors were not significant predictors of behavioral disinhibition. Put simply, it does not appear that people with good inhibition slowed down in order to minimize errors, and that individuals with high levels of disinhibition failed to slow down and consequently committed more errors.
Previous studies have similarly failed to find an association between Stroop Interference and measures of behavioral disinhibition after accounting for Color Naming speed (
Cheung, Mitsis, & Halperin, 2004;
Back-Madruga et al., 2002). Indeed,
Marra and colleagues (2007) found that patients with frontotemporal dementia (FTD) completed the Stroop test more quickly than patients with Alzheimer's disease (AD) or Progressive Nonfluent Aphasia (PNFA), and these same FTD patients were more likely to be rated as disinhibited on the NPI than the AD or PNFA patients, although the latter finding did not meet the Bonferroni-adjusted significance level. Taken together, FTD patients were both faster on the Stroop, and also more disinhibited, than AD or PNFA patients (
Marra et al., 2007), suggesting that individuals with greater levels of disinhibition might complete the Stroop more quickly than individuals with lower levels of disinhibition.
A similar pattern of counterintuitive results emerged when examining the relationship between Stroop Interference and frontal lobe atrophy, with Stroop Interference showing weak relationships with frontal lobe regions after accounting for Color Naming. In simple bivariate correlations, all the frontal regions, but not left or right ACC, showed significant positive associations with Stroop Interference performance, suggesting initially that individuals with frontal lobe atrophy tended to complete fewer items on Stroop Interference than individuals with less frontal lobe atrophy. These correlations are consistent with previous research that has found an association between slowed Interference speed and frontal lobe atrophy or frontal lobe lesions when Color Naming speed was not accounted for (
Foong et al., 1997;
Stuss et al., 2001;
Soderlund et al., 2004). Once Color Naming performance was accounted for using partial correlations, right and left MFG were the only frontal regions that were significantly positively associated with Interference speed. After controlling for Color Naming speed, age, intracranial volume, MMSE, and non-frontal brain regions using hierarchical regressions, poorer Stroop Interference performance was significantly associated with bilateral parietal atrophy and left MFG atrophy, but larger left and right ACC volumes. This finding was also generally consistent with previous research: when Color Naming speed is accounted for, Stroop Interference speed tends to show poor associations with frontal lobe damage (
Hanninen et al., 1997;
Stuss et al., 2001).
The strength of the association between poor Stroop Interference performance and parietal lobe atrophy, rather than frontal lobe atrophy, was among the most unexpected findings in the study. Parietal atrophy accounted for a larger amount of variance in Stroop Interference performance than did left MFG atrophy, the only frontal lobe region in which atrophy was significantly associated with poorer performance. Due to the association between bilateral parietal atrophy and poorer Stroop performance, the present study suggests that the Stroop is not a specific indicator of atrophy in frontal lobe regions among individuals with mild cognitive impairment or dementia. Replication of this finding would substantially alter the interpretation of Stroop performance when it is used in neuropsychological assessment for localization, since poor performance may be more indicative of parietal atrophy than left dorsolateral prefrontal cortex atrophy.
One potential explanation for the poor relationship between the Stroop, behavioral disinhibition, and most of the regions of frontal lobe atrophy measured is suggested by a distinction between different types of inhibition and the neurological circuitry that may underlie them. Inhibition may not be a unitary construct (
Friedman & Miyake, 2004;
Nigg, 2000;
Harnishfeger, 1995), and while there is little evidence from this or other studies that the Stroop measures behavioral disinhibition (
Aarsland et al., 1999;
Back-Madruga et al., 2002;
Cheung, Mitsis, & Helperin, 2004), it may better measure a cognitive type of inhibition. Indeed, recent research examining frontal brain regions associated with behavioral disinhibition and executive dysfunction found a double dissociation, with atrophy in the OFC predicting NPI behavioral disinhibition, and atrophy in the MFG predicting executive functioning, including Stroop performance (
Krueger et al., under review). The possibility that neuroanatomically distinct pathways are necessary for cognitive and behavioral inhibition could help explain the poor association between Interference and behavioral disinhibition, the counterintuitive finding of an association between ACC atrophy and superior Interference performance, and why Interference performance was positively associated only with left MFG (which includes the left dorsolateral prefrontal cortex) rather than diffuse frontal regions.
In his theoretically-guided taxonomy of inhibitory functions,
Nigg (2000) classified the Stroop as measuring ‘interference control,’ or the ability to suppress competing internal or external stimuli when making a response. He distinguished interference control from intentional motor inhibition, which he described as “inhibiting a dominant or prepotent response” (p.223), and from inhibition in the context of attentional orienting, occulomotor inhibition, motivational inhibition, and automatic inhibition of attention (
Nigg, 2000). Using an empirical approach,
Friedman and Miyake (2004) used confirmatory factor analysis and structural equation modeling to investigate the relationship between different types of inhibition and their measurement. In their most parsimonious model, they found the Stroop task to load on a factor of Response-Distractor Inhibition along with antisaccade, stop-signal, Eriksen flanker, word naming, and shape matching tasks. Demonstrating an ability to discriminate between different types of inhibition, the Response-Distractor Inhibition factor correlated very poorly (0.01) with a Resistance to Proactive Interference factor of inhibition, which included three tasks that required individuals to inhibit irrelevant intrusions from working memory. They did not, however, include measures of behavioral or motivational disinhibition, so it is impossible to know from this research the extent to which behavioral/motivational and cognitive disinhibition can be empirically dissociated. Nevertheless, both of these taxonomies of inhibitory functions would likely predict the Stroop to have a poor relationship with behavioral disinhibition such as that measured by the NPI.
A differentiation between inhibition subtypes is supported by the distinct roles that are theorized for different frontal-subcortical circuits, with dorsolateral frontal-subcortical circuitry believed to be involved in working memory and attentional or cognitive control (
Petrides, 2000;
MacDonald et al., 2000), orbitofrontal-subcortical circuitry involved in social and behavioral inhibition (
Fuster, 1997), and ACC circuitry involved during tasks with conflicting information (
Carter, 2000;
MacDonald et al., 2000). Within this context, real-world behavioral inhibition might be expected to be most reliant upon healthy orbitofrontal cortex and circuitry, whereas the more cognitive Stroop task would be expected to be most reliant upon healthy dorsolateral and ACC cortex and circuitry, if fast Stroop performance requires healthy frontal cortex. Indeed, out of all the frontal regions, behavioral disinhibition on the NPI showed the strongest negative correlation with right and left OFC volumes (r=−0.43 for left OFC and r=−0.54 for right OFC), suggesting that greater OFC atrophy is associated with greater behavioral disinhibition. This correlation is also consistent with the distinction that
Dillon and Pizzagalli (2007) made between response inhibition and cognitive set inhibition in their review of the neuroanatomical substrates of three types of inhibition, insofar as it suggests that OFC is involved in inhibition of affectively meaningful, or reinforced, responses such as those that likely guide much of our real-world behavior. Of the frontal lobe regions, the Stroop Interference task, in contrast, showed (in the hierarchical regressions) a significant negative association with only left middle frontal gyrus volume, the area that corresponds to left dorsolateral prefrontal cortex. This relationship between left dorsolateral prefrontal cortex atrophy and poorer Stroop Interference performance is consistent with
MacDonald et al.'s (2000) finding that dorsolateral prefrontal cortex is involved in attentional control necessary for performing the Stroop Interference task.
Although its contribution is small, explaining around 1% of the variance, we did find that ACC volumes bilaterally were negatively associated with Interference performance. Conceptualizing the ACC as a structure that is involved in conflict monitoring (e.g.,
Carter & van Veen, 2007), rather than one involved in inhibition, renders less surprising the present finding. Within this model, when an individual performs the Stroop task, the normally-functioning ACC should become activated upon recognizing the conflict in the task (i.e., the conflict between the overlearned word reading response and the incongruent ink color identification response), and the individual's performance should slow modestly. The ACC should then recruit frontal regions including the dorsolateral prefrontal cortex (DLPFC) and possibly the medial frontal cortex to improve cognitive control (e.g., improve focus on the response-appropriate stimuli) and perform more the task more efficiently (
Carter & van Veen, 2007;
Nessler et al., 2007). However, if the dorsolateral DLPFC is working sub-optimally, the individual may have difficulty improving cognitive control, thus leading to poorer performance on the Stroop task despite good ACC functioning. Increased conflict monitoring in the ACC within the context of sub-optimal DLPFC functioning thus may impede Stroop performance by causing individuals to slow down despite being unable to benefit from improved cognitive control. Indeed, even cognitively normal older adults may show intact conflict monitoring but reduced engagement of cognitive control processes when faced with high-conflict tasks such as the Stroop (
Nessler et al., 2007).
If we consider regional volume to be an indicator of possible neurological activity such that smaller ACC volumes are likely to be less active than larger ACC volumes, our present finding is also consistent with previous findings of an association between greater activation and slower Stroop performance (
Raz et al., 2005;
MacDonald et al., 2000). Similarly,
Milham et al. (2002) found greater ACC activation using functional MRI imaging during the Stroop amongst older adults compared to younger adults, and interpreted this ACC activation as a response to poorer attentional control and greater propensity for errors amongst older adults, rather than a necessary part of inhibition. Within the context of the ACC's apparent role in conflict-monitoring (
Carter & van Veen, 2007;
Carter et al., 2000; Gehring & Knight, 2000;
Carter et al., 1998), the relationship between larger ACC volume and slower Stroop Interference speed is consistent with the idea that ACC involvement during tasks of conflict might serve a cautionary function of slowing responses, or that slowing may be a by-product of monitoring during conflict-laden tasks.
Despite the null findings in the first portion of the study, it remains possible that the Stroop could better measure inhibition deficits in this population if other—possibly more cognitive or attentional—types of inhibition were investigated. Because we did not include measures of different types of inhibition, one limitation of the present study is that it cannot shed any light on whether the Stroop Interference task may be a better measure of another type of disinhibition than behavioral disinhibition. This may be an important avenue of future research within samples of individuals who are at high risk for inhibitory deficits, given the evidence and theoretical basis for distinct inhibitory processes (
Friedman & Miyake, 2004;
Nigg, 2000;
Harnishfeger, 1995). In addition, because our study focused entirely on older adults with neurodegenerative disorders, it is possible that the Stroop may have stronger associations with disinhibition in other populations of individuals with inhibitory deficits.
The sole use of an informant report as a measure of behavioral disinhibition is another limitation of the present study. Although informant ratings of patient behavior have several advantages over clinician's ratings of a patient's behavioral disinhibition—such as the knowledge of what a patient was like prior to the onset of a neurodegenerative disease, and the ability to observe the patient over a longer period of time and a range of situations—rating scales based upon informant reports may have their own problems. Because different caregivers may have different internal criteria for endorsing symptoms of behavioral disinhibition in the patient, inter-informant error is introduced that would not be present if one clinician rated each patient on the level of disinhibition exhibited during an evaluation. Such error variance could come from inter-informant differences in observational skills, willingness to endorse undesirable behavior, and overall level of stress. In the present study, error variance is a particular problem since we are reporting primarily null findings, and increased error variance reduces one's ability to find significant results.
However, if added inter-informant error variance played a large role in preventing us from finding significant results, we would expect that Interference would show poor associations with other indices from the NPI, since all indices from the NPI would logically be expected to have similar inter-informant error variance. When we examined bivariate associations between Interference and other indices from the NPI, Interference was significantly negatively correlated with Agitation (p<0.01), Depression (p<0.01), Anxiety (p<0.01), Euphoria (p<0.05), Apathy (p<0.0001), Irritability (p<0.01), Aberrant Motor Behavior (p<0.05), Night-time Behavior Disturbances (p<0.01), and Eating Behavior Changes (p<0.001). After controlling for Color Naming, only NPI Anxiety remained significantly associated with Stroop Interference (ρ=−0.21, p<0.05). As mentioned earlier, previous research has also found an association between Stroop Interference and other indices of the NPI, but not Disinhibition (
Aarsland et al., 1999;
Back-Madruga et al., 2002), suggesting that informant-based measures of behavioral disinhibition can show significant correlations with neuropsychological test results.
Because statistical factors can also influence one's ability to find significant results, multiple statistical approaches were used within the first part of the study, with each approach compensating for weaknesses in the other approaches. Hierarchical linear regression allowed us to model the data in a manner that is among the most commonly used in psychology research, and which preserves good statistical power. However, because the NPI Disinhibition variable was ordinal rather than continuous, and could not be made to approximate a good normal distribution even with data transformations, we also used the more statistically appropriate—albeit less powerful and less commonly used—approach of proportional odds ordinal logistic regression. Finally, we examined the association between Color-Word Interference and behavioral disinhibition using only individuals who had received ratings of ‘1’ or higher on the Disinhibition rating scale, replicating the poor relationship between Color-Word Interference and behavioral disinhibition even among a sample of individuals who showed some evidence of disinhibition. This last approach reduced the sample size, therefore reducing statistical power, but it allowed a focus upon the individuals at highest risk for behavioral disinhibition. As mentioned earlier, each of these approaches resulted in null findings, showing no association between Stroop Interference and behavioral disinhibition after controlling for color naming.
Although the failure to reject a null hypothesis does not mean that the null hypothesis should be accepted, we thought it noteworthy that the Stroop interference task failed to be a significant predictor of behavioral disinhibition and atrophy in most regions of the frontal lobes. This failure occurred despite a large sample size, multiple methods of examining the data, and focusing upon a population at high risk for behavioral disinhibition, a population for whom the Stroop is thought to be a useful and meaningful measure. The findings suggest not simply that there was insufficient evidence for the null hypothesis to be rejected, but also show a clinically insignificant relationship between the Stroop and behavioral disinhibition, once basic color naming speed is accounted for. While the present study did not examine other types of inhibition, previous studies have not found strong evidence that Stroop Interference is a good measure of inhibition after color naming time is accounted for (
Shilling et al., 2002;
Earles et al., 1997;
Salthouse & Meinz, 1995;
Shum et al., 1990), as noted earlier. Moreover, the present findings showed unimpressive or counterintuitive relationships between the Stroop and atrophy in frontal lobe regions, with parietal lobe atrophy predicting poor Stroop performance better than dorsolateral prefrontal cortex atrophy, and with ACC atrophy predicting improved Stroop performance. Hence, the data compelled us to come to the unconventional conclusion of accepting the null hypothesis in this instance: the Stroop does not appear to be a meaningful measure of behavioral disinhibition—although it may be a measure of cognitive inhibition—or a specific indicator of frontal lobe damage. We suggest that neuropsychologists be cautious in interpreting results from the Stroop, and should avoid interpreting it in isolation from color naming speed.