The Stroop effect has captivated researchers for over 75

years and has resulted in a vast (and daunting) body of literature. Versions of the Stroop paradigm have been used to study diverse cognitive phenomena, like selective attention, inhibition and executive control, conflict detection and monitoring, and automaticity and lexical access (see MacLeod,
1991), and have been used clinically to test for deficits in many areas (Green et al.,
2010; Peckham et al.,
2010; Pukrop and Klosterkötter,
2010). In the field of bilingualism, the Stroop paradigm has been commonly used to analyze the degree of interference or alternatively the degree of automaticity of access to words in each language and across languages (see Francis,
1999, for a review). The color word Stroop task (Stroop,
1935) has participants name the color of words printed in congruent (RED in red) or incongruent ink color (RED in green). The Stroop effect occurs when incongruent items elicit slower naming times than congruent items, which is generally thought to reflect interference due to the automaticity of reading words compared to naming colors. Bilinguals add the complexity of being able to perform the Stroop task in both of their languages. Moreover, the languages used for the distracter words and naming can match (within) or not (between), such that interference within each language and between languages can be measured. Because the Stroop paradigm taps into a complex set of cognitive processes, there is still much debate over the nature of this powerful effect. The goal of the current study is to examine the behavioral and neural correlates of the bilingual Stroop task to inform word access, attention, and inhibition in the bilingual brain, as well as the nature of the Stroop effect more generally.
The Stroop effect has commonly been explained as a response level conflict, by accounts like the relative speed of processing – where competition occurs strictly at response, in having to choose the color over the faster processed word – and automaticity of access – where faster spread of activation throughout a network of concepts, and inversely smaller attentional demands, occurs for more automatic processes, like reading than naming (see MacLeod,
1991). Connectionist models of the Stroop, such as Cohen et al.’s (
1990) model propose that interference can arise from any level of processing, from input to output. Information from the color and the word are processed in parallel in a distributed network with interconnections that are weighted based on experience. Attention plays a critical role in tuning these weights, such that an attentional set can be created for the specific task and even the specific response set simply by virtue of the strength of the connections between the attended items. MacLeod (1991; MacLeod and MacDonald,
2000) has argued that connectionist models present a more parsimonious account of the many factors that affect performance on Stroop tasks, accounting for both the speed of processing and automaticity differences. However, these models do not fully address the nature of the bilingual Stroop.
The Stroop effect is modulated by factors unique to operating in a bilingual mode. There is even some evidence that bilinguals can perform better on the Stroop task compared to monolinguals (Bialystok et al.,
2008), a skill thought to emerge from the cognitive demands of managing two languages. Individual factors, such as dominance and relative proficiency in the languages (Mägiste,
1985; Chen and Ho,
1986; Tzelgov et al.,
1990; Francis,
1999; Rosselli et al.,
2002; Zied et al.,
2004; Gasquoine et al.,
2007), and form level factors of the stimuli, such as orthographic or phonological overlap between the languages (Preston and Lambert,
1969; Roelofs,
2003), both affect performance on the Stroop task. Bilinguals with one dominant language (herein, unbalanced bilinguals) experience greater Stroop interference when performing in the dominant than weaker language on within language trials, and experience more interference from distracter words written in the dominant than the weaker language on between language trials. In contrast, bilinguals with equivalent proficiency in both languages (herein, balanced bilinguals) generally exhibit no difference in the amount of interference across their languages, both naming within or between languages. This dynamic has been shown to change as the relative proficiency of a bilingual’s languages changes (Mägiste,
1984,
1985; Chen and Ho,
1986).
In addition, bilinguals experience different magnitude of Stroop interference based on the degree of overlap of the word forms across languages (Sumiya and Healy,
2004). When color words share orthographic features across languages (green,
grun) the magnitude of the Stroop effect is equivalent within a language (written and naming languages are the same) and between languages (Roelofs,
2003). However, when there is no orthographic overlap across languages (black,
schwarz) the within language Stroop effect (incongruent versus congruent) is on average twice the magnitude of the between language effect (Francis,
1999). This has been referred to recently as the within language Stroop superiority effect (WLSSE; Goldfarb and Tzelgov,
2007), but we feel this inappropriately deemphasizes the importance of the between language effect. Therefore, we refer to this between-within language Stroop difference herein as the BWLS or the bilingual Stroop effect, interchangeably. This phenomenon was first observed by Dalrymple-Alford (
1968), Dyer (
1971) and Preston and Lambert (
1969) and has since been replicated across several languages and tasks (Dyer,
1971; Chen and Ho,
1986; Tzelgov et al.,
1990; Goldfarb and Tzelgov,
2007; see reviews by MacLeod,
1991; Francis,
1999). Spanish and English bilinguals (our target sample) generally show this BWLS (Preston and Lambert,
1969; Dyer,
1971), with few exceptions (Rosselli et al.,
2002).
Under the accounts of the Stroop effect discussed above, which do not directly address the bilingual language system, it is clear how the proficiency of a language could affect the automaticity and/or speed of processing of the words in each language, but it is not clear how within language distracters elicit a significantly larger effect than between language distracters without further restrictions on the processors. This complexity is a result of bilinguals having two lexical representations for a single concept (“red” and “rojo” for concept RED Okuniewska,
2007). There is growing support for a model of bilingual lexical access in which both languages are non-selectively activated, at least at some stages of word recognition, even if processing demand is restricted to one language (Green,
1998; Spivey and Marian,
1999; Dijkstra and Van Heuven,
2002; Rodriguez-Fornells et al.,
2005; Costa et al.,
2006; Sunderman and Kroll,
2006). These lexical items must be kept at bay when they are not needed, but there is less of a consensus about how bilinguals, particularly those with high proficiency in a second language, prevent cross language interference.
Some contend that a mechanism of inhibition is required (Green,
1998; Kroll et al.,
2010), while others propose that only language relevant items are “flagged” when attending to one language on a task, creating an attentional set of plausible responses (Roelofs,
2003,
2010). A third account proposes a mechanism of access through activation thresholds similar to other connectionist models (Dijkstra and Van Heuven,
2002). Spread of activation can occur between languages at various levels of processing, from semantic (Dijkstra et al.,
1998; Lemhöfer and Dijkstra,
2004) to orthographic (Dijkstra et al.,
1998; Jared and Kroll,
2001), and as a function of proficiency (see also Sunderman and Kroll,
2006, for a different account). Only one of these models has addressed the BWLS directly, claiming that it is something equivalent to a response set effect in monolinguals (Roelofs,
2003,
2010; Goldfarb and Tzelgov,
2007).
A response set effect (or membership effect) is observed when distracter words that are actively used for responding on the task, e.g., GREEN, RED, YELLOW, BLUE, cause more interference (larger Stroop effect) than other color words that are not being actively used to respond, e.g., PINK (Klein,
1964; Proctor,
1978; Glaser and Glaser,
1989; Lamers et al.,
2010). Most accounts of the response set effect propose that it occurs at response and not during access to meaning. Cohen et al. (
1990) describe response set effects as occurring at the output level of processing by attentional selection of a set of relevant responses. In a slightly different account, Roelofs (
2003,
2010) restricts the response set effect to the response level, but does so by “flagging” the response relevant items at the conceptual level in the multi-tiered WEAVER++ model. The flag results in setting and maintaining an attentional set for the response relevant items (see also Treisman and Fearnley,
1969), shielding valid responses from interference anywhere except at the output layer (response selection). Hence, response set effects elicit response conflict, not because the response-irrelevant words elicit competing responses directly, but rather by spread of activation to the response set at the semantic level. It has been argued that this attentional set account can better explain the response set effect than models that propose inhibition of irrelevant responses during stimulus evaluation (see Lamers et al.,
2010). Roelofs has argued that the BWLS can be explained parsimoniously with monolingual data as a response set effect. Similar to the word PINK in the example above, the between language words, that is words that are viewed but not actively prepared for naming, e.g., VERDE, ROJO, AMARILLO, AZUL, receive less activation than the equivalent within language response set of words. In this way, the BWLS effect would be caused by differential spread of activation from the response set to related color words in the other language. If this is the case, then there should always be greater activation for response set items, and color incongruent items should be named more slowly for the response relevant than irrelevant language. Similarly, the neural correlate for the BWLS should reflect this differential spread of activation, perhaps as a modulation of amplitude from response relevant to irrelevant but related words.
This is the first study to use event related potentials (ERP) to address the source of the BWLS. In recent history, the debate over the source of Stroop interference, more generally, has been informed by electrophysiological techniques, which provide a way of experimentally disentangling semantic and response level effects. Scalp-recorded ERP, which have extraordinary temporal resolution (on the order of milliseconds), are especially well suited to investigate the timing of cognitive events. Early ERP studies of Stroop interference focused on the P300 –a component found to vary in latency with stimulus evaluation, but not response selection (Kutas et al.,
1977; for a review of the P300, see Polich,
2007). Since the P300 latency is insensitive to color congruence on the Stroop task, the Stroop effect must occur later in processing, that is at response selection (Duncan-Johnson and Kopell,
1981; Ilan and Polich,
1999; Rosenfeld and Skogsberg,
2006; however Lansbergen and Kenemans,
2008, found modulation of P300 with low probability of Stroop trials).
In fact robust Stroop effects have been observed later in time at the N450 (or medial frontal negativity – MFN) and the conflict sustained potential or SP (Rebai et al.,
1997; West and Alain,
1999; Liotti et al.,
2000; Markela-Lerenc et al.,
2004; West et al.,
2004,
2005; Larson et al.,
2009). While the functional significance of these components is not yet fully understood, they are thought to index different levels of conflict processing and are distinguished both by what modulates them and topographical distribution. The conflict SP, which can range in latency and duration based on task demands, generally occurs after the N450, showing increased amplitude for color incongruent than congruent trials (West and Alain,
1999; Liotti et al.,
2000; West,
2003; Markela-Lerenc et al.,
2004; West et al.,
2005; Larson et al.,
2009). The activity in this window may reflect a complex of cognitive processes, including response selection, and response monitoring and conflict adaptation, respectively by region of the SP (West et al.,
2005; Chen et al.,
2011).
The N450 precedes the SP as a medial fronto-central negativity between 300 and 500

ms post-stimulus onset. It is more negative in amplitude for color incongruent than color congruent stimuli, and increasing the degree of conflict increases N450 amplitude (West and Alain,
2000). Though its timing can vary with task demand, the N450 has been observed on a variety of Stroop-like tasks (West et al.,
2005), with both covert (silent naming) and overt (naming aloud) responses (Liotti et al.,
2000). The component’s neural generators have been source localized to the anterior cingulated cortex (ACC; West,
2003; Markela-Lerenc et al.,
2004). Some have argued that the ACC is responsible for “directing attention to a goal, even in the absence of conflict” (MacLeod and MacDonald,
2000), while others contend that it is responsible for conflict detection and monitoring (Van Veen and Carter,
2002; Carter and Van Veen,
2007) and that separate parts of the ACC respond to semantic (stimulus) and response conflict (Roelofs,
2003; van Veen and Carter,
2005; Wendt et al.,
2007; Aarts et al.,
2009; Bialystok and Craik,
2010). At least one study suggests that the ACC should be more involved in between- than within language processes (Abutalebi et al.,
2008) to prevent interference from the non-target language.
The N450 effect has been observed for both response and non-response type conflict on a counting task, suggesting that it might be sensitive to both incongruent but response eligible (i.e., response set) and incongruent but response ineligible items (West et al.,
2004). This would suggest that both within and between language words might modulate N450 amplitude. However, a more recent study showed that only response conflict, and not stimulus conflict, modulated the N450 on a 2-1 mapping color word Stroop task (Chen et al.,
2011). By mapping two color words to one finger (index finger, BLUE/GRAY; middle finger, GREEN/WHITE; ring finger, YELLOW/PURPLE), the source of conflict was parsed by presenting trials with color incongruent words that created stimulus (GREEN/WHITE) or response (and stimulus) conflict (YELLOW/GRAY; Chen et al.,
2011). N450 amplitude was more negative for response incongruent than color congruent trials, but no different for stimulus incongruent and congruent trials. Based on these findings, the BWLS may be reflected as a modulation of the N450, with a larger Stroop effect for between than within language trials.
Finally, response set (and the BWLS) may modulate earlier ERP components than the N450 and conflict SP, in particular the N2 (Folstein and Van Petten,
2008). Although the conflict N2 has not been robustly elicited in a Stroop task (West et al.,
2005), its amplitude increases with increasing magnitude of conflict on other tasks, like the Eriksen flanker task (Van Veen and Carter,
2002; Wendt et al.,
2007). If the conflict N2 is sensitive to the degree of conflict on the bilingual Stroop task, then greater N2 amplitude might be expected for within than between language distracters. Alternatively, attention to response relevant information, or attentional set, specifically in word recognition tasks, has been shown to modulate N2 (or N200) amplitude with increased negativity for attention to orthographic features of a word (Ruz and Nobre,
2008; see also Grainger et al.,
2006, for a similar component that is modulated by orthographic processes in a priming paradigm). The N2 has been modulated on bilingual tasks that focus attention on one language at a time or cause a switch between languages (Jackson et al.,
2001; Rodriguez-Fornells et al.,
2005). In addition, Proverbio et al. (
2009) found that bilinguals can use orthographic information to distinguish between real and pseudo native language words (Italian) as early as 160–180 ms. Hence, the language of response relevant words in the bilingual Stroop task may be detected and processed early, reflected by modulation of the N2 (see Atkinson et al.,
2003, for early perceptual effects in a Stroop task).
The current study used behavioral and electrophysiological measures to investigate how Spanish–English bilinguals process language and color congruence in a modified bilingual Stroop task across two experiments. Our central aims were to investigate (1) the unique contribution of language incongruence in the bilingual Stroop paradigm and (2) the temporal dynamics and neural correlates of cognitive control in balanced bilinguals while performing a bilingual Stroop task. In Experiment 1, we collected response time (RT) and error data across single and mixed language blocks to determine the pattern of within and between language effects for our sample (Spanish–English bilinguals) and to explore the possibility that balanced and unbalanced bilinguals use different strategies in mixed versus single language context to manage cross language interference. In Experiment 2, we collected ERP data using EEG to record brain activity while balanced bilinguals performed the single language blocks from Experiment 1 both overtly (for behavioral analysis) and covertly (for ERP analysis) to determine the source of the bilingual Stroop effect or BWLS.