|Home | About | Journals | Submit | Contact Us | Français|
Although bilinguals rarely make random errors of language when they speak, research on spoken production provides compelling evidence to suggest that both languages are active when only one language is spoken (e.g., Poulisse, 1999). Moreover, the parallel activation of the two languages appears to characterize the planning of speech for highly proficient bilinguals as well as second language learners. In this paper we first review the evidence for cross-language activity during single word production and then consider the two major alternative models of how the intended language is eventually selected. According to language-specific selection models, both languages may be active but bilinguals develop the ability to selectively attend to candidates in the intended language. The alternative model, that candidates from both languages compete for selection, requires that cross-language activity be modulated to allow selection to occur. On the latter view, the selection mechanism may require that candidates in the non-target language be inhibited. We consider the evidence for such an inhibitory mechanism in a series of recent behavioral and neuroimaging studies.
When bilinguals perform even the simplest production task, such as speaking the name of a familiar object in one of their two languages, there is evidence that both languages are active and influence performance (e.g., Colomé, 2001; Costa, Miozzo, & Caramazza, 1999; Hermans, Bongaerts, De Bot, & Schreuder, 1998; Kroll, Bobb, & Wodniecka, 2006). Although there is abundant evidence for parallel activity of the bilingual’s two languages in comprehension tasks (e.g., Van Heuven, Dijkstra, & Grainger, 1998; Spivey & Marian, 1999), finding that the unintended language is available during spoken production remains surprising. Unlike the bottom-up processing that characterizes word recognition (e.g., Dijkstra & Van Heuven, 2002), speaking is initiated by a conceptually-driven process that takes a thought and maps it on to available lexical information. In theory, the conceptual nature of spoken production should allow the language to be selected early in speech planning as one aspect of the thought to be expressed. Although some instances of early language selection may be possible, for example, when bilinguals who have a stronger first (L1) than second (L2) language speak in their L1 (e.g., Bloem & La Heij, 2003; La Heij, 2005), most recent studies of bilingual word production have shown that the intention to speak one language only does not suffice to limit activation to alternatives in that language alone.
A striking aspect of bilingual speech is that proficient bilinguals do not make random errors of language. At the same time, they are able to code switch with ease with others who are similarly bilingual (e.g., Muysken, 2000; Myers-Scotton, 2002). Although one might argue that fluency at the level of sentence or discourse production is supported by a range of mechanisms that might be unavailable in decontextualized word production, the fact remains that bilingual spoken production is better than might be expected if we assume that both languages are available in parallel and potentially compete for selection. That observation has led some to propose that bilinguals possess an exquisite mechanism of cognitive control that develops as they gain skill in the L2 (e.g., Green, 1998) and that has consequences more generally for executive control processes (e.g., Bialystok, Craik, Klein, & Viswanathan, 2004) and for their neural representation (e.g., Abutalebi & Green, 2007).
Understanding the way in which spoken word production is accomplished in bilinguals when two or more alternatives are available requires that the nature of cross-language activity be characterized and that a mechanism of selection be specified. A number of previous studies have considered the first of these questions in detail (see Costa, 2005, and Kroll et al., 2006, for recent reviews). The available evidence provides support for a number of different loci of cross-language activation during the planning of a single word utterance. Figure 1 is a representative model of bilingual word production adapted from previous work by Hermans (2000) and Poulisse and Bongaerts (1994). The general assumption in models of lexical production (e.g., De Bot & Schreuder, 1993; Levelt, 1989) is that at least three component processes must be engaged prior to articulation. A concept and its closest lexical representation must be selected and the phonology that corresponds to that lexical representation must be specified. For bilinguals, because there are multiple alternatives in each language, there can be activation of abstract candidates at the lemma level or among phonological competitors.1 Note that the model illustrated in Figure 1 assumes that a language cue represents the intention to name the object in one of the two languages. As we will discuss later, the representation of the intention to speak one language alone may be influenced by the relative dominance of the two languages for the bilingual, by the context in which spoken production occurs, and by features of the two languages themselves.
Kroll et al. (2006) argued that cross-language alternatives may be active at any of the loci shown in the model. The degree to which there is sustained activity of the nontarget language will depend on a variety of factors, including the language of production, proficiency in the L2, the task that initiates speech planning, and the degree to which specific lexical alternatives are primed. As noted above, when production occurs in L1, there may be little evidence of L2 influence because L1 is more skilled than L2 and the rapid time course of speech planning in L1 may not provide an opportunity for L2 to come into play (e.g., Bloem & La Heij, 2003; Kroll, Dijkstra, Janssen, & Schriefers, in preparation).
In contrast, when bilinguals speak in the L2, particularly when they are more dominant in L1, there may be multiple influences of L1 on L2 (e.g., Costa, Caramazza, & Sebastián-Gallés, 2000; Costa, Roelstraete, & Hartsuiker, 2006; Hermans et al., 1998; Hoshino & Kroll, 2008). Notably, these cross-language interactions in production appear to function between lexical and sub-lexical levels (Costa et al., 2006), to extend to feed-back and well as feed-forward interactions (Kroll et al., in preparation), and to late specification of the phonetic properties of realized speech (e.g., Engstler & Goldrick, 2007; Gerfen, Jacobs, & Kroll, 2005).
Although there is not complete agreement about the extent to which each result in past studies uniquely demonstrates the presence and locus of cross-language activation in the planning of words in each of the bilingual’s two languages (see Costa, La Heij, & Navarrete, 2006), taken together, the evidence is quite compelling. If alternatives are active in the two languages, how is the correct word selected? Two types of selection mechanisms have been contrasted. According to a language-specific selection model (e.g., Costa et al., 1999), information about words in the unintended language may be activated but those words are not themselves candidates for selection. Note that the presence of cross-language activation itself rules out an extreme language-specific model in which one of the two languages is effectively switched off or inhibited in advance to enable the bilingual to function as a monolingual speaker (and see Wang, Xue, Chen, Xue, & Dong, 2007, for recent neuroimaging evidence suggesting that there is no brain area uniquely associated with a language switch). The proposed language-specific model is functionally a “mental firewall” such that the language cue effectively signals which activated alternatives are on the right side of the wall. A threshold version of the language-specific model assumes that the language cue acts to set the activation level higher for candidates in the target language, thereby avoiding potential competition between them at the point when selection occurs. Finkbeiner, Gollan, and Caramazza (2006) proposed the threshold model to be a mechanism to avoid what they consider to be the “hard” problem of lexical competition. In contrast, the language non-specific model assumes that words in both languages are potential candidates for selection. The language non-specific model allows competition for selection such that candidates within and across languages actively compete with alternatives in the unintended language which are eventually inhibited to allow accurate production to proceed (e.g., Green, 1998). Costa and Santesteban (2004) recently proposed a reconciliation of these alternatives by arguing that more proficient bilinguals have acquired the skills to avoid the “hard” problem whereas L2 learners and less proficient bilinguals may be more likely to face cross-language competition that requires subsequent inhibition. On this view, the models are both correct but describe different states of bilingualism.
In the remainder of this paper we consider the selection models in more detail, reviewing existing findings as well as new evidence that we believe provides a more compelling case for the need for an inhibitory mechanism for even proficient bilinguals. In the course of our review we consider the role that the production tasks used may have contributed to this debate and the evidence that bilinguals can exploit language-specific cues when they are present. Most critically, we consider the emerging literature that examines the neural basis of language selection using event related potentials (ERPs) and functional magnetic resonance imaging (fMRI) to examine the time course and localization of bilingual speech planning processes.
Three approaches have been adopted to test the language specific vs. non-specific models of spoken word production. The first approach is to use a variant of the Stroop task (Stroop, 1935) in which pictures are named (e.g., Hermans et al., 1998) or words are translated (e.g., La Heij et al., 1990). During speech planning, a distractor word, related to the name of the picture or word to be translated, is presented visually or auditorily. The general prediction is that at the point in planning when the language of speaking is selected, there should no longer be any effect of distractors in the nontarget language.
The second approach involves mixing and/or switching the languages of production to examine the consequence of having to prepare alternatives in both languages. If both language alternatives are normally active during the planning of a single word utterance in one language alone, then mixing the language of production should have little consequence for performance relative to blocked language naming conditions (e.g., Kroll et al., in preparation). In a sense, forcing the languages to be mixed potentially disrupts the mechanism of selection that would ordinarily be adopted under blocked conditions. Identifying the way in which that disruption is manifest provides insight into the selection mechanism itself (e.g., Meuter & Allport, 1999). Likewise, when bilinguals are required to switch between their two languages, we can examine the magnitude of switch costs and the relative impact for the L1 vs. the L2. If one language must be inhibited to produce the other, these differential inhibitory demands should be revealed in the pattern of processing costs observed following a language switch.
A third approach to examining language selection processes is to exploit the presence of shared cross-language features. The logic of this approach is similar to the one that has been used extensively in the literature on bilingual word recognition in which performance on language ambiguous words, such as cognates, interlingual homographs, and interlingual neighbors can be compared with performance on language unambiguous words (e.g., Dijkstra, 2005). Because languages often share aspects of their lexical and/or sublexical representations, it is possible to have bilinguals perform a task in one language alone and to ask whether their performance in producing words with shared-language features is similar to words that are unique to one language alone. If both languages are active when even a single language is required, then these language ambiguous materials should give rise to a different pattern of performance than language unique materials. Furthermore, bilinguals and monolinguals would be expected to perform similarly on the language unique words but only bilinguals would be predicted to respond differentially to the language ambiguous words. In previous production studies, the effect of cross-language cognate status has been examined as a means of determining whether the phonology of the unintended language is active during the planning of the target utterance (e.g., Christoffels, De Groot, & Kroll, 2006; Costa et al., 2000; Kroll et al., in preparation). Although the presence of cross-language activity does not, in and of itself, reveal the nature of the selection mechanism, a comparison of these effects for language pairs with different properties provides a means to determine whether language selection is sensitive to language-specific features (e.g., Hoshino & Kroll, 2008).
In the next section of the paper we review briefly the evidence from investigations that have adopted each of these approaches and consider the implications for each of the models of language selection. We summarize the findings of a set of experiments that we believe provide support for the competition-for-selection alternative and the need for an inhibitory process. Here we describe the central findings in past research using each of the three primary empirical approaches used to examine bilingual word production: picture-word interference, language switching, and the effects of cognate status on picture naming. In the section that follows, we then consider how very recent ERP and fMRI studies on bilingual speech planning might allow us to test the alternative selection models more sensitively.
One of the claims for cross-language competition in bilingual word production comes from findings using the Picture-Word Interference (PWI) paradigm. As noted earlier, in the standard version of the task, a picture is presented followed by a distractor word at a variable SOA with respect to the presentation of the picture. The objective of the task is to name the picture while ignoring the distractor. The typical finding in PWI is that words that are semantically related to the picture cause interference, while words that are phonologically and/orthographically related facilitate picture naming (La Heij, Van der Heijden, & Schreuder, 1985; Lupker, 1979, 1982). In the bilingual version of the task, distractor words from the nontarget language that are semantically or phonologically related to the picture’s name also produce interference and facilitation, respectively (e.g., Costa et al., 1999; Hermans et al., 1998). Hermans et al. (1998) also reported an inhibitory effect for distractors that were phonological neighbors of the translation. The “phono-translation” effect followed a time course similar to effects observed for semantic distractors which led Hermans et al. to conclude that the nontarget language was active to the level of the lemma. In and of themselves, these effects of the nontarget language distractors suggest that speech planning is open to the influence of the unintended language.
Most critically, pictures paired with distractors that are the translation of the target word itself also facilitate picture naming (Costa et al., 1999; Hermans, 2004). Costa et al. argued that if lexical items compete for selection, then a translation distractor should cause the most interference of all since it directly activates the competing lexical alternative. They took the translation facilitation effect as support for a language-specific mechanism that does not consider lexical nodes in the competing language for selection. An important feature of the translation facilitation effect is that it is short lived relative to the effects of identical distractors (i.e., the picture’s name in the target language). The larger and more extended facilitation effect for the identity condition was taken as further support for language-specific access since it suggests that lexical activation of the unintended language had less of an effect on the production of the target name.
Hermans (2004) argued that the differential time course and magnitude of the identity condition versus the translation condition could be accommodated in either a language-nonspecific model that assumes competition or a language-specific model of lexical selection that does not. Under the former logic, cross-language activation reduces the amount of facilitation of the conceptual, lexical, and phonological activation. Under the latter model, other factors such as a delay in activation from the translation distractor or stronger activation by identity distractors could change the time course and effect magnitude respectively.
The debate concerning the interpretation of translation facilitation also raises an important issue regarding the processing of the distractor word in PWI. If bottom-up activation of the distractor word intrudes into processing that would not otherwise occur during the planning of the picture’s name, it becomes difficult to tell whether the observed effects reveal the locus of cross-language activation and language selection during speech planning or a complex interaction between the processing of the word and picture. Following this line of argument, it becomes difficult to use PWI data alone to adjudicate clearly between the two selection models.
In a seminal study investigating control in bilingual production, Meuter and Allport (1999) empirically tested the proposal of cross-language competition and subsequent suppression/inhibition using a switching paradigm. Central to the logic of their experiment is the idea of the Task Set Inertia hypothesis developed first in non-linguistic task-switching studies (Allport, Styles, & Hsieh, 1994). In these tasks one observes what seems to be a paradoxical effect where, following a task switch, the dominant task shows a greater switching cost than the nondominant task. Meuter and Allport explain this phenomenon in terms of suppression, such that the dominant task must be actively suppressed in order to prevent interference with the nondominant task. When switching into the dominant task, the active suppression of the dominant task on the preceding non-dominant trial persists to disrupt processing on the subsequent trial. To investigate whether language switching reveals a similar pattern, Meuter and Allport asked bilingual participants to rapidly name Arabic numbers in either their L1 or L2, cued by the background color on which the number appeared. Results supported the Task Set Inertia hypothesis, with asymmetric switch costs observed such that switch costs were greater for naming in the L1 than in the L2. Meuter and Allport (1999) proposed a mechanism of suppression whereby the stronger L1 is inhibited to allow production in the weaker L2.
More recently, Finkbeiner, Almeida, Janssen, and Caramazza (2006) challenged the original interpretation of Meuter and Allport’s (1999) results due to what they argued was a confound in the nature of the design. Their claim is based on the idea of valence. Bivalent stimuli that are paired with two distinct responses typically elicit the characteristic asymmetric pattern of switch costs. Univalent stimuli, however, do not typically show switch costs. In Meuter and Allport’s data, valence was confounded with language switching since digits were named both L1 and L2. To tease apart valence from language switching, Finkbeiner et al. used a bivalent digit-naming task similar to Meuter and Allport. They then interspersed a picture naming task in which pictures were only named in the participants’ L1. These stimuli were therefore univalent. Their results replicated the asymmetrical switch costs for bivalent digit naming, but there were no switch costs for univalent picture naming. When the picture naming task was replaced by a dot pattern in which participants named the number of dots only in the L1, they again found no evidence of switch costs. These results led them to conclude that Meuter and Allport’s language switching results support neither the suppression of the unintended language as a whole nor of the specific lexical item in the unintended language. In a further experiment, they manipulated the speed of response and found that the speed of response availability, and not language identity per se, appears to be crucial in determining the pattern of switch cost.
The role of valence and speed of response in accounting for the pattern of switch costs suggests that switch costs may not provide a simple or direct means to reveal cross-language competition. Although the Finkbeiner et al. (2006) results are also potentially problematic for an inhibitory account, they suggest that the symmetry or asymmetry of switch costs in and of itself may not reveal the means of lexical selection.
A recent study by Gollan and Ferreira (2007) reached the same conclusion regarding the difficulty in relying on the symmetry of switch costs to adjudicate between alternative models of bilingual speech planning. They asked bilinguals to switch between their two languages in simple picture naming. However, unlike previous studies, the decision to switch was under the person’s control so that he or she could use whichever language was preferred to name a given picture. Under these conditions, Gollan and Ferreira found that even L1 dominant bilinguals who typically produce an asymmetric pattern of switch costs when switches are required, produced symmetric switch costs when switches were under their own control. Critically, even under these voluntary switching conditions with symmetric switch costs, there was an overall inhibitory effect for the more dominant language, suggesting that language mixing requires that the dominant language be inhibited. The symmetric pattern of switch costs observed in this study also shows that valence per se is not the critical factor in determining the pattern of switch costs. In the context of voluntary switching, items are inherently bivalent and yet allowing bilinguals to control the pattern of switching eliminated asymmetric switch costs.
A final study took a different approach but also reached the conclusion that the symmetry of switch costs may not be the most reliable index of the presence of inhibition of the non-target language. Wodniecka, Bobb, Kroll, and Green (in preparation) used a competitor priming paradigm to determine whether the presence of asymmetric switch costs was uniquely correlated with the presence of inhibitory processing. The logic of this study was to develop a new means to induce competition that might require inhibition and to then ascertain whether only those bilinguals who were likely to produce asymmetric switch costs would also reveal evidence for inhibition. Costa and Santesteban (2004) demonstrated that L2 learners, like the L1 dominant bilinguals in the original Meuter and Allport (1999) study, showed an asymmetric pattern of switch costs, with larger costs when switching into the L1 than into the L2. Highly proficient and balanced bilinguals in that study produced a symmetric pattern of switch costs, but they also produced longer naming latencies for the L1 than the L2, a result that suggested the presence of L1 inhibition. A later study (Costa, Santesteban, & Ivanova, 2006) replicated the symmetrical pattern for proficient bilinguals but also showed that proficiency alone was sufficient to produce symmetry; it was not necessary that they also be early bilinguals.
Wodniecka et al. (in preparation) first demonstrated that the asymmetric vs. symmetric pattern of switch costs in picture naming could be replicated in two groups of late bilinguals who differed in whether they were dominant in the L1 or relatively balanced across the two languages. They then showed that both groups, regardless of the symmetry of the switch costs produced, were subject to inhibition in the context of a competitor priming paradigm. Briefly, the design of the competitor priming task involved a study phase and a test phase. At study, participants named pictures in separate blocks in Spanish and in English. At test, they again named pictures in Spanish and English but some of the pictures at test were new pictures, not seen at study, and others were old pictures, previously named at study. Of the old pictures, some were named in the same language at study and test (i.e., were congruent) and others were named in different languages at study and test (i.e., were incongruent). If in the process of planning a spoken word there is competition for selection among alternatives within and across languages, then previously naming a picture in the nontarget language should increase cross-language competition. If there is not active competition for selection, then naming the same picture that had been named previously but in a different language should produce facilitation attributable to repetition priming of the picture itself and its concept. That is, a language-specific selection mechanism would not be affected by the mismatch of language other than to reduce the facilitation predicted in the congruent condition when picture, concept, and word are repeated identically. The critical result in the Wodniecka et al. study was that there significant facilitation only in the congruent condition. Because the incongruent condition included the repetition of the picture and concept, factors that have been shown in previous research to produce facilitatory priming (e.g., Francis, Augustini, & Saenz, 2003; Hernandez & Reyes, 2002; Sholl, Sankaranarayanan, & Kroll, 1995; Van Turennout, Bielamowicz, & Martin, 2003), these results suggest that an inhibitory component must also have been present to counteract that facilitation. Statistically, the pattern of competitor priming was identical for both balanced and L1 dominant bilinguals although these two groups differed in whether they produced symmetric or asymmetric language switch costs.
Taken together, the results of each of these studies, using different converging tasks, suggests that inhibition is required to overcome the activation of competitors from the nontarget language when bilinguals produce words in one of their two languages.
A third source of evidence on bilingual production comes from studies that exploit language-specific properties to determine whether the non-target language is active and available for selection. A number of studies have taken this approach by examining the effects of cross-language cognates on production. As previously noted, cognates are translation pairs that possess shared lexical features across languages (e.g., phonology and/or orthography). Picture naming studies have shown that bilinguals name pictures whose names are cognates faster than pictures whose names are non-cognates (e.g., Costa et al., 2000; Hoshino & Kroll, 2008; Kroll et al., in preparation). These results suggest that the phonology of lexical candidates in the unintended language is active during the planning of speech so that cognate items receive activation from two sources – from both the L1 and the L2. Critically, only bilinguals show these effects; performance for monolinguals is similar for both types of pictures, suggesting that this is an effect of the activation of the nontarget language. Costa et al. found that the magnitude of the cognate effect in picture naming was also larger in the bilingual’s L2 than in their L1, suggesting that the more dominant language is more likely to influence the less dominant language than the reverse. These results support not only parallel activation of a bilingual’s languages but also suggest that parallel activation to the level of phonology interacts and competes in the selection of the name of a picture.
Hoshino and Kroll (2008) replicated the cognate facilitation for picture naming reported by Costa et al. (2000) for Spanish-English bilinguals and then extended the results to Japanese-English bilinguals whose languages do not share the same script. Their results, in addition to highlighting the extent to which both languages of a bilingual are active (i.e., to the level of phonology), also demonstrate that bilinguals cannot easily exploit all possible cues that might be available to achieve early language selection. Although the script itself is not present in a simple picture naming experiment, other research suggests that resonance among orthographic and phonological codes as a consequence of fluent literacy can be observed during spoken word recognition and production (e.g., Damian & Bowers, 2003; Ziegler, Muneaux, & Grainger, 2003). In the Hoshino and Kroll experiment, the life experience of proficient Japanese-English bilinguals in reading different-script languages did not function as a cue to mitigate cross-language activation during picture naming. If inhibitory processes are not necessary to modulate cross-language activity, then bilinguals must be able to exploit available cues to direct attention to the intended language and/or to raise activation of lexical items in the intended language above the activation threshold of lexical alternatives in the unintended language. These results contribute to a growing body of evidence showing that it is difficult to identify how a language cue would function if it is used at all (e.g., Emmorey, Borinstein, & Thompson, 2005; Schwartz & Kroll, 2006.) Guo and Peng (2006) also recently reported an ERP study showing that Chinese-English bilinguals produce the same translation facilitation effect in a variant of picture-word interference as same-script bilinguals. By all accounts, bilinguals do not appear to use what might seem to be obvious choices for cues to language selection such as script, sentence context, or language modality.2
Kroll et al. (in preparation) used a delayed picture naming task in which relatively proficient Dutch-English bilinguals were cued as to the language in which to name the picture when they heard an auditorily presented tone. In mixed conditions, one of two tones signaled the production of one of the bilingual’s two languages. In blocked conditions, one tone signaled naming in the target language and the other tone required a “no” response. In both mixed and blocked conditions, the tone was presented at a variable delay following the onset of the picture. By comparing the time to name pictures whose names were cognates or not under these conditions of mixed and blocked naming, it was possible to track the time course of parallel activation of the phonology associated with both languages.
The results of the Kroll et al. (in preparation) study revealed clear evidence for an asymmetry that resembles the asymmetry observed in language switching (e.g., Meuter & Allport, 1999). Picture naming in L1 (Dutch) was slower under mixed conditions when L2 (English) was required to be active than when it was optional under blocked conditions. In contrast, picture naming in L2 was relatively unaffected by the requirement to have L1 active, suggesting that even under the blocked naming conditions in L2, L1 was also active. Perhaps the most striking result, however, was that picture naming was slower in L1 than in L2 under the mixed language conditions. This finding, like the results of the previously reviewed language switching experiments, suggests that the mixed language conditions potentially impose a requirement to inhibit the L1.
To test the hypothesis that the longer latencies for L1 relative to L2 in the mixed conditions reflect active inhibition of the L1, Kroll et al. (in preparation) examined the effects of cognate status on L1 and L2 production. In previous studies of simple picture naming (e.g., Costa et al., 2000; Hoshino & Kroll, 2008), reliable facilitation has been observed when bilinguals name cognate pictures in their L2. That facilitation has been attributed to the parallel activation of the L1 phonology of the picture’s name. The past studies have also reported facilitation for naming L1 cognates, but the effects in these studies have been smaller and less reliable for L1 than for the L2. In the Kroll et al. study, the blocked conditions replicated the past findings. There were significant effects of cognate facilitation for naming in L2 but no effects of cognate status on L1. In contrast, in the mixed conditions, cognate facilitation was present for L2 only at a 0 ms SOA; by 500 ms there was no facilitation for naming a picture in L2 whose name shared phonology with its translation equivalent in L1. In contrast to the results of past studies and the results of the blocked conditions, there was robust cognate facilitation for L1 under mixed conditions that extended throughout the entire time course, from 0 to 1000 ms. Taken together with the slower RTs for L1 in the mixed condition, the absence of a cognate effect for L2 under just those conditions when the cost to L1 processing was greatest, suggests that L1 was actively inhibited while L2 was prepared. Although L1 is the more dominant language, under these conditions it no longer influenced the processing of L2.
Taken together, the results of these studies suggest that inhibition may be required to overcome the activation of competitors from the nontarget language when bilinguals produce words in one of their two languages. We now turn to a set of very recent studies that have adopted electrophysiological and neuroimaging methods to examine the time course of cross-language activation and language selection. As we will see, the evidence on the neural basis of language selection also converges on the conclusion that inhibitory control is required to enable the bilingual to speak one language alone.
In the search for a locus of selection effects in bilingual word processing, we and others have turned to methods that can better elucidate the time-course and locus of cognitive processes involved in selecting and producing words in the both languages. The different accounts of language selection that we have already discussed would seem to make contrasting hypotheses about the time at which these effects should reveal themselves. Accounts which claim that language cues, operating at a conceptual level, can guide the selection process might suggest that effects of language selection should be seen early in the time course of processing, at least in a language production task. In contrast, accounts that propose an inhibitory process in response to competition from alternatives in both languages would be more consistent with a later locus of selection following activation and competition among within and between-language alternatives. While behavioral methods have been used to examine time-course issues by varying stimulus onset asynchronies and manipulating task constraints, behavioral approaches have the distinct disadvantage of relying on a discrete measure which may reflect the combined result of many stages and loci of processing. Response times on the order of hundreds of milliseconds may obscure the fine-grained series of events which underlie fluent language processing.
Unlike response time measures, event-related potentials (ERPs) can allow for evaluation of neurocognitive processes with millisecond resolution. This sensitivity to time-course, coupled with the fact that the ERP method is non-invasive, relatively inexpensive, and well-suited for use with a variety of populations, makes it the cognitive neuroscience method of choice when questions of time-course of processing are at issue3. The ERP technique provides an invaluable opportunity to “observe” on-line processing of stimuli without requiring overt responses or additional decision processes needed for most behavioral measures. However, ERP recording can also be performed concurrently with many behavioral measures to allow for a direct comparison. This feature of the method is particularly useful, since recent evidence has suggested that ERPs may reveal aspects of L2 acquisition that are obscured in behavioral measures (e.g., McLaughlin, Osterhout, & Kim, 2004; Tokowicz & MacWhinney, 2005).
In the section below we focus on the literature pertaining to tasks involving language switching or mixing, primarily in contexts that should require processing to the phonological level, either because language production is required or because a task involves a decision based on the name of the stimulus. Thus, we will not review the growing ERP literature examining reading in the L2 (e.g., Alvarez, Holcomb, & Grainger, 2003; De Bruijn, Dijkstra, Chwilla, & Schriefers, 2001; Kotz, 2001; Kotz & Elston-Güttler, 2004; Kotz & Hernandez, 2004; Weber-Fox & Neville, 1996) or reading of code-switched sentences (Moreno, Federmeier, & Kutas, 2002; Proverbio, Cok, & Zani, 2004). In addition, we have focused on studies in which the stimuli themselves should not cue the language choice (i.e., numerals or pictures, rather than words), but where task demands have been made explicit by the experimenter through external language cues. These studies most closely mirror the logic described in the behavioral approaches described above.
Previous ERP studies have suggested that both languages are activated even when bilinguals intended to speak only one of their languages, and that the time course and magnitude of nontarget language activation might be modulated by the relative proficiency of their two languages (Guo & Peng, 2006). However, results concerning how far into processing both languages are active have been somewhat inconsistent. Similar to behavioral studies, ERP studies using cognates (Rodriguez-Fornells et al., 2005; Christoffels, Firk, & Schiller, 2007) have found evidence that phonological information of the nontarget language was activated in tasks which involved overt or tacit picture naming. In contrast, when using non-cognates in a picture-word interference paradigm, Guo and Peng (2005) did not obtain significant activation of the L1 phonology when Chinese-English bilinguals spoke words in L2 although Guo and Peng (2006) reported significant activation of the L1 translation in L2 production. It’s not clear how the different scripts for Chinese and English may account for the observed differences with respect to the activation of cross-language phonology. Despite the somewhat conflicting results, these ERP studies show consistent evidence that both languages are activated during speech planning and that activation of the non-target language may even spread to the phonological level at least for certain tasks or language pairings. Thus, a further concern of current ERP studies is how bilinguals can select the correct words in the correct language and whether they have to inhibit activation of the nontarget language.
One approach in the literature has been to evaluate the role of executive function as a possible locus to inhibit the activation of the non-target language. These studies have primarily used the switching paradigm or “language mixing” to examine this issue. The main finding of these studies has been that modulations of the N2 component, observed to be maximal over the frontal and central scalp, may reflect the cognitive control system in bilingual speech production. Effects on the N2 component have been interpreted as evidence for inhibitory effects in these tasks, since this component has also been found to be sensitive to response inhibition processes required for the performance of go/no-go tasks (e.g., Schmitt, Rodriguez-Fornells, Kutasm & Munte, 2001). Jackson, Swainson, Cunnington, and Jackson (2001) investigated executive control during language switching by recording ERPs during a visually cued naming task in which bilinguals named digits in either L1 or L2. Switch-related modulation of ERP components was observed on the N2 component, around 310 ms after stimulus onset over the parietal and frontal cortices. As illustrated in Figure 2, switch trials were observed to increase this negative ERP component compared to non-switch trials. Importantly, this effect persists throughout the recording epoch, suggesting that it is not a transient effect, but rather reflects a process that remains active throughout the process of lexical selection. This effect over the frontal scalp was significant when switching from L1 to L2 but not when switching from L2 to L1, suggesting that switching into the non-dominant language from the dominant language required greater allocation of resources than making switches in the opposite direction. These results are consistent with claims that speaking in the L2 may require active inhibition of the L1. However, switches to the dominant L1 should not require such a demanding process.
Verhoef, Roelofs, and Chwilla (2006) also examined switch costs using the ERP technique. In their study, highly proficient, but unbalanced, Dutch-English bilinguals were asked to perform a cued picture naming task. They manipulated the time available for preparing the picture’s name, allowing either a short interval of 500 ms or a long interval of 1250 ms. The behavioral data revealed a larger switch cost for L1 than for L2 at short intervals. However, for long intervals, switch costs were symmetrical for both languages. They argued that these results challenged the proficiency hypothesis proposed by Costa and Santesteban (2004) who attributed symmetry differences in switching costs to language proficiency, since symmetrical switch costs could be observed in the very same unbalanced bilinguals when longer preparation intervals were provided. Furthermore, ERP data in this experiment also showed evidence for differential switching costs by preparation interval, reflected in modulations of the N2. However, these authors interpret the N2 to reflect attentional control mechanisms rather than response inhibition and suggest that the observed pattern reveals that more attentional resources were engaged for the long preparation trials, but this engagement was not fully maintained during long non-switch trials in the L1, thus contributing to the observed switch-cost patterns.
Christoffels, Firk, Schiller (2007) recently examined bilingual language control using a language switching task. ERPs and naming latencies were recorded while unbalanced German–Dutch bilinguals named pictures. The bilinguals attended university in their L2 context and commonly switched between their languages in daily life. Picture names were trained in advance, and participants named pictures in both blocked and mixed language conditions. Additionally, Christoffels et al. manipulated cognate status between translation equivalents to examine phonological activation of the non-target language. Both behavioral results and ERP results revealed a cognate facilitation effect in both languages and for both blocked and mixed language naming, suggesting that phonological information from the non-target language was activated. However, in contrast to previous studies, equal switch costs were observed for both languages behaviorally, which the authors attribute to participants’ experience of commonly switching between languages in their daily life. In addition, a small switching effect in the ERP data was obtained for L1 but not for L2 during two windows interpreted to be consistent with the N2 (275–375 ms and 375–475 ms). However, in contrast to previous ERP studies on language switching, non-switch trials elicited more negative ERP waveforms than switch trials. Finally, both their behavioral and ERP data showed that the mixed language context had a strong effect on L1 and L2, as compared to the blocked language context. Specifically, both languages showed a greater negativity for non-switch trials in the mixed naming context as compared to trials in the blocked naming context in the earlier epoch, while in the later epoch a reversal of this effect was found for L1, but not for N2. Thus, blocked naming trials showed an enhanced negativity in the L1 in the later epoch. Taken together, Christoffels et al. argued that their results suggest that language control takes place via global inhibition of languages which acts specifically to change the availability of the L1.
In recent experiments (Guo, Misra, Bobb, & Kroll, 2007; Misra, Guo, Bobb, & Kroll, 2007), we have evaluated the time-course of lexical activation and the interaction of a bilingual’s languages during speech production ERPs. In two experiments unbalanced Chinese-English bilinguals named pictures while ERPs were recorded. Picture names were untrained and repetitions of each picture were carefully minimized and controlled. Participants named pictures in Chinese or English, depending on the picture’s background color. In addition, pictures were named at both short (250 ms) and long (1000 ms) delays, with ERPs evaluated only at the long delays to minimize artifact. The short delay naming trials were included to ensure the early preparation of responses and to allow for evaluation of immediate behavioral responses (using logic similar to that described by Jackson et al., 2001). In one experiment, pictures to be named in Chinese and English alternated in a predictable fashion in a mixed naming paradigm. In another experiment, participants named pictures in one language in the first block and then named the same pictures in the other language in the second block. The effects of switching from one language to another were evaluated for each experiment, and mixed naming was compared to blocked naming between experiments. Results suggest that there is a processing cost associated with forcing both languages to be active, reflected in effects on the P200, N300 (consistent in latency with the “N2” described in other ERP language switching paradigms), and N400 (see Figure 2). However, in contrast to expectations based on the behavioral literature, in which costs to the first language are typically greater, processing costs were similar for both languages in most conditions. Also, similar to results from Jackson et al. (2001), effects of language mixing and language switching began early, but persisted throughout the recording epoch. Representative results from this paradigm are presented in Figure 3.
Although there is little doubt that executive control is involved in tasks where people have to change frequently from one language to the other, there remains a question as to whether bilinguals have to inhibit the non-target language when speaking in only one of their two languages. Rodriguez-Fornells et al. (2005) examined ERP and neuroimaging evidence for interference of phonological information from the bilinguals' non-target language and inhibition of this interference by the frontal cortex in a task where responses could be based on access to only one of a bilingual’s two languages. In their study, in order to avoid vocalization artifacts during EEG and fMRI data acquisition, a variant of the go/no-go picture-naming task was employed. German-Spanish bilinguals were required to respond when the name of the picture began with a consonant and to withhold a response for words starting with a vowel. The target language was changed on every block, but responses within a block did not require switching between languages. Stimuli were selected such that on half of the trials the names in both languages (Spanish and German) would lead to the same response (coincidence condition, e.g., vowel coincidence Esel - asno “donkey” or consonant coincidence Spritze - jeringuilla “syringe”), whereas on the other half responses were different for the two languages (noncoincidence condition, e.g., Erdbeere-fresa “strawberry”). Interference was evident behaviorally by slower response times (RTs) for incongruent than congruent trials for bilinguals as compared to monolingual controls. For the ERPs, an enhanced negativity with a frontal maximum was found between 300 to 600 ms (similar to the N2 described elsewhere) for incongruent as compared to congruent trials. These results provide evidence for cross-language interference at the phonological level in bilinguals. In addition, the results of fMRI data collected in the same paradigm showed two regions associated with the noncoincidence effect in bilinguals when compared to monolinguals: the left dorso-lateral prefrontal cortex (DLPFC) and the supplementary motor area (SMA). These neural areas have been associated with executive function in a variety of other tasks, suggesting that bilinguals recruit “typical ‘executive function’ brain areas” (Rodriguez-Fornells et al., 2005, p. 427).
In a recent review of the literature, Rodriguez-Fornells, De Diego Balaguer, and Münte (2006) further suggested that cognitive control executed by the left dorso-lateral prefrontal cortex is required in bilinguals, and the degree of activation of this mechanism might be related to the similarity of languages in use at the lexical, grammatical, and phonological levels. Abutalebi and Green (2007) also reviewed fMRI evidence on bilingual language production and claimed that there is a single network mediating the representation of a person’s L1 and L2 and that cortical and subcortical structures generally associated with executive function such as LPFC (left prefrontal cortex) and ACC (anterior cingulate cortex) are engaged by bilinguals to inhibit lexical competition between languages in order to successfully select the intended language. The implication is that as a bilingual’s proficiency in L2 is increased, a reduction in prefrontal activity should be observed due to changes in the internal structure that will mediate the way in which control mechanisms are used.
A recent fMRI study by Wang et al. (2007) reported that both the frontal gyrus and the ACC are involved in language switching, providing further evidence for the neural mechanism of the inhibition in bilingual word production. Another recent fMRI study (Abutalebi, Annoni, Zimine, Pegna, Seghier, Lee-Jahnke, et al., in press) investigated whether the neural network underlying language control in bilinguals differs from that involved in general executive functions that control switching between competing tasks within language. They found that language control processes engaged in contexts during which both languages must remain active recruited the left caudate and the ACC in a manner that could be distinguished from areas engaged in within-language task switching. Taken together, the evidence from both ERPs and other neuroimaging methods support a view in which brain areas associated with inhibitory processing function to aid bilinguals in selecting the appropriate language alternative.
The behavioral and neuroimaging studies we have reviewed suggest that the problem of language selection is indeed a hard problem, contrary to the suggestion that it may be possible to by pass processes that negotiate competition and potential inhibition across the bilingual’s two languages (Finkbeiner, Gollan, et al., 2006). The available evidence is not conclusive, as this is a relatively young area of investigation and like other research on bilingualism, there are a host of factors whose influence is not fully understood (e.g. proficiency in the L2, context of language use, age of acquisition, similarity of the two languages). Despite that, there is a strong suggestion in the material that we have reviewed that not only are the bilingual’s two languages activated in parallel, but that they compete for selection during spoken production. In this final section, we consider a set of remaining issues that relate to this conclusion.
As noted earlier in the discussion, a language-specific selection model has to assume that bilinguals are able to effectively represent the intention to speak one language alone. The evidence on cross-language activation suggestions that the intention to speak one language is not sufficient to restrict activation to that language. The model of bilingual language production shown in Figure 1 assumes that there is a representation of a language cue. The language cue potentially provides a means to represent the intention to use one of the bilingual’s two language and also to weight the influence of information that should bias production towards one of the languages. For a model to enable language-specific selection, it must be able to exploit available language cues. Those cues may take different forms, depending on the linguistic context in which the two languages are used and the specific goals of the task that is performed. They may also be related to aspects of the larger cultural and perceptual environment in which the two languages are used. An important consideration in testing the language-specific alternative is to determine whether bilinguals can indeed utilize information that might functionally enable a language cue to direct attention to alternatives in the target language alone. The available evidence on this issue is mixed at best, and again, very few studies have directly investigated the nature of language cues. In the studies we have reviewed, there is a suggestion that cross-language script differences sometimes affect the degree of observed cross-language activation during speech planning (e.g., Guo & Peng, 2005; Hoshino, 2006) and sometimes do not (e.g., Hoshino & Kroll, 2008).
Studies addressing other levels of language processing have also demonstrated the degree to which parallel activation of the two languages proceeds even in the face of clear evidence for the presence of one language rather than the other. For example, a number of experiments have asked whether the language of a sentence context constrains lexical access for language ambiguous words. Schwartz and Kroll (2006) examined the performance of Spanish-English bilinguals naming words that were cognates or controls when they were embedded within a sentence context that appeared in one language alone. The sentences were either highly constrained semantically so that the upcoming target word to be named was highly predictable from the context or not. The cognates used in context had already been shown to reveal the effects of cross-language activation when named out of context. Schwartz and Kroll found that the cognate facilitation observed out of context was eliminated in context when sentences were highly semantically constrained. Most critically, under conditions of low semantic constraint, the magnitude of cognate facilitation was identical to what was observed out of context. That is, the language of the sentence context itself, in the absence of high semantic constraints, was not able to be exploited to direct access to the target language alternative (see Van Hell, 1998, for similar results). Although it is difficult to know how far we might generalize these findings, they suggest that obvious cues to the target language are not necessarily functional cues to language selection. Duyck, Van Assche, Drieghe, and Hartsuiker (2007) recently reported a similar result in an eye tracking study, suggesting that the failure to overcome the parallel activation of the two languages in context can be observed even under experimental conditions that are more ecologically valid.
Another alternative to the selection-by-competition model is the weak link hypothesis (e.g., Gollan & Acenas, 2004; Gollan, Montoya, Cera, & Sandoval, in press; Gollan, Montoya, Fennema-Notestine, & Morris, 2005). The claim here is that because bilinguals are likely to use each of their two languages less often than a monolingual uses one language, the words in each language will be functionally lower frequency and therefore less available than they are for monolingual speakers. The focus of this work has been on the performance of bilingual vs. monolingual speakers, showing that bilinguals experience more tip-of-the-tongue (TOT) states than monolingual speakers and are generally slower to name pictures, even in the language that is nominally their L1 than their monolingual counterparts. On this view, slower spoken production and increased TOTs do not indicate cross-language competition that must be resolved but rather weaker links between the semantics and phonology. It remains to be seen whether it will possible to adjudicate between this alternative and those that are the focus of the current paper, but there are at least two features of this work that make it seem unlikely that it will provide a comprehensive account of bilingual production. At an empirical level, the bilingual participants in most of the studies on which the weak link alternative is based have been heritage speakers of Spanish who, for the most part, had Spanish as their L1 but were educated almost entirely in English and for whom English has become the dominant language. In contexts in which it is likely that there is language attrition or the failure to fully acquire the two language equally and in similar contexts, it seems likely that there will be processing consequences that may be quite different than those encountered by bilinguals for whom the two languages are used more equally or in similar contexts (but see Ivanova & Costa, in press, for evidence that the same pattern may be observed for early bilinguals who actively use both languages). At a theoretical level, it is very difficult to see how the weak link hypothesis can account for the benefits to cognitive performance in the realm of executive function that have been reported in a series of recent papers (e.g., Bialystok et al., 2004; Costa, Hernandez, & Sebastián-Gallés, 2008). Although the relation between the resolution of cross-language competition and the acquisition of expertise in the realm of executive control is indirect at best, there is clear evidence that bilingualism confers benefits more generally to just those control skills that affect the ability to resolve conflicting information. The selection-by-competition model of language choice could easily have the consequence of conferring such expertise as bilinguals negotiate cross-language competition in multiple contexts. It is difficult to see how the weak links alternative would have any way of accounting for these positive cognitive consequences of bilingualism. To the contrary, it supports a deficit view of bilingualism that would be unlikely to confer these benefits.
A number of recent papers have proposed that inhibition of the L1 may be required to modulate the activity of the L2. Levy, McVeigh, Marful, and Anderson (2007) used the retrieval-induced forgetting paradigm that has been studied primarily in the domain of memory research (Anderson, Bjork, & Bjork, 1994), to investigate the degree to which the phonology of the L1 is inhibited when the L2 is spoken. They demonstrated that increasing practice in retrieving the Spanish name of a picture has the effect of suppressing retrieval of the phonology associated with the English name of the same picture. Although the participants were native English speakers who were not highly proficient in Spanish as the L2, and may therefore reflect the stage of L2 development that Costa and Santesteban (2004) identified as requiring inhibition, they demonstrate a phenomenon that is clearly inhibitory and that like other evidence reviewed here, affects the more dominant L1.
Related findings have been reported by Linck, Kroll and Sunderman (in preparation) in a study of L2 learners immersed in the L2 environment during study abroad. Linck et al. compared the performance of a group of immersed learners with group of classroom learners matched on length of L2 study and working memory span. Participants performed a translation recognition task in which distractors to be rejected included pairs that were related to the target translation by similarity to lexical form (e.g., man-mano, which means hand in Spanish) or by similarity to the lexical form of the translation (e.g., man-hambre, which means hunger in Spanish), or to the meaning of the translation (e.g., man-mujer, which means woman in Spanish). Classroom learners were slower to reject these false pairs that were lexically and semantically related than completely unrelated control pairs. In contrast, immersed learners appeared to be immune to the effects of lexical interference, even when the form similarity of the lexical pairs was easily perceived. In a verbal fluency task in which they were asked to generate as many members of a semantic category as they could think on 30 sec, the immersed learners generated a larger number of Spanish exemplars than the classroom learners. But most critically, the immersed learners generated fewer exemplars in English, their dominant language, even when the English task was blocked for retrieval. The overall pattern of results suggest that in the L2 immersion context, the L1 is actively inhibited. How the inhibition observed in a brief immersion experiences later maps on to enduring consequences for cognitive control is a rich topic for future research. For present purposes, the findings simply underline the need for inhibition.
To summarize, we have reviewed a range of empirical studies that address the question of how bilinguals select the language they intend to speak.4 Although the findings in any particular study may be constrained by aspects of the methodology, the overall picture, including evidence from both behavioral and ERP experiments, suggests that there is cross-language activation during the planning of speech that potentially extends quite far along into the time course of processing. The available evidence also suggests that although some of these effects may be larger when bilinguals are less proficient in the L2, providing greater opportunity for L1 to influence performance in L2, they are present and characterize the speech of even highly proficient bilinguals. Taken together with the available evidence on the degree to which language cues are difficult to exploit, the data demonstrate that the “hard problem” of language selection does not go away with increasing bilingual expertise. They suggest that bilinguals become skilled in negotiating the existing cross-language competition rather than in learning to avoid that competition from the start. That conclusion is compatible with the claims that bilingualism more generally confers enhanced cognitive control (e.g., Bialystok et al., 2004). Mapping the connections more directly between mechanisms of language control and cognitive skill will be a rich topic for future research.
The writing of this article was supported in part by NSF Grant BCS-0418071 and NIH Grant F33HD055003 to Judith F. Kroll, by NIH Grant R56-HD053146 to Judith F. Kroll, Maya Misra, Taomei Guo, and Chip Gerfen, and by the Open Project Grant at State Key Laboratory for Cognitive Neuroscience and Learning, Beijing Normal University to Judith F. Kroll, Maya Misra, and Taomei Guo, and the National Natural Science Foundation of China (30600179) to Taomei Guo. Correspondence concerning this article should be addressed to Judith F. Kroll, Department of Psychology, Pennsylvania State University, University Park, PA 16802 USA. E-mail: jfk7/at/psu.edu.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1For the purpose of this discussion we assume that concepts and conceptual features are largely shared across languages although the same concept may give rise to different patterns of lexical activation across the bilingual’s two languages (e.g., De Groot, 1992; Francis, 2005; Tokowicz & Kroll, 2007).
3While the spatial resolution of ERP signals is relatively poor, since different combinations of neural generators could propagate similar patterns, this need not be a limitation when research questions focus on time-course.
4Given space constraints, we have not reviewed the literature on bilingual speech errors although that area of research also provides support for the presence of cross-language interactions that sometime result in errors that reveal the activation of the nontarget language (see Poulisse, 1997, 1999, for reviews of that work).
Judith F. Kroll, Department of Psychology, Pennsylvania State University.
Susan C. Bobb, Department of Psychology, Pennsylvania State University.
Maya Misra, Department of Communication Sciences and Disorders, Pennsylvania State University.
Taomei Guo, State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University.