Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Cogn Psychol. Author manuscript; available in PMC 2010 August 1.
Published in final edited form as:
PMCID: PMC2746365

Vowel Categorization during Word Recognition in Bilingual Toddlers


Toddlers’ and preschoolers’ knowledge of the phonological forms of words was tested in Spanish-learning, Catalan-learning, and bilingual children. These populations are of particular interest because of differences in the Spanish and Catalan vowel systems: Catalan has two vowels in a phonetic region where Spanish has only one. The proximity of the Spanish vowel to the Catalan ones might pose special learning problems. Children were shown picture pairs; the target picture’s name was spoken correctly, or a vowel in the target word was altered. Altered vowels either contrasted with the usual vowel in Spanish and Catalan, or only in Catalan. Children’s looking to the target picture was used as a measure of word recognition. Monolinguals’ word recognition was hindered by within-language, but not non-native, vowel changes. Surprisingly, bilingual toddlers did not show sensitivity to changes in vowels contrastive only in Catalan. Among preschoolers, Catalan-dominant bilinguals but not Spanish-dominant bilinguals revealed mispronunciation sensitivity for the Catalan-only contrast. These studies reveal monolingual children’s robust knowledge of native-language vowel categories in words, and show that bilingual children whose two languages contain phonetically overlapping vowel categories may not treat those categories as separate in language comprehension.

Keywords: language development, bilingual, word recognition, phonology, vowel, word learning, infant, toddler, speech perception

Over the course of the first year of life, infants learn about the speech sound categories (consonants and vowels) that are constituents of their native language’s phonological system. Infants improve in differentiating similar sounds that are used in their language (Kuhl, Stevens, Hayashi, Deguchi, Kiritani & Iverson, 2006; Tsao, Liu & Kuhl, 2006; Narayan, Werker, & Beddor, under review) and worsen in differentiating sounds that are not used in their language (Best & McRoberts, 2003; Bosch & Sebastián-Gallés, 2003a, b; Kuhl, Williams, Lacerda, Stevens & Lindblom, 1992; Polka & Werker, 1994; Werker & Tees, 1984). In principle, this perceptual tuning should be of considerable help in language acquisition because it should lead infants to recognize words more accurately. For example, better categorization of vowels like /i/ and /eI/ should help English-learning children differentiate words like wheel and whale. Decreased attention to distinctions not made in the language should also help by preventing children from misinterpreting different instances of the same word as distinct words.

But the in-principle benefits of phonetic learning might, in fact, be of limited use in the word learning and word recognition process, if children’s encoding of words in memory is very vague, or if children cannot separate the linguistically relevant aspects of words, like their consonants and vowels, from formally irrelevant aspects, like whose voice produced the word, how quickly he was talking, whether he was mumbling. The purpose of speech sound categorization is to recognize words, but if the words of the child’s vocabulary are encoded in memory with few phonological details intact, or with an overabundance of irrelevant experiential detail, infants’ sound-categorization skills do not imply accurate identification and differentiation of words. To make an analogy to adult performance, the ability to perfectly categorize the sounds in a person’s name (“Hi – I’m Ellen”) does not, alas, guarantee recovery of the name the next day (“EileenEllenHelenEmily?”). Children’s speech discrimination performance may substantially overestimate their actual memory representations of words. If so, children’s linguistic representations might be qualitatively different from those of adults—indeed, to take an extreme view, they might be so different that phonetic categorization experiments on infants are largely irrelevant to young children’s word recognition.

The goal of much work on the early development of phonological perception is to characterize the format with which children store the sound forms of words, and to determine how children make use of this knowledge in interpreting language. Do children represent words in terms of the speech-sound categories they learned as infants, or in a more holistic way, with phonological categories (which are tied to linguistic interpretation by definition) emerging only in middle childhood? The present paper addresses this issue in monolinguals and bilinguals by comparing children with varying language backgrounds.

First, we ask whether monolingual toddlers from two different language environments differ in language-particular ways in their responses to changes in words’ pronunciations. Toddlers learning a language that treats two sounds as falling into separate categories should respond differently to words spoken with the normal pronunciation and words (mis)spoken using the other sound. However, toddlers learning a language that treats those two sounds as falling into the same category should treat the two realizations as equivalent. This pattern would be consistent with a model in which words are stored in terms of phonological categories specific to the language. Alternative patterns in which children of both language backgrounds respond the same way, by either ignoring the sound change or detecting it, would be consistent with models in which children’s knowledge of words is more language-generic, and either vague (if both groups ignore small changes in words, even when linguistically relevant) or overly precise (if both groups respond to small changes, even when linguistically irrelevant).

Second, we ask whether bilingual toddlers show the same pattern as monolinguals. Bilinguals face a complex learning problem, because no two languages use exactly the same set of phonetic categories. Learning two systems of categories is not simply a matter of having to learn more sounds. To see why, consider the fact that languages tend to position phonological categories within the space of possible speech sounds in a way that maximizes the distinctiveness of the categories; for example, languages with only three vowels tend to use the very distinct sounds /i/, /u/, /a/ (as in heat, hoot, hot), and not, for example, the neighboring sounds /e/, /ε /, /æ/ (as in hate, head, hat; Liljencrantz & Lindblom, 1972). It stands to reason that maximizing distinctiveness aids learning by making categories easier to identify; in a sense, individual languages are “designed” to be learnable. But because pairs of separate languages need not be optimized over the intersection of their sounds, some bilingual situations may pose challenges that learners overcome relatively late in development, or even not at all.

This prediction follows from the dominant view of infant phonetic category formation, in which infants’ learning of the speech sounds of their language is based on analysis of the distribution of speech sounds in phonetic space. Although individual tokens (instances) of a given speech sound vary in their exact realization, variability within a category may be constrained enough that the categories emerge as clusters of similar sounds (e.g., Vallabha, McClclland, Pons, Werker, & Amano, 2007). Infants and adults can learn categories on the basis of distributional information if the categories are sufficiently distinct (Ashby, Queller, & Berretty, 1999; Goudbeek, Swingley, & Smits, in press; Maye, Werker, & Gerken, 2002; Younger, 1985). Given the fact that infants learn at least some aspects of the categories of their native language before they learn what words mean (e.g., Kuhl et al., 1992; Polka & Werker, 1994), phonetic category learning cannot proceed from infants’ analysis of meaningful differences between similar words (e.g., /b/ and /p/ are different because bear and pear mean different things). Instead, this phonetic learning must be based on information in the speech signal itself. Because categories are easier to learn when they are more separate in perceptual space than when they are close, distinctiveness-driven spreading of sounds within a language should make learning easier. Conversely, learning the phonetic categories of two languages at once could be more difficult if the categories of the combined systems overlap.

We studied just such a case. In a series of experiments, we tested word recognition in three populations of children: those learning only Spanish, those learning only Catalan, and those learning both. These populations are particularly well suited to the present inquiry because of phonological properties of the two languages. Spanish has a 5 vowel system, whereas Catalan has 7 vowels (8 counting schwa). The set of Spanish front and mid vowels, or vowels produced with the tongue body relatively forward in the mouth, includes only /i/, /e/, and /a/; the analogous set of Catalan vowels includes /i/, /e/, /ε/, and /a/. The Catalan vowels /e/ and /ε/ are similar to the English vowels in bait and bet, except that the English bait is diphthongized, starting with a sound much like the Catalan /e/ and finishing with the vowel in English beat. The Spanish and Catalan vowels are monophthongal. The Spanish vowel /e/ is similar to the Catalan /e/ and falls between the Catalan /e/ and /ε/ phonetically (Figure 1; Recasens & Espinosa, 2006; Martínez Celdrán, 2007; Carrera-Sabaté & Fernández-Planas, 2005). What makes this set of phonetic circumstances interesting is the proximity of the three /e/-like vowels. Spanish monolinguals who go on to learn Catalan have been shown to be extremely poor in their ability to distinguish the two Catalan vowels--even if they began learning Catalan as young as 3 years of age (Pallier, Bosch, & Sebastián-Gallés, 1997; Pallier, Colomé, & Sebastián-Gallés, 2001; Sebastián-Gallés & Soto-Faraco, 1999), and studies of bilingual infants described below show reduced discrimination of these sounds in some tasks (Bosch & Sebastian-Gallés, 2003a).

Figure 1
Typical first and second formant values of Spanish and Catalan front vowels (plus /a/), for a male speaker. Axes are reversed, following standard practice in phonetics. Note the Spanish /e/ vowel, lying between two Catalan vowels.

The present studies examined children’s responses to words of their language in a sentence—picture matching task that used eye movements as the response measure (i.e., we employed the “language-guided looking” or “preferential looking” procedure). On some trials, words were pronounced normally; on others, a vowel in the word was replaced with another vowel, creating a nonword (a mispronunciation). As described in more detail below, children tend to look at named pictures less when hearing mispronunciations than when hearing correct pronunciations (Swingley & Aslin, 2000). Here we asked: what counts as a mispronunciation for these children? If children have learned the phonological system of their language, Spanish children should be indifferent to changes in pronunciation from /e/ to /ε/ (which Spanish monolingual adults hear as /e/); Catalan children should find words changed in this way to be harder to recognize. Bilingual children would be expected to pattern like the Catalan monolingual children when hearing Catalan.

Previous work on children’s lexical representations

Prior studies have illustrated some of the difficulties in estimating what children know about language structure while they are still learning to talk. Early experiments appeared to indicate that infants’ phonetic knowledge of words is vague or unspecified, and suggested that because toddlers’ lexicons are small, phonetic detail is not needed in order to distinguish among words (Shvachkin, 1973; Edwards, 1974; Eilers & Oller, 1976; Charles-Luce and Luce, 1990; Jusczyk, 1986; Walley, 1993). From this perspective, initial representations stored in memory have a global, underspecified format, and are gradually refined as needed along with vocabulary growth. Learning minimal pairs (words that only differ in one vocalic or consonantal segment) is said to play a crucial role in motivating the inclusion of greater detail in lexical representations (Charles-Luce & Luce 1990; but see Coady & Aslin, 2003; Swingley, 2003). This is one of the central claims of the Lexical Restructuring Model (Metsala & Walley, 1998; Storkel, 2002), which is supported primarily by demonstrations of better performance on both metalinguistic judgment tasks and word learning tasks for words that sound similar to several other words than for words that are relatively “isolated” in phonological space (e.g. Storkel, 2002). Better performance on words that sound like several other words (i.e., words in “dense phonological neighborhoods”) is consistent with dense neighborhoods themselves serving a causal role in phonological specification.

However, for at least some words learned at home (such as Mommy), and for novel words taught in the laboratory, infants of 8 months or even younger show better recognition of words presented with their canonical pronunciation than with a mispronunciation created by changing a single consonant (Bortfeld, Morgan, Golinkoff, & Rathburn, 2006; Jusczyk & Aslin, 1995; see also Johnson, 2005). By 11 months, infants prefer to listen to a list of familiar spoken words than to either a set of unfamiliar words, or the familiar words with one of the consonants changed (Swingley, 2005; Vihman, Nakai, DePaolis, & Hallé, 2004; though see Hallé & de Boysson-Bardies, 1996). Preferences for canonical pronunciations over mispronunciations can only emerge if infants know the words’ canonical forms, and would not be expected if infants only retained vague, fragmentary knowledge of words. However, infants in such studies are arguably engaged in a task quite different from language comprehension.

Several studies have addressed this issue by testing children’s knowledge of words using tasks in which the child’s response is tied to his or her grasping the meanings of those words. Swingley and Aslin (2000) used children’s visual fixations to named pictures as a response measure, following Golinkoff, Hirsh-Pasek, Cauley, and Gordon (1987) and Fernald, Pinto, Swingley, Weinberg and McRoberts (1998). On each of a series of trials, children were shown two pictures, one of which was named in a sentence. Children’s gaze at the pictures was monitored to determine how much children looked at the named picture. On some trials, Swingley and Aslin (2000) played children sentences in which the target word was slightly mispronounced (e.g., gall for ball, or opple for apple). Children ranging from 18 to 23 months fixated the target picture less upon hearing a mispronunciation than upon hearing a correct pronunciation. Inferior recognition performance for mispronounced words has now been demonstrated in several studies (Bailey & Plunkett, 2002; Ballem & Plunkett, 2005; Mani & Plunkett, 2007; Mani, Coleman, & Plunkett, 2008; Swingley, 2003, in press; Swingley & Aslin, 2002; White & Morgan, 2008; but see also van der Feest & Fikkert, 2005). These studies have repeatedly found no relation between children’s age (ranging from 14 to 24 months) and the degree to which mispronunciation impairs performance, and, similarly, no relation between children’s spoken or receptive vocabulary size and the size of the mispronunciation effect, though performance on both correct pronunciations and mispronunciations improves with age and with vocabulary size. The effect holds similarly for consonants and vowels, and for onset, medial, and coda consonants.

Studies of word learning have produced less consistent results. In one approach, children are told (with multiple repetitions) that a novel word corresponds to some novel object, and are then tested using object choice tasks (Shvachkin, 1973) or two-alternative visual fixation tasks (Ballem & Plunkett, 2005; Swingley, 2007; Swingley & Aslin, 2007). Under some conditions, newly-learned words are differentiated from similar-sounding words by children as young as 14 months. Such effects tend to be fragile (Ballem & Plunkett, 2005), or require numerous repetitions of the word during training (e.g., 22, in Swingley, 2007). Also, young children do not always assume that a distinct phonological neighbor of a familiar word is, in fact, a different word (Nazzi, 2005; Swingley & Aslin, 2007; White & Morgan, 2008).

A second approach, pioneered by Stager and Werker (1997), uses a habituation procedure. Children are habituated to one or (more often) two labeling events in which an object is repeatedly named with a novel-word label. Upon habituation, children are presented with either a familiar scene, or an altered “Switch” scene in which the speech no longer matches the object (e.g., an object previously named with bin being named with din). Children are considered to have succeeded if they look at the display longer on “Switch” trials than on familiar-scene trials. Stager and Werker (1997) found that 14-month-olds could learn two phonetically distinct words (nim and leef), but not two phonetically similar words (bih and dih). This result has been replicated using other consonant contrasts (Pater, Stager & Werker, 2004).

Werker and colleagues argue that 14-month-olds fail to encode the novel words accurately because of cognitive resource limitations (Werker, Fennell, Corcoran, & Stager, 2002; Werker & Curtin, 2005). The fact that 14-month-olds’ difficulty with the task is not purely phonological is demonstrated by experiments in which task manipulations lead 14-month-olds to succeed: familiarizing children with the objects in advance (Fennell, 2004; Fennell & Werker, 2004); clarifying the habituation phase’s meaning by putting novel words in an English carrier phrase (Fennell, 2006); adding a “warm-up trial” in which a familiar object is labeled (Fennell, Waxman & Weisleder, 2007); using already-familiar words (Fennell & Werker, 2003), or testing using the two-picture fixation task (Yoshida, Fennell, Swingley & Werker, in press).

Taking all of these results together, it seems clear that for many familiar words that children have learned from their daily experience with language, those words are represented with sufficient detail for children to be hindered in recognizing subtle phonological variants of those words. For novel words taught in the laboratory, the picture is rather mixed, particularly at 14 months. In addition, the large majority of these findings have concerned consonants, rather than vowels (but see Mani & Plunkett, 2007, 2008; Mani, Coleman & Plunkett, 2008); a more in-depth examination of vowels in word recognition is warranted (e.g. Nazzi, 2005; Fennell, Curtin, & Werker, 2004; Dietrich, Swingley, & Werker 2007).

One major gap in our present knowledge concerns the role of phonological categorization in children’s lexical representations—a question that is partly independent of the issue of whether children’s knowledge lacks phonetic detail. As described earlier, current theories disagree about whether children should be viewed as having lexical representations that are composed of discrete, language-specific phonological categories, or rather lexical representations that are holistic (e.g., Beckman & Edwards, 2000). On the latter sort of theory, the mispronunciation effects shown to date might be explained simply by arguing that the mispronounced words tested fell outside their normal range of phonetic variation, irrespective of any learned, language-specific categories, leading to slower or less robust recognition. Such an account would minimize the role of language-specific phonological learning of the sort documented by Kuhl et al. (1992) and Werker and Tees (1984) in favor of a more generic sensitivity to phonetic distances. This issue is addressed in the present experiments by comparing Catalan and Spanish learners’ responses to sounds that are the same in Spanish but different in Catalan. If only Catalan learners are hindered by exchanges of one Catalan category for the other, then either the phonological categories are relevant descriptors of the child’s lexical representations, or it must be assumed that the Spanish learners’ experience with the tested words includes sufficient instances that happen to include the Catalan vowel to render children insensitive to the difference. We return to this issue in the General Discussion.

Bilingual environments

The consensus view of infant phonetic category formation as a statistical clustering process does not itself make clear predictions about when and how phonetic categories are sorted by language among children in bilingual environments, nor about how young children’s language-specific phonological knowledge is exercised in word recognition or word learning. For infants growing up in Spanish-Catalan bilingual environments a developmental U-shaped pattern has been described, both for sounds present in Catalan but not Spanish (Bosch & Sebastian-Gallés, 2003a) and also for sounds present in both languages (Sebastián-Gallés & Bosch, in press). Bilingual children at an early age (4 months) and several months later (12 months of age) discriminate the Catalan-only vowel contrast of /e/ and /ε/ and the common contrast /o/ -/u/, but fail to discriminate these sounds at 8 months of age. Infants in monolingual Catalan environments discriminate these sounds at 4 and 8 months. A similar pattern was found in bilinguals tested on the Catalan-only distinction between /s/ and /z/: discrimination at 4 and 16 months, but not 12 months, in bilinguals (Bosch & Sebastián-Gallés, 2003b). In both the /e,ε/ and /s,z/ cases, the Catalan-only contrasts involve sounds that are infrequent in Catalan relative to a nearby sound that is frequent in Spanish. Considering the fricatives, /s/ instances amount to about 9% of Spanish speech sound tokens (easily the most frequent consonant), whereas in Catalan, /s/ accounts for 6.4% of tokens (also the most frequent consonant), and /z/ only 3.1% (the 8th most common consonant). Considering the vowels, /e/ instances account for about 12% of Spanish speech sounds (the 2nd most common sound), whereas Catalan /e/ is only 3.0% of Catalan speech sounds (the 5th most common vowel, of 8), and /ε/ is only 1.1% (the least frequent vowel; Alcina & Blecua, 1975; Esquerra, Febrer, & Nadeu, 1998). The low frequency of the Catalan sounds relative to their similar Spanish counterparts likely contributes to the bilingual infants’ difficulty in determining whether the Catalan categories are distinct (although the bilingual children’s difficulty with the /o/ - /u/ contrast, which is present in both of their languages, suggests that other factors play important roles as well; Sebastián-Gallés & Bosch, in press).

The Catalan-Spanish bilingual developmental pattern is not universal. For example, 10—12 month olds learning both English and French learn the appropriate phonetic boundary between the English [p] and [ph] stop categories (the onsets of bit and pit), and also the boundary between the French [b] and [p] (the onsets of bain and pain); children learning only English differentiate only the English contrast (Burns, Yoshida, Hill, & Werker, 2007). Furthermore, French-English bilinguals from 6 to 12 months can discriminate the English and French /d/ sounds, which vary slightly in their place of articulation (Sundara, Polka, & Molnar, 2008). The success of English-French bilinguals in keeping these stop categories distinct, in contrast to the Catalan-Spanish bilinguals, might be due to some intrinsic difference in the discriminability of the sounds, or might be affected by how common the relevant sounds are in each language. As indicated above, the /ε/ and /e/ vowels are infrequent in Catalan compared to the very common Spanish [e], whereas the [b] and [p] are about as frequent in French as they are in English. (In French they make up about 3.6% and 6.0% of word onsets respectively, being the 9th and 5th most common onset consonants; in English the [b] and [ph] make up about 5.2% and 2.6% of word onsets, being the 5th and 14th most common onset consonants; the proportions are greater in both languages if one excludes function words).1

The question of how phonetic perception is related to linguistic interpretation has been addressed in relatively few studies even among monolinguals. It is clear that children do not always assume that perceptible phonetic variation signals phonological contrast. Sometimes children correctly disregard phonetic variation that is not used contrastively in the native language (Dietrich et al., 2007), yet in other cases children seem unable to use native-language phonetic distinctions to signal lexical contrast (Stager & Werker, 1997; Swingley & Aslin, 2007; Thiessen, 2007; White & Morgan, 2008). When we turn to the bilingual case, it is an open question whether we should expect children to interpret phonetic variation as required by each language, even for variations children have been shown to discriminate. Children might be less likely to draw language-appropriate inferences about pronunciation variation, reflecting the more complex task they face. Or children might perform better, if the bilingual environment spurs greater attention to sound structure.

The study that has addressed this issue most directly used the Switch procedure to examine children’s learning of the phonologically similar words “bih” and “dih” (Fennell, Byers-Heinlein, & Werker, 2007). Monolinguals typically fail at 14 months but succeed at 17 months and above. Fennell et al. found that neither Chinese-English nor French-English bilinguals succeeded in remembering the link between “bih” and “dih” and their referents at 17 months, and many did not succeed even at 20 months. The generality of this surprising result is not yet known, but it does provide a demonstration of language processing differences associated with bilingualism.2

The present series of experiments was conducted to address the issue of phonological specificity and phonological interpretation in bilinguals’ early words. Vowels were targeted in this research because prior work in our laboratory has demonstrated that by 12 months, Catalan monolinguals and Catalan-Spanish bilinguals can discriminate the /e/ and /ε/ vowels of Catalan, while Spanish monolinguals cannot; thus, differences between Catalan monolinguals and bilinguals in detecting vowel mispronunciations are unlikely to be due to the inability to discriminate these sounds (Bosch & Sebastián-Gallés, 2003a; Sebastián-Gallés & Bosch, 2005). In addition, vowels have been explored less than consonants in studies of toddlers’ lexical knowledge, and at least some work on toddlers’ interpretation of vowels in word learning suggests that vowels are weighted less heavily than consonants (Nazzi, 2005; Nespor, Mehler, & Peña, 2003).

Experiments 1a and 1b: Monolingual toddlers

A first series of experiments assessed monolingual toddlers’ representation of vowels in their lexicon. Following Swingley & Aslin (2000), on each of a series of trials pictures of familiar objects were displayed side by side and one of the pictures was named by playing a pre-recorded sentence. The pronunciation of the target words was either correct or deviant (mispronounced). Mispronunciations presented to Spanish monolinguals involved changing Spanish /e/ into the non-Spanish vowel /ε/, whereas mispronunciations presented to Catalan monolinguals involved changing Catalan /e/ into Catalan /ε/ and vice versa.

Children’s eye movements were recorded and coded off-line. Past work has shown that children tend to fixate the picture that is being named. If this response is significantly attenuated when children hear mispronunciations, we infer that the mispronunciations match children’s stored lexical representations less well than the correct pronunciations.

If Catalan toddlers have well-specified vowel representations in their lexicon, mispronunciations of /e/ and /ε/ in familiar words would be expected to hinder their recognition of those words, compared to trials on which object labels are accurately pronounced. Spanish children, however, would be expected to treat mispronounced words like homophones of correct pronunciations, according to the phonology of their own language.



Participants were 48 children from monolingual environments (Experiment 1a: Catalan; 1b, Spanish, with 24 in each group). All children were healthy, full-term and without hearing problems. The Spanish children ranged in age from 17;22 to 24;26 (months, days; range from 540 to 756 days) with a mean of 642 days and median of 647 days. Thirteen were girls. Four additional children were not included in the analysis due to fussiness (n=3), and poor health on the date of testing (n=1). The Catalan group ranged in age from 18;04 to 27;23 (range from 552 to 833 days) with a mean of 661 days and median of 651. Ten were girls. Five additional children were tested but not included in the sample due to fussiness. The monolingual status of the families was carefully assessed via a questionnaire providing information about each of the child’s caregivers over the child’s history, including estimates of their time spend with the child, the language used to talk to the child, and the language used to converse with other adults. Thus, the questionnaire offered an estimate of children’ daily exposure to the languages under study. To be included in the monolingual samples participants had to have a daily exposure to the family language within the 85%–100% range and parents had to use only one language (either Catalan or Spanish) between them and when interacting with the child.


The Catalan target words used were: [[turned e]’βελ [turned e]] (“bee”), [g[turned e]’lεt[turned e]] (“cookie”), [λet] (“milk”) and [pe∫] (“fish”) and their equivalents in Spanish [a’βexa], [ga’λeta], [let∫e] and [peθ]. Cognate words were selected in order to make comparable versions of the same test in both languages. The selection of the items was constrained by several factors. Words in Catalan had to have the target vowels, belong to children’s vocabularies, correspond to picturable objects, and have a cognate counterpart in Spanish. The selected words fulfilled these requirements but they differed in familiarity (for instance, words for “milk” and “cookie” are more familiar to many young children than the word for “bee”) and in syllabic length (two of them are monosyllabic and the other two are disyllabic, with the target vowels appearing in the second syllable). Six filler words were also included. They were the Catalan and Spanish words for “car”, “ball”, “cat”, “plane”, “hen” and “cow” ([kot∫[turned e]], [pi’lƆt[turned e]], [gat], [[turned e]’βio], [g[turned e]’λin[turned e]] and [bak[turned e]] in Catalan and [kot∫e], [pe’lota], [gato], [a’βion], [ga’λina] and [baka] in Spanish). The fillers were always presented correctly pronounced.

The target and distracter words did not always share the same grammatical gender, because of the many constraints on item selection. Galleta/galeta and the distracters gallina/gallina are all feminine, and so matched. However, peix/pez (Catalan/Spanish) are masculine, and llet/leche are feminine. In principle, on trials testing these words children had information about which item was the target or distracter before the noun began, because the preceding determiner (el/el, la/la) varies with gender. Similarly, abella/abeja are masculine whereas their distracter pictures (avió/avión) are feminine; thus, children hearing Spanish ¿Dónde está la abeja? could have had a slight temporal advantage in responding to these words, based on the incompatibility of the determiner la and the distracter picture, for which the article would have been el. For this item the Catalan learners would not have the same advantage because in Catalan both masculine and feminine articles are reduced to /l/ before vowels. It is not known whether Spanish-learning two-year-olds can use the difference between el (/el/) and la (/la/) to help identify words, though Spanish-learning three-year-olds can (Lew-Williams & Fernald, 2007). Of course, in no case did these facts about gender differ for correct pronunciations and mispronunciations, so this issue is orthogonal to the concerns of the present study.

The speech stimuli were recorded by a female bilingual Catalan-Spanish speaker using an infant-directed speech register. Following Swingley and Aslin (2000), target and filler words were inserted in carrier sentences, beginning with “where is the …” (¿Dónde está el/la?/On és el/la…?, in Spanish and Catalan respectively) for the targets, or ”Look, a…” (Mira, un/una…) for the fillers. These test sentences were followed by a 500 ms pause and then a second sentence that did not name either picture (“can you see it?” –lo/la ves? / e/la veus?- for the targets, or “do you like it?” -¿Te gusta? / ¿T’agrada..? - for the fillers). Second sentences were included to help maintain children’s interest in the task.

The vowel change in the Catalan material went in two directions, from /e/ to /ε/ in the two monosyllabic words ([λet] to [λεt]; [pe∫] to [pε∫]) and from /ε/ to /e/ in the two disyllabic words ([[turned e]’βελ[turned e]] to [[turned e]’βeλ[turned e]]; [g[turned e]’lεt[turned e]] to [g[turned e]’let [turned e]]). For the Spanish stimuli the vowel change was always from /e/ to /ε/ because only the former exists in Spanish. Spectral analyses of the formant frequencies measured at a central, steady-state portion of the vowels in the target words (CP and MP) were undertaken (see figure 2). F1 and F2 values corresponded to the regions expected for high and low mid-front Catalan vowels. Values for Spanish targets also conformed to an extended distribution within which two groups of phonetically different exemplars might be identified (although these differences are not phonologically contrastive in Spanish). The mispronunciations did not form actual words, though in Catalan they could in principle be new words (they were phonotactically legal).

Figure 2
First and second formant values of Spanish (top panel) and Catalan (bottom panel) vowels in the test words of the experimental stimuli for Experiments 1, 2, and 4. Plotted letters a, g, l, and p refer to the target words abeja, galleta, leche, and pez ...

The visual stimuli were digitized photographs of objects on a light blue background, averaging about 14 cm in length. These pictures were presented side by side on two 15” color video monitors, separated about 25 cm from each other.

Apparatus and procedure

The experiments were run in a portable lab consisting of a white four-sided cloth-walled booth (2 × 1.5 × 2 m tall). Inside the booth there was a chair, where the parent sat with her child on her lap, facing the two monitors where the images were displayed. A central loudspeaker was placed between the two monitors, concealed from children’s gaze by a cloth. Children’s eye movements were videotaped using a digital video camera, and coded offline. Parents were instructed to close their eyes and not to speak to the baby during the experiment. The experiment began with a short familiarization phase where all the pictures were displayed one after the other in both monitors at the same time while being named in sentences using only canonical pronunciations of the words. Each picture was named twice. This familiarization phase, which lasted 54 seconds, was intended to ensure agreement about the identity of the pictured objects (Swingley, Pinto, & Fernald 1999). Immediately afterward the testing phase began. It consisted of 28 trials: 16 test trials and 12 fillers. Each trial began with the presentation of two different images simultaneously for 2s; then the audio file was played. The pictures remained for an additional 2s after the end of the sentences. After 2500 ms another trial began.

Four stimulus orders were created. The second order was a left/right reflection of the first, and the third and fourth orders reversed the first and second. Each picture was used as a target four times (twice on the left screen and twice on the right) and as a distracter four times (twice left and twice right). Pictures were presented in pairs: Catalan abella-avió (“bee-plane”); galeta-gallina (“cookie-hen”); peix-llet (“fish-milk”); pilota-cotxe (“ball-car”); vaca-gat (“cow-cat”). Each picture appeared an equal number of times on the left and on the right in each half of the experiment and each of the target words appeared twice on each half of the experiment, once correctly pronounced and once mispronounced. (A sample trial order is provided as an Appendix.) Note that when the critical vowel appeared in the second syllable of the target word, the two pictured objects’ names began with the same syllable. This design feature was intended to eliminate the possibility that children would choose the target picture based only upon the initial syllable, ignoring or weighting less heavily the vowel of interest.

One of four trial orders used in the Catalan version of Experiment 1.

After the test, parents were asked whether they thought their child knew the target words. Parents were also asked to fill in a questionnaire about their child’s expressive lexicon. This questionnaire was an adapted (shorter) version of the MacArthur-Bates CDI (Fenson, Dale, Reznick, Bates, Thal & Pethik, 1994), given in the child’s language. The questionnaires permitted assessment of the potential relationship between children’s ability to detect mispronounced words and their vocabulary size.


Children were video recorded during the procedure. On the video recording visual signals indicated when the two pictures were first displayed and when the sound stimulus (noun label) began. Independent coders analyzed videotapes offline, frame by frame (with frames’ duration 40ms). Coders made accurate measurements of looking times to the right and left pictures, over the 28 trials (filler and target trials), coding every change in the location of children’s fixations. Following previous studies (e.g. Swingley & Fernald, 2002) a window of analysis was established, extending from 360 ms to 2000 ms after the onset of target word, in which children’s eye movements were analyzed to obtain the accuracy measure (proportion of target fixation) and the latency measure (amount of time needed to initiate a shift from the distracter picture to the target)3. Eye movements outside the 360—2000 ms window may not be related to the target word recognition process under study and were not analyzed in detail (see Swingley & Aslin, 2000). Minor variations in the length of the window did not modify the pattern of results.

Coding reliability was established by comparing the measures obtained by independent coders on randomly selected blocks of test trials for a sample of the participants (15 in each language group). High inter-coder agreement was found (r = 0.99; p < 0.0001)

Results and Discussion

We consider results from each experiment in turn. Catalan monolingual children’s mean proportion of fixation to the target pictures in the correct-pronunciation (CP) condition was 71.3%, while in the mispronunciation (MP) condition was 62.9%. These means surpassed the chance level of 50% in both conditions (CP: t (23) = 8.85, p<0.0001; MP: t (23) = 4.77, p<0.0001; all t-tests 2-tailed). Thus, 18- to 24-month-old Catalan children recognized the test words when correctly pronounced and when mispronounced. In addition, children fixated the target more upon hearing correct pronunciations than mispronunciations (t(23) = 3.42, p < .005; see Figure 3). Eighteen out of the 24 participants showed greater fixation times to the well-pronounced words, a proportion consistent with previous research. Over all 24 children, CP performance exceeded MP performance for three of the four test pairs (abella, galeta, and llet) and did not differ for the fourth (peix).

Figure 3
The effects of mispronouncing /e/ as /ε/ or vice versa among Catalan and Spanish monolinguals (Experiments 1a and 1b) and Catalan-Spanish bilinguals (Experiment 2). The y axis displays, for each child, the difference between target fixation proportions ...

Two individual-difference measures were evaluated in each set of children: their age, and their vocabulary size as estimated by the CDI. The chief interest in examining these measures was to evaluate the possibility that only the more linguistically advanced children would perform better on correct pronunciations than mispronunciations. The Catalan learners’ ages ranged from 18 to 27 months (mean 21;22) and spoken vocabulary (CDI) scores ranged from 4 to 624 words (mean 253, median 194). Neither age nor vocabulary size correlated significantly with the difference between CP and MP trial performance (age, r = −0.316; vocabulary, r = 0.098, both ns). These results are consistent with prior research showing that age and vocabulary size are not predictors of mispronunciation effects in this task.

Children in the Spanish monolingual group (Experiment 1b) showed a different pattern, as expected given the phonology of Spanish. Spanish children showed no indication of discriminating the Catalan contrast /e/-/ε/. The average proportion of target fixation in the CP condition was 62.8%, and in the MP condition, 66.6%. Children recognized the words in both conditions, as fixation percentages were significantly above chance (CP: t (23) = 4.7, p < 0.0001; MP: t (23) = 6.0, p <0.0001). Performance in the CP and MP conditions did not differ (t (23) = −1.08, ns; see Figure 3). Nine of the 24 children in this group showed greater fixation times for the correctly pronounced words. Two items yielded somewhat higher performance for CP realizations and two the reverse, with no individual items’ difference scores significantly different from chance levels.

Spanish learners’ ages ranged from 17 to 25 months (mean 21;03) and vocabulary size ranged from 17 to 480 words (mean 205, median 188; 2 children were excluded due to lack of data). Neither of these measures correlated with the difference between CP and MP trial performance (age, r = 0.092; vocabulary, r = −0.147, both ns). Proportion of Spanish language exposure did not correlate significantly with the effect of mispronunciations either (exposure range 80% to 99%, mean 94.2; r = 0.337, p > 0.10).

Catalan and Spanish learners’ responses to the mispronunciation manipulation were significantly different, as revealed by an ANOVA involving data from both monolingual groups, with condition (CP and MP) and linguistic group (Catalan or Spanish) as variables [interaction F (1, 46) = 8.04, p < 0.01; no significant main effects (F <1.2)].

Because children were sometimes fixating the target, and sometimes the distracter, when the spoken word began, effects of mispronunciation could arise from children making different sorts of decisions on separate trials. Children fixating the target picture might reject this picture as an instance of the spoken word more often for mispronunciations than correct pronunciations. Children fixating the distracter might shift away from the distracter less often for mispronunciations than correct pronunciations, perhaps suggesting difficulty matching the spoken word to any stored mental representation, or uncertainty concerning whether the mispronounced word could be an unfamiliar word referring to the distracter picture. In past work (Swingley, in press; Swingley & Aslin, 2000, 2002) both effects were found. To explore the basis for children’s responses, we supplemented the overall proportion-to-target analyses for both language groups with separate analyses considering only those trials on which children happened to be fixating the distracter when the target word began (“D-onset trials”), or those trials when children were fixating the target (“T-onset trials”).

Catalan monolingual children’s target fixation proportion on D-onset trials averaged 61.9% in the CP condition and 47.3% in the MP condition, a significant difference (t(23)=3.99, p<.001). By contrast, Spanish monolinguals’ target fixation was equivalent in the two conditions (CP, 47.5%; MP, 50.3%, ns).4 An Anova comparing the two groups in the two conditions revealed a marginal effect of pronunciation condition (F(1,46)=3.80, p=0.057) and a significant interaction between pronunciation and language group (F(1,46)=8.30, p=.006). Neither group revealed mispronunciation effects on target-onset (“T-onset”) trials (Catalan CP, 77.8%; MP, 78.6%; Spanish CP, 79.2%; MP, 77.7%). We will return to the possible implications of this difference between D-onset and T-onset trials in the General Discussion.

By showing better recognition of correctly pronounced than mispronounced words, toddlers from monolingual Catalan families behaved similarly to participants in previous studies with monolingual Dutch and English toddlers (Swingley & Aslin, 2000; Swingley 2003). The Catalan /e/-/ε/ contrast is well represented in their lexicon, permitting discrimination of accurately and inaccurately produced words, even though the number of minimal pairs in their vocabulary is small and the density of phonological neighbors is thus still low.

As in previous studies with 18- to 24-month-old monolingual toddlers (Swingley & Aslin 2000; Bailey and Plunkett 2002), no correlation between age and the impact of mispronunciations was observed, nor was such a correlation observed for vocabulary size and mispronunciation effects. It seems likely that the encoding of phonological detail in the first lexicon does not depend on age, at least inasmuch as detail in encoding is revealed by sensitivity to phonological substitutions in familiar words. Phonetic categories learned in infancy thus appear ready to be applied in learning the words of the early vocabulary. In addition, the lack of an effect of vocabulary size provides suggestive evidence contrary to models that rely on vocabulary growth or minimal pair contrasts for the specification of phonological features.

As for the Spanish monolingual group, results followed predictions based on Spanish phonology. Toddlers seemed to accept both pronunciations as adequate forms of the target words. Age and vocabulary size did not seem to have any influence on their failure to distinguish the vowels.

The Catalans’ results provide the strongest evidence to date for toddlers’ accurate phonological representation of vowels in familiar words. Other studies have used a similar procedure to examine vowels, but have not tested for the statistical significance of vowel mispronunciations alone (Swingley & Aslin, 2000, 2002), or have focused primarily on vowel mispronunciations more phonetically distinct than those tested here (Mani & Plunkett, 2007, 2008; Mani, Coleman & Plunkett, 2008). In addition, the contrast between the Catalan and Spanish monolinguals reveals that children’s responses to altered pronunciations are not based on a language-generic phonetic similarity metric: vowel changes in the same region of phonetic space will or will not have consequences for word recognition depending on the language of the listener.

Experiment 2: Bilingual toddlers

Experiments 1a and 1b provided the foundation for our examination of phonological representation in bilingual toddlers: Spanish monolinguals did not appear to detect a vowel change contrastive in Catalan but not Spanish, while Catalan monolinguals did. In Experiment 2, Catalan/Spanish bilingual children were tested in the same word recognition task. The tested children had been continuously exposed to Catalan and Spanish on a daily basis from birth. They had heard the target Catalan contrast (/e/ - /ε/), at least from the speech of one of their parents. Two logical outcomes were, thus, contemplated: (a) “part-time” exposure would be sufficient to mimic the effects of monolingual Catalan experience; or (b) longer exposure would be required for bilinguals to encode these vowels in their lexicon, resulting in a failure to react to the mispronunciations as monolingual Catalans did. The latter would be consistent with other studies of bilinguals revealing differences in phonetic categorization around 8 months of age (Bosch & Sebastián-Gallés, 2003a) and later success in word learning tasks using minimal pairs (Werker & Fennell, 2004).



The simultaneous Catalan-Spanish bilingual sample comprised 24 children between 18;22 and 26;08 months of age (range from 562 to 788 days, mean 661, median 644 days). Fourteen were girls. The sample was comparable to the monolingual samples in terms of age and productive vocabulary size5. Eight additional children were tested but not included in the sample for different reasons: fetal suffering at delivery (1), prematurely born (1), recent ear infection (1), not enough valid trials in the test (4), experimenter error (1).

The amount of Catalan and Spanish input was carefully assessed via the same questionnaire used for monolinguals. The sample was composed of 20 babies with a Catalan-speaking mother/main caretaker and 4 with a Spanish-speaking mother/main caretaker. Because 20 out of 24 participants belonged to bilingual homes where Catalan was the predominant language in parent-child daily interaction, we maximized the probability for this group to react to this Catalan-specific contrast. (In some prior research, Catalan-dominant subjects categorized Catalan sounds more like Catalan monolinguals than the Spanish-dominant subjects did; Sebastián-Gallés & Bosch, 2002; Sebastián-Gallés, Echeverría & Bosch, 2005.) Here, because the parent who was reported to spend the most time with the child was almost always the mother, we use the terms “dominant language” and “maternal language” interchangeably. Daily exposure to both languages ranged from 51%--49% to 21%--79%, with an average of Catalan exposure of 65%.

Materials and procedure

Materials and experimental procedures were the same as in Experiment 1. All participants were tested with the Catalan version. Materials were kept the same for two reasons. First, testing bilingual children on the same words and pictures ensured that any differences in responding among the experiments were not due to extraneous factors like the recognizability of each picture as an instance of its target concept. Second, testing these children on words that are cognates in Spanish and Catalan provided a stringent test of children’s sensitivity to phonological changes. Cognates might be harder for children to keep distinct between their two languages and might therefore suffer interference, just as similar-sounding words within a single language can interfere with one another in word learning (Stager & Werker, 1997; Swingley & Aslin, 2007). Although the tested vowel change was never the only difference between the Spanish and Catalan cognates (there was always at least one consonant marking the Catalan words as Catalan and not Spanish), children might find, say, Catalan galeta (/g[turned e]’lεt[turned e]/) and Spanish galleta (/ga’λeta/) similar enough that they might disregard their differences to some degree. Thus, children differentiating the CP and MP realizations of the target words would provide a strong confirmation of phonological encoding. Children failing to make this differentiation would indicate a lack of sensitivity to the change in vowels, which might or might not be related to the cognate status of the tested words. We return to discussion of this issue below.

Results and Discussion

The mean proportion of fixation to the target pictures in the correct-pronunciation condition (CP) was 64.5%, while in the mispronunciation condition (MP) it was 61.6 %. As in the monolingual groups, recognition was above chance in both conditions (CP: t (23 = 5.72, p < 0.0001; MP: t (23) = 3.71, p < 0.002; see Figure 3). However, the difference between conditions did not approach significance (F (1, 23) < 1, ns). Only 15 out of the 24 participants showed greater fixation times for the correctly-pronounced words. Over all 24 children, CP performance slightly exceeded MP performance for abella and galeta; was slightly inferior to MP performance for llet, and not different for peix, reflecting the null results overall. Analyses conditioned on initial fixation location (D-onset and T-onset trials) again showed equivalent performance on CP and MP trials (D-onset CP: 54.8%; MP, 53.8%; T-onset CP, 73.3%; MP 71.2%; all ns).

Additional analyses were carried out to study the influence of age, productive vocabulary size and amount of Catalan exposure. Age did not correlate with the size of the MP effect (r = −0.02, ns). Productive vocabulary size (total number of words for different concepts/objects) ranged from 25 to 421 words (average 208, median 231). No correlation was found between vocabulary size and the size of the MP effect (r = −0.23; ns).

The third factor under analysis was the amount of exposure to the familiar languages. Proportion of Catalan exposure was positively correlated with the size of the MP effect (i.e., children hearing more Catalan showed a greater difference in target fixation accuracy for CP and MP trials; r = 0.412, p < .05). This correlation was driven in part by the two children whose mispronunciation effect sizes were more than 2 SDs from the mean (one 3.0 SDs below, with a Catalan proportion of 56%; one 2.7 SDs above, with a Catalan proportion of 76%). Without these two children, the correlation dropped to 0.22 (p = 0.33, ns). A split-half analysis of proportion of Catalan exposure showed that the 12 children with the greatest estimated Catalan exposure (range: 65 to 80%) produced a marginally significant effect of mispronunciation, fixating targets 9.0% more for CP than MP words (t(11) = 2.18; p(two-tailed) = 0.052). In this group, 8 of 12 children’s CP performance numerically exceeded their MP performance. The 12 children with the least estimated Catalan exposure (range: 33 to 63%) showed no such effects (t(11) < 1, ns). Thus, the evidence is not incompatible with a model in which bilingual children are sensitive to substitutions of the /e/-/ε/ vowels if their home exposure to Catalan is sufficient, though such a conclusion must be considered tentative. The overall pattern of results from the 24 children indicates that bilingual children as a group were not sensitive to substitutions of the Catalan vowel.

A final analysis compared data from the bilingual and the Catalan monolingual groups. Both groups were tested with the same materials and in both cases children had been exposed to Catalan since birth, although with a different total amount of exposure. An ANOVA with linguistic group (Catalan vs. Bilingual) and condition (CP vs. MP) as factors showed a significant effect of condition (F (1, 46) = 6.38, p = 0.015), no significant group differences (F (1, 46) = 2.34; p =0.133) and a nonsignificant interaction among these factors (F (1,46) = 2.18, p = 0.146). Considering only the D-onset trials, the analogous Anova revealed a significant main effect of condition (F (1,46) = 6.20, p = 0.016) and a significant interaction between condition and language group (F(1,46) = 4.81, p = 0.033).

Taken together, the results indicate a statistical pattern intermediate between those for the monolingual Spanish and Catalan children. In the context of the published literature, the bilinguals’ failure to show an effect of mispronunciation for a contrastive pair of vowels in one of their family languages is unusual; to date, nearly every substitution of one sound in the language for another sound has resulted in decrements in the recognition of familiar words, whether consonants or vowels are tested (Bailey & Plunkett, 2002; Swingley & Aslin, 2000, 2002), whether the substitutions are word-initial, word-medial, or word-final (Swingley, 2003; Swingley, in press). In addition, the bilinguals’ failure to be hindered in recognition is unlikely to be due to a pure inability to discriminate the sounds, given that 12-month-old bilinguals from the same population are fully capable of discriminating /ε/ and /e/ (Bosch & Sebastian-Gallés, 2003a). We consider alternate explanations in the remaining four experiments.

Experiments 3a and 3b: Common contrast

Because Experiments 1 and 2 tested only one vowel contrast, it is possible to interpret the findings as suggesting that Spanish learners and Catalan/Spanish bilinguals are, in some global sense, less alert to vowel substitutions than the monolingual Catalans are, although for different reasons. For Spanish participants, the vowels in Experiment 1 were not phonologically contrastive in their native language. Moreover, the small vowel inventory of Spanish (only five vowels) might lead to wider variance in vowels’ realization and this may have as a consequence a greater tolerance toward variability. For bilingual participants, it is possible that the need to establish vowel categories for two languages means that children’s establishment of the proper categories in both languages takes longer. These two factors, although different in nature, may lead to a similar result in a word recognition task, that is, inability to react to vowel mispronunciations in known words. To examine this possibility, Experiment 3 tested Spanish and Spanish/Catalan bilinguals for their sensitivity to a pair of vowel contrasts that are present in both Catalan and Spanish. If Spanish and bilingual children show effects of altered vowels here, it suggests that these children do not have a global lack of sensitivity to vowels, but a more specific tendency to ignore the Catalan /e/-/ε/ contrast.



Two groups of toddlers from Catalan-Spanish bilingual environments (Experiment 3a) and Spanish monolingual environments (Experiment 3b) participated. All were healthy, full term babies without parental report of hearing problems. The Catalan-Spanish bilingual group (n=24) was comparable to the previous samples in terms of age and vocabulary size. Children ranged in age from 18;12 to 23;20 (range from 542 to 710 days, mean 619, median 616). Twelve were girls. As in Experiments 1 and 2, the amount of Catalan and Spanish input to the bilingual children was carefully assessed by means of a questionnaire. Twenty-one children had a Catalan-speaking mother and three a Spanish-speaking mother. Percentages of exposure to both languages at home ranged from 51%-49% to a maximum of 21–79%, the average Catalan exposure being 65 %. The Spanish monolingual sample (n=24) was also comparable to the previous ones. The children ranged in age from 18;09 to 23;23 (549 days to 713 days, mean 631, median 634). Eleven of the children were girls. The average Spanish exposure was 93%. Across Experiments 3a and 3b, seven additional toddlers were tested but not included in the sample due to fussiness (5), experimental error (1) or persistent ear infections (1).

Materials and procedures

Materials and experimental procedures were the same as in Experiments 1 and 2. The only difference was the vocalic contrast used to create the mispronunciations. The vowel change in the Catalan material went from /e/ or /ε/ to /i/, in the words for “fish” and “bee” ([pe∫] to [pi∫], [[turned e]’βελ[turned e]] to [[turned e]’βiλ[turned e]]), and from /ε/ or /e/ to /a/ in the words for “milk” and “cookie” ([λet] to [λat], [g[turned e]’lεt[turned e]] to [g[turned e]’lat[turned e]]). In the Spanish version the same substitutions were applied ([peθ] to [piθ], [a’βexa] to [a’βixa], [let∫e] to [lat∫e] and [ga’λeta] to [ga’λata]). The mispronunciations did not form actual words of either language, though they were phonotactically legal. Bilingual babies were tested with the Catalan version and the monolingual Spanish group with the Spanish version.

Results and Discussion

Results from the bilingual group showed a fixation proportion to target pictures in the CP condition of 64.6% and in the MP condition of 51.8%. These proportions were above chance levels only in the CP condition (CP: t(23) = 4.82, p < .0001; MP: t(23) = .61, ns). Children fixated the target more upon hearing CP than MP realizations (t(23) = 3.75, p = .001). Eighteen children showed greater target fixation proportions in the correct pronunciation condition (see Figure 4). CP performance exceeded MP performance for three of the four test pairs (abella, galeta, and peix) and did not differ for the fourth (llet). Results considering only the trials on which children were fixating the distracter at target-word onset (D-onset trials) were similar, revealing an effect of mispronunciation (CP: 52.3%; MP: 40.4%; t(23)=2.29, p=0.03). There was not an analogous effect on T-onset trials (CP: 76.2%; MP: 71.0%, t<1).

Figure 4
Results of Experiments 3a and 3b: the effects of mispronouncing words by substituting vowels that are contrastive in both Catalan and Spanish, among bilinguals (left portion of graph) and Spanish monolinguals (right portion). Open circles represent each ...

The influence of age and vocabulary size on the MP-effect was also analyzed. The magnitude of the MP-effect did not correlate significantly with age (r = 0.212. ns). Productive vocabulary sizes ranged from 3 to 377 words (average 149 words, median 135; four infants were not included in this analysis because of lack of vocabulary data), and did not correlate with the MP effect (r = 0.243, ns). Finally, no correlation was expected between differences in the amount of exposure to each of the languages in the environment and the MP effect because the vowels involved in this case belong to both language phonologies, and this is what was found (r = 0.023, ns).

The Spanish monolinguals’ mean proportion of fixation to the target pictures in the correct condition was 66.8 %, while in the MP condition was 58.9%. These proportions were both above chance (CP: t (23) = 6.10, p < 0.0001; MP: t (23) = 3.21, p < 0.005), demonstrating that children recognized the words in both conditions. Differences in fixation time to CP and MP words were significant (t(23) = 2.71, p < .015). Seventeen of 24 children showed greater fixation times for well-pronounced words (see Figure 4). CP performance exceeded MP performance for three of the four test pairs (abella, galeta, and leche) and did not differ for a fourth (pez). Once again, the effect of mispronunciation was driven by D-onset trials (CP: 55.8%; MP: 38.4%; t(23)=4.41; p=0.0002) and not T-onset trials (CP: 78.7%; MP: 79.3; ns).

Correlations between the size of the MP effect (i.e., CP minus MP proportions) and the individual difference measures of age and vocabulary size did not reveal significant associations, though vocabulary size showed a marginal correlation among the 18 children for whom CDI data were available (age, r = 0.110, ns; CDI, r = 0.435, p = 0.071; CDI range 3 to 178 words; mean 178, median 168).

The bilinguals and the Spanish learners showed similar effects of mispronunciation, as shown by an ANOVA with language group (bilingual, Spanish) and condition (CP, MP) as factors (main effect of condition, F(1, 46) = 21.1, p < .0001; no effect of language group, F(1,46) = 1.9, p > 0.15; no interaction, F(1,46) = 1.2, p > .25). These results suggest that for the contrasts under study, which are common to Spanish and Catalan, both linguistic groups behaved similarly, being able to detect mispronunciations for familiar words (see Figure 4). The results rule out the hypotheses that Spanish learners do not attend to distinctions among any vowels, and the possibility that bilinguals do not either by virtue of being bilingual. To confirm these cross-experiment generalizations statistically, we compared Spanish children and bilingual children’s performance on the ε/e contrast (Experiments 1 and 2) and the other contrasts tested in Experiment 3. As expected, Spanish children revealed an interaction between contrast type (ε/e, others) and pronunciation condition (CP, MP) in an ANOVA (interaction F (1, 46) = 6.48, p = 0.014; no significant main effects). Bilingual children also revealed an interaction between contrast type and pronunciation condition (interaction F(1, 46) = 4.91, p = 0.03) and a main effect of condition carried by the Experiment 3 results (F(1, 46) = 9.76, p = 0.003).

The vowel substitutions tested in Experiment 3 were, in all likelihood, more acoustically distinct than the /e/ and /ε/ exchanges tested in the first two experiments; the /e/ and /ε/ are, according to spectral measurements, more similar than any other pairs of vowels in either language. In addition, the mispronunciations we implemented involved changes from non-peripheral vowels to peripheral vowels, which have been shown in some discrimination studies (Polka & Bohn, 2003) to be easier to detect than changes from peripheral to non-peripheral vowels (and, conceivably, easier to detect than exchanges of the non-peripheral vowels /e/ and /ε/). Thus, although we cannot infer that the Spanish and bilingual learners showed vowel sensitivity equivalent to the Catalan monolinguals, we may conclude that the Spanish and bilingual children showed a similar sensitivity to vowel mispronunciations relative to one another, when tested on vowel contrasts that are phonemic in both languages. Taken together, the first three experiments show that monolinguals and bilinguals find familiar words harder to recognize when a vowel in the words is replaced by another vowel – except in the case of exchanges of /e/ and /ε/, which are only reliably detected by Catalan monolinguals.

In the following experiment we explored the capacity to react to the critical vowel mispronunciation (/e/ or /ε/) in older bilingual children, because it is important to know when bilinguals show evidence of encoding this Catalan contrast in their lexicon.

Experiments 4a and 4b: Older Bilingual toddlers

The failure of the bilingual toddlers to react to exchanges of the /e/ and /ε/ vowels in word recognition might be explained in two different ways. One possibility is that differentiating these vowels in words requires some critical amount of Catalan exposure, which only the monolingual Catalans have attained. If this is true, then sensitivity to the difference between these vowels would be expected to correlate with age. This was not strongly supported in Experiment 2, but perhaps the age range tested there (18 to 26 months) was too restricted: if a child hears Catalan 65% of the time and therefore receives only 65% of the Catalan speech experience one would expect for a monolingual, then all else being equal (and assuming that infant-directed speech is constant over this age range), such a child would need to reach 27.7 months of age to have heard the quantity of Catalan speech typical of an 18-month-old Catalan monolingual.

A distinct possibility is that the bilinguals failed to differentiate the similar /e/ and /ε/ vowels not only because they lacked experience with Catalan, but because their experience with Spanish made the learning problem more difficult. Given that Spanish has a vowel that is similar to both of the relevant Catalan vowels, Spanish language experience could hinder the learning of the Catalan contrast in words (Bosch & Sebastián-Gallés, 2003a). If this is so, we might expect to find effects of the ratio of Catalan to Spanish exposure well beyond infancy.

In Experiments 4a and 4b, two groups of Spanish/Catalan bilingual children were tested, differing in whether their dominant (maternal) language was Spanish or Catalan.



Participants were 48 children from bilingual environments: half Catalan-dominant (Experiment 4a) and half Spanish-dominant (4b). All children were healthy, full-term and without hearing problems. The 24 Catalan dominant children ranged in age from 31;16 to 51;08 (months; days) with a mean of 1327 days (approximately 3 years, 7 months; median 1350 days). Nine were girls. The Spanish dominant group ranged in age from 35;15 to 55;06, with a mean of 1346 days (approximately 3 years, 8 months; median 1380 days). Nine were girls. In this group, 3 additional children were tested but not included, due to fussiness (n=2) or having been born premature (n=1). The amount of Catalan and Spanish exposure was assessed primarily by means of the same questionnaire used in previous experiments. Because older children are frequently exposed to language from a wider range of sources than younger children, we asked parents to provide details of children’s family environment, time in daycare, languages used in preschools, and so forth. With this information an estimate was obtained which yielded an average of Catalan exposure of 67% (range from 54% to 79%; SD 8%) for the Catalan-dominant group and of 34% (range from 48% to 21%, SD 10%) for the Spanish dominant group. The exposure proportions for the Catalan-dominant group were similar to those for the bilingual toddlers of Experiment 2 (mean Catalan exposure 65%, SD 13%), Language dominance coincided in almost every case with the language spoken at home by the mother or the main caretaker (for a small number of children, both parents used the same language at home, but the child’s primary daily caregiver was a speaker of the other language).

Materials and procedures

Materials and experimental procedures were exactly the same as in Experiment 1. Both groups were tested with the Catalan version involving the /e/ -/ε/ vowel contrast.

Results and Discussion

In the Catalan-dominant group, the mean proportion of target fixation in the correct pronunciation condition was 87.3%, and for the mispronunciation condition was 81.7% (see Figure 5). The mean surpassed 50% in both conditions (both t (23) > 15, both p < 0.00001). Children fixated the target more upon hearing correct pronunciations than mispronunciations (t (23) = 2.61, p < .02). Seventeen out of the 24 children showed greater target fixation proportions in the CP condition than in the MP condition. All four items contributed to the effect, though the condition difference for peix was negligible (1.3%). As in previous experiments, the mispronunciation effect came from the D-onset trials (CP: 84.0%; MP: 73.3%; t (23)=2.56; p<0.02) and not T-onset trials (CP: 91.1%; MP: 90.9%; ns).

Figure 5
Results of Experiments 4a and 4b: the effects of mispronouncing /e/ as /ε/ or vice versa among Catalan-dominant and Spanish-dominant bilinguals. Open circles represent each child’s difference score (correct pronunciation fixation minus ...

Correlational analyses revealed no significant associations between any measure of task performance and either age or proportion of Catalan exposure (all |r| < 0.25).

In contrast to the Catalan-dominant bilinguals, Spanish-dominant bilinguals did not react differently to CP and MP trials. Mean target fixation was 82.0% in the CP condition and 83.1% in the MP condition. Performance was well above chance in both conditions (both t (23) > 13, both p < 0.00001). The mispronunciation effect (CP minus MP) was not significant (t (23) < 0.5). Ten children showed better performance on CP than MP trials. No differences were found for D-onset trials (CP: 75.1%; MP: 76.1%; ns) or T-onset trials (CP: 87.2%, MP: 90.9%; ns).

Children’s age was not correlated with performance on CP nor MP trials, nor was age correlated with the difference between CP and MP performance (all |r| < 0.33, ns). However, overall performance was better among children with a greater proportion of Catalan exposure: CP trials, r = 0.31, p = 0.06; MP trials, r = 0.69, p < .0002; mean of CP and MP performance, r = 0.58, p < .003. This superior performance with increasing proportion of Catalan exposure did not lead to greater ability to detect the MP substitutions, however; there was not a correlation between Catalan exposure and the mispronunciation effect (i.e., CP-MP; r = −0.08, ns), and the 8 Spanish-dominant children with the highest proportion of Catalan exposure (mean 46%) showed no mispronunciation effect (mean –1.0%, ns).

An ANOVA with linguistic group (Catalan dominant and Spanish dominant) and condition (CP and MP) showed no main effect of condition (F (1, 46) = 2.30; p = 0.14), but a significant interaction of language group and condition (F (1,46) = 4.80, p = .03), reflecting the effect of mispronunciation on the Catalan-dominant but not Spanish-dominant bilinguals.

These results suggest that bilingual toddlers’ failure to differentiate the /e/ and /ε/ vowels in word recognition (Experiment 2) was not simply due to these children’s lacking sufficient exposure to the Catalan language. Suppose we estimate children’s lifetime Catalan exposure by multiplying their age by their estimated proportion of Catalan exposure. This yields a (rather rough) “months of Catalan exposure” value that allows comparison among participant populations. On this measure the twelve Spanish-dominant bilinguals (Experiment 4) with the greatest total Catalan exposure cover the range of Catalan exposure for the full set of monolingual Catalan toddlers tested in Experiment 1 (Spanish-dominant bilinguals, mean = 19;21 months of exposure, SD = 3;5; Catalan monolinguals, mean = 20;21, SD = 3;3). These twelve Spanish-dominant bilinguals, like the sample as a whole, showed no sign of a mispronunciation effect (mean CP-MP score, −4.6%, ns). Although these estimates of total exposure are imprecise, they accord with the fact that age was not a significant predictor of mispronunciation effects within any of the present experiments. The only age effect on mispronunciation sensitivity was that implicit in the comparison of (largely Catalan-dominant) toddlers in Experiment 2, who for the most part showed no such sensitivity, and the Catalan-dominant preschoolers of Experiment 4, who did.

Thus, although age is not irrelevant to bilingual children’s mispronunciation sensitivity (hence the change seen as Catalan-dominant bilingual toddlers mature into preschoolers), it is not central to this sensitivity in the way that would be expected if the raw amount of Catalan exposure were the driving force behind mispronunciation sensitivity. Rather, the pattern of data suggests that language exposure from a Spanish-speaking parent makes the /e/ - /ε/ contrast harder to learn. It is remarkable that even preschoolers ranging from 3 to over 4 1/2 years of age showed no sensitivity to substitutions of these vowels, if their dominant language was Spanish.

General Discussion

The main goal of the current research was to analyze the impact of simultaneous Catalan/Spanish bilingual exposure on the representation of a Catalan vowel contrast in bilinguals’ first words. A secondary goal was to supplement existing data on monolingual children’s sensitivity to vowel changes, focusing in particular on the /e/ and /ε/ sounds which are very close in phonetic space, and which are phonologically contrastive in the language of one of our participant populations and not the other. We examined children’s recognition of words that were presented in their canonical form, or were mispronounced by substituting one vowel with a similar-sounding vowel. The results of this manipulation among monolingual children may be stated simply: children’s recognition of mispronounced words was impaired, relative to correctly pronounced words, only when the canonical and substituted vowels were contrastive in the language of the child. Thus, Catalan monolinguals found words with /e/ and / ε / substituted harder to recognize, reflecting the phonological status of these vowels in Catalan; Spanish monolinguals did not detect this change, but were sensitive to changes that involved contrasting Spanish vowels. These results were as would be expected based on monolinguals’ vowel categorization performance earlier in infancy (Bosch & Sebastian-Galles, 2003a). Bilingual children’s performance depended partly on their amount of exposure to each language (language dominance), and partly on their age. Bilingual toddlers as a whole did not react differently to canonical and deviant forms involving exchanges of /e/ and /ε/, though there was a trend toward such an effect in children with a greater proportion of Catalan exposure. Bilingual preschoolers (3- and 4-year-olds) behaved differently depending on which language was predominant at home. Catalan-dominant preschoolers recognized words with exchanged /e/ and /ε/ poorly, like the Catalan monolingual toddlers; Spanish-dominant preschoolers showed no sign of detecting the mispronounced vowels.

The two bilingual groups that apparently treated the Catalan /e/ and /ε/ vowels equivalently probably did not do so out of a general delay in vowel encoding, as suggested by the results of Experiment 3. How, then, can we best account for the bilinguals’ performance? As stated previously, a number of factors might contribute to the failure of bilingual toddlers and Spanish-dominant preschoolers to respond to these mispronunciations. First, it is possible that these children simply cannot reliably encode the difference in these sounds, much as adult Spanish monolinguals (and even Spanish-dominant bilinguals) respond to them identically in a range of tasks (e.g., Pallier, Colomé, & Sebastián-Gallés, 2001). This possibility is unlikely, however, given the ability of 12-month-old bilinguals to discriminate the sounds in a task involving memory of multiple speakers’ realizations of a word over a short delay (Bosch & Sebastián-Gallés, 2003a). If 12-month-olds can succeed in this challenging task, we doubt that this ability would decline significantly in the ensuing 9 months or so, and indeed other evidence described below indicates a continuing ability to discriminate.

If the bilingual children can hear the distinction, why do they not show an effect of mispronunciation in word recognition? A second possibility is that children’s experience in listening to Catalan speech from non-native (Spanish) speakers has taught children to ignore the distinction between /e/ and /ε/. Even when bilingual parents aim to hold to a ”one-parent, one-language” policy at home, in general they do not rigidly switch to their own dominant language every time they take a conversational turn. Rather, conversations usually maintain a single language, to the slight disadvantage of the speaker using his non-dominant language. Thus, children likely hear many words spoken both with a native accent and a nonnative one. The bilingual toddlers in our experiments may well have heard our test words being spoken both with the proper Catalan vowel, and with some variation in this vowel, perhaps even including a prototypical Spanish vowel as used in our studies. It is plausible that bilingual children do command the /e/ - /ε/ difference well enough to (for example) learn two Catalan words that vary only in those sounds, but failed to show looking-time effects of mispronunciation, simply because they have learned to overcome variation in these sounds. Our toddlers’ results show a striking parallelism with adult data (Sebastián-Gallés et al., 2005). In that study, bilingual adults who had begun learning Spanish and Catalan from birth (“simultaneous bilinguals”) were less able to detect exchanges of /e/ and /ε/ in Catalan words than were bilingual adults who in their first few years of life learned only Catalan, and then learned Spanish while continuing with Catalan (Catalan-dominant sequential bilinguals). This was true even among simultaneous bilinguals whose mother spoke Catalan natively. Early exposure to Spanish appears to permanently degrade sensitivity to the Catalan /e/ - /ε/ contrast in words. Sebastián-Gallés et al. suggested that children’s early exposure to Spanish (mis)pronunciations of these vowels contributes to inferior learning of them in the bilinguals.

An additional factor that should be considered is our use of cognates as test items. The Spanish and Catalan labels for our test pictures were phonologically similar (e.g., for bee, [a’βexa] and [[turned e]’βελ [turned e]]). This situation is not exceptional; indeed, around 60% of the words in the Catalan and Spanish vocabulary checklists are cognates (based on examination of the Spanish and Catalan versions of the MacArthur CDI (Águila, Ramon-Casas & Bosch, 2007). Bilingual children familiar with both the Spanish and Catalan forms of these words might conflate them in memory to some degree, increasing their uncertainty about the phonological details that differentiate the words. Of course, children are not totally at sea, as shown by the toddlers’ detection of mispronunciations other than /e/ - /ε/ exchanges. But the fact that this vowel pair is contrastive only in Catalan, and realized as a single intermediate vowel in Spanish, probably makes children especially vulnerable to interference. The Catalan-dominant preschoolers’ effects of /e/ - /ε/ mispronunciations become all the more striking on this view. This interference may have two components: some degradation in the representations of the words in children’s mental lexicons, and also uncertainty during the recognition process. Studies of adult bilinguals indicate retrieval of words from language A while hearing sentences of language B, provided that the realizations of the words are sufficiently close to the canonical forms in both lexicons (Colomé, 2001; Ju & Luce, 2004; Spivey & Marian, 1999; Weber & Cutler, 2004). Thus, if bilingual children hearing (e.g.) [[turned e]’βελ [turned e]] also activated [a’βexa], which means the same thing in Spanish, it is possible that this “double” lexical match would enhance target looking enough to overcome effects of sensitivity to the phonological difference between /e/ and /ε/.

It is significant to note that follow-up work testing mispronunciations of non-cognate words has begun to suggest effects of /e/ - /ε/ mispronunciations among bilingual toddlers (Ramon-Casas & Bosch, 2007). For example, 24-month-old Catalan-Spanish bilinguals tested on [[turned e]s’pεlm[turned e]] (English: candle; Spanish: [bela]) recognize the CP [[turned e]s’pεlm[turned e]] better than the MP [[turned e]s’pelm[turned e]] in a procedure otherwise identical to that used in the present experiments. Should continuing research uphold this pattern, it will show that bilingual toddlers can encode the Catalan vowels in at least some Catalan words. In this case we would conclude that the difference between bilinguals and the Catalan monolinguals is not primarily in their system of Catalan phonological categories per se, but is specific to the representation or interpretation of certain words, namely the Catalan-Spanish cognates. For the moment it is not known whether similar differences characterize bilingual children’s representation of other sounds in cognate words, such as the initial vowels of pilota (Catalan, ball) and pelota (Spanish, ball), where the two vowels are each contrastive in both languages. Our intuition is that the blurring of cognates’ phonological representations would be most severe in particularly difficult cases like the one studied here, where two vowels in one language are conflated in the other.

One unexpected finding in the present studies was that when mispronunciation effects were found, they were concentrated in the set of trials on which children happened to be fixating the distracter object when the spoken target word began (the “D-onset trials”). This pattern has not been found in our previous studies using single-sound mispronunciations, although it may have a precedent in one set of experiments (Fernald, Swingley & Pinto, 2001). In that study, 18- and 21-month-olds viewed pairs of pictures and heard sentences naming one of the pictures. On some trials, children heard a normal pronunciation of the target word; on other trials, children heard only the first portion of the target (e.g., [beI] from baby). Eye movement analyses showed that children initially fixating the distracter began shifting their gaze away from the distracter equivalently in both conditions, at first. Shortly after the interruption on truncated-word trials, though, children became much less likely to leave the distracter object, and more likely to shift back to it from the target, than on whole-word trials. Thus, children showed large effects of word truncation, once they heard it, if they had started out on the distracter. But among children initially fixating the target picture, word truncation had no effect. Children already looking at a picture of a baby persisted in fixating it whether they heard “baby” or just “bay—“.

One way to account for larger effects of lexical distortions on distracter-onset than on target-onset trials is to assume that children need more evidence to shift away from the target than the distracter. As discussed in Swingley (in press), children in the language-guided looking task behave as though they expect the sentences they hear to be consistent with one of the available options. When viewing the distracter, it is quickly clear to the child that this expectation of consistency is not being met (the notions evoked by the word do not match the category exemplified by the picture), and the child shifts away. The response is slower when the word is hard to recognize (e.g. because it has been mispronounced). When viewing the target, the expectation of consistency is met, and the child maintains his or her gaze. Mispronounced words on target-onset trials are still partially consistent with the child’s focus of attention—more consistent than the child might expect by chance—and so children might be less inclined to reject the target. Still, as noted previously, our prior studies have found effects of mispronunciation for both D-onset and T-onset trials. We can only speculate about the reason for the difference between the current work and the prior studies. It is possible that the vowel distinctions tested here were more subtle than the consonant and vowel substitutions examined in prior research; or that testing with picture pairs whose names overlapped at onset (e.g., abeja and avión) played a role.

Our results support existing evidence indicating that even simultaneous childhood bilingualism does not necessarily result in children who treat each language as monolingual speakers of that language do. The duration of a learner’s exposure to a language is one significant factor in determining his or her eventual performance, but it is not the only factor; the nature of the learners’ childhood exposure is of special relevance. This was seen in the effects of language dominance on the looking patterns of bilingual preschoolers, and is consistent with studies of adults. Adult Catalan- and Spanish-dominant bilinguals who are functionally completely fluent in both languages nevertheless vary in how consistently and quickly they categorize the Catalan /e/ and /ε/ sounds (Sebastián-Gallés, 2005; see also Bosch, Costa, & Sebastian-Gallés, 1994; Bosch, Costa, & Sebastián-Gallés, 1997; Bosch et al. 2000; Pallier et al., 1997; Pallier et al., 2001; Sebastián-Gallés & Soto-Faraco, 1999). In this work, dominance effects are predicted by maternal language (the language heard most often in early childhood), not adult usage patterns. Differences between simultaneous bilinguals and early bilinguals (participants from Spanish monolingual families, but exposed very early to Catalan in day-care or pre-school centers) are also observed, suggesting that the amount of exposure during the first months of life have permanent effects that can be measured in adulthood.

Finally, the present experiments illustrate the relationship between perceptual categorization of speech as revealed in experiments on infants, and perceptual interpretation of speech sounds in word recognition. Infants growing up in Spanish monolingual households do not discriminate the Catalan /e/ and /ε/ vowels after about 8 months. Just as one would expect on the assumption that discrimination in late infancy is criterial for inclusion in early lexical representations, Spanish monolingual toddlers do not detect exchanges in these vowels during word recognition, though Catalan monolinguals do. Our view holds that these facts are related, and that phonetic categorization in infancy is a primary determinant of phonological categorization in toddlerhood. Interpretation of phonetic variation does not occur in the same way in children learning different languages; identical variations in particular regions of vowel space are more important to children who, as infants, began learning a language whose phonology partitions that region into separate categories, than to children whose language does not.

We note, though, that the monolinguals’ results may be consistent with another account, in which children’s experience with individual words is what drives the word recognition process. Spanish monolinguals may have heard words like leche pronounced with a wider range of realizations of the vowels than were present in the Catalan child’s experience with the cognate llet. If variation among heard instances of words defines lexical representations, might we not simply do away with the assumption that phonological categories are relevant in word recognition? Indeed, we ourselves appeal to variation in the realizations of individual words in our explanation of the bilingual children’s apparent insensitivity to /e/-/ε/ changes for cognates but not non-cognates.

Because we have not measured the actual realizations of the words in our child participants’ experience of language, we cannot rule out the latter possibility on the basis of our findings here. One way to handle this problem empirically is to teach children new words, thereby controlling exposure, and evaluate responses to deviant pronunciations of various sorts; such a study has yet to be completed in the present population of children. Other evidence, however, implicates native phonology in interpretation even when children of different language backgrounds have received exactly the same exposure to novel words (Dietrich, Swingley, & Werker, 2007). Though considerable work remains to be done to evaluate the generality of the present results, we suspect that in most cases, even among bilinguals, infants’ tendencies in category discrimination (as assessed from about 10 months onward) will be good predictors of toddlers’ sensitivity to mispronunciations in words. The case of the Catalan /e/ and /ε/ presents what may prove to be an unusual alignment of factors that discourage ready interpretation by children: the presence of a single Spanish category in between the prototypical Catalan ones within a small region of phonetic space; the low frequency (rarity) of the Catalan vowels relative to their Spanish counterpart; and the very large number of cognate words in bilingual children’s vocabularies. As we have shown, under these circumstances bilingual toddlers and even Spanish-dominant bilingual preschoolers readily accept nonstandard pronunciations involving exchanges of these sounds.


This research was supported by the Spanish Ministerio de Ciencia y Tecnología (Grant Contracts with EC Fondos FEDER SEJ2004-06429/Psic to L.B. and SEJ2007- 60751 to N.S.), the CONSOLIDER 2010 Program (CSD2007-012) and the Catalan DURSI (SGR2005-01026); in addition, the work was supported by US NIH grant R01-049681 to D.S and US NSF grant HSD-0433567 to Delphine Dahan and D.S. The authors want to thank Xavier Mayoral, Eva Águila, Ferran Pons and to the staff of the following educational institutions where the participants were recruited: Espais Familiars Bon Pastor, Erasme Janer, Casa dels Colors, El Petit Drac; E.B. La Quitxalla, CEIP Mare Nostrum and Escola Avenç. Special thanks are due to all the infants and the families who so generously participated in this research.


1These counts were made by listing the word tokens in the Anais corpus of Demuth and Tremblay (2007) for French, and 14 mothers in the English corpus of Brent and Siskind (2001). The labial stops are also frequent in counts of word types. Similar results are obtained in counts over all word positions, though for noninitial position there is greater uncertainty about the actual realization of the sounds and about whether initial and final sounds are considered the same by young children.

2Vihman, Thierry, Lum, Keren-Portnoy, and Martin (2007) found that bilingual Welsh-English children and monolingual English children displayed familiarity for English words (and Welsh words too, among the bilinguals) at the same age, 11 months, in tasks requiring sensitivity to auditory forms alone. Welsh monolinguals, however, showed no word familiarity effects (for Welsh) even as old as 12 months. These results can be interpreted as a bilingual advantage (for Welsh) or at least no disadvantage (for English). Similarly, Sebastián-Gallés & Bosch (2002) found that 10-month-old Catalan-Spanish bilinguals were able to develop specific phonotactic knowledge at the same age as infants from Catalan monolingual environments, although this was not observed in the group of Spanish-dominant bilinguals.

3In the present experiments, we found that response latency data were too sparse to allow for comparisons of correct-pronunciation and mispronunciation trials. Because a trial only yields a RT if the child is initially fixating the distracter and then shifts away, many trials were unavailable for analysis, and individual children frequently provided response latencies only for different words in each condition. Furthermore, given that children respond to speech as it unfolds (Swingley, Pinto, & Fernald, 1999), response latencies typically reflect children’s interpretation of the first sounds in a word, and therefore would not be expected to be informative about mispronunciations occurring in the second syllable of a word, as was the case for two of our four items.

4Note that 50% is not “chance” on D-onset trials, because the analysis window of 360—2000 ms includes the time when children are still shifting their gaze away from the distracter. If, for example, a later window of 1000—2000 ms is examined, children’s target fixation proportions are significantly above 50% in every experiment. The subject means for this window, for each experiment, are given as follows. Experiment 1a, 75.6%; 1b, 62.2%; 2, 63.4%; 3a, 68.0%, 3b, 65.7%, 4a, 93.9%; 4b, 85.9% (all t > 2, all p < .05). The effects of mispronunciation that are statistically significant using the traditional window are also significant using (e.g.) the later window of 1000—2000 ms.

5The tool used to collect vocabulary information was not specifically adapted for bilingual populations. Parents were given a questionnaire in the dominant language at home and they signaled the concepts labeled by the child. Sometimes the existence of two labels for the same concept was indicated, but this information was not systematically obtained, so total number of words produced is likely to be underestimated.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


  • Águila E, Ramon-Casas M, Bosch L. Patterns of lexical acquisition in bilingual toddlers: does amount of exposure make a difference?. Paper presented at the 6th International Symposium on Bilingualism (ISB6); Hamburg, Germany. 2007.
  • Alcina J, Blecua JM. Gramática española. Barcelona: Editorial Ariel; 1975.
  • Ashby FG, Queller S, Berretty PM. On the dominance of unidimensional rules in unsupervised categorization. Perception & Psychophysics. 1999;61:1178–1199. [PubMed]
  • Bailey TM, Plunkett K. Phonological specificity in early words. Cognitive Development. 2002;17:1265–1282.
  • Ballem K, Plunkett K. Phonological specificity in 1;2 year olds. Journal of Child Language. 2005;32:159–173. [PubMed]
  • Beckman ME, Edwards J. The ontogeny of phonological categories and the primacy of lexical learning in linguistic development. Child Development. 2000;71:240–249. [PubMed]
  • Best CT, McRoberts GW. Infant perception of non-native consonant contrasts that adults assimilate in different ways. Language & Speech. 2003;46:183–216. [PMC free article] [PubMed]
  • Bortfeld H, Morgan J, Golinkoff R, Rathbun K. Mommy and me: Familiar names help launch babies into speech stream segmentation. Psychological Science. 2005;16:298–304. [PMC free article] [PubMed]
  • Bosch L, Costa A, Sebastián-Gallés N. La estructura interna de las categorías fonéticas: Percepción de vocales e identificación de prototipos en catalán y español. Paper presented at the XII National Congress AESLA: New horizons of linguistics; Barcelona, Spain. 1994.
  • Bosch L, Costa A, Sebastián-Gallés N. Vowel discrimination in early bilinguals and the perceptual magnet effect. Paper presented at the 5th International Congress of the ISAPL; Porto, Portugal. 1997.
  • Bosch L, Costa A, Sebastián-Gallés N. First and second language vowel perception in early bilinguals. European Journal of Cognitive Psychology. 2000;12(2):189– 222.
  • Bosch L, Sebastián-Gallés N. Simultaneous bilingualism and the perception of a language-specific vowel contrast in the first year of life. Language and Speech. 2003a;46(2–3):217–243. [PubMed]
  • Bosch L, Sebastián-Gallés N. Language experience and the perception of a voicing contrast in fricatives: Infant and adult data. In: Recasens D, Solé MJ, Romero J, editors. Proceedings of the 15th Internacional Conference of Phonetic Sciences. Barcelona: UAB/Casual Prods; 2003b. pp. 1987–1990.
  • Brent MR, Siskind JM. The role of exposure to isolated words in early vocabulary development. Cognition. 2001;81:B33–B44. [PubMed]
  • Burns TC, Yoshida K, Hill K, Werker J. Bilingual and monolingualinfant phonetic development. Applied Psycholinguistics. 2007;28(3):455–474.
  • Carrera-Sabaté J, Fernández-Planas AM. Vocals mitjanes tòniques del català. Estudi contrastiu interdialectal. Barcelona: Editorial Horsori; 2005.
  • Charles-Luce J, Luce PA. Similartity neighbourhoods of words in young children’s receptive vocabularies. Journal of Child Language. 1990;17:205–215. [PubMed]
  • Coady JA, Aslin RN. Phonological neighbourhoods in the developing lexicon. Journal of Child Language. 2003;30:441–469. [PubMed]
  • Colomé A. Lexical activation in bilinguals’ speech production: language-specific or language-independent? Journal of Memory and Language. 2001;45:721–736.
  • Demuth K, Tremblay A. Prosodically-conditioned variability in children’s production of French determiners. Journal of Child Language. 2007;34:1–29. [PubMed]
  • Dietrich C, Swingley D, Werker JF. Native language governs interpretation of salient speech sound differences at 18 months. PNAS. 2007;104(41):16027–16031. [PubMed]
  • Eilers RE, Oller MK. The role of speech discrimination in developmental sound substitutions. Journal of Child Language. 1976;3:319–329.
  • Edwards ML. Perception and production in child phonology: The testing of four hypotheses. Journal of Child Language. 1974;1:205–219.
  • Esquerra I, Febrer A, Nadeu C. Frequency analysis of phonetic units for concatenative synthesis in Catalan. Proceedings of the 5th International Conference of Spoken Language Processing; Sydney, Australia. 1998. p. 0817.
  • van der Feest S, Fikkert P. Segmental detail in children’s early lexical representations. Proceedings of the ISCA Workshop on Plasticity in Speech Perception (PSP) on CD; London, UK. 2005.
  • Fennell CT. Unpublished doctoral dissertation. University of British Columbia; Vancouver: 2004. Infant attention to phonetic detail in word forms: Knowledge and familiarity effects.
  • Fennell CT. Infants of 14 months use phonetic detail in novel words embedded in naming phrases. Proceedings of the 30th annual Boston University Conference on Language Development; Somerville, MA: Cascadilla Press; 2006. pp. 178–189.
  • Fennell CT, Werker JF. Early word learners’ ability to access phonetic detail in well-known words. Language and Speech. 2003;46:245–264. [PubMed]
  • Fennell CT, Werker JF. Infant attention to phonetic detail: Knowledge and familiarity effects. Proceedings of the 28th Annual Boston University Conference on Language Development; Somerville, MA: Cascadilla Press; 2004. pp. 165–176.
  • Fennell C, Curtin S, Werker J. Infants’ ability to distinguish vowel contrasts in a word learning task. Poster presented at the 2004 International Conference on Infant Studies; Chicago, USA. 2004.
  • Fennell CT, Waxman SR, Weisleder A. With referential cues, infants successfully use phonetic detail in word learning. Proceedings of the 31st Annual Boston University Conference on Language Development; Somerville, MA: Cascadilla Press; 2007. pp. 206–217.
  • Fennell CT, Byers-Heinlein K, Werker JF. Using speech sounds to guide word learning: the case of bilingual infants. Child Development. 2007;78:1510–1525. [PubMed]
  • Fenson L, Dale PS, Reznick JS, Bates E, Thal DJ, Pethick SJ. Variability in early communicative development. Monographs of the Society for Research in Child Development. 1994;59(5) Serial Number 242. [PubMed]
  • Fernald A, Pinto J, Swingley D, Weinberg A, McRoberts G. Rapid Gains in Speed of Verbal Processing by Infants in the 2nd Year. Psychological Science. 1998;9(3):228–231.
  • Fernald A, Swingley D, Pinto JP. When half a word is enough: infants can recognize spoken words using partial acoustic-phonetic information. Child Development. 2001;72:1003–1015. [PubMed]
  • Golinkoff RM, Hirsh-Pasek K, Cauley KM, Gordon L. The eyes have it: Lexical and syntactic comprehension in a new paradigm. Journal of Child Language. 1987;14:23–45. [PubMed]
  • Goudbeek M, Swingley D, Smits R. Supervised and unsupervised learning of multidimensional acoustic categories. Journal of Experimental Psychology: Human Perception and Performance in press. [PubMed]
  • Hallé PA, de Boysson-Bardies B. The format of representation of recognized words in infants’early receptive lexicon. Infant behaviour and development. 1996;19:463–481.
  • Johnson EK. English-learning infants’ representations of word-forms with iambic stress. Infancy. 2005;7:95–105.
  • Ju M, Luce PA. Falling on sensitive ears: constraints on bilingual lexical activation. Psychological Science. 2004;15:314–318. [PubMed]
  • Jusczyk PW. Towards a model for the development of speech perception. In: Perkell J, Klatt DH, editors. Invariance and variability in speech processes. Hillsdale, NJ: Erlbaum; 1986. pp. 1–19.
  • Jusczyk PW, Aslin RN. Infants’ detection of sound patterns of words in fluent speech. Cognitive Psychology. 1995;29:1–23. [PubMed]
  • Kuhl PK, Williams KA, Lacerda F, Stevens KN, Lindblom B. Linguistic experience alters phonetic perception in infants by 6 months of age. Science. 1992;255:606–608. [PubMed]
  • Kuhl PK, Stevens E, Hayashi A, Deguchi SK, Kiritani S, Iverson P. Infants show a facilitation effect for native language phonetic perception between 6 and 12 months. Developmental Science. 2006;9(2):F13–F21. [PubMed]
  • Lew-Williams C, Fernald A. Young children learning Spanish make rapid use of grammatical gender in spoken word recognition. Psychological Science. 2007;18:193–198. [PMC free article] [PubMed]
  • Liljencrants J, Lindblom B. Numerical simulation of vowel quality systems: the role of perceptual contrast. Language. 1972;48:839–862.
  • Mani N, Plunkett K. Phonological specificity of vowels and consonants in early lexical representations. Journal of Memory and Language. 2007;57:252–272.
  • Mani N, Plunkett K. Fourteen-month-olds pay attention to vowels in novel words. Developmental Science. 2008;11(1):53–59. [PubMed]
  • Mani N, Coleman J, Plunkett K. Phonological specificity of vowel contrasts at 18 months. Language and Speech. 2008;51:3–21. [PubMed]
  • Martínez Celdrán E, Fernández Planas AM. Manual de fonética española. Articulaciones y sonidos del español. Barcelona: Ariel; 2007.
  • Maye J, Werker J, Gerken L. Infant sensitivity to distributional information can affect phonetic discrimination. Cognition. 2002;82:B101–B111. [PubMed]
  • Metsala JL, Walley AC. Spoken vocabulary growth and the segmental restructuring of lexical representations: Precursors to phonemic awareness and early reading ability. In: Metsala JL, Ehri LC, editors. Word recognition in beginning literacy. Hillsdale, NJ: Erlbaum; 1998. pp. 89–120.
  • Narayan C, Werker JF, Beddor P. Acoustic salience affects speech perception in infancy: evidence from nasal place discrimination under review. [PubMed]
  • Nazzi T. Use of phonetic specificity during the acquisition of new words: differences between consonants and vowels. Cognition. 2005;98:13–30. [PubMed]
  • Nespor M, Mehler J, Peña M. On the different roles of vowels and consonants in speech processing and language acquisition. Lingue e Linguaggio. 2003;2:221–247.
  • Pallier C, Bosch L, Sebastián-Gallés N. A limit on behavioral plasticity in vowel acquisition. Cognition. 1997;64:B9–B17. [PubMed]
  • Pallier C, Colomé A, Sebastián-Gallés N. The influence of nativelanguage phonology on lexical access: Exemplar-based versus abstract lexical entries. Psychological Science. 2001;12:445 – 449. [PubMed]
  • Pater J, Stager C, Werker JF. The perceptual acquisition of phonological contrasts. Language. 2004;80:361–379.
  • Polka L, Bohn OS. Asymmetries in vowel perception. Sppech Communication. 2003;41:221–231.
  • Polka L, Werker JF. Developmental changes in perception of nonnative vowel contrasts. Journal of Experimental Psychology: Human Perception and Performance. 1994;20:421–435. [PubMed]
  • Ramon-Casas M, Bosch L. Desarrollo léxico y sensibilidad fonològica a un contraste vocálico en el bilingüe: palabras cognadas y no cognadas. Proceedings of the V International Congress of Language Acquisition; Oviedo, Spain. 2007.
  • Recasens D, Espinosa A. Dispersion and variability of Catalan vowels. Speech Communication. 2006;48:645–666.
  • Salverda AP, Dahan D, McQueen JM. The role of prosodic boundaries in the resolution of lexical embedding in speech comprehension. Cognition. 2003;90:51–89. [PubMed]
  • Sebastián-Gallés N. Native-language sensitivities: evolution in the firts year of life. Trends in cognitive Sciences. 2005;10(6):239–241. [PubMed]
  • Sebastián-Gallés N, Bosch L. Building phonotactic knowledge in bilinguals: Role of early exposure. Journal of Experimental Psychology: Human Perception and Performance. 2002;28:974–989. [PubMed]
  • Sebastián-Gallés N, Bosch L. Phonology and bilingualism. In: Kroll JF, de Groot AMB, editors. Handbook of Bilingualism: Psycholinguistic Approaches. New York, NY: Oxford University Press; 2005. pp. 68–87.
  • Sebastián-Gallés N, Bosch L. Developmental shift in the discrimination of vowel contrasts in bilingual infants: Is the distributional account all there is to it? Developmental Science in press. [PubMed]
  • Sebastián-Gallés N, Echeverria S, Bosch L. The influence of initial exposure on lexical representation: Comparing early and simultaneous bilinguals. Journal of Memory and Language. 2005;52:240–255.
  • Sebastián-Gallés N, Soto-Faraco S. On-line processing of native and non-native phonemic contrasts in early bilinguals. Cognition. 1999;72:112 – 123. [PubMed]
  • Shvachkin NK. The development of phonemic speech perception in early childhood. In: Ferguson CA, Slobin DI, editors. Studies of child language development. New York: Holt, Rinehart and Winston; 1973. pp. 91–127. Original work published 1948.
  • Spivey M, Marian V. Cross talk between native and second languages: partial activation of an irrelevant lexicon. Psychological Science. 1999;10:281–284.
  • Stager CL, Werker JF. Infants listen for more phonetic detail in speech perception than in word-learning tasks. Nature. 1997;388(6640):381–382. [PubMed]
  • Storkel HL. Restructuring of similarity neighbourhoods in the developing mental lexicon. Journal of Child Language. 2002;29(2):251–274. [PubMed]
  • Sundara M, Polka L, Molnar M. Development of coronal stop perception: Bilingual infants keep pace with their monolingual peers in press. [PubMed]
  • Swingley D. Phonetic detail in the developing lexicon. Language and speech. 2003;46:265–294. [PubMed]
  • Swingley D. 11-month-olds’ knowledge of how familiar words sound. Developmental Science. 2005;8:432–443. [PubMed]
  • Swingley D. Lexical exposure and word-form encoding in 1.5-year-olds. Developmental Psychology. 2007;43:454–464. [PubMed]
  • Swingley D. Onsets and codas in 1.5-year-olds’ word recognition. Journal of Memory and Language. doi: 10.1016/j.jml.2008.11.003. in press. [PMC free article] [PubMed] [Cross Ref]
  • Swingley D, Pinto JP, Fernald A. Continuous processing in word recognition at 24 months. Cognition. 1999;71:73–108. [PubMed]
  • Swingley D, Aslin RN. Spoken word recognition and lexical representation in very young children. Cognition. 2000;76:147–166. [PubMed]
  • Swingley D, Aslin RN. Lexical neighborhoods and the word-form representations of 14-month-olds. Psychological Science. 2002;13:480–484. [PubMed]
  • Swingley D, Aslin RN. Lexical competition in young children’s word learning. Cognitive Psychology. 2007;54:99–132. [PMC free article] [PubMed]
  • Swingley D, Fernald A. Recognition of words referring to present and absent objects by 24-month-olds. Journal of memory and Language. 2002;46:39–56.
  • Thiessen ED. The effect of distributional information on children’s use of phonemic contrasts. Journal of Memory and Language. 2007;56:16–34.
  • Tsao FM, Liu HM, Kuhl PK. Perception of native and nonnative affricate-fricative contrasts: cross-language tests on adults and infants. JASA. 2006;120:2285–2294. [PubMed]
  • Vallabha GK, McClelland JL, Pons F, Werker JF, Amano S. Unsupervised learning of vowel categories from infant-directed speech. PNAS. 2007;104:13273–13278. [PubMed]
  • Vihman MM, Nakai S, DePaolis RA, Hallé P. The role of accentual pattern in early lexical representation. Journal of Memory and Language. 2004;50:336–353.
  • Vihman MM, Thierry G, Lum J, Keren-Portnoy T, Martin P. Onset of word form recognition in English, Welsh and English-Welsh bilingual infants. Applied Psycholinguistics. 2007;28:475–493.
  • Walley AC. The role of vocabulary development in children’s spoken word recognition and segmentation ability. Developmental Review. 1993;13:286–350.
  • Weber A, Cutler A. Lexical competition in non-native spoken-word recognition. Journal of Memory and Language. 2004;50:1–25.
  • Werker JF, Tees RC. Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development. 1984;7:49–63.
  • Werker JF, Fennell CT, Corcoran KM, Stager CL. Infants’ ability to learn phonetically similar words: Effects of age and vocabulary size. Infancy. 2002;3:1–30.
  • Werker J, Fennell CT. From listening to sounds to listening to words: Early steps in word learning. In: Hall G, Waxman S, editors. Weaving a Lexicon. Cambridge: MIT Press; 2004. pp. 79–109.
  • Werker JF, Curtin S. PRIMIR: A developmental model of speech processing. Language Learning and Development. 2005;1:197–234.
  • White KS, Morgan JL. Sub-segmental phonology in infants’ early lexical representations. Journal of Memory and Language. 2008;59:114–132.
  • Yoshida KA, Fennell CT, Swingley D, Werker JF. Fourteen-month-old infants learn similar-sounding words. Developmental Science in press. [PMC free article] [PubMed]
  • Younger BA. The segregation of items into categories by 10-month-old infants. Child Development. 1985;56:1574–1583. [PubMed]