Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Child Lang. Author manuscript; available in PMC 2010 July 7.
Published in final edited form as:
J Child Lang. 2007 May; 34(2): 227–249.
PMCID: PMC2898269

Spoken word recognition by Latino children learning Spanish as their first language*


Research on the development of efficiency in spoken language understanding has focused largely on middle-class children learning English. Here we extend this research to Spanish-learning children (n=49; M=2;0; range=1;3–3;1) living in the USA in Latino families from primarily low socioeconomic backgrounds. Children looked at pictures of familiar objects while listening to speech naming one of the objects. Analyses of eye movements revealed developmental increases in the efficiency of speech processing. Older children and children with larger vocabularies were more efficient at processing spoken language as it unfolds in real time, as previously documented with English learners. Children whose mothers had less education tended to be slower and less accurate than children of comparable age and vocabulary size whose mothers had more schooling, consistent with previous findings of slower rates of language learning in children from disadvantaged backgrounds. These results add to the cross-linguistic literature on the development of spoken word recognition and to the study of the impact of socioeconomic status (SES) factors on early language development.

Determining what young language learners understand in the speech they hear can be challenging, because the processes involved in comprehension are only partially and inconsistently revealed in children's behavior in everyday situations. Until recently, studies of early language understanding have had to rely on measures such as the child's ability to pick out a named object or perform a requested action, or a parent's report of words assumed to be understood by the child. These are referred to as offline measures because they are based on children's responses to a spoken word or sentence after it is complete, rather than as it is heard and processed. While such offline procedures enable researchers to assess whether or not a child responds systematically in a way that indicates understanding, they reveal less about the child's developing skill in identifying and interpreting familiar words in continuous speech. Here we use real-time or online measures to investigate the early development of speech processing efficiency by children learning Spanish as their first language.

Questions about the time course of spoken language processing are central to psycholinguistic studies with adults, which rely on online measures to capture listeners' responses to the speech signal as it unfolds. For example, Tanenhaus and colleagues have pioneered the use of eye-tracking methods to study sentence interpretation, monitoring adults' gaze patterns as they survey a scene while listening to speech that is relevant to the visual stimuli (e.g. Dahan, Swingley, Tanenhaus & Magnuson, 2000). For many years, developmental researchers have also used looking behavior as a response measure in studies of infants' visual (e.g. Baillargeon, 1994) as well as auditory preferences (e.g. Fernald, 1985; Jusczyk, 1997). ‘Preferential looking’ techniques that incorporate both visual and auditory stimuli have been modified to investigate spoken word recognition and language comprehension by young children (e.g. Thomas, Campos, Shucard, Ramsay & Shucard, 1981; Golinkoff, Hirsh-Pasek, Cauley & Gordon, 1987), although the summary measures of total looking time used in such looking-preference procedures are not designed to capture the real-time dynamics of sentence interpretation. However, more recent research with infants and young children has incorporated the same high-resolution measures used in eye-tracking studies with adults (Fernald, Pinto, Swingley, Weinberg & McRoberts, 1998; Swingley & Aslin, 2000; Snedeker & Trueswell, 2004). Thus it is now possible to obtain continuous measures of speed and accuracy that enable sensitive assessment of efficiency in spoken language processing even by very young children.

Using this looking-while-listening procedure, Fernald et al. (1998) tracked infants' eye movements as they looked at pictures of familiar objects while listening to speech naming one of the objects. This cross-sectional study of the development of processing efficiency by English-learning children at 1;3, 1;6 and 2;0 revealed age-related changes in the speed and accuracy of responses to familiar words. These findings were replicated in longitudinal research showing similar growth in processing speed and reliability of word recognition across the second year (Fernald, Perfors & Marchman, 2006). Studies using online processing measures have also found that efficiency in word recognition was correlated with individual differences in vocabulary knowledge, as indexed by parental report. Children who oriented more quickly and accurately to the target picture in response to the spoken word tended to have larger productive vocabularies (Fernald, Swingley & Pinto, 2001; Zangl, Klarman, Thal, Fernald & Bates, 2005) as well as faster rates of vocabulary growth across the second year (Fernald et al., 2006). Research on how children process spoken language from moment to moment has begun to yield valuable insights into the early emergence of receptive language competence, and the relation of speech processing skills to lexical and grammatical development.

The purpose of this study is to broaden the existing literature on the time course of spoken word recognition in young language learners in two directions. First, we extend this research to children learning Spanish as a first language. The substantial literature on phonological processing by preverbal infants in the first year includes numerous studies in languages other than English (e.g. Werker, 1989; Kuhl, Williams, Lacerda, Stevens & Lindblom, 1992; Bosch & Sebastián-Gallés, 1997). However, research on the development of competence in online sentence interpretation in the second year has been limited almost exclusively to children learning English. By focusing on Spanish-learning children, we extend research on early processing efficiency to the third most widely used language in the world. In the USA, Spanish is used by nearly 60% of the population who speak a language other than English in the home, representing more than 28 million speakers (US Census, 2000, While several studies have examined early lexical development in Spanish using traditional offline measures (e.g. Pearson, Fernández & Oller, 1993), this study is the first to explore developmental changes in online speech processing in children learning a language other than English.

Second, we extend this research to Latino children from primarily low socioeconomic status (SES) families living in the USA. Another bias in the emerging literature on online processing efficiency is the narrow focus on children in families from mid to high SES backgrounds (e.g. Fernald et al., 2006; Swingley & Aslin, 2000; Zangl et al., 2005). In the present study we begin to examine how SES factors might have an influence on the development of speed and accuracy in online spoken word recognition. Spanish-speaking Latino children under five years comprise a rapidly growing population group in the USA. These children are three times more likely to live in poverty than their non-Latino white peers (Brindis, Driscoll, Biggs & Valderrama, 2002), and comprise nearly 25% of the children currently enrolled in government funded early education programs for low-income children (Collins & Ribeiro, 2004). Although many studies using offline measures have shown that language outcomes such as vocabulary size vary with SES (e.g. Hoff, 2003), little is known about how factors associated with SES may affect the early development of speech processing efficiency. After reviewing some background studies on cross-linguistic differences in early lexical development that highlight both similarities and differences in acquisition across languages, we describe recent research on links between features of maternal talk, SES and language outcomes.

Early lexical development from a cross-linguistic perspective

Research on early lexical development has provided insight into features of early acquisition that are similar across languages, as well as those that vary cross-linguistically. Such studies reveal remarkably similar patterns across languages in how many and what types of words children know at different ages (e.g. Caselli et al., 1995; Jackson-Maldonado, Thal, Marchman, Newton, Fenson & Conboy, 2003; Bornstein et al., 2004). For example, in an extensive study of English, Italian and Spanish, Bornstein & Cote (2005) found few cross-linguistic differences in overall vocabulary size, with similar patterns of noun dominance over other word types in children aged 1;6 to 2;6. These common patterns are typically hypothesized to reflect the universal cognitive and social abilities that guide how children link referents to the words they hear during everyday social interactions. Other studies have focused on language-specific features of early lexical development. For example, Tardif, Gelman & Xu (1999) found that children learning Mandarin produced a higher proportion of verbs than nouns in naturalistic settings as compared to English speakers, even though the total number of words produced was comparable. This effect may be due to structural features related to typological differences between these two languages. Mandarin, unlike English, is a ‘pro-drop’ language with verbal morphology that is relatively transparent. Thus, structural features of the language that serve to place words in more or less salient positions may be prominent in parental speech, contributing to different patterns of lexical development among children learning different languages.

While structural differences are one obvious source of cross-linguistic variability in children's early language input and lexical learning, speech addressed to children may vary across cultures for other reasons as well (Tardif et al., 1999). Fernald & Morikawa (1993) observed Japanese and American mothers interacting with their infants at 0;6, 1;0 and 1;6 during a play session with familiar toys. Although both groups of mothers produced the same amount of speech to the child and were engaged with the toys to the same extent, the focus of mother–child interactions was subtly different in the two groups. For example, when playing with a toy dog, English-speaking mothers labeled the dog frequently and consistently (e.g. Look at this dog. Yeah! See the dog? Do you like the doggie?), while Japanese-speaking mothers labeled it less often and less consistently, putting greater emphasis on the toy dog as a social partner (e.g. ‘Say hello to the doggie! Hello! Hello! Now give him a love. Love the woof-woof’). One could argue that the lower frequency of naming by Japanese mothers relates to the fact that noun ellipsis is grammatical in Japanese but not in English. However, English-speaking mothers also had the grammatical option of omitting object names by replacing them with pronouns, although they rarely did. Thus, Fernald & Morikawa argued that the robust differences in linguistic features of mothers' speech to infants in Japan and the USA were influenced as much by cultural differences in communicative style (e.g. Clancy, 1986) as by structural differences between English and Japanese. The point to be made here is that when we study children learning different languages, we need to be aware that parents' speech, and thus each child's early experience with language, are shaped by cultural as well as linguistic factors.

Whatever the sources of variability, studies of early language processing in English have shown that certain features of the input can enhance the efficiency with which words are recognized, facilitating children's ability to successfully map those forms onto appropriate referents. For example, English is an SVO language in which the object follows the verb and frequently appears in final position in the utterance. Moreover, the tendency to put object names in utterance-final position is greatly exaggerated in child-directed speech by English-speaking mothers, as compared to adult-directed speech (Fernald & Mazzie, 1991). This may account for the finding that young children learning English identify objects more efficiently when the object name appears in final rather than medial position (Fernald, McRoberts & Swingley, 2001). In addition, children respond more accurately if words occur at the end of a familiar and predictable sentence frame rather than being spoken in isolation (Fernald & Hurtado, 2006). However, it is not yet known whether such findings will generalize to other languages. Spanish is a language that allows relatively free word order, and both SVO and VSO word orders are common. While this variability could make the task of processing words in continuous speech more challenging, other features might work in the opposite direction. For example, portions of the Spanish morphological system are highly regular, with concord morphology adding redundancy, factors that could potentially facilitate the early processing and acquisition of lexical forms, especially nouns. As a first step in understanding how these features might impact the early development of online processing efficiency in a language other than English, the current study examines spoken language understanding by Spanish-learning children, relating developmental gains in the speed and accuracy of word recognition to age and vocabulary size in the second and third years of life.

The impact of environmental factors on children's lexical development

In addition to linguistic and cultural differences between language communities that have an impact on children's early experience, cultural differences within the same language community are also influential. For example, in their comparative study of children learning English, Spanish and Italian across urban and rural settings, Bornstein & Cote (2005) found that children's reported vocabulary size varied more as a function of within-culture environmental differences in rural vs. urban locales than as a function of differences between languages. Other studies in the USA have shown extensive demographic variation in the quantity and quality of the talk that children hear (e.g. Hoff-Ginsberg, 1998). In their longitudinal study of lower-, middle- and higher-SES children, Hart & Risley (1995) found that by 4;0, children from higher-SES families had heard about 30 million more words and had vocabularies that were three to four times larger, on average, than children in lower-SES families. In a recent large-scale study of low-income families, Pan, Rowe, Singer & Snow (2005) reported that variation in rates of vocabulary growth from 1;2 to 3;0 was significantly related to diversity of maternal talk – in particular, the number of different words produced during mother–child interaction. Clearly, children who hear a richer vocabulary that includes a higher proportion of low-frequency or complex words are better positioned to expand their own vocabularies at a faster rate (e.g. Weizman & Snow, 2001). Pan et al. (2005) also found that features of maternal knowledge such as years of education and scores on standardized tests of language and literacy contributed to child outcomes.

Only a few studies have explored relations among SES, features of maternal talk and vocabulary outcomes specifically in Latino populations (e.g. Laosa, 1980; Eisenberg, 2002). These studies have primarily focused on cultural differences in the nature of interactions, as mothers engage with their young children in activities such as making a cake, tying shoelaces or reading a book. For example, Laosa compared the types of talk that Mexican-American and European-American mothers used in contexts in which they were teaching the child how to put a toy together. In general, Mexican-American mothers were more likely than European-American mothers to be directive and to use more negative feedback; teaching strategies that other studies have shown to be ineffective. However, it was also the case that the Mexican-American mothers had less than a high-school education, whereas, most of the European-American mothers had at least a high-school education. When this SES disparity was taken into account, group differences in interactional style were no longer evident. Thus, the cross-cultural differences in maternal talk were more attributable to SES differences than to ethnicity per se.

All of the studies to date that have documented relations between language outcomes, maternal input and SES have been conducted using offline measures. Here we extend the exploration of how SES factors influence language outcomes by examining the development of online speech processing in primarily low-income Latino populations. This study has three goals: first, we examine whether age-related changes in receptive language processing are observed in Spanish-learning children over the second and third years of life, as in previous research with English-learning children (Fernald et al., 1998). Second, we evaluate whether these changes in processing efficiency are also associated with gains in expressive vocabulary (Zangl et al., 2005; Fernald et al., 2006). Third, we evaluate the impact of SES on the speech processing abilities of children living in the USA who are learning Spanish as a first language. Because Latino children in the USA are more likely to live in families with recent immigrants who may have lower levels of education and less-skilled occupations, this study extends research on early receptive language development to children from SES backgrounds underrepresented in prior research. By examining experimental measures of children's speed and efficiency of spoken word recognition in relation to SES, this study complements previous research in this area based on parent report measures and naturalistic observation.


Research facility and recruitment of research participants

This research was conducted in a laboratory located a few miles from the Stanford University campus. The majority of the residents are Latino families, many of whom are recent immigrants to the USA from Mexico. For a variety of reasons, Spanish-speaking families from this community are unable or reluctant to visit our main laboratory on the university campus. Most of the parents speak little English and have limited access to transportation, as well as lacking the time, resources and incentive to participate in a study. For these reasons, we have established a satellite laboratory in a family neighborhood, located in a five-room house that also serves as the residence for one of the Spanish-speaking staff members on the project. Two rooms are used as testing room and office, and the living room serves as a comfortable reception room and play area for visiting families. This laboratory is staffed by bilingual researchers who are native speakers of Mexican Spanish and who conduct all recruitment efforts and communicate with participant families in Spanish. Latino families are recruited through various sources, including county birth records, the university hospital, the community health center, preschools and library programs.


Participants were 49 children (30F, 19M) ranging in age from 1;3 to 3;1 (M=2;0), all from Spanish-speaking Latino families who had recently immigrated to the USA. While 92% of the parents were born in Mexico, all of the children were born in the USA. Parents reported that all children were full term with no perinatal difficulties, major illnesses, developmental delays or hearing loss. An additional 19 participants were tested, but not included in the analyses due to fussiness (n=8), failure to fixate one of the stimulus pictures on at least 50% of trials (n=7), experimenter error (n=2) or parental interference during testing (n=2). To enable descriptive comparison with results from previous studies, participants were grouped by age for some analyses: 1;3–1;8 (M=1;6, n=18), 1;9–2;1 (M=2;0, n=15) and 2;2–3;1 (M=2;6, n=16).

Prior to scheduling, a Spanish-speaking research assistant interviewed the parent about family background, the child's history and the daily experiences of the child. As part of this interview, she inquired about the child's language experience across all sources, including family, daycare, other adults, peers and television. A criterion for participation was that the child was learning ‘only Spanish’ in the home and that no more than 15% of the child's daily language exposure was in a language other than Spanish. While some exposure to English is inevitable given that these families reside in the USA, none of the children had regular interaction with speakers of English and none were reported to know more than just a few English words. The majority of parents (88%) had low levels of English-language proficiency and no siblings or other relatives of the participating children spoke English in the home. Most mothers were either not employed outside the home (n=33) or had non-skilled jobs (n=9); most fathers were in non-skilled (n=32) or semi-skilled (n=9) occupations. As shown in Table 1, the average annual family income was less than $25,000, with 98% reporting an income less than the median family income in the state. In the majority of families, both mothers and fathers reported less than a high-school education, although a range of educational levels was represented (3–18 years). There were no differences in these demographic factors among the parents of children in the three age groups (p>0.05, ns).

Participant demographics and vocabulary sizes and percentiles (M and s.d.)

Measures of expressive vocabulary

Spanish-language adaptations of the MacArthur-Bates Communicative Development Inventory (CDI) were used to gather parental report data on children's lexical development. For children younger than 1;6, parents completed the MacArthur-Bates Inventarios del Desarrollo de Habilidades Comunicativas: Inventario I; for children 1;6 and older, parents completed Inventario II (Jackson-Maldonado et al., 2003). In most cases, the Inventario was mailed to the home ahead of time and brought to the visit. In some cases, the parent completed the form at the visit while a research assistant played with the child. Parents with low levels of reading proficiency completed the questionnaires verbally with the assistance of the research assistant.

Vocabulary size was defined as the total number of words reported to be produced, based on the vocabulary checklist portions of the Inventarios. Although these checklists contain the same number of items as their English counterparts (Inventario I: 390 items; Inventario II: 680 items) and are organized into similar semantic categories (e.g. animal names, vehicles), the Inventarios are adaptations of the CDIs designed to be culturally and linguistically appropriate for Mexican and Mexican-American children. Percentile scores were derived for each child (by age and sex) using norms reported by Jackson-Maldonado et al. (2003). As shown in Table 1, there was considerable variation in reported vocabulary, with children spanning the range of percentile values at each age (range=5th to 93rd). It is important to note that norms for the Inventarios are based on a sample of Mexican-Spanish speakers in which 64% of the mothers reported high-school educations or less, notably different from the English CDI normative sample in which only 31.5% of the mothers reported high-school educations or less (Fenson, Marchman, Thal, Dale, Reznick & Bates, 2007).

The looking-while-listening procedure

On each trial in this procedure, children were shown a pair of objects as they listened to speech naming one of the objects. Their eye movements in response to the target word in each sentence were videotaped and later coded frame-by-frame, yielding a high-resolution record of the time course of comprehension. Given that little is known about online speech processing by Spanish-learning children, the stimuli were designed to be comparable to those used in previous research with English-learning children, and thus to reduce the potential influence of language-specific morphosyntactic features. For example, nouns have grammatical gender in Spanish but not in English, and adult speakers of languages with grammatical gender can use gender-marked articles to facilitate word recognition (e.g. Dahan et al., 2000). In this study, the target and distracter objects were always matched in grammatical gender, so that the child had to wait to hear the target noun before identifying the referent on every trial, comparable to test trials in English where the article the is never informative about which object name will follow. Thus, any differences in the performance of Spanish- and English-learning children in this task could not be attributable to features of the stimuli unique to Spanish.

Speech stimuli

The stimuli consisted of Spanish sentences in which a target noun was presented in a simple carrier phrase (e.g. ¿Dónde está el/la [target]?¿Te gusta? ‘Where's the [target]? Do you like it?’). The eight target nouns (el perro ‘doggie’; el bebé ‘baby’; el carro ‘car’; el globo ‘balloon’; el zapato ‘shoe’; el plátano ‘banana’; la pelota ‘ball’; la galleta ‘cookie’) were chosen based on their familiarity to children learning Mexican Spanish in this age range (Jackson-Maldonado et al., 2003). Noun pairings were matched for grammatical gender and number of syllables. To prepare the stimuli, a female native speaker of Spanish recorded several tokens of each sentence, matching them closely in intonation contour. These candidate stimuli were then digitized, analyzed, and edited using Peak 2.0 LE software for MacIntosh. The final tokens were chosen based on naturalness and prosodic comparability. The mean duration of target nouns was 527.4 ms (range=426–630 ms). Five filler trials were interspersed among the 16 test trials (e.g. ¿Te gustan las fotos? ¡Aquí vienen más! ‘Do you like the pictures? Here come some more!’).

Visual stimuli

Visual stimuli consisted of digitized photographs presented on a gray background. Two different picture tokens were used for each target word. Pictures were presented in four fixed pairs (el perro/el bebé; el carro/el globo; el zapato/el plátano; la pelota/la galleta). The pictures in each pair were matched for brightness and visual salience. Each object served as target on two trials and as distracter on two trials. Side of presentation of target picture was counterbalanced across trials. Trials were presented in one of four pseudo-random orders, counterbalanced across participants.


The looking-while-listening procedure was conducted in a 10′×12′ room containing a three-sided testing booth, with two adjacent computer monitors mounted in the front panel at the child's eye level. During testing, the infant sat on the parent's lap approximately 60 cm from the monitors. The parent wore opaque sunglasses to block their view of the images. Auditory stimuli were presented through a loudspeaker concealed below the monitors. The child's face was recorded by a video camera connected to the computer controlling the experiment, located behind the test booth.


Upon arrival, two Spanish-speaking research assistants greeted the family in the playroom. One research assistant talked with the parent, obtained informed consent, collected the Inventario, and updated background information. The second research assistant interacted with the participant child and any siblings. When child and parent were comfortable, they were escorted to the testing room and seated in the booth. An experimenter behind the booth spoke briefly over the loudspeaker to acquaint the child with the sound source. When the child was attentive, the experimental session began. On each trial, the two pictures were shown in silence for 2 s before the onset of the stimulus sentence. The pictures remained visible for 1 s after the offset of the speech, for a total trial duration of 6–8 s. The screens were blank for the 1 s interval between trials. The session lasted approximately 4 minutes.

Coding eye movements

Sessions were videotaped with a digital time-code accurate to a single frame (33 ms resolution). Highly trained observers, blind to stimuli and trial types, coded each trial frame-by-frame, indicating at each time point whether the child was looking left or right, between the two images or away from both. The time course of eye movements was coordinated with information in the speech waveform, such as the acoustic onset of the target noun. Trials on which the child's gaze was away from both pictures at the onset of the target noun or for more than 20% of the entire trial length were excluded from the analyses. Fixation times to each image and shifts in gaze between images were also calculated using custom software. Two observers conducted reliability checks by independently coding four trials for 25% of the participants. The reliability analysis focused on trials with at least two shifts in gaze, where the potential for disagreement among coders was highest. The proportion of frames on which observers agreed within a single frame was 94%.

Calculating accuracy and reaction time

Since children do not know in advance which picture will be named, at trial onset they will by chance be looking about half the time at the distracter picture (distracter-initial trials) and half the time at the target picture (target-initial trials). Correct looking is a function of the child's tendency to shift quickly away from the distracter to the target picture on distracter-initial trials in response to the target word, and also to stay fixating the target picture on target-initial trials. To determine the degree to which participants fixated the appropriate picture across trials, mean proportion looking to target was calculated for each participant at each 33 ms frame from the onset of the target noun. Accuracy was defined as the mean proportion of time spent looking at the target picture out of the total time spent on either the target or distracter picture from 367 to 1800 ms from target noun onset. reaction time (RT) corresponds to the latency to shift away from the distracter to the target picture on distracter-initial trials, measured from the acoustic onset of the target word. Responses prior to 367 ms from noun onset were excluded because they presumably occurred before the child had time to process sufficient acoustic input and to mobilize an eye movement; responses slower than 1800 ms were excluded because these delayed looks are less likely to reflect a response to the target word (see Fernald, Swingley & Pinto, 2001). Note that RT can be calculated only on those trials on which the child happens to be looking at the distracter picture at the onset of the noun and shifts correctly to the target picture within the designated time window. Since children vary in the likelihood that they will by chance start out on the distracter on a given trial, mean RTs are based on different numbers of trials across participants (M=6.3 trials, range=2–13). About 27% of all distracter-initial trials were excluded from the RT analysis, either because the child never shifted to the correct picture or because the shift occurred outside the 367–1800 ms window. Only those children with at least two RTs within the appropriate window (n=44) were included in analyses of mean RT.



Figure 1 gives an overview of the time course of correct orienting to the referent in response to the spoken target word. The three curves show changes in the mean proportion of trials on which Spanish-learning children in each age group fixated the correct referent at every 33 ms interval as the target word unfolded, with error bars representing SE of the mean computed over participants. Before hearing the target word, participants at all ages started out fixating target and distracter pictures with equal likelihood. Children in the oldest group began to increase their looking to the target picture immediately after the offset of the noun. Children in the middle group remained at chance for several hundred milliseconds after the offset of the noun, and the youngest children showed only a slight increase in looking to the target over the trial. Differences in asymptote reflect the higher levels of accuracy achieved by older children.

Fig. 1
The accuracy of children's looking to target picture as a function of age group (1;6, 2;0, and 2;6). Curves show changes over time in the mean proportion looking to the correct picture, measured in ms from noun onset; error bars represent SEs. Solid vertical ...

Mean accuracy scores, computed over the 367–1800 ms window from noun onset, were examined as a function of age. Accuracy was positively correlated with age (r(49)=0.63, p<0.0001), indicating that older Spanish-learning children were significantly more reliable than younger children in fixating the target picture. A comparison of accuracy scores in the three age groups in a one-way between-subjects ANOVA revealed a significant main effect of age (F(2, 46)=14.3, p<0.0001, ηp2=0.38). Contrasts indicated that children in the oldest age group looked significantly more at the target (M=0.69, S.D.=0.11) than children in both the middle (M=0.55, s.d.=0.12) and youngest (M=0.49, S.D.=0.10) groups (all p<0.01). The difference in looking between the children in the middle and youngest groups was not significant (p=0.11).

Reaction time

Mean RTs were significantly negatively correlated with age (r(43)=−0.45, p<0.002), indicating that older Spanish-speaking children were faster to shift to the target picture than younger ones. Figure 2 presents mean RTs for the three age groups. A one-way between-subjects ANOVA indicated a significant main effect of age (F(2, 40)=5.3, p<0.009, ηp2=0.21). Contrasts showed that children in the oldest age group were significantly faster to shift from the distracter to the target picture (M=841.8 ms, s.d.=207.5) than children in the youngest (M=1084.9 ms, s.d.=188.7) age group (p<0.05). No other group differences were statistically reliable.

Fig. 2
Mean reaction time (in ms) to initiate a shift in gaze from the distracter to the target picture as a function of age group (1;6, 2;0, 2;6); error bars represent SEs. The graph is aligned with an amplitude waveform of one of the stimulus sentences.

Relations between speech processing measures and vocabulary size

Not surprisingly, age and vocabulary size were strongly intercorrelated in this sample (r(49)=0.82, p<0.0001). Multiple regression analyses indicated that together these factors accounted for approximately 40.3% of the variance in accuracy (F(2, 46)=15.5, p<0.0001). Although vocabulary did not contribute significant variance after age was taken into account (r2-change: <1%, ns), age contributed approximately 12% additional variance beyond vocabulary (p<0.004). Thus, the majority of the variation in accuracy that was accounted for by age and vocabulary size was attributable to the shared variance between these two factors, yet some sources of individual differences in accuracy were attributable to age above and beyond vocabulary. Taken together, these results indicate that children learning Spanish as a first language were more accurate in identifying the referents of familiar words as they got older and developed a larger expressive vocabulary.

Multiple regression analyses also indicated that age and vocabulary together accounted for approximately 22% of the variance in RT. However, in contrast to the accuracy measure, neither age nor vocabulary contributed significant unique variance (r2-change: <3%, ns) on the RT measure. Thus, all of the variation in RT accounted for by age and vocabulary was attributable to the shared variance between these two factors. In sum, consistent with previous research with children learning English, speed of orienting in children learning Spanish improves as children get older and learn more vocabulary words across the second and third years.

Relation of maternal education to development in speech processing efficiency

Occupation, income and education have all been used as indices of SES level in previous studies. However, maternal education was adopted here as the proxy for SES for two reasons. First, maternal education is generally highly correlated with other indices of SES and it is the single most predictive component of SES for developmental outcomes (e.g. Noble, Norman & Farah, 2005). Second, because information about maternal education is more easily obtained than other indices of SES and may be less subject to reporting bias, it has traditionally been employed as the primary measure of SES in studies investigating language outcomes (e.g. Jackson-Maldonado et al., 2003).

Although almost all of the mothers of children in this study had less than a high-school education, there was still a range of educational levels represented in the sample. Maternal education level was examined in relation to both accuracy and RT in spoken word recognition. Both measures were moderately but significantly correlated with mother's years of education (accuracy: r(49)=0.32, p<0.03; RT: r(43)=−0.32, p<0.04). To evaluate the unique contribution of maternal education to speech processing efficiency, we conducted multiple regression analyses examining the effect of maternal education independent of age and vocabulary size. Years of maternal education added a significant r2-change of 9.0% to accuracy, over and above both age and vocabulary size (p<0.01). Similarly, maternal education accounted for 8.9% additional variance in RT, after age and vocabulary size were taken into account (p<0.03). Thus, although the impact of maternal education on speech processing efficiency was relatively small, the observed effects were not reducible to the well-established relation between maternal education and vocabulary size.


The first major finding in this research is that Spanish-learning children demonstrated age-related improvement in the efficiency with which they processed spoken language, as observed in previous research with children learning English (e.g. Fernald et al., 1998; Zangl & Fernald, in press; Zangl et al., 2005). All target words were familiar to children in this age range, yet older children more quickly and accurately identified the correct referent than younger children. Thus, like children learning English, these young Spanish-language learners showed significant developmental gains in speech processing abilities over the second and third years of life.

The second major finding was also consistent with previous findings, namely that by the end of the second year, children's efficiency in spoken language processing was significantly associated with their vocabulary size. Several recent studies have found that English-learning two-year-olds who were lexically more advanced were also faster and more accurate in spoken word recognition, even after controlling for age (Fernald et al., 2001, 2006; Zangl et al., 2005). Here we found that Spanish-learning children who were lexically more advanced were also faster and more accurate in speech processing than those who were lexically less advanced. However, the factors of age and vocabulary size were highly intercorrelated in this sample and the majority of the associations between vocabulary and efficiency of spoken language processing were attributable to variance that was shared between these two factors. Nevertheless, these results with children learning Spanish were consistent with previous studies with English-learning children that demonstrate relations between efficiency in online language comprehension and other concurrent measures of linguistic achievement.

Thus, as these Spanish-learning children got older and developed a larger working vocabulary, they also became more efficient at processing words during real-time spoken language understanding. However, the nature and direction of this relation is far from clear. Do initial differences in processing speed make it easier for some children to learn words more quickly? Recent studies showing that individual differences in speech processing abilities in the first year of life are correlated with vocabulary growth in the second year lend some support to this hypothesis (Newman, Bernstein Ratner, Jusczyk, Jusczyk & Dow, 2006; Tsao, Liu & Kuhl, 2004). It is also likely that having a larger vocabulary facilitates greater efficiency in processing familiar words. Given the multitude of environmental factors known to influence lexical development, it seems likely that children become faster and more accurate as a result of their more extensive experience in interpreting speech. For example, if children whose parents expose them to more complex language input begin to talk sooner, these lexically more advanced children could develop faster processing speed through increased experience both in hearing and using speech. This could then give them an advantage in identifying known words and in learning new ones, so that by the end of the second year greater speech processing efficiency is associated with more rapid vocabulary growth (Fernald et al., 2006). It might also be the case that larger vocabulary size is associated with more efficient word-recognition skills because lexical growth has led to changes in the way that lexical forms are represented. For example, Walley (1993) has proposed that increases in vocabulary size prompt more efficient phonological encoding of lexical forms, required to reduce confusion among the increasing number of lexical entries. Thus, children with larger vocabularies may be faster and more efficient processors of spoken language because lexical growth itself has contributed to a shift to more segmentally-based lexical representations. Because the findings from the current study cannot distinguish between these explanations, these questions remain topics for future studies specifically designed to tease apart these possibilities.

A third major finding emerging from this study is that maternal education was also positively correlated with the efficiency of children's spoken language understanding. That is, children whose mothers had more education were faster and more accurate at identifying the correct referent than children of similar ages and vocabulary levels whose mothers had less education. In addition to the age- and vocabulary-related changes observed in previous studies of English-language learners using this experimental paradigm, children's efficiency in spoken language processing was also uniquely associated with factors that co-varied with SES, indexed here by maternal education. Thus, performance in the looking-while-listening procedure appears to have tapped into differences related to SES that are not completely overlapping with offline measures of language competence. These results supplement the large body of literature using offline methods and parental reports that has documented lower language outcomes for children from disadvantaged backgrounds (e.g. Hart & Risley, 1995; Arriaga, Fenson, Cronan & Pethick, 1998; Dollaghan et al., 1999; Hoff, 2003).

Effects of SES on performance in this online processing task could derive from several sources. First, it is possible that children of mothers with more years of formal education were more familiar with the context of a testing procedure in which children's attention was directed to a series of different objects. Mothers with more education may engage in such ostensive labeling routines with their children (i.e. the ‘naming game’) more often than do mothers with less education, and thus their children may simply have had more practice in responding effectively in a task of this sort (see Eisenberg, 2002). Alternatively, the impact of maternal education on children's success in spoken word recognition may be much broader, rather than specific to the task demands of the experimental procedure used here, and thus may be relevant to speech processing proficiency in the real world. In this case, the path of effect for children of mothers with lower education may lie in the child's general language learning experiences. We know that the quantity and quality of daily social interactions vary in families with different educational backgrounds (Hoff, 2003), and that differences in early language experience have long-term consequences for language learning (Hart & Risley, 1995). A child who has the opportunity to participate more often and more effectively in language-related activities in the home would have more practice in processing language in real time, and this experience could contribute to the development of greater efficiency in spoken language understanding.

Of course, it is also possible that maternal education is a proxy for a host of factors that influence cognitive and linguistic growth but have little to do with the child's experience with language-related activities, such as nutrition, health care and other environmental influences critical to early development. However, it has become increasingly clear that maternal education (as an index of SES) affects cognitive and linguistic development quite specifically through maternal talk (e.g. Hoff, 2003). While other cooccurring factors may also play a role in the developmental changes in spoken language understanding seen here, it is likely that these effects have some grounding in the specific day-to-day activities of children's lives. In our ongoing research with Latino families, we are examining relations between characteristics of maternal talk and the emergence of efficiency in spoken language understanding by children learning Spanish as their first language.

In addition to the relation observed in this study between maternal education and children's early speech processing efficiency, an indirect comparison of the present results with those of previous studies provides another perspective on the possible impact of SES. Although we found common developmental patterns across studies in Spanish- and English-learning children, there were also noteworthy differences. In particular, even though older children in the present sample responded relatively quickly and achieved accuracy scores in the 70% range, these children were generally slower and less accurate overall than has been observed in prior studies, especially in younger learners. For example, using the same experimental paradigm in the same lab, Fernald et al. (1998) reported that the mean accuracy score of English-learning two-year-olds was 77%, with a mean RT of 680 ms, results also replicated in a larger longitudinal sample of English-learning children (Fernald et al., 2006). By comparison, the mean accuracy for Spanish-learning children in the same age range in the present study was 55%, with a mean RT of 960 ms. It is also noteworthy that the Spanish-learning children observed here had smaller vocabularies on average than English-learning children at the same age, according to maternal reports. Although they were near the 50th percentile in vocabulary size relative to Inventario norms for Mexican children, the Latino children in this study produced fewer words than did the English-learning children from high- to mid-level SES populations observed in earlier research.

How do we account for the apparent discrepancies in speech processing measures between the Spanish-learning children in this study and English-learning children in previous studies using the same experimental paradigm? One possibility is that linguistic features of Spanish make it inherently harder for children to process sentences in Spanish than in English. However, while such factors might be influential in interpreting more complex sentences, the Spanish stimuli used here were very simple exemplars of child-directed speech presented so as to maximize comparability with the English stimuli. In particular, the names of the target and distracter objects on each trial were matched for grammatical gender (e.g. el plátano/el zapato), so that children could not use the gender-marked article as a cue to identifying the referent before the noun was spoken. Indeed, if mixed-gender trials had been included (e.g. el plátano/la galleta), older children would likely have responded more rapidly than on same-gender trials (Lew-Williams & Fernald, in press). But it cannot be argued that including only same-gender trials put Spanish learners at a disadvantage relative to English learners, since information regarding the identity of the appropriate referent was available at noun onset for both groups, i.e. at exactly the same point in the sentence as in the stimuli used in previous studies with English learners.

Another possibility is that these children were generally slower and less efficient in processing speech than children observed in our earlier studies because the language that they were learning was different from the language of the country in which they were tested. That is, these children may have had more difficulty in the looking-while-listening task in Spanish because of some type of interference from the majority language, i.e. English. However, recall that none of these children had regular exposure to English, their parents were native speakers of Spanish with very low proficiency in English and most lived in primarily Spanish-speaking communities. Moreover, all contact with the parents and children was conducted by native Spanish speakers completely in Spanish, so that the language of the testing situation was consistent with the language of the home. Thus, every effort was made to reduce the possible impact of English exposure, and we suspect that this factor had a relatively minimal effect on our findings. Of course, data documenting the development of processing efficiency in Spanish-learning children living in Mexico would be needed to rule out the possibility that exposure to English influenced the performance of these Latino children learning Spanish in the USA.

A much more powerful determinant of children's performance in this study was presumably the set of factors associated with SES. This research was neither intended nor designed to provide a direct comparison between language groups, and any attempt to make an indirect comparison between the Spanish-learning children observed here and the English-learning children observed in earlier studies must take into account that language group is completely confounded with SES level. The parents of the English-learning children in the Fernald et al. (1998, 2006) studies were almost all in the top 10% of the US population in terms of both education and income level, while parents of the Spanish-learning children in the present study were in the bottom 20% on both measures (2000, US Census Bureau). Given the well-established relations between SES and language outcomes (e.g. Hart & Risley, 1995), it is likely that the somewhat depressed performance of the Latino children seen here on both online and offline measures of language is attributable to factors associated with demographic features of the sample. Indeed, SES-background comparisons within the participants of the current study revealed that those children with mothers who had higher levels of education tended to be faster and more accurate in spoken word recognition than children of comparable age and vocabulary size whose mothers had less formal schooling. It would be fruitful for future cross-linguistic studies to examine the development of speech processing abilities in populations that avoided confounds with SES, and hence enabled more direct and appropriate comparisons between language groups.

In conclusion, three major findings emerged from this research. First, Spanish-learning children became more adept in interpreting spoken language over the second and third years of life, not only because they had learned to identify more words, but also because they had become more efficient in recognizing the same words learned months earlier. Like the English-learning children observed in previous studies, older Latino children learning Spanish as their first language were significantly faster and more accurate in identifying the named referent than were younger learners. Second, these developmental increases were linked to reported vocabulary size, suggesting that the efficiency of processing spoken language in real time is associated with processes that also guide the child's development of a working productive vocabulary. The third finding is that despite these common patterns of improvement in speech processing related to age and vocabulary learning, the SES background of the participants, as operationalized by maternal education level, had a significant impact on their online spoken word recognition. Children from more disadvantaged backgrounds were slower and less accurate than children from higher-SES families within the same population. Moreover, the impact of SES was also observable when looking at the performance of these Spanish-language learners in relation to that of English-language learners in earlier studies using similar measures. There are multiple factors that could account for this difference. However, given the enormous disparities between these two groups in demographic characteristics such as family income and education level, this pattern of results is most consistent with the substantial literature documenting slower rates of language learning in children from disadvantaged backgrounds. This study is the first to show that speech processing efficiency is also potentially compromised in low-SES children, in addition to vocabulary growth. These results provide the first look at spoken language understanding in young children learning Spanish, and add to the growing literature exploring the impact of SES factors on early language development.


[*]We are grateful to the many children and parents who participated in this research, to Drs Fernando Mendoza, Deanne Perez-Granados and Guadalupe Valdes, and to the staff of the Ravenswood Clinic, the East Palo Alto Library, East Palo Alto Head Start and Family Connections of San Mateo County. Special thanks to Dr Renate Zangl and to Guadalupe Makasyuk for their invaluable contributions on many levels, as well as to Ana Luz Portillo, Kirsten Thorpe, Rebecca Wedel, Casey Williams, Sara Hernandez, Daisy Rios, Veronica Trejo, Natalie Rios, Monica Prieto, Irene Guerra and the staff of the Center for Infant Studies at Stanford University. This work was supported by a grant from the National Institutes of Health to Anne Fernald (HD 42235) with a Postdoctoral Research Supplement for Underrepresented Minorities to Nereyda Hurtado.


  • Arriaga RI, Fenson L, Cronan T, Pethick SJ. Scores on the MacArthur Communicative Inventory of children from low- and middle-income families. Applied Psycholinguistics. 1998;19:209–23.
  • Baillargeon R. How do infants learn about the physical world? Current Directions in Psychological Science. 1994;3:133–40.
  • Bornstein MH, Cote LR. Expressive vocabulary in language learners from two ecological settings in three language communities. Infancy. 2005;7:299–316.
  • Bornstein MH, Cote LR, Maital S, Painter K, Park S, Pascual L, Pecheux M-G, Ruel J, Venuti P, Vyt A. Cross-linguistic analyses of vocabulary in toddlers : Spanish, Dutch, French, Hebrew, Italian, and English. Child Development. 2004;75:1115–39. [PubMed]
  • Bosch L, Sebastián-Gallés N. Native-language recognition abilities in four-month-old infants from monolingual and bilingual environments. Cognition. 1997;65:33–69. [PubMed]
  • Brindis CD, Driscoll AK, Biggs MA, Valderrama LT. Fact sheet on Latino youth: Income & poverty. University of California, San Francisco, Center for Reproductive Health Research and Policy, Department of Obstetrics, Gynecology and Reproductive Health Sciences and the Institute for Health Policy Studies; San Francisco, CA: 2002.
  • Caselli MC, Bates E, Casadio P, Fenson J, Fenson L, Sanderl L, Weir J. A cross-linguistic study of early lexical development. Cognitive Development. 1995;10:159–99.
  • Clancy PM. The acquisition of communicative style in Japanese. In: Schieffelin BB, Ochs E, editors. Language socialization across cultures. Cambridge University Press; Cambridge: 1986. pp. 213–50.
  • Collins R, Ribeiro R. Toward an early care and education agenda for Hispanic children. Early Childhood Research and Practice. 2004.
  • Dahan D, Swingley D, Tanenhaus M, Magnuson JS. Linguistic gender and spoken-word recognition in French. Journal of Memory & Language. 2000;42:465–80.
  • Dollaghan C, Campbell TF, Paradise JK, Feldman HM, Janosky JE, Pitcairn DN, Kurs-Lasky M. Maternal education and measures of early speech and language. Journal of Speech, Language, and Hearing Research. 1999;42:1432–43. [PubMed]
  • Eisenberg A. Maternal teaching talk within families of Mexican descent : influences of task and socioeconomic status. Hispanic Journal of Behavioral Sciences. 2002;24:206–24.
  • Fenson L, Marchman VA, Thal D, Dale PS, Reznick JS, Bates E. MacArthur-Bates Communicative Development Inventories : User's Guide and Technical Manual. 2nd ed. Brookes Publishing Co.; Baltimore, MD: 2007.
  • Fernald A. Four-month-old infants prefer to listen to motherese. Infant Behavior & Development. 1985;8:181–95.
  • Fernald A, Hurtado N. Names in frames : infants interpret words in sentence frames faster than words in isolation. Developmental Science. 2006;9:F33–40. [PMC free article] [PubMed]
  • Fernald A, Mazzie C. Prosody and focus in speech to infants and adults. Developmental Psychology. 1991;27:209–21.
  • Fernald A, McRoberts GW, Swingley D. Infants' developing competence in understanding and recognizing words in fluent speech. In: Weissenborn J, Höhle B, editors. Approaches to bootstrapping in early language acquisition. John Benjamins; Amsterdam: 2001. pp. 97–123.
  • Fernald A, Morikawa H. Common themes and cultural variations in Japanese and American mothers' speech to infants. Child Development. 1993;64:637–56. [PubMed]
  • Fernald A, Perfors A, Marchman VA. Picking up speed in understanding: speech processing efficiency and vocabulary growth across the second year. Developmental Psychology. 2006;42:98–116. [PMC free article] [PubMed]
  • Fernald A, Pinto JP, Swingley D, Weinberg A, McRoberts GW. Rapid gains in speed of verbal processing by infants in the 2nd year. Psychological Science. 1998;9:228–31.
  • Fernald A, Swingley D, Pinto JP. When half a word is enough: infants can recognize spoken words using partial phonetic information. Child Development. 2001;72:1003–15. [PubMed]
  • Golinkoff RM, Hirsh-Pasek K, Cauley KM, Gordon L. The eyes have it : lexical and syntactic comprehension in a new paradigm. Journal of Child Language. 1987;14:23–45. [PubMed]
  • Hart B, Risley TR. Meaningful differences in the everyday experience of young American children. Brookes Publishing Co.; Baltimore, MD: 1995.
  • Hoff E. The specificity of environmental influence: socioeconomic status affects early vocabulary development via maternal speech. Child Development. 2003;74:1368–78. [PubMed]
  • Hoff-Ginsberg E. The relation of birth order and socioeconomic status to children's language experience and language development. Applied Psycholinguistics. 1998;19:603–29.
  • Jackson-Maldonado D, Thal DJ, Marchman VA, Newton T, Fenson L, Conboy B. MacArthur-Bates Inventarios del Desarrollo de Habilidades Comunicativas : User's guide and technical manual. Brookes Publishing Co.; Baltimore, MD: 2003.
  • Jusczyk PW. Finding and remembering words : some beginnings by English-learning infants. Current Directions in Psychological Science. 1997;6:170–4.
  • Kuhl PK, Williams K, Lacerda F, Stevens K, Lindblom B. Linguistic experience alters phonetic perception in infants by 6 months of age. Science. 1992;255(5044):606–8. [PubMed]
  • Laosa LM. Maternal teaching strategies in Chicano and Anglo-American families : the influence of culture and education on maternal behavior. Child Development. 1980;51:759–65.
  • Lew-Williams C, Fernald A. Young children learning Spanish make rapid use of grammatical gender in spoken word recognition. Psychological Science. in press. [PMC free article] [PubMed]
  • Newman R, Bernstein Ratner N, Jusczyk AM, Jusczyk PW, Dow KA. Infants' early ability to segment the conversational speech signal predicts later language development: a retrospective analysis. Developmental Psychology. 2006;42:643–55. [PubMed]
  • Noble KG, Norman MF, Farah MJ. Neurocognitive correlates of socioeconomic status in kindergarten children. Developmental Science. 2005;8:74–87. [PubMed]
  • Pan BA, Rowe ML, Singer JD, Snow C. Maternal correlates of growth in toddler vocabulary production in low-income families. Child Development. 2005;76:763–82. [PubMed]
  • Pearson BZ, Fernández SC, Oller DK. Lexical development in bilingual infants and toddlers : comparison to monolingual norms. Language Learning. 1993;43:93–120.
  • Snedeker J, Trueswell J. The developing constraints on parsing decisions: the role of lexical-biases and referential scenes in child and adult sentence processing. Cognitive Psychology. 2004;49:238–99. [PubMed]
  • Swingley D, Aslin D. Spoken word recognition and lexical representation in very young children. Cognition. 2000;76:147–66. [PubMed]
  • Tardif T, Gelman S, Xu F. Putting the ‘noun-bias’ in context : a comparison of English and Mandarin. Child Development. 1999;70:620–35.
  • Thomas DG, Campos JJ, Shucard DW, Ramsay DS, Shucard J. Semantic comprehension in infancy : a signal detection analysis. Child Development. 1981;52:798–803. [PubMed]
  • Tsao F, Liu HM, Kuhl PK. Speech perception in infancy predicts language development in the second year of life. Child Development. 2004;75:1067–84. [PubMed]
  • US Census Bureau 2000.
  • Walley A. The role of vocabulary development in children's spoken word recognition and segmentation ability. Developmental Review. Special Issue : Phonological processes and learning disability. 1993;13:286–350.
  • Weizman ZO, Snow C. Lexical input as related to children's vocabulary acquisition : effects of sophisticated exposure and support for meaning. Developmental Psychology. 2001;37:265–79. [PubMed]
  • Werker JF. Becoming a native listener : a developmental perspective on human speech perception. American Scientist. 1989;77:54–9.
  • Zangl R, Fernald A. Increasing flexibility in children's online processing of grammatical and nonce determiners in fluent speech. Language Learning and Development. in press. [PMC free article] [PubMed]
  • Zangl R, Klarman L, Thal DJ, Fernald A, Bates E. Dynamics of word comprehension in infancy : development in timing, accuracy, and resistance to acoustic degradation. Journal of Cognition and Development. 2005;6:179–08. [PMC free article] [PubMed]