|Home | About | Journals | Submit | Contact Us | Français|
How do infants find the words in the tangle of speech that confronts them? The present study shows that by as early as 6 months of age, infants can already exploit highly familiar words—including, but not limited to, their own names—to segment and recognize adjoining, previously unfamiliar words from fluent speech. The head-turn preference procedure was used to familiarize babies with short passages in which a novel word was preceded by a familiar or a novel name. At test, babies recognized the word that followed the familiar name, but not the word that followed the novel name. This is the youngest age at which infants have been shown capable of segmenting fluent speech. Young infants have a powerful aid available to them for cracking the speech code. Their emerging familiarity with particular words, such as their own and other people’s names, can provide initial anchors in the speech stream.
Imagine listening to people speak a foreign language. They appear to be talking rapidly, and it is unclear where sentences—let alone words—begin and end. The problem of segmenting fluent speech is a great challenge, given that the speech signal does not typically contain breaks at word edges; worse, when breaks do occur, they often do not coincide with perceived word boundaries (Jones, 1918; Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967). This article explores how infants use familiar names to help them segment the speech stream into wordlike units.
Adults, who already know many words, may segment speech in a top-down fashion, using stored knowledge of the phonological forms of familiar words to match portions of the speech stream and forecast locations of word boundaries (Cole & Jakimik, 1980; Marslen Wilson & Welsh, 1978; McClelland & Elman, 1986; Norris, 1994). But infants just learning language lack word knowledge, so research has instead focused on how they might segment speech from the bottom up, locating word boundaries by using an array of cues such as word stress (Jusczyk, Houston, & Newsome, 1999), allophonic variants of speech sounds (Jusczyk, Hohne, & Bauman, 1999), and sequences of sounds or patterns of transitional probabilities (Friederici & Wessels, 1993; Goodsitt, Morgan, & Kuhl, 1993; Jusczyk, Luce, & Charles-Luce, 1994; Mattys & Jusczyk, 2001; Mattys, Jusczyk, Luce, & Morgan, 1999; Saffran, Aslin, & Newport, 1996).
In the latter half of the first year of life, infants make considerable progress in their ability to detect and exploit such cues. By 7.5 months, infants can use the predominant strong-weak pattern of stress in English to segment words (Jusczyk, Houston, & Newsome, 1999). By 8 months, infants can exploit patterns of transitional probabilities to identify words (Saffran et al., 1996) and can also use co-articulation of juxtaposed sounds to locate word boundaries (Johnson & Jusczyk, 2001). By 9 months, infants can exploit knowledge of sound sequences that are permissible in their language and likely to occur within words (Friederici & Wessels, 1993). At 10.5 months, English-learning infants can also segment words that exemplify the less common weak-strong stress pattern (Jusczyk, Houston, & Newsome, 1999). Nevertheless, reliance on bottom-up cues for segmentation is suboptimal, because such cues are often unreliable, ambiguous, or altogether missing (Cole & Jakimik, 1980; Davis, Marslen Wilson, & Gaskell, 2002). Moreover, segmentation from the bottom up frequently requires looking ahead to ascertain properties of the initial sounds or syllables of following words, slowing segmentation decisions.
Computational modeling of infant word segmentation (Brent, 1999; Dahan & Brent, 1999; Venkataraman, 2001) has underscored the potential superiority of segmentation based on lexical knowledge. Top-down segmentation of corpora is both more accurate and more complete than segmentation using one or more bottom-up cues. However, proposed top-down models have certain weaknesses as well. For example, they predict that performance should rise rapidly to asymptotic levels, which is not consistent with developmental observations. This prediction is due in part to a simplifying assumption incorporated in these models, namely, that words occur in invariant form, so that word identification is trivial. Given this assumption, top-down segmentation should be very broadly based, but empirical efforts to demonstrate top-down segmentation in infants have so far been unsuccessful (e.g., Hollich, Jusczyk, & Brent, 2001).
In fact, words do not occur in invariant forms (Pollack & Pickett, 1964). For infants, who are uncertain about the types and degrees of variation that signal differences between words, word identification is far from trivial. Thus, rather than being broadly based, top-down segmentation in early infancy may be confined to just those few words that are readily identifiable, yet there are not many of these. From birth, there are certain words that babies encounter repeatedly, such as their own names, as well as appellations for parents, such as Mommy. Indeed, infants begin to recognize the sound patterns of their own names as early as 4.5 months (Mandel, Jusczyk, & Pisoni, 1995), and by 6 months, infants may be able to pick their names out of running speech (Mandel et al., 1995; Mandel-Emer, 1997). Could knowing the sound patterns of their own names help infants segment adjoining words from the stream of speech? Perhaps, like guests at a proverbial cocktail party (Cherry, 1953; Moray, 1959; Wood & Cowan, 1995), infants are riveted by the familiar sound pattern of their own name, allowing them to detect words that begin immediately following that name. If infants as young as 6 months of age can recognize their own names in running speech, perhaps they can also use their names to isolate and segment novel words that follow. If so, this would provide them with an important tool for speech segmentation.
Jusczyk and Aslin (1995) familiarized 6- and 7.5-month-old infants with words and tested their preference for passages containing familiarized versus nonfamiliarized words. The older infants preferred passages with familiarized words, showing that they could segment fluent speech. The younger infants did not. In this study, we investigated whether 6-month-old infants can extract unfamiliar words from fluent speech when those words occur adjacent to the infants’ own names. If they can, infants should recognize words that follow their own name during familiarization and not words that follow another, unfamiliar name.
Participants were twenty-four 6-month-old infants (average age = 191 days, range = 167 to 206) from an American English language environment. Four additional infants were tested but were not included in the final sample because of fussiness (n = 3) or sleepiness (n = 1).
We used a head-turn preference procedure (Jusczyk & Aslin, 1995; Kemler-Nelson et al., 1995) to familiarize infants with two passages and test them on four individual words. In one of the familiarization passages, each of the six sentences contained the infant’s own first name followed by the same novel word (the familiar-name target). In the second passage, all of the sentences contained another name, followed by a second novel word (the alternate-name target). Example passages are shown in Table 1. To control for differences in the salience of particular names, in addition to possible acoustic differences in the production of the paired words, we yoked pairs of infants together so that the alternate-name passage for one infant was the familiar-name passage for another, and vice versa. Yoked infants had names with the same number and stress pattern of syllables. Thus, Maggie and Hannah were both familiarized with passages about Maggie’s bike and Hannah’s cup, whereas Sam and John were both familiarized with passages about Sam’s feet and John’s dog; all 4 infants were tested with bike, cup, feet, and dog. The experimental question was whether the infants would subsequently show superior recognition of the familiar-name target, even though they had received equal amounts of familiarization with the two target words.
Each infant was familiarized with the targets while seated on his or her parent’s lap in a three-sided booth. A colored light was mounted at the infant’s eye level on each wall of the booth. The infant’s gaze was monitored remotely with a video camera mounted behind the center wall. Speech stimuli were played at a conversational level (75 dB). Over active noise-cancellation headphones, parents listened to music that masked the experimental stimuli. The experimenter, using custom software, initiated trials when the infant gazed at the central light. Trial onset extinguished that light, and one of the side lights began to blink. When the experimenter judged that the infant had turned toward the blinking light, the speech stimuli began playing through a loudspeaker on that side. The stimuli continued to play as long as the infant looked to the side, up to a maximum of 30 s. Trials also ended if the infant looked away from the side for 2 s or more. If the infant glanced away but returned to look at the side within 2 s, the trial continued. Cumulative time during which the infant’s gaze was oriented to the side with the blinking light was computed for each trial.
Familiarization passages were recorded by a female talker speaking in a lively, infant-directed manner. During familiarization, side of presentation was selected at random, and each stimulus set was presented on both sides over the course of this phase. The two stimulus sets (passages) were initially presented on alternating trials. Once the infant had reached criterion for one passage, all subsequent trials presented the other passage.
Familiarization was followed immediately by recognition testing, in which stimulus sets comprised multiple tokens of the familiar-name target, the alternate-name target, and two non-familiarized control words. Recognition tokens were recorded by the same female talker who produced the familiarization stimuli. Test trials followed the same procedure as familiarization trials except that in each test trial, the infant heard repetitions of a target or control word. Stimuli were presented through loudspeakers located on either side of the testing booth; the dependent variable was how long infants looked to the side on which the word was being played. Three blocks of 4 trials each were included (12 trials total). Each block included 1 trial per stimulus set. Ordering of trials was randomized for each block. The experimenter was blind to this ordering.
Analysis of results proceeded in two steps.1 First, we asked whether infants preferred the word that had been linked with their own name. Indeed, infants listened significantly longer to the familiar-name target than to the alternate-name target, t(23) = 2.15, p < .05, d = 0.42 (see Fig. 1a). Although preference for the familiar-name target may indicate that the infants recognized that word, it does not indicate whether infants also recognized the alternate-name target. If this were the case, then the advantage of hearing a familiar name might be only a small aid in real-world speech segmentation.
To assess whether the infants recognized both familiarized target words, we compared looking times for each of the target types with looking times for the nonfamiliarized control words. Infants listened significantly longer to the familiar-name target than to the control words, t(23) = 2.4, p < .05, d = 0.28, showing that they had indeed stored some representation of that target word. However, there was no difference in looking times to the alternate-name target versus the control words, t(23) = −0.88, n.s., and thus no evidence that the infants had stored any representation of the alternate-name target word. Six-month-old infants succeeded in segmenting and recognizing a novel word that had been linked with their own name, but not a novel word that had been linked with another name, even though they had heard the two words equally often during familiarization. This is the earliest age at which infants have been shown capable of segmenting words from running speech.
How general is this phenomenon? In adults, the classic cocktail-party phenomenon is limited to one’s own name, though the contribution of top-down processing to speech segmentation undoubtedly involves a much broader range of lexical items. Infants frequently hear names other than their own, such as Mommy and siblings’ names. If infants can use other frequently occurring words to anchor speech segmentation and recognition, then we have discovered a potent language-learning device that allows infants, like adults, to use their knowledge of words for top-down processing of the speech stream.
Other than an infant’s own name, words that are likely to be highly familiar include appellations for parents, siblings, and family pets, as well as names of objects that figure prominently in infants’ routines, such as bottle, pacifier, and diaper. As an initial step to test the generality of the findings of our first experiment, we next asked whether infants could use the moniker used for their mother to segment previously unfamiliar words from fluent speech.
Participants included twenty 6-month-old infants (average age = 188 days, range = 168 to 198 days). Fifteen additional infants were tested but not included because of fussiness or crying (n = 8), sleepiness (n = 2), sibling interference (n = 1), equipment failure (n = 2), or variability in response (i.e., two or more trials with looking times more than 2 SD from the infant’s mean; n = 2).
The infants were familiarized with two passages, each containing six sentences (see Table 2). In one passage, each of the sentences contained the name most often used for the infant’s mother (either Mommy or Mama, selected according to parental report) followed by the same novel word (again designated as the familiar-name target). In the other passage, each sentence contained an alternate name (either Lola or Lolly, respectively, to contrast with the mother’s appellation) followed by a second novel word (the alternate-name target). Acoustic analyses revealed no systematic differences between the target words following the names (see Table 3). Order of presentation of the passages was randomized across trials. Familiarization continued until the infant had reached the criterion of 30 s of exposure to each passage.2 Recognition stimuli consisted of the two target words and two nonfamiliarized control words produced in isolation. As in Experiment 1, these were counterbalanced across pairs of infants.
Infants again displayed a preference for the word that had been paired with the familiar name. They listened significantly longer to the familiar-name target than to the alternate-name target, t(19) = 2.15, p < .05, d = 0.53 (see Fig. 1b). On average, infants listened significantly longer to the familiar-name target than to the nonfamiliarized control words, t(19) = 2.55, p < .01, d = 0.40. As in Experiment 1, there was no difference in looking times to the alternate-name target versus the nonfamiliarized control words, t(19) = −0.76, n.s. Infants segmented, stored, and recognized the word that had been paired with Mommy (or Mama), but failed to recognize the word that had been paired with Lola (or Lolly), even though they had received equal amounts of familiarization with the two words. The results of Experiment 1 thus generalize beyond the infant’s own name—no mere cocktail-party phenomenon—to encompass at least other highly familiar names. Infants can use such names as anchors for segmenting subsequent novel words from the speech stream.
The results of these two experiments are consistent with infants using stored lexical knowledge of familiar words to segment the speech stream in a top-down fashion. However, infants may become so well versed in the sound patterns of familiar words that they are able to exploit specific bottom-up cues associated with those words with particular alacrity.3 For example, consider the word Mommy’s. Because infants have frequently heard the sequence of sounds that constitutes this word, they may have learned that the transitional probabilities between /m/ and /a/, between /a/ and /m/, between /m/ and /i/, and between /i/ and /z/ are relatively high, whereas the transitional probabilities between /z/ and following sounds are relatively low. Such a sequence of probabilities can indicate that /mamiz/ forms a word—Mommy’s—and the sounds that follow belong to other words (Harris, 1955; Hayes & Clark, 1970; Saffran et al., 1996). If infants are simply well practiced in using bottom-up cues associated with familiar words to segment the speech stream, then there may be no need to appeal to top-down knowledge in explaining the effects we observed.
To adjudicate between these two explanations of how babies solved our recognition task in Experiments 1 and 2—top down or bottom up—we manipulated the sound pattern of the familiar name. Suppose that infants are using bottom-up cues, such as transitional probabilities, to identify the offsets of familiar words and, hence, the onsets of following words. In this case, mispronunciations should be most disruptive at the ends of familiar words, and should become progressively less disruptive as they occur closer and closer to the beginnings of the words. Tommy’s has the same sounds and transitions as Mommy’s, except for the initial sound. If infants are using bottom-up cues to segment the speech stream, then they should be able to segment and recognize words following Tommy’s nearly as well as they do words following Mommy’s. In contrast, if they are using top-down knowledge, then a change in initial sound may be enough to block recognition of the familiar word, disrupting use of such knowledge. Tommy and Mommy are different words. If infants are using stored knowledge of familiar words, then Tommy should not provide an effective anchor for speech segmentation.
Participants included twenty 6-month-old infants (average age = 188 days, range = 169 to 210 days). Nine additional infants were tested but not included in the final sample because of fussiness or crying (n = 6), equipment failure (n = 1), or sibling interference (n = 2).
Infants were familiarized with two passages, each containing six sentences (see Table 2). In one passage, each of the sentences contained the name Tommy followed by the same novel word (the “familiar”-name target). In the other passage, each sentence contained the name Lola followed by a second novel word (the alternate-name target). Acoustic analyses revealed no systematic differences between the target words following the two names (see Table 3). Familiarization and test stimuli were counterbalanced and presented as in the preceding two experiments.4
Unlike in Experiments 1 and 2, infants failed to display any preference for one target word over the other. There was no difference in looking times to the word paired with Tommy versus the word paired with Lola, t(19) = −0.54, n.s. (see Fig. 1c). Also unlike in Experiments 1 and 2, there was no indication that infants recognized either of the familiarized targets. There was no difference in looking times to the familiar-name target versus the nonfamiliarized control words, t(19) = −1.35, n.s., nor was there any difference in looking times to the alternate-name target versus the nonfamiliarized control words, t(19) = −0.93, n.s.
To compare Experiments 2 and 3, we computed recognition scores (target-word looking time minus control-word looking time) for each infant and conducted a 2 (target-word type) × 2 (experiment) analysis of variance on these data (see Fig. 2). Underscoring the difference between these two experiments, this analysis yielded a significant interaction, F(1, 38) = 4.54, p < .05, η2 = .42. Thus, although its sound pattern overlaps greatly with that of Mommy, Tommy does not provide infants an entrée into the speech stream.
Our experiments provide evidence that infants as young as 6 months can use knowledge of familiar words to segment input speech in a top-down fashion, akin to that which has been documented in adult speech processing. This is the youngest age at which infants have been shown to segment fluent speech. Nevertheless, infants’ capacities for processing speech are clearly not identical to those of adults. But whereas previous research has suggested that these differences are qualitative, our results indicate that they are likely to be quantitative. When adults recognize a word, for example, they have access to numerous additional facts, including the word’s meaning, grammatical role, and connotations. All of this information is stored associatively in the adult lexicon. Infants’ lexical knowledge is much less rich. However, our findings show that by 6 months, babies have stored knowledge about the phonological forms of some words that they can match against the input speech stream. Infants may even have associated some rudimentary meanings with particular phonological forms, such as the words mommy and daddy (Tincoff & Jusczyk, 1999). However, it is doubtful that infants have progressed sufficiently—lexically or cognitively—to allow them access to information pertaining to syntactic, morphological, pragmatic, or social characteristics of words. Moreover, infants’ phonological representations are not the same as those of adults. In some instances, infants’ representations may encode variations that are irrelevant to adults (such as the voice of the speaker who uttered the word), whereas in other instances, infants’ representations may elide differences that are crucial for distinguishing words (Best, McRoberts, & Sithole, 1988; Kuhl, Williams, Lacerda, Stevens, & Lindblom, 1992; Stager & Werker, 1997; Werker & Tees, 1984). Although we remain neutral regarding the nature of the representations that the infants in our experiments may have formed, we suggest that they were robust enough to influence the infants’ segmentation abilities. Determining how precise and how long lasting these representations are will require further experimentation.
These observations speak to inequalities in the contents of infants’ and adults’ lexicons; none of our results demonstrate differences in the processes that are available at different ages. Previous accounts have suggested that infants may lack the resources for top-down segmentation and must instead rely exclusively on bottom-up cues in the speech stream (Cutler, 1996; Jusczyk, 1997). Our findings contradict this view. As we have noted, top-down segmentation requires matching some form of stored representation of phonological forms against the input—precisely what we have shown infants to be capable of by 6 months. There is no empirical basis for drawing qualitative distinctions between infants’ and adults’ segmentation abilities. In this respect, at least, lexical processing systems in infants and adults appear to share the same architecture. Differences between infants and adults reside in how this architecture is deployed. Adults are highly familiar with large numbers of words and can use lexical knowledge for top-down segmentation in the wide array of instances in which bottom-up cues to word boundaries are absent or ambiguous. Here, we have shown that 6-month-olds can use their own names and the appellations for their mothers for top-down segmentation.
How much more broadly might this ability generalize? Because there are likely to be few words that young infants can recognize quickly enough to use for top-down segmentation, they will initially be more dependent on bottom-up cues to word boundaries in the speech stream than are adults. With increasing exposure to the native language, more words will have greater familiarity. At the same time, infants’ phonological representations become more sophisticated and stable. Both of these factors contribute to observed increases in the efficiency of infants’ speech processing (Fernald, Pinto, Swingley, Weinberg, & McRoberts, 1998). With increased efficiency, more word forms may be recruited for use in top-down segmentation. By putting familiar names to use in segmentation, infants can begin the transition to adultlike speech processing.
This research was supported by a grant from the National Institutes of Health (5 R01 HD32005) to James Morgan. We thank Jonathan Ring, Leher Singh, Jennifer Sootsman, Katherine White, and Eric Wruck for help with this research, and James Sawusch and Daniel Swingley for comments on an earlier version of this manuscript.
1Amount of familiarization and sentence position of the familiarized words were equivalent for the two passages. In each paragraph, familiarized words appeared twice at the beginning, twice in the middle, and twice at the end of a sentence. Infants required 3.67 trials for the familiar-name passage and 3.79 trials for the alternate-name passage, t(23) = −0.51, n.s. This amounted to 37.20 s and 37.23 s of exposure, respectively, t(23) = −0.013, n.s.
2Infants required 3.00 familiarization trials for the familiar-name passage and 3.45 trials for the alternate-name passage, t(19) = −1.83, n.s. This amounted to 36.48 s and 37.34 s of exposure, respectively, t(19) = −0.41, n.s.
3We thank Christophe Pallier for pointing this out.
4Only infants whose parents reported using Mommy were included in this study; infants with family members or pets named Tommy or Lola were excluded. Infants required 2.85 familiarization trials for the familiar-name passage and 3.25 trials for the alternate-name passage, t(19) = −1.45, n.s. This amounted to 36.88 s and 37.81 s of exposure, respectively, t(19) = −0.47, n.s.