Word segmentation, detecting word boundaries in continuous speech, is a critical aspect of language learning. Previous research in infants and adults demonstrated that a stream of speech can be readily segmented based solely on the statistical and speech cues afforded by the input. Using functional magnetic resonance imaging (fMRI), the neural substrate of word segmentation was examined on-line as participants listened to three streams of concatenated syllables, containing either statistical regularities alone, statistical regularities and speech cues, or no cues. Despite the participants’ inability to explicitly detect differences between the speech streams, neural activity differed significantly across conditions, with left-lateralized signal increases in temporal cortices observed only when participants listened to streams containing statistical regularities, particularly the stream containing speech cues. In a second fMRI study, designed to verify that word segmentation had implicitly taken place, participants listened to trisyllabic combinations that occurred with different frequencies in the streams of speech they just heard (“words,” 45 times; “partwords,” 15 times; “nonwords,” once). Reliably greater activity in left inferior and middle frontal gyri was observed when comparing words with partwords and, to a lesser extent, when comparing partwords with nonwords. Activity in these regions, taken to index the implicit detection of word boundaries, was positively correlated with participants’ rapid auditory processing skills. These findings provide a neural signature of on-line word segmentation in the mature brain and an initial model with which to study developmental changes in the neural architecture involved in processing speech cues during language learning.
fMRI; language; speech perception; word segmentation; statistical learning; auditory cortex; inferior frontal gyrus
Very little is known about the neural underpinnings of language learning across the lifespan and how these might be modified by maturational and experiential factors. Building on behavioral research highlighting the importance of early word segmentation (i.e. the detection of word boundaries in continuous speech) for subsequent language learning, here we characterize developmental changes in brain activity as this process occurs online, using data collected in a mixed cross-sectional and longitudinal design. One hundred and fifty-six participants, ranging from age 5 to adulthood, underwent functional magnetic resonance imaging (fMRI) while listening to three novel streams of continuous speech, which contained either strong statistical regularities, strong statistical regularities and speech cues, or weak statistical regularities providing minimal cues to word boundaries. All age groups displayed significant signal increases over time in temporal cortices for the streams with high statistical regularities; however, we observed a significant right-to-left shift in the laterality of these learning-related increases with age. Interestingly, only the 5- to 10-year-old children displayed significant signal increases for the stream with low statistical regularities, suggesting an age-related decrease in sensitivity to more subtle statistical cues. Further, in a sample of 78 10-year-olds, we examined the impact of proficiency in a second language and level of pubertal development on learning-related signal increases, showing that the brain regions involved in language learning are influenced by both experiential and maturational factors.
Statistical learning is a candidate for one of the basic prerequisites underlying the expeditious acquisition of spoken language. Infants from 8 months of age exhibit this form of learning to segment fluent speech into distinct words. To test the statistical learning skills at birth, we recorded event-related brain responses of sleeping neonates while they were listening to a stream of syllables containing statistical cues to word boundaries.
We found evidence that sleeping neonates are able to automatically extract statistical properties of the speech input and thus detect the word boundaries in a continuous stream of syllables containing no morphological cues. Syllable-specific event-related brain responses found in two separate studies demonstrated that the neonatal brain treated the syllables differently according to their position within pseudowords.
These results demonstrate that neonates can efficiently learn transitional probabilities or frequencies of co-occurrence between different syllables, enabling them to detect word boundaries and in this way isolate single words out of fluent natural speech. The ability to adopt statistical structures from speech may play a fundamental role as one of the earliest prerequisites of language acquisition.
Language delay is a hallmark feature of autism spectrum disorders (ASD). The identification of word boundaries in continuous speech is a critical first step in language acquisition that can be accomplished via statistical learning and reliance on speech cues. Importantly, early word segmentation skills have been shown to predict later language development in typically developing (TD) children.
Here we investigated the neural correlates of online word segmentation in children with and without ASD with a well-established behavioral paradigm previously validated for functional magnetic resonance imaging. Eighteen high-functioning boys with ASD and 18 age- and IQ-matched TD boys underwent functional magnetic resonance imaging while listening to two artificial languages (containing statistical or statistical + prosodic cues to word boundaries) and a random speech stream.
Consistent with prior findings, in TD control subjects, activity in fronto-temporal-parietal networks decreased as the number of cues to word boundaries increased. The ASD children, however, did not show this facilitatory effect. Furthermore, statistical contrasts modeling changes in activity over time identified significant learning-related signal increases for both artificial languages in basal ganglia and left temporo-parietal cortex only in TD children. Finally, the level of communicative impairment in ASD children was inversely correlated with signal increases in these same regions during exposure to the artificial languages.
This is the first study to demonstrate significant abnormalities in the neural architecture subserving language-related learning in ASD children and to link the communicative impairments observed in this population to decreased sensitivity to the statistical and speech cues available in the language input.
Autism; implicit learning; language; neuroimaging; speech perception
We examined the influence of bilingual experience and inhibitory control on the ability to learn a novel language. Using a statistical learning paradigm, participants learned words in two novel languages that were based on the International Morse Code. First, participants listened to a continuous stream of words in a Morse code language to test their ability to segment words from continuous speech. Since Morse code does not overlap in form with natural languages, interference from known languages was minimized. Next, participants listened to another Morse code language composed of new words that conflicted with the first Morse code language. Interference in this second language was high due to conflict between languages and due to the presence of two colliding cues (compressed pauses between words and statistical regularities) that competed to define word boundaries. Results suggest that bilingual experience can improve word learning when interference from other languages is low, while inhibitory control ability can improve word learning when interference from other languages is high. We conclude that the ability to extract novel words from continuous speech is a skill that is affected both by linguistic factors, such as bilingual experience, and by cognitive abilities, such as inhibitory control.
language acquisition; statistical learning; bilingualism; inhibitory control; Morse code; Simon task
In order to acquire their native languages, children must learn richly structured systems with regularities at multiple levels. While structure at different levels could be learned serially, e.g., speech segmentation coming before word-object mapping, redundancies across levels make parallel learning more efficient. For instance, a series of syllables is likely to be a word not only because of high transitional probabilities, but also because of a consistently co-occurring object. But additional statistics require additional processing, and thus might not be useful to cognitively constrained learners. We show that the structure of child-directed speech makes simultaneous speech segmentation and word learning tractable for human learners. First, a corpus of child-directed speech was recorded from parents and children engaged in a naturalistic free-play task. Analyses revealed two consistent regularities in the sentence structure of naming events. These regularities were subsequently encoded in an artificial language to which adult participants were exposed in the context of simultaneous statistical speech segmentation and word learning. Either regularity was independently sufficient to support successful learning, but no learning occurred in the absence of both regularities. Thus, the structure of child-directed speech plays an important role in scaffolding speech segmentation and word learning in parallel.
statistical learning; speech segmentation; word learning; child-directed speech; frequent frames
Infants are adept at tracking statistical regularities to identify word boundaries in pause-free speech. However, researchers have questioned the relevance of statistical learning mechanisms to language acquisition, since previous studies have used simplified artificial languages that ignore the variability of real language input. The experiments reported here embraced a key dimension of variability in infant-directed speech. English-learning infants (8–10 months) listened briefly to natural Italian speech that contained either fluent speech only or a combination of fluent speech and single-word utterances. Listening times revealed successful learning of the statistical properties of target words only when words appeared both in fluent speech and in isolation; brief exposure to fluent speech alone was not sufficient to facilitate detection of the words’ statistical properties. This investigation suggests that statistical learning mechanisms actually benefit from variability in utterance length, and provides the first evidence that isolated words and longer utterances act in concert to support infant word segmentation.
A portable real-time speech processor that implements an acoustic simulation model of a cochlear implant (CI) has been developed on the Apple iPhone/iPod Touch to permit testing and experimentation under extended exposure in real-world environments. This simulator allows for both a variable number of noise band channels and electrode insertion depth. Utilizing this portable CI simulator, we tested perceptual learning in normal hearing listeners by measuring word and sentence comprehension behaviorally before and after 2 weeks of exposure. To evaluate changes in neural activation related to adaptation to transformed speech, fMRI was also conducted. Differences in brain activation after training occurred in the inferior frontal gyrus and areas related to language processing. A 15–20% improvement in word and sentence comprehension of cochlear implant simulated speech was also observed. These results demonstrate the effectiveness of a portable CI simulator as a research tool and provide new information about the physiological changes that accompany perceptual learning of degraded auditory input.
cochlear implants; fMRI; portable media players
The left inferior frontal gyrus (LIFG) exhibits increased responsiveness when people listen to words composed of speech sounds that frequently co-occur in the English language (Vaden, Piquado, Hickok, 2011), termed high phonotactic frequency (Vitevitch & Luce, 1998). The current experiment aimed to further characterize the relation of phonotactic frequency to LIFG activity by manipulating word intelligibility in participants of varying age. Thirty six native English speakers, 19–79 years old (mean = 50.5, sd = 21.0) indicated with a button press whether they recognized 120 binaurally presented consonant-vowel-consonant words during a sparse sampling fMRI experiment (TR = 8 sec). Word intelligibility was manipulated by low-pass filtering (cutoff frequencies of 400 Hz, 1000 Hz, 1600 Hz, and 3150 Hz). Group analyses revealed a significant positive correlation between phonotactic frequency and LIFG activity, which was unaffected by age and hearing thresholds. A region of interest analysis revealed that the relation between phonotactic frequency and LIFG activity was significantly strengthened for the most intelligible words (low-pass cutoff at 3150 Hz). These results suggest that the responsiveness of the left inferior frontal cortex to phonotactic frequency reflects the downstream impact of word recognition rather than support of word recognition, at least when there are no speech production demands.
Many studies have shown that listeners can segment words from running speech based on conditional probabilities of syllable transitions, suggesting that this statistical learning could be a foundational component of language learning. However, few studies have shown a direct link between statistical segmentation and word learning. We examined this possible link in adults by following a statistical segmentation exposure phase with an artificial lexicon learning phase. Participants were able to learn all novel object-label pairings, but pairings were learned faster when labels contained high probability (word-like) or non-occurring syllable transitions from the statistical segmentation phase than when they contained low probability (boundary-straddling) syllable transitions. This suggests that, for adults, labels inconsistent with expectations based on statistical learning are harder to learn than consistent or neutral labels. In contrast, infants seem learn consistent labels, but not inconsistent or neutral labels.
statistical learning; word segmentation; word learning; language acquisition
Speakers convey meaning not only through words, but also through gestures. Although children are exposed to co-speech gestures from birth, we do not know how the developing brain comes to connect meaning conveyed in gesture with speech. We used functional magnetic resonance imaging (fMRI) to address this question and scanned 8- to 11-year-old children and adults listening to stories accompanied by hand movements, either meaningful co-speech gestures or meaningless self-adaptors. When listening to stories accompanied by both types of hand movements, both children and adults recruited inferior frontal, inferior parietal, and posterior temporal brain regions known to be involved in processing language not accompanied by hand movements. There were, however, age-related differences in activity in posterior superior temporal sulcus (STSp), inferior frontal gyrus, pars triangularis (IFGTr), and posterior middle temporal gyrus (MTGp) regions previously implicated in processing gesture. Both children and adults showed sensitivity to the meaning of hand movements in IFGTr and MTGp, but in different ways. Finally, we found that hand movement meaning modulates interactions between STSp and other posterior temporal and inferior parietal regions for adults, but not for children. These results shed light on the developing neural substrate for understanding meaning contributed by co-speech gesture.
Biologically salient sounds, including speech, are rarely heard in isolation. Our brains must therefore organize the input arising from multiple sources into separate “streams” and, in the case of speech, map the acoustic components of the target signal onto meaning. These auditory and linguistic processes have traditionally been considered to occur sequentially and are typically studied independently [1, 2]. However, evidence that streaming is modified or reset by attention , and that lexical knowledge can affect reports of speech sound identity [4, 5], suggests that higher-level factors may influence perceptual organization. In two experiments, listeners heard sequences of repeated words or acoustically matched nonwords. After several presentations, they reported that the initial /s/ sound in each syllable formed a separate stream; the percept then fluctuated between the streamed and fused states in a bistable manner. In addition to measuring these verbal transformations, we assessed streaming objectively by requiring listeners to detect occasional targets—syllables containing a gap after the initial /s/. Performance was better when streaming caused the syllables preceding the target to transform from words into nonwords, rather than from nonwords into words. Our results show that auditory stream formation is influenced not only by the acoustic properties of speech sounds, but also by higher-level processes involved in recognizing familiar words.
•Linguistic processing affects perceptual organization•The acoustic elements of words fuse more readily than those of nonwords•Bistable speech sounds share dynamics with other ambiguous perceptual objects
Linguistic stress and sequential statistical cues to word boundaries interact during speech segmentation in infancy. However, little is known about how the different acoustic components of stress constrain statistical learning. The current studies were designed to investigate whether intensity and duration each function independently as cues to initial prominence (trochaic-based hypothesis) or whether, as predicted by the Iambic-Trochaic Law (ITL), intensity and duration have characteristic and separable effects on rhythmic grouping (ITL-based hypothesis) in a statistical learning task. Infants were familiarized with an artificial language (Experiments 1 & 3) or a tone stream (Experiment 2) in which there was an alternation in either intensity or duration. In addition to potential acoustic cues, the familiarization sequences also contained statistical cues to word boundaries. In speech (Experiment 1) and non-speech (Experiment 2) conditions, 9-month-old infants demonstrated discrimination patterns consistent with an ITL-based hypothesis: intensity signaled initial prominence and duration signaled final prominence. The results of Experiment 3, in which 6.5-month-old infants were familiarized with the speech streams from Experiment 1, suggest that there is a developmental change in infants’ willingness to treat increased duration as a cue to word offsets in fluent speech. Infants’ perceptual systems interact with linguistic experience to constrain how infants learn from their auditory environment.
linguistic stress; rhythmic grouping; statistical learning; perceptual biases; speech segmentation; language acquisition
Speech sound disorders (SSD) are the largest group of communication disorders observed in children. One explanation for these disorders is that children with SSD fail to form stable phonological representations when acquiring the speech sound system of their language due to poor phonological memory (PM). The goal of this study was to examine PM in individuals with histories of SSD employing functional MR imaging (fMRI). Participants were 6 right-handed adolescents with a history of early childhood SSD and 7 right-handed matched controls with no history of speech and language disorders. We performed an fMRI study using an overt non-word repetition (NWR). Right lateralized hypoactivation in the inferior frontal gyrus and middle temporal gyrus was observed. The former suggests a deficit in the phonological processing loop supporting PM, while the later may indicate a deficit in speech perception. Both are cognitive processes involved in speech production. Bilateral hyperactivation observed in the pre and supplementary motor cortex, inferior parietal, supramarginal gyrus and cerebellum raised the possibility of compensatory increases in cognitive effort or reliance on the other components of the articulatory rehearsal network and phonologic store. These findings may be interpreted to support the hypothesis that individuals with SSD may have a deficit in PM and to suggest the involvement of compensatory mechanisms to counteract dysfunction of the normal network.
Four experiments examined listeners’ segmentation of ambiguous schwa-initial sequences (e.g., a long vs. along) in casual speech, where acoustic cues can be unclear, possibly increasing reliance on contextual information to resolve the ambiguity. In Experiment 1, acoustic analyses of talkers’ productions showed that the one-word and two-word versions were produced almost identically, regardless of the preceding sentential context (biased or neutral). These tokens were then used in three listening experiments, whose results confirmed the lack of local acoustic cues for disambiguating the interpretation, and the dominance of sentential context in parsing. Findings speak to the H&H theory of speech production (Lindblom, 1990), demonstrate that context alone guides parsing when acoustic cues to word boundaries are absent, and demonstrate how knowledge of how talkers speak can contribute to an understanding of how words are segmented.
Word segmentation; Casual speech; Speech production; Speech perception
To efficiently segment fluent speech, infants must discover the predominant phonological form of words in the native language. In English, for example, content words typically begin with a stressed syllable. To discover this regularity, infants need to identify a set of words. We propose that statistical learning plays two roles in this process. First, it provides a cue that allows infants to segment words from fluent speech, even without language-specific phonological knowledge. Second, once infants have identified a set of lexical forms, they can learn from the distribution of acoustic features across those word forms. The current experiments demonstrate both processes are available to 5-month-old infants. This demonstration of sensitivity to statistical structure in speech, weighted more heavily than phonological cues to segmentation at an early age, is consistent with theoretical accounts that claim statistical learning plays a role in helping infants to adapt to the structure of their native language from very early in life.
statistical learning; word segmentation; lexical stress; infant language; phonology
Selective attention to speech versus nonspeech signals in complex auditory input could produce top-down modulation of cortical regions previously linked to perception of spoken, and even visual, words. To isolate such top-down attentional effects, we contrasted 2 equally challenging active listening tasks, performed on the same complex auditory stimuli (words overlaid with a series of 3 tones). Instructions required selectively attending to either the speech signals (in service of rhyme judgment) or the melodic signals (tone-triplet matching). Selective attention to speech, relative to attention to melody, was associated with blood oxygenation level–dependent (BOLD) increases during functional magnetic resonance imaging (fMRI) in left inferior frontal gyrus, temporal regions, and the visual word form area (VWFA). Further investigation of the activity in visual regions revealed overall deactivation relative to baseline rest for both attention conditions. Topographic analysis demonstrated that while attending to melody drove deactivation equivalently across all fusiform regions of interest examined, attending to speech produced a regionally specific modulation: deactivation of all fusiform regions, except the VWFA. Results indicate that selective attention to speech can topographically tune extrastriate cortex, leading to increased activity in VWFA relative to surrounding regions, in line with the well-established connectivity between areas related to spoken and visual word perception in skilled readers.
complex sounds; fusiform gyrus; pure-tone judgment; rhyming; speech perception
The present work examined the discovery of linguistic cues during a word segmentation task. Whereas previous studies have focused on sensitivity to individual cues, this study addresses how individual cues may be used to discover additional, correlated cues. Twenty-four 9-month-old infants were familiarized with a speech stream, in which syllable-level transitional probabilities and an overlapping novel cue served as cues to word boundaries. Infants’ behavior at test indicated they were able to discover the novel cue. Additional experiments showed that infants did not have a preexisting preference for specific test items, and that transitional probability information was necessary to acquire the novel cue. Results suggest one way learners can discover relevant linguistic structure amidst the multiple overlapping properties of natural language.
Two primary areas of damage have been implicated in apraxia of speech (AOS) based on the time post-stroke: (1) the left inferior frontal gyrus (IFG) in acute patients, and (2) the left anterior insula (aIns) in chronic patients. While AOS is widely characterized as a disorder in motor speech planning, little is known about the specific contributions of each of these regions in speech. The purpose of this study was to investigate cortical activation during speech production with a specific focus on the aIns and the IFG in normal adults. While undergoing sparse fMRI, 30 normal adults completed a 30-minute speech-repetition task consisting of three-syllable nonwords that contained either (a) English (native) syllables or (b) Non-English (novel) syllables. When the novel syllable productions were compared to the native syllable productions, greater neural activation was observed in the aIns and IFG, particularly during the first 10 minutes of the task when novelty was the greatest. Although activation in the aIns remained high throughout the task for novel productions, greater activation was clearly demonstrated when the initial 10 minutes were compared to the final 10 minutes of the task. These results suggest increased activity within an extensive neural network, including the aIns and IFG, when the motor speech system is taxed, such as during the production of novel speech. We speculate that the amount of left aIns recruitment during speech production may be related to the internal construction of the motor speech unit such that the degree of novelty/automaticity would result in more or less demands respectively. The role of the IFG as a storehouse and integrative processor for previously acquired routines is also discussed.
The fluency and reliability of speech production suggests a mechanism that links motor commands and sensory feedback. Here, we examine the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or not, and examining the overlap with the network recruited during passive listening to speech sounds. We use real-time signal processing to compare brain activity when participants whispered a consonant-vowel-consonant word (‘Ted’) and either heard this clearly, or heard voice-gated masking noise. We compare this to when they listened to yoked stimuli (identical recordings of ‘Ted’ or noise) without speaking. Activity along the superior temporal sulcus (STS) and superior temporal gyrus (STG) bilaterally was significantly greater if the auditory stimulus was a) processed as the auditory concomitant of speaking and b) did not match the predicted outcome (noise). The network exhibiting this Feedback type by Production/Perception interaction includes an STG/MTG region that is activated more when listening to speech than to noise. This is consistent with speech production and speech perception being linked in a control system that predicts the sensory outcome of speech acts, and that processes an error signal in speech-sensitive regions when this and the sensory data do not match.
Individuals suffering from vision loss of a peripheral origin may learn to understand spoken language at a rate of up to about 22 syllables (syl) per second - exceeding by far the maximum performance level of normal-sighted listeners (ca. 8 syl/s). To further elucidate the brain mechanisms underlying this extraordinary skill, functional magnetic resonance imaging (fMRI) was performed in blind subjects of varying ultra-fast speech comprehension capabilities and sighted individuals while listening to sentence utterances of a moderately fast (8 syl/s) or ultra-fast (16 syl/s) syllabic rate.
Besides left inferior frontal gyrus (IFG), bilateral posterior superior temporal sulcus (pSTS) and left supplementary motor area (SMA), blind people highly proficient in ultra-fast speech perception showed significant hemodynamic activation of right-hemispheric primary visual cortex (V1), contralateral fusiform gyrus (FG), and bilateral pulvinar (Pv).
Presumably, FG supports the left-hemispheric perisylvian “language network”, i.e., IFG and superior temporal lobe, during the (segmental) sequencing of verbal utterances whereas the collaboration of bilateral pulvinar, right auditory cortex, and ipsilateral V1 implements a signal-driven timing mechanism related to syllabic (suprasegmental) modulation of the speech signal. These data structures, conveyed via left SMA to the perisylvian “language zones”, might facilitate – under time-critical conditions – the consolidation of linguistic information at the level of verbal working memory.
Speech perception; Compressed speech; Late- and early-blind subjects; Cross-modal plasticity; Timing
How do listeners manage to recognize words in an unfamiliar language? The physical continuity of the signal, in which real silent pauses between words are lacking, makes it a difficult task. However, there are multiple cues that can be exploited to localize word boundaries and to segment the acoustic signal. In the present study, word-stress was manipulated with statistical information and placed in different syllables within trisyllabic nonsense words to explore the result of the combination of the cues in an online word segmentation task.
The behavioral results showed that words were segmented better when stress was placed on the final syllables than when it was placed on the middle or first syllable. The electrophysiological results showed an increase in the amplitude of the P2 component, which seemed to be sensitive to word-stress and its location within words.
The results demonstrated that listeners can integrate specific prosodic and distributional cues when segmenting speech. An ERP component related to word-stress cues was identified: stressed syllables elicited larger amplitudes in the P2 component than unstressed ones.
Event-related potential (ERP) evidence indicates that listeners selectively attend to word onsets in continuous speech, but the reason for this preferential processing is unknown. The current study measured ERPs elicited by syllable onsets in an artificial language to test the hypothesis that listeners direct attention to word onsets because their identity is unpredictable. Both before and after recognition training, participants listened to a continuous stream of six nonsense words arranged in pairs, such that the second word in each pair was completely predictable. After training, first words in pairs elicited a larger negativity beginning around 100 ms after onset. This effect was not evident for the completely predictable second words in pairs. These results suggest that listeners are most likely to attend to the segments in speech that they are least able to predict.
speech perception; predictability; selective attention; auditory; ERP; N1
Multiple cues influence listeners’ segmentation of connected speech into words, but most previous studies have used stimuli elicited in careful readings rather than natural conversation. Discerning word boundaries in conversational speech may differ from the laboratory setting. In particular, a speaker’s articulatory effort – hyperarticulation vs. hypoarticulation (H&H) – may vary according to communicative demands, suggesting a compensatory relationship whereby acoustic-phonetic cues are attenuated when other information sources strongly guide segmentation. We examined how listeners’ interpretation of segmentation cues is affected by speech style (spontaneous conversation vs. read), using cross-modal identity priming. To elicit spontaneous stimuli, we used a map task in which speakers discussed routes around stylized landmarks. These landmarks were two-word phrases in which the strength of potential segmentation cues – semantic likelihood and cross-boundary diphone phonotactics – was systematically varied. Landmark-carrying utterances were transcribed and later re-recorded as read speech. Independent of speech style, we found an interaction between cue valence (favorable/unfavorable) and cue type (phonotactics/semantics). Thus, there was an effect of semantic plausibility, but no effect of cross-boundary phonotactics, indicating that the importance of phonotactic segmentation may have been overstated in studies where lexical information was artificially suppressed. These patterns were unaffected by whether the stimuli were elicited in a spontaneous or read context, even though the difference in speech styles was evident in a main effect. Durational analyses suggested speaker-driven cue trade-offs congruent with an H&H account, but these modulations did not impact on listener behavior. We conclude that previous research exploiting read speech is reliable in indicating the primacy of lexically based cues in the segmentation of natural conversational speech.
speech segmentation; semantics; phonotactics; conversational speech; cross-modal priming
Infants have been described as ‘statistical learners’ capable of extracting structure (such as words) from patterned input (such as language). Here, we investigated whether prior knowledge influences how infants track transitional probabilities in word segmentation tasks. Are infants biased by prior experience when engaging in sequential statistical learning? In a laboratory simulation of learning across time, we exposed 9- and 10-month-old infants to a list of either bisyllabic or trisyllabic nonsense words, followed by a pause-free speech stream composed of a different set of bisyllabic or trisyllabic nonsense words. Listening times revealed successful segmentation of words from fluent speech only when words were uniformly bisyllabic or trisyllabic throughout both phases of the experiment. Hearing trisyllabic words during the pre-exposure phase derailed infants’ abilities to segment speech into bisyllabic words, and vice versa. We conclude that prior knowledge about word length equips infants with perceptual expectations that facilitate efficient processing of subsequent language input.
statistical learning; infant language learning; word segmentation; transfer; prior experience