Related Articles
Background
Statistical learning is a candidate for one of the basic prerequisites underlying the expeditious acquisition of spoken language. Infants from 8 months of age exhibit this form of learning to segment fluent speech into distinct words. To test the statistical learning skills at birth, we recorded event-related brain responses of sleeping neonates while they were listening to a stream of syllables containing statistical cues to word boundaries.
Results
We found evidence that sleeping neonates are able to automatically extract statistical properties of the speech input and thus detect the word boundaries in a continuous stream of syllables containing no morphological cues. Syllable-specific event-related brain responses found in two separate studies demonstrated that the neonatal brain treated the syllables differently according to their position within pseudowords.
Conclusion
These results demonstrate that neonates can efficiently learn transitional probabilities or frequencies of co-occurrence between different syllables, enabling them to detect word boundaries and in this way isolate single words out of fluent natural speech. The ability to adopt statistical structures from speech may play a fundamental role as one of the earliest prerequisites of language acquisition.
doi:10.1186/1471-2202-10-21
PMCID: PMC2670827
PMID: 19284661
Background
Language delay is a hallmark feature of autism spectrum disorders (ASD). The identification of word boundaries in continuous speech is a critical first step in language acquisition that can be accomplished via statistical learning and reliance on speech cues. Importantly, early word segmentation skills have been shown to predict later language development in typically developing (TD) children.
Methods
Here we investigated the neural correlates of online word segmentation in children with and without ASD with a well-established behavioral paradigm previously validated for functional magnetic resonance imaging. Eighteen high-functioning boys with ASD and 18 age- and IQ-matched TD boys underwent functional magnetic resonance imaging while listening to two artificial languages (containing statistical or statistical + prosodic cues to word boundaries) and a random speech stream.
Results
Consistent with prior findings, in TD control subjects, activity in fronto-temporal-parietal networks decreased as the number of cues to word boundaries increased. The ASD children, however, did not show this facilitatory effect. Furthermore, statistical contrasts modeling changes in activity over time identified significant learning-related signal increases for both artificial languages in basal ganglia and left temporo-parietal cortex only in TD children. Finally, the level of communicative impairment in ASD children was inversely correlated with signal increases in these same regions during exposure to the artificial languages.
Conclusions
This is the first study to demonstrate significant abnormalities in the neural architecture subserving language-related learning in ASD children and to link the communicative impairments observed in this population to decreased sensitivity to the statistical and speech cues available in the language input.
doi:10.1016/j.biopsych.2010.01.011
PMCID: PMC3229830
PMID: 20303070
Autism; implicit learning; language; neuroimaging; speech perception
We examined the influence of bilingual experience and inhibitory control on the ability to learn a novel language. Using a statistical learning paradigm, participants learned words in two novel languages that were based on the International Morse Code. First, participants listened to a continuous stream of words in a Morse code language to test their ability to segment words from continuous speech. Since Morse code does not overlap in form with natural languages, interference from known languages was minimized. Next, participants listened to another Morse code language composed of new words that conflicted with the first Morse code language. Interference in this second language was high due to conflict between languages and due to the presence of two colliding cues (compressed pauses between words and statistical regularities) that competed to define word boundaries. Results suggest that bilingual experience can improve word learning when interference from other languages is low, while inhibitory control ability can improve word learning when interference from other languages is high. We conclude that the ability to extract novel words from continuous speech is a skill that is affected both by linguistic factors, such as bilingual experience, and by cognitive abilities, such as inhibitory control.
doi:10.3389/fpsyg.2011.00324
PMCID: PMC3223905
PMID: 22131981
language acquisition; statistical learning; bilingualism; inhibitory control; Morse code; Simon task
Infants are adept at tracking statistical regularities to identify word boundaries in pause-free speech. However, researchers have questioned the relevance of statistical learning mechanisms to language acquisition, since previous studies have used simplified artificial languages that ignore the variability of real language input. The experiments reported here embraced a key dimension of variability in infant-directed speech. English-learning infants (8–10 months) listened briefly to natural Italian speech that contained either fluent speech only or a combination of fluent speech and single-word utterances. Listening times revealed successful learning of the statistical properties of target words only when words appeared both in fluent speech and in isolation; brief exposure to fluent speech alone was not sufficient to facilitate detection of the words’ statistical properties. This investigation suggests that statistical learning mechanisms actually benefit from variability in utterance length, and provides the first evidence that isolated words and longer utterances act in concert to support infant word segmentation.
doi:10.1111/j.1467-7687.2011.01079.x
PMCID: PMC3280507
PMID: 22010892
Many studies have shown that listeners can segment words from running speech based on conditional probabilities of syllable transitions, suggesting that this statistical learning could be a foundational component of language learning. However, few studies have shown a direct link between statistical segmentation and word learning. We examined this possible link in adults by following a statistical segmentation exposure phase with an artificial lexicon learning phase. Participants were able to learn all novel object-label pairings, but pairings were learned faster when labels contained high probability (word-like) or non-occurring syllable transitions from the statistical segmentation phase than when they contained low probability (boundary-straddling) syllable transitions. This suggests that, for adults, labels inconsistent with expectations based on statistical learning are harder to learn than consistent or neutral labels. In contrast, infants seem learn consistent labels, but not inconsistent or neutral labels.
doi:10.1016/j.cognition.2008.02.003
PMCID: PMC2486406
PMID: 18355803
statistical learning; word segmentation; word learning; language acquisition
In order to acquire their native languages, children must learn richly structured systems with regularities at multiple levels. While structure at different levels could be learned serially, e.g., speech segmentation coming before word-object mapping, redundancies across levels make parallel learning more efficient. For instance, a series of syllables is likely to be a word not only because of high transitional probabilities, but also because of a consistently co-occurring object. But additional statistics require additional processing, and thus might not be useful to cognitively constrained learners. We show that the structure of child-directed speech makes simultaneous speech segmentation and word learning tractable for human learners. First, a corpus of child-directed speech was recorded from parents and children engaged in a naturalistic free-play task. Analyses revealed two consistent regularities in the sentence structure of naming events. These regularities were subsequently encoded in an artificial language to which adult participants were exposed in the context of simultaneous statistical speech segmentation and word learning. Either regularity was independently sufficient to support successful learning, but no learning occurred in the absence of both regularities. Thus, the structure of child-directed speech plays an important role in scaffolding speech segmentation and word learning in parallel.
doi:10.3389/fpsyg.2012.00374
PMCID: PMC3498894
PMID: 23162487
statistical learning; speech segmentation; word learning; child-directed speech; frequent frames
Speakers convey meaning not only through words, but also through gestures. Although children are exposed to co-speech gestures from birth, we do not know how the developing brain comes to connect meaning conveyed in gesture with speech. We used functional magnetic resonance imaging (fMRI) to address this question and scanned 8- to 11-year-old children and adults listening to stories accompanied by hand movements, either meaningful co-speech gestures or meaningless self-adaptors. When listening to stories accompanied by both types of hand movements, both children and adults recruited inferior frontal, inferior parietal, and posterior temporal brain regions known to be involved in processing language not accompanied by hand movements. There were, however, age-related differences in activity in posterior superior temporal sulcus (STSp), inferior frontal gyrus, pars triangularis (IFGTr), and posterior middle temporal gyrus (MTGp) regions previously implicated in processing gesture. Both children and adults showed sensitivity to the meaning of hand movements in IFGTr and MTGp, but in different ways. Finally, we found that hand movement meaning modulates interactions between STSp and other posterior temporal and inferior parietal regions for adults, but not for children. These results shed light on the developing neural substrate for understanding meaning contributed by co-speech gesture.
doi:10.1111/j.1467-7687.2011.01100.x
PMCID: PMC3515080
PMID: 22356173
Speech sound disorders (SSD) are the largest group of communication disorders observed in children. One explanation for these disorders is that children with SSD fail to form stable phonological representations when acquiring the speech sound system of their language due to poor phonological memory (PM). The goal of this study was to examine PM in individuals with histories of SSD employing functional MR imaging (fMRI). Participants were 6 right-handed adolescents with a history of early childhood SSD and 7 right-handed matched controls with no history of speech and language disorders. We performed an fMRI study using an overt non-word repetition (NWR). Right lateralized hypoactivation in the inferior frontal gyrus and middle temporal gyrus was observed. The former suggests a deficit in the phonological processing loop supporting PM, while the later may indicate a deficit in speech perception. Both are cognitive processes involved in speech production. Bilateral hyperactivation observed in the pre and supplementary motor cortex, inferior parietal, supramarginal gyrus and cerebellum raised the possibility of compensatory increases in cognitive effort or reliance on the other components of the articulatory rehearsal network and phonologic store. These findings may be interpreted to support the hypothesis that individuals with SSD may have a deficit in PM and to suggest the involvement of compensatory mechanisms to counteract dysfunction of the normal network.
doi:10.1016/j.bandl.2011.02.002
PMCID: PMC3162995
PMID: 21458852
To efficiently segment fluent speech, infants must discover the predominant phonological form of words in the native language. In English, for example, content words typically begin with a stressed syllable. To discover this regularity, infants need to identify a set of words. We propose that statistical learning plays two roles in this process. First, it provides a cue that allows infants to segment words from fluent speech, even without language-specific phonological knowledge. Second, once infants have identified a set of lexical forms, they can learn from the distribution of acoustic features across those word forms. The current experiments demonstrate both processes are available to 5-month-old infants. This demonstration of sensitivity to statistical structure in speech, weighted more heavily than phonological cues to segmentation at an early age, is consistent with theoretical accounts that claim statistical learning plays a role in helping infants to adapt to the structure of their native language from very early in life.
doi:10.3389/fpsyg.2012.00590
PMCID: PMC3547220
PMID: 23335903
statistical learning; word segmentation; lexical stress; infant language; phonology
Selective attention to speech versus nonspeech signals in complex auditory input could produce top-down modulation of cortical regions previously linked to perception of spoken, and even visual, words. To isolate such top-down attentional effects, we contrasted 2 equally challenging active listening tasks, performed on the same complex auditory stimuli (words overlaid with a series of 3 tones). Instructions required selectively attending to either the speech signals (in service of rhyme judgment) or the melodic signals (tone-triplet matching). Selective attention to speech, relative to attention to melody, was associated with blood oxygenation level–dependent (BOLD) increases during functional magnetic resonance imaging (fMRI) in left inferior frontal gyrus, temporal regions, and the visual word form area (VWFA). Further investigation of the activity in visual regions revealed overall deactivation relative to baseline rest for both attention conditions. Topographic analysis demonstrated that while attending to melody drove deactivation equivalently across all fusiform regions of interest examined, attending to speech produced a regionally specific modulation: deactivation of all fusiform regions, except the VWFA. Results indicate that selective attention to speech can topographically tune extrastriate cortex, leading to increased activity in VWFA relative to surrounding regions, in line with the well-established connectivity between areas related to spoken and visual word perception in skilled readers.
doi:10.1093/cercor/bhp129
PMCID: PMC2820701
PMID: 19571269
complex sounds; fusiform gyrus; pure-tone judgment; rhyming; speech perception
The left inferior frontal gyrus (LIFG) exhibits increased responsiveness when people listen to words composed of speech sounds that frequently co-occur in the English language (Vaden, Piquado, Hickok, 2011), termed high phonotactic frequency (Vitevitch & Luce, 1998). The current experiment aimed to further characterize the relation of phonotactic frequency to LIFG activity by manipulating word intelligibility in participants of varying age. Thirty six native English speakers, 19–79 years old (mean = 50.5, sd = 21.0) indicated with a button press whether they recognized 120 binaurally presented consonant-vowel-consonant words during a sparse sampling fMRI experiment (TR = 8 sec). Word intelligibility was manipulated by low-pass filtering (cutoff frequencies of 400 Hz, 1000 Hz, 1600 Hz, and 3150 Hz). Group analyses revealed a significant positive correlation between phonotactic frequency and LIFG activity, which was unaffected by age and hearing thresholds. A region of interest analysis revealed that the relation between phonotactic frequency and LIFG activity was significantly strengthened for the most intelligible words (low-pass cutoff at 3150 Hz). These results suggest that the responsiveness of the left inferior frontal cortex to phonotactic frequency reflects the downstream impact of word recognition rather than support of word recognition, at least when there are no speech production demands.
doi:10.1016/j.neuropsychologia.2011.09.008
PMCID: PMC3207245
PMID: 21925521
The present work examined the discovery of linguistic cues during a word segmentation task. Whereas previous studies have focused on sensitivity to individual cues, this study addresses how individual cues may be used to discover additional, correlated cues. Twenty-four 9-month-old infants were familiarized with a speech stream, in which syllable-level transitional probabilities and an overlapping novel cue served as cues to word boundaries. Infants’ behavior at test indicated they were able to discover the novel cue. Additional experiments showed that infants did not have a preexisting preference for specific test items, and that transitional probability information was necessary to acquire the novel cue. Results suggest one way learners can discover relevant linguistic structure amidst the multiple overlapping properties of natural language.
doi:10.1111/j.1467-8624.2010.01430.x
PMCID: PMC2892808
PMID: 20573101
Two primary areas of damage have been implicated in apraxia of speech (AOS) based on the time post-stroke: (1) the left inferior frontal gyrus (IFG) in acute patients, and (2) the left anterior insula (aIns) in chronic patients. While AOS is widely characterized as a disorder in motor speech planning, little is known about the specific contributions of each of these regions in speech. The purpose of this study was to investigate cortical activation during speech production with a specific focus on the aIns and the IFG in normal adults. While undergoing sparse fMRI, 30 normal adults completed a 30-minute speech-repetition task consisting of three-syllable nonwords that contained either (a) English (native) syllables or (b) Non-English (novel) syllables. When the novel syllable productions were compared to the native syllable productions, greater neural activation was observed in the aIns and IFG, particularly during the first 10 minutes of the task when novelty was the greatest. Although activation in the aIns remained high throughout the task for novel productions, greater activation was clearly demonstrated when the initial 10 minutes were compared to the final 10 minutes of the task. These results suggest increased activity within an extensive neural network, including the aIns and IFG, when the motor speech system is taxed, such as during the production of novel speech. We speculate that the amount of left aIns recruitment during speech production may be related to the internal construction of the motor speech unit such that the degree of novelty/automaticity would result in more or less demands respectively. The role of the IFG as a storehouse and integrative processor for previously acquired routines is also discussed.
PMCID: PMC2953867
PMID: 19385020
The fluency and reliability of speech production suggests a mechanism that links motor commands and sensory feedback. Here, we examine the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or not, and examining the overlap with the network recruited during passive listening to speech sounds. We use real-time signal processing to compare brain activity when participants whispered a consonant-vowel-consonant word (‘Ted’) and either heard this clearly, or heard voice-gated masking noise. We compare this to when they listened to yoked stimuli (identical recordings of ‘Ted’ or noise) without speaking. Activity along the superior temporal sulcus (STS) and superior temporal gyrus (STG) bilaterally was significantly greater if the auditory stimulus was a) processed as the auditory concomitant of speaking and b) did not match the predicted outcome (noise). The network exhibiting this Feedback type by Production/Perception interaction includes an STG/MTG region that is activated more when listening to speech than to noise. This is consistent with speech production and speech perception being linked in a control system that predicts the sensory outcome of speech acts, and that processes an error signal in speech-sensitive regions when this and the sensory data do not match.
doi:10.1162/jocn.2009.21324
PMCID: PMC2862116
PMID: 19642886
Event-related potential (ERP) evidence indicates that listeners selectively attend to word onsets in continuous speech, but the reason for this preferential processing is unknown. The current study measured ERPs elicited by syllable onsets in an artificial language to test the hypothesis that listeners direct attention to word onsets because their identity is unpredictable. Both before and after recognition training, participants listened to a continuous stream of six nonsense words arranged in pairs, such that the second word in each pair was completely predictable. After training, first words in pairs elicited a larger negativity beginning around 100 ms after onset. This effect was not evident for the completely predictable second words in pairs. These results suggest that listeners are most likely to attend to the segments in speech that they are least able to predict.
doi:10.1016/j.neuropsychologia.2011.08.014
PMCID: PMC3192267
PMID: 21875609
speech perception; predictability; selective attention; auditory; ERP; N1
Background
How do listeners manage to recognize words in an unfamiliar language? The physical continuity of the signal, in which real silent pauses between words are lacking, makes it a difficult task. However, there are multiple cues that can be exploited to localize word boundaries and to segment the acoustic signal. In the present study, word-stress was manipulated with statistical information and placed in different syllables within trisyllabic nonsense words to explore the result of the combination of the cues in an online word segmentation task.
Results
The behavioral results showed that words were segmented better when stress was placed on the final syllables than when it was placed on the middle or first syllable. The electrophysiological results showed an increase in the amplitude of the P2 component, which seemed to be sensitive to word-stress and its location within words.
Conclusion
The results demonstrated that listeners can integrate specific prosodic and distributional cues when segmenting speech. An ERP component related to word-stress cues was identified: stressed syllables elicited larger amplitudes in the P2 component than unstressed ones.
doi:10.1186/1471-2202-9-23
PMCID: PMC2263048
PMID: 18282274
Multiple cues influence listeners’ segmentation of connected speech into words, but most previous studies have used stimuli elicited in careful readings rather than natural conversation. Discerning word boundaries in conversational speech may differ from the laboratory setting. In particular, a speaker’s articulatory effort – hyperarticulation vs. hypoarticulation (H&H) – may vary according to communicative demands, suggesting a compensatory relationship whereby acoustic-phonetic cues are attenuated when other information sources strongly guide segmentation. We examined how listeners’ interpretation of segmentation cues is affected by speech style (spontaneous conversation vs. read), using cross-modal identity priming. To elicit spontaneous stimuli, we used a map task in which speakers discussed routes around stylized landmarks. These landmarks were two-word phrases in which the strength of potential segmentation cues – semantic likelihood and cross-boundary diphone phonotactics – was systematically varied. Landmark-carrying utterances were transcribed and later re-recorded as read speech. Independent of speech style, we found an interaction between cue valence (favorable/unfavorable) and cue type (phonotactics/semantics). Thus, there was an effect of semantic plausibility, but no effect of cross-boundary phonotactics, indicating that the importance of phonotactic segmentation may have been overstated in studies where lexical information was artificially suppressed. These patterns were unaffected by whether the stimuli were elicited in a spontaneous or read context, even though the difference in speech styles was evident in a main effect. Durational analyses suggested speaker-driven cue trade-offs congruent with an H&H account, but these modulations did not impact on listener behavior. We conclude that previous research exploiting read speech is reliable in indicating the primacy of lexically based cues in the segmentation of natural conversational speech.
doi:10.3389/fpsyg.2012.00375
PMCID: PMC3464055
PMID: 23060839
speech segmentation; semantics; phonotactics; conversational speech; cross-modal priming
Infants have been described as ‘statistical learners’ capable of extracting structure (such as words) from patterned input (such as language). Here, we investigated whether prior knowledge influences how infants track transitional probabilities in word segmentation tasks. Are infants biased by prior experience when engaging in sequential statistical learning? In a laboratory simulation of learning across time, we exposed 9- and 10-month-old infants to a list of either bisyllabic or trisyllabic nonsense words, followed by a pause-free speech stream composed of a different set of bisyllabic or trisyllabic nonsense words. Listening times revealed successful segmentation of words from fluent speech only when words were uniformly bisyllabic or trisyllabic throughout both phases of the experiment. Hearing trisyllabic words during the pre-exposure phase derailed infants’ abilities to segment speech into bisyllabic words, and vice versa. We conclude that prior knowledge about word length equips infants with perceptual expectations that facilitate efficient processing of subsequent language input.
doi:10.1016/j.cognition.2011.10.007
PMCID: PMC3246061
PMID: 22088408
statistical learning; infant language learning; word segmentation; transfer; prior experience
Everyday communication is accompanied by visual information from several sources, including co-speech gestures, which provide semantic information listeners use to help disambiguate the speaker’s message. Using fMRI, we examined how gestures influence neural activity in brain regions associated with processing semantic information. The BOLD response was recorded while participants listened to stories under three audiovisual conditions and one auditory-only (speech alone) condition. In the first audiovisual condition, the storyteller produced gestures that naturally accompany speech. In the second, she made semantically unrelated hand movements. In the third, she kept her hands still. In addition to inferior parietal and posterior superior and middle temporal regions, bilateral posterior superior temporal sulcus and left anterior inferior frontal gyrus responded more strongly to speech when it was further accompanied by gesture, regardless of the semantic relation to speech. However, the right inferior frontal gyrus was sensitive to the semantic import of the hand movements, demonstrating more activity when hand movements were semantically unrelated to the accompanying speech. These findings show that perceiving hand movements during speech modulates the distributed pattern of neural activation involved in both biological motion perception and discourse comprehension, suggesting listeners attempt to find meaning, not only in the words speakers produce, but also in the hand movements that accompany speech.
doi:10.1002/hbm.20774
PMCID: PMC2896896
PMID: 19384890
discourse comprehension; fMRI; gestures; semantic processing; inferior frontal gyrus
A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain.
doi:10.3389/fpsyg.2012.00320
PMCID: PMC3434440
PMID: 22973251
intelligibility; language; oscillations; phase locking; speech comprehension; speech rate; theta
Learners can segment potential lexical units from syllable streams when statistically variable transitional probabilities between adjacent syllables are the only cues to word boundaries. Here we examine the nature of the representations that result from statistical learning by assessing learners’ ability to generalize across acoustically different stimuli. In three experiments, we compare two possibilities: that the products of statistical segmentation processes are abstract and generalizable representations, or, alternatively, that products of statistical learning are stimulus-bound and restricted to perceptually similar instances. In Experiment 1, learners segmented units from statistically predictable streams, and recognized these units when they were acoustically transformed by temporal reversals. In Experiment 2, learners were able to segment units from temporally reversed syllable streams, but were only able to generalize in conditions of mild acoustic transformation. In Experiment 3, learners were able to recognize statistically segmented units after a voice change but were unable to do so when the novel voice was mildly distorted. Together these results suggest that representations that result from statistical learning can be abstracted to some degree, but not in all listening conditions.
doi:10.3389/fpsyg.2012.00070
PMCID: PMC3311134
PMID: 22470357
speech perception; representation; generalization; segmentation; acoustics; statistical learning
Summary
This study investigated the neural plasticity associated with perceptual learning of a cochlear implant (CI) simulation. Normal-hearing listeners were trained with vocoded and spectrally-shifted speech simulating a CI while cortical responses were measured with fMRI. A condition in which the vocoded speech was spectrally inverted provided a control for learnability and adaptation. Behavioral measures showed considerable individual variability both in the ability to learn to understand the degraded speech, and in phonological working memory capacity. Neurally, left-lateralized regions in superior temporal sulcus and inferior frontal gyrus (IFG) were sensitive to the learnability of the simulations, but only the activity in prefrontal cortex correlated with inter-individual variation in intelligibility scores and phonological working memory. A region in left angular gyrus (AG) showed an activation pattern that reflected learning over the course of the experiment, and co-variation of activity in AG and IFG was modulated by the learnability of the stimuli. These results suggest that variation in listeners' ability to adjust to vocoded and spectrally-shifted speech is partly reflected in differences in the recruitment of higher-level language processes in prefrontal cortex, and that this variability may further depend on functional links between the left inferior frontal gyrus and angular gyrus. Differences in the engagement of left inferior prefrontal cortex, and its co-variation with posterior parietal areas, may thus underlie some of the variation in speech perception skills that have been observed in clinical populations of CI users.
doi:10.1523/JNEUROSCI.4040-09.2010
PMCID: PMC2883443
PMID: 20505085
Speech perception; cochlear implants; perceptual learning; individual differences; fMRI; Cochlea
The brain uses context and prior knowledge to repair degraded sensory inputs and improve perception. For example, listeners hear speech continuing uninterrupted through brief noises, even if the speech signal is artificially removed from the noisy epochs. In a functional MRI study, we show that this temporal filling-in process is based on two dissociable neural mechanisms: the subjective experience of illusory continuity, and the sensory repair mechanisms that support it. Areas mediating illusory continuity include the left posterior angular gyrus (AG) and superior temporal sulcus (STS) and the right STS. Unconscious sensory repair occurs in Broca’s area, bilateral anterior insula, and pre-supplementary motor area. The left AG/STS and all the repair regions show evidence for word-level template matching and communicate more when fewer acoustic cues are available. These results support a two-path process where the brain creates coherent perceptual objects by applying prior knowledge and filling-in corrupted sensory information.
doi:10.1016/j.neuroimage.2008.09.045
PMCID: PMC2653101
PMID: 18977448
Auditory induction; Continuity illusion; fMRI; Perceptual filling-in; Phonemic restoration; Speech
The issue of whether speech is supported by the same neural substrates as non-speech vocal-tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, were compared to the production of speech syllables without meaning. Brain activation related to overt production was captured with BOLD fMRI using a sparse sampling design for both conditions. Speech and non-speech were compared using voxel-wise whole brain analyses, and ROI analyses focused on frontal and temporoparietal structures previously reported to support speech production. Results showed substantial activation overlap between speech and non-speech function in regions. Although non-speech gesture production showed greater extent and amplitude of activation in the regions examined, both speech and non-speech showed comparable left laterality in activation for both target perception and production. These findings posit a more general role of the previously proposed “auditory dorsal stream” in the left hemisphere – to support the production of vocal tract gestures that are not limited to speech processing.
doi:10.1016/j.neuroimage.2009.03.032
PMCID: PMC2711766
PMID: 19327400
sensory-motor interaction; auditory dorsal stream; functional magnetic resonance imaging (fMRI)
Speech segmentation, determining where one word ends and the next begins in continuous speech, is necessary for auditory language processing. However, because there are few direct indices of this fast, automatic process, it has been difficult to study. We recorded event-related brain potentials (ERPs) while adult humans listened to six pronounceable nonwords presented as continuous speech and compared the responses to nonword onsets before and after participants learned the nonsense words. In subjects showing the greatest behavioral evidence of word learning, word onsets elicited a larger N100 after than before training. Thus N100 amplitude indexes speech segmentation even for recently learned words without any acoustic segmentation cues. The timing and distribution of these results suggest specific processes that may be central to speech segmentation.
doi:10.1038/nn873
PMCID: PMC2532533
PMID: 12068301