Word segmentation, detecting word boundaries in continuous speech, is a critical aspect of language learning. Previous research in infants and adults demonstrated that a stream of speech can be readily segmented based solely on the statistical and speech cues afforded by the input. Using functional magnetic resonance imaging (fMRI), the neural substrate of word segmentation was examined on-line as participants listened to three streams of concatenated syllables, containing either statistical regularities alone, statistical regularities and speech cues, or no cues. Despite the participants’ inability to explicitly detect differences between the speech streams, neural activity differed significantly across conditions, with left-lateralized signal increases in temporal cortices observed only when participants listened to streams containing statistical regularities, particularly the stream containing speech cues. In a second fMRI study, designed to verify that word segmentation had implicitly taken place, participants listened to trisyllabic combinations that occurred with different frequencies in the streams of speech they just heard (“words,” 45 times; “partwords,” 15 times; “nonwords,” once). Reliably greater activity in left inferior and middle frontal gyri was observed when comparing words with partwords and, to a lesser extent, when comparing partwords with nonwords. Activity in these regions, taken to index the implicit detection of word boundaries, was positively correlated with participants’ rapid auditory processing skills. These findings provide a neural signature of on-line word segmentation in the mature brain and an initial model with which to study developmental changes in the neural architecture involved in processing speech cues during language learning.
fMRI; language; speech perception; word segmentation; statistical learning; auditory cortex; inferior frontal gyrus
Very little is known about the neural underpinnings of language learning across the lifespan and how these might be modified by maturational and experiential factors. Building on behavioral research highlighting the importance of early word segmentation (i.e. the detection of word boundaries in continuous speech) for subsequent language learning, here we characterize developmental changes in brain activity as this process occurs online, using data collected in a mixed cross-sectional and longitudinal design. One hundred and fifty-six participants, ranging from age 5 to adulthood, underwent functional magnetic resonance imaging (fMRI) while listening to three novel streams of continuous speech, which contained either strong statistical regularities, strong statistical regularities and speech cues, or weak statistical regularities providing minimal cues to word boundaries. All age groups displayed significant signal increases over time in temporal cortices for the streams with high statistical regularities; however, we observed a significant right-to-left shift in the laterality of these learning-related increases with age. Interestingly, only the 5- to 10-year-old children displayed significant signal increases for the stream with low statistical regularities, suggesting an age-related decrease in sensitivity to more subtle statistical cues. Further, in a sample of 78 10-year-olds, we examined the impact of proficiency in a second language and level of pubertal development on learning-related signal increases, showing that the brain regions involved in language learning are influenced by both experiential and maturational factors.
Statistical learning is a candidate for one of the basic prerequisites underlying the expeditious acquisition of spoken language. Infants from 8 months of age exhibit this form of learning to segment fluent speech into distinct words. To test the statistical learning skills at birth, we recorded event-related brain responses of sleeping neonates while they were listening to a stream of syllables containing statistical cues to word boundaries.
We found evidence that sleeping neonates are able to automatically extract statistical properties of the speech input and thus detect the word boundaries in a continuous stream of syllables containing no morphological cues. Syllable-specific event-related brain responses found in two separate studies demonstrated that the neonatal brain treated the syllables differently according to their position within pseudowords.
These results demonstrate that neonates can efficiently learn transitional probabilities or frequencies of co-occurrence between different syllables, enabling them to detect word boundaries and in this way isolate single words out of fluent natural speech. The ability to adopt statistical structures from speech may play a fundamental role as one of the earliest prerequisites of language acquisition.
Language delay is a hallmark feature of autism spectrum disorders (ASD). The identification of word boundaries in continuous speech is a critical first step in language acquisition that can be accomplished via statistical learning and reliance on speech cues. Importantly, early word segmentation skills have been shown to predict later language development in typically developing (TD) children.
Here we investigated the neural correlates of online word segmentation in children with and without ASD with a well-established behavioral paradigm previously validated for functional magnetic resonance imaging. Eighteen high-functioning boys with ASD and 18 age- and IQ-matched TD boys underwent functional magnetic resonance imaging while listening to two artificial languages (containing statistical or statistical + prosodic cues to word boundaries) and a random speech stream.
Consistent with prior findings, in TD control subjects, activity in fronto-temporal-parietal networks decreased as the number of cues to word boundaries increased. The ASD children, however, did not show this facilitatory effect. Furthermore, statistical contrasts modeling changes in activity over time identified significant learning-related signal increases for both artificial languages in basal ganglia and left temporo-parietal cortex only in TD children. Finally, the level of communicative impairment in ASD children was inversely correlated with signal increases in these same regions during exposure to the artificial languages.
This is the first study to demonstrate significant abnormalities in the neural architecture subserving language-related learning in ASD children and to link the communicative impairments observed in this population to decreased sensitivity to the statistical and speech cues available in the language input.
Autism; implicit learning; language; neuroimaging; speech perception
The presence of gesture during speech has been shown to impact perception, comprehension, learning, and memory in normal adults and typically developing children. In neurotypical individuals, the impact of viewing co-speech gestures representing an object and/or action (i.e., iconic gesture) or speech rhythm (i.e., beat gesture) has also been observed at the neural level. Yet, despite growing evidence of delayed gesture development in children with autism spectrum disorders (ASD), few studies have examined how the brain processes multimodal communicative cues occurring during everyday communication in individuals with ASD. Here, we used a previously validated functional magnetic resonance imaging (fMRI) paradigm to examine the neural processing of co-speech beat gesture in children with ASD and matched controls. Consistent with prior observations in adults, typically developing children showed increased responses in right superior temporal gyrus and sulcus while listening to speech accompanied by beat gesture. Children with ASD, however, exhibited no significant modulatory effects in secondary auditory cortices for the presence of co-speech beat gesture. Rather, relative to their typically developing counterparts, children with ASD showed significantly greater activity in visual cortex while listening to speech accompanied by beat gesture. Importantly, the severity of their socio-communicative impairments correlated with activity in this region, such that the more impaired children demonstrated the greatest activity in visual areas while viewing co-speech beat gesture. These findings suggest that although the typically developing brain recognizes beat gesture as communicative and successfully integrates it with co-occurring speech, information from multiple sensory modalities is not effectively integrated during social communication in the autistic brain.
Autism spectrum disorders; fMRI; gesture; language; superior temporal gyrus
A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain.
intelligibility; language; oscillations; phase locking; speech comprehension; speech rate; theta
Speakers convey meaning not only through words, but also through gestures. Although children are exposed to co-speech gestures from birth, we do not know how the developing brain comes to connect meaning conveyed in gesture with speech. We used functional magnetic resonance imaging (fMRI) to address this question and scanned 8- to 11-year-old children and adults listening to stories accompanied by hand movements, either meaningful co-speech gestures or meaningless self-adaptors. When listening to stories accompanied by both types of hand movements, both children and adults recruited inferior frontal, inferior parietal, and posterior temporal brain regions known to be involved in processing language not accompanied by hand movements. There were, however, age-related differences in activity in posterior superior temporal sulcus (STSp), inferior frontal gyrus, pars triangularis (IFGTr), and posterior middle temporal gyrus (MTGp) regions previously implicated in processing gesture. Both children and adults showed sensitivity to the meaning of hand movements in IFGTr and MTGp, but in different ways. Finally, we found that hand movement meaning modulates interactions between STSp and other posterior temporal and inferior parietal regions for adults, but not for children. These results shed light on the developing neural substrate for understanding meaning contributed by co-speech gesture.
Functional magnetic resonance imaging (fMRI) was used to assess neural
activation as participants learned to segment continuous streams of speech
containing syllable sequences varying in their transitional probabilities.
Speech streams were presented in four runs, each followed by a behavioral test
to measure the extent of learning over time. Behavioral performance indicated
that participants could discriminate statistically coherent sequences (words)
from less coherent sequences (partwords). Individual rates of learning, defined
as the difference in ratings for words and partwords, were used as predictors of
neural activation to ask which brain areas showed activity associated with these
measures. Results showed significant activity in the pars opercularis and pars
triangularis regions of the left inferior frontal gyrus (LIFG). The relationship
between these findings and prior work on the neural basis of statistical
learning is discussed, and parallels to the frontal/subcortical network involved
in other forms of implicit sequence learning are considered.
fMRI; statistical learning; word segmentation; artificial language; sequence learning; Broca’s area; LIFG
Biologically salient sounds, including speech, are rarely heard in isolation. Our brains must therefore organize the input arising from multiple sources into separate “streams” and, in the case of speech, map the acoustic components of the target signal onto meaning. These auditory and linguistic processes have traditionally been considered to occur sequentially and are typically studied independently [1, 2]. However, evidence that streaming is modified or reset by attention , and that lexical knowledge can affect reports of speech sound identity [4, 5], suggests that higher-level factors may influence perceptual organization. In two experiments, listeners heard sequences of repeated words or acoustically matched nonwords. After several presentations, they reported that the initial /s/ sound in each syllable formed a separate stream; the percept then fluctuated between the streamed and fused states in a bistable manner. In addition to measuring these verbal transformations, we assessed streaming objectively by requiring listeners to detect occasional targets—syllables containing a gap after the initial /s/. Performance was better when streaming caused the syllables preceding the target to transform from words into nonwords, rather than from nonwords into words. Our results show that auditory stream formation is influenced not only by the acoustic properties of speech sounds, but also by higher-level processes involved in recognizing familiar words.
•Linguistic processing affects perceptual organization•The acoustic elements of words fuse more readily than those of nonwords•Bistable speech sounds share dynamics with other ambiguous perceptual objects
Everyday communication is accompanied by visual information from several sources, including co-speech gestures, which provide semantic information listeners use to help disambiguate the speaker’s message. Using fMRI, we examined how gestures influence neural activity in brain regions associated with processing semantic information. The BOLD response was recorded while participants listened to stories under three audiovisual conditions and one auditory-only (speech alone) condition. In the first audiovisual condition, the storyteller produced gestures that naturally accompany speech. In the second, she made semantically unrelated hand movements. In the third, she kept her hands still. In addition to inferior parietal and posterior superior and middle temporal regions, bilateral posterior superior temporal sulcus and left anterior inferior frontal gyrus responded more strongly to speech when it was further accompanied by gesture, regardless of the semantic relation to speech. However, the right inferior frontal gyrus was sensitive to the semantic import of the hand movements, demonstrating more activity when hand movements were semantically unrelated to the accompanying speech. These findings show that perceiving hand movements during speech modulates the distributed pattern of neural activation involved in both biological motion perception and discourse comprehension, suggesting listeners attempt to find meaning, not only in the words speakers produce, but also in the hand movements that accompany speech.
discourse comprehension; fMRI; gestures; semantic processing; inferior frontal gyrus
Models propose an auditory-motor mapping via a left-hemispheric dorsal speech-processing stream, yet its detailed contributions to speech perception and production are unclear. Using fMRI-navigated repetitive transcranial magnetic stimulation (rTMS), we virtually lesioned left dorsal stream components in healthy human subjects and probed the consequences on speech-related facilitation of articulatory motor cortex (M1) excitability, as indexed by increases in motor-evoked potential (MEP) amplitude of a lip muscle, and on speech processing performance in phonological tests. Speech-related MEP facilitation was disrupted by rTMS of the posterior superior temporal sulcus (pSTS), the sylvian parieto-temporal region (SPT), and by double-knock-out but not individual lesioning of pars opercularis of the inferior frontal gyrus (pIFG) and the dorsal premotor cortex (dPMC), and not by rTMS of the ventral speech-processing stream or an occipital control site. RTMS of the dorsal stream but not of the ventral stream or the occipital control site caused deficits specifically in the processing of fast transients of the acoustic speech signal. Performance of syllable and pseudoword repetition correlated with speech-related MEP facilitation, and this relation was abolished with rTMS of pSTS, SPT, and pIFG. Findings provide direct evidence that auditory-motor mapping in the left dorsal stream causes reliable and specific speech-related MEP facilitation in left articulatory M1. The left dorsal stream targets the articulatory M1 through pSTS and SPT constituting essential posterior input regions and parallel via frontal pathways through pIFG and dPMC. Finally, engagement of the left dorsal stream is necessary for processing of fast transients in the auditory signal.
articulatory motor cortex; dorsal auditory stream; motor-evoked potential; phonological processing; repetitive transcranial magnetic stimulation; transient virtual lesion
Earlier studies have shown considerable intersubject synchronization of brain activity when subjects watch the same movie or listen to the same story. Here we investigated the across-subjects similarity of brain responses to speech and non-speech sounds in a continuous audio drama designed for blind people. Thirteen healthy adults listened for ∼19 min to the audio drama while their brain activity was measured with 3 T functional magnetic resonance imaging (fMRI). An intersubject-correlation (ISC) map, computed across the whole experiment to assess the stimulus-driven extrinsic brain network, indicated statistically significant ISC in temporal, frontal and parietal cortices, cingulate cortex, and amygdala. Group-level independent component (IC) analysis was used to parcel out the brain signals into functionally coupled networks, and the dependence of the ICs on external stimuli was tested by comparing them with the ISC map. This procedure revealed four extrinsic ICs of which two–covering non-overlapping areas of the auditory cortex–were modulated by both speech and non-speech sounds. The two other extrinsic ICs, one left-hemisphere-lateralized and the other right-hemisphere-lateralized, were speech-related and comprised the superior and middle temporal gyri, temporal poles, and the left angular and inferior orbital gyri. In areas of low ISC four ICs that were defined intrinsic fluctuated similarly as the time-courses of either the speech-sound-related or all-sounds-related extrinsic ICs. These ICs included the superior temporal gyrus, the anterior insula, and the frontal, parietal and midline occipital cortices. Taken together, substantial intersubject synchronization of cortical activity was observed in subjects listening to an audio drama, with results suggesting that speech is processed in two separate networks, one dedicated to the processing of speech sounds and the other to both speech and non-speech sounds.
Many figurative expressions are fully conventionalized in everyday speech. Regarding the neural basis of figurative language processing, research has predominantly focused on metaphoric expressions in minimal semantic context. It remains unclear in how far metaphoric expressions during continuous text comprehension activate similar neural networks as isolated metaphors. We therefore investigated the processing of similes (figurative language, e.g., “He smokes like a chimney!”) occurring in a short story. Sixteen healthy, male, native German speakers listened to similes that came about naturally in a short story, while blood-oxygenation-level-dependent (BOLD) responses were measured with functional magnetic resonance imaging (fMRI). For the event-related analysis, similes were contrasted with non-figurative control sentences (CS). The stimuli differed with respect to figurativeness, while they were matched for frequency of words, number of syllables, plausibility, and comprehensibility. Similes contrasted with CS resulted in enhanced BOLD responses in the left inferior (IFG) and adjacent middle frontal gyrus. Concrete CS as compared to similes activated the bilateral middle temporal gyri as well as the right precuneus and the left middle frontal gyrus (LMFG). Activation of the left IFG for similes in a short story is consistent with results on single sentence metaphor processing. The findings strengthen the importance of the left inferior frontal region in the processing of abstract figurative speech during continuous, ecologically-valid speech comprehension; the processing of concrete semantic contents goes along with a down-regulation of bilateral temporal regions.
figurative speech; simile; abstractness; inferior frontal gyrus; fMRI
Two primary areas of damage have been implicated in apraxia of speech (AOS) based on the time post-stroke: (1) the left inferior frontal gyrus (IFG) in acute patients, and (2) the left anterior insula (aIns) in chronic patients. While AOS is widely characterized as a disorder in motor speech planning, little is known about the specific contributions of each of these regions in speech. The purpose of this study was to investigate cortical activation during speech production with a specific focus on the aIns and the IFG in normal adults. While undergoing sparse fMRI, 30 normal adults completed a 30-minute speech-repetition task consisting of three-syllable nonwords that contained either (a) English (native) syllables or (b) Non-English (novel) syllables. When the novel syllable productions were compared to the native syllable productions, greater neural activation was observed in the aIns and IFG, particularly during the first 10 minutes of the task when novelty was the greatest. Although activation in the aIns remained high throughout the task for novel productions, greater activation was clearly demonstrated when the initial 10 minutes were compared to the final 10 minutes of the task. These results suggest increased activity within an extensive neural network, including the aIns and IFG, when the motor speech system is taxed, such as during the production of novel speech. We speculate that the amount of left aIns recruitment during speech production may be related to the internal construction of the motor speech unit such that the degree of novelty/automaticity would result in more or less demands respectively. The role of the IFG as a storehouse and integrative processor for previously acquired routines is also discussed.
In order to acquire their native languages, children must learn richly structured systems with regularities at multiple levels. While structure at different levels could be learned serially, e.g., speech segmentation coming before word-object mapping, redundancies across levels make parallel learning more efficient. For instance, a series of syllables is likely to be a word not only because of high transitional probabilities, but also because of a consistently co-occurring object. But additional statistics require additional processing, and thus might not be useful to cognitively constrained learners. We show that the structure of child-directed speech makes simultaneous speech segmentation and word learning tractable for human learners. First, a corpus of child-directed speech was recorded from parents and children engaged in a naturalistic free-play task. Analyses revealed two consistent regularities in the sentence structure of naming events. These regularities were subsequently encoded in an artificial language to which adult participants were exposed in the context of simultaneous statistical speech segmentation and word learning. Either regularity was independently sufficient to support successful learning, but no learning occurred in the absence of both regularities. Thus, the structure of child-directed speech plays an important role in scaffolding speech segmentation and word learning in parallel.
statistical learning; speech segmentation; word learning; child-directed speech; frequent frames
Successful categorization of phonemes in speech requires that the brain analyze the acoustic signal along both spectral and temporal dimensions. Neural encoding of the stimulus amplitude envelope is critical for parsing the speech stream into syllabic units. Encoding of voice onset time (VOT) and place of articulation (POA), cues necessary for determining phonemic identity, occurs within shorter time frames. An unresolved question is whether the neural representation of speech is based on processing mechanisms that are unique to humans and shaped by learning and experience, or is based on rules governing general auditory processing that are also present in non-human animals. This question was examined by comparing the neural activity elicited by speech and other complex vocalizations in primary auditory cortex of macaques, who are limited vocal learners, with that in Heschl’s gyrus, the putative location of primary auditory cortex in humans. Entrainment to the amplitude envelope is neither specific to humans nor to human speech. VOT is represented by responses time-locked to consonant release and voicing onset in both humans and monkeys. Temporal representation of VOT is observed both for isolated syllables and for syllables embedded in the more naturalistic context of running speech. The fundamental frequency of male speakers is represented by more rapid neural activity phase-locked to the glottal pulsation rate in both humans and monkeys. In both species, the differential representation of stop consonants varying in their POA can be predicted by the relationship between the frequency selectivity of neurons and the onset spectra of the speech sounds. These findings indicate that the neurophysiology of primary auditory cortex is similar in monkeys and humans despite their vastly different experience with human speech, and that Heschl’s gyrus is engaged in general auditory, and not language-specific, processing.
phonemes; primary auditory cortex; monkeys; Heschl’s gyrus; electrophysiology; temporal processing
Language fluency is a common diagnostic marker for discriminating among aphasia subtypes and improving clinical inference about site of lesion. Nevertheless, fluency remains a subjective construct that is vulnerable to a number of potential sources of variability, both between and within raters. Moreover, this variability is compounded by distinct neurological aetiologies that shape the characteristics of a narrative speech sample. Previous research on fluency has focused on characteristics of a particular patient population. Less is known about the ways that raters spontaneously weigh different perceptual cues when listening to narrative speech samples derived from a heterogeneous sample of brain-damaged adults.
We examined the weighted contribution of a series of perceptual predictors that influence listeners’ judgements of language fluency among a diverse sample of speakers. Our goal was to sample a range of narrative speech representing most fluent (i.e., healthy controls) to potentially least nonfluent (i.e., left inferior frontal lobe stroke).
Methods & Procedures
Three raters blind to patient diagnosis made forced choice judgements of fluency (i.e., fluent or nonfluent) for 61 pseudorandomly presented narrative speech samples elicited by the BDAE Cookie Theft picture. Samples were collected from a range of clinical populations, including patients with frontal and temporal lobe pathologies and non-brain-damaged speakers. We conducted a logistic regression analysis in which the dependent measure was the majority judgement of fluency for each speech sample (i.e., fluent or non-fluent). The statistical model contained five predictors: speech rate, syllable type token ratio, speech productivity, audible struggle, and filler ratio.
Outcomes & Results
This statistical model fit the data well, discriminating group membership (i.e., fluent or nonfluent) with 95.1% accuracy. The best step of the regression model included the following predictors: speech rate, speech productivity, and audible struggle. Listeners were sensitive to different weightings of these predictors.
A small combination of perceptual variables can strongly discriminate whether a listener will assign a judgement of fluent versus nonfluent. We discuss implications for these findings and identify areas of potential future research towards further specifying the construct of fluency among adults with acquired speech and language disorders.
Fluency; Perception; Listener judgement; Nonfluent aphasia
Human speech consists of a variety of articulated sounds that vary dynamically in spectral composition. We investigated the neural activity associated with the perception of two types of speech segments: (a) the period of rapid spectral transition occurring at the beginning of a stop-consonant vowel (CV) syllable and (b) the subsequent spectral steady-state period occurring during the vowel segment of the syllable. Functional magnetic resonance imaging (fMRI) was recorded while subjects listened to series of synthesized CV syllables and non-phonemic control sounds. Adaptation to specific sound features was measured by varying either the transition or steady-state periods of the synthesized sounds. Two spatially distinct brain areas in the superior temporal cortex were found that were sensitive to either the type of adaptation or the type of stimulus. In a relatively large section of the bilateral dorsal superior temporal gyrus (STG), activity varied as a function of adaptation type regardless of whether the stimuli were phonemic or non-phonemic. Immediately adjacent to this region in a more limited area of the ventral STG, increased activity was observed for phonemic trials compared to non-phonemic trials, however, no adaptation effects were found. In addition, a third area in the bilateral medial superior temporal plane showed increased activity to non-phonemic compared to phonemic sounds. The results suggest a multi-stage hierarchical stream for speech sound processing extending ventrolaterally from the superior temporal plane to the superior temporal sulcus. At successive stages in this hierarchy, neurons code for increasingly more complex spectrotemporal features. At the same time, these representations become more abstracted from the original acoustic form of the sound.
speech perception; auditory cortex; phonological processing; fMRI; temporal lobe; spectrotemporal cues
The issue of whether speech is supported by the same neural substrates as non-speech vocal-tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, were compared to the production of speech syllables without meaning. Brain activation related to overt production was captured with BOLD fMRI using a sparse sampling design for both conditions. Speech and non-speech were compared using voxel-wise whole brain analyses, and ROI analyses focused on frontal and temporoparietal structures previously reported to support speech production. Results showed substantial activation overlap between speech and non-speech function in regions. Although non-speech gesture production showed greater extent and amplitude of activation in the regions examined, both speech and non-speech showed comparable left laterality in activation for both target perception and production. These findings posit a more general role of the previously proposed “auditory dorsal stream” in the left hemisphere – to support the production of vocal tract gestures that are not limited to speech processing.
sensory-motor interaction; auditory dorsal stream; functional magnetic resonance imaging (fMRI)
Lexical-semantic knowledge is a core language component that undergoes prolonged development throughout childhood and is therefore highly amenable to developmental studies. Most previous lexical-semantic functional MRI (fMRI) studies have been limited to single-word or word-pair tasks, outside a sentence context. Our objective was to investigate the development of lexical-semantic language networks in typically developing children using a more ecological sentence-embedded semantic task that permitted performance monitoring while minimizing head movement by avoiding overt speech. Sixteen adults and 23 children completed two fMRI runs of an auditory lexical-semantic decision task with a button-press response, using reverse speech as control condition. Children and adults showed similar activation in bilateral temporal and left inferior frontal regions. Greater activation in adults than in children was seen in left inferior parietal, premotor, and inferior frontal regions, and in bilateral supplementary motor area (SMA). Specifically for semantically incongruous sentences, adults also showed greater activation than children in left inferior frontal cortex, possibly related to enhanced top-down control. Age-dependent activation increases in motor-related regions were shown to be unrelated to overt motor responses, but could be associated with covert speech accompanying semantic decision. Unlike previous studies, age-dependent differences were not detected in posterior sensory cortices (such as extrastriate cortex), nor in middle temporal gyrus.
The fluency and reliability of speech production suggests a mechanism that links motor commands and sensory feedback. Here, we examine the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or not, and examining the overlap with the network recruited during passive listening to speech sounds. We use real-time signal processing to compare brain activity when participants whispered a consonant-vowel-consonant word (‘Ted’) and either heard this clearly, or heard voice-gated masking noise. We compare this to when they listened to yoked stimuli (identical recordings of ‘Ted’ or noise) without speaking. Activity along the superior temporal sulcus (STS) and superior temporal gyrus (STG) bilaterally was significantly greater if the auditory stimulus was a) processed as the auditory concomitant of speaking and b) did not match the predicted outcome (noise). The network exhibiting this Feedback type by Production/Perception interaction includes an STG/MTG region that is activated more when listening to speech than to noise. This is consistent with speech production and speech perception being linked in a control system that predicts the sensory outcome of speech acts, and that processes an error signal in speech-sensitive regions when this and the sensory data do not match.
A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and functional magnetic resonance imaging data were collected before and after the treatment phase. Patients were able to produce a greater variety of words with and without speech entrainment at 1 and 6 weeks after training. Treatment-related decrease in cortical activation associated with speech entrainment was found in areas of the left posterior-inferior parietal lobe. We conclude that speech entrainment allows patients with Broca’s aphasia to double their speech output compared with spontaneous speech. Neuroimaging results suggest that speech entrainment allows patients to produce fluent speech by providing an external gating mechanism that yokes a ventral language network that encodes conceptual aspects of speech. Preliminary results suggest that training with speech entrainment improves speech production in Broca’s aphasia providing a potential therapeutic method for a disorder that has been shown to be particularly resistant to treatment.
Broca’s aphasia; functional MRI; speech production; tractography; treatment
Differentiation of logopenic (lvPPA) and nonfluent/agrammatic (nfvPPA) variants of Primary Progressive Aphasia is important yet remains challenging since it hinges on expert based evaluation of speech and language production. In this study acoustic measures of speech in conjunction with voxel-based morphometry were used to determine the success of the measures as an adjunct to diagnosis and to explore the neural basis of apraxia of speech in nfvPPA. Forty-one patients (21 lvPPA, 20 nfvPPA) were recruited from a consecutive sample with suspected frontotemporal dementia. Patients were diagnosed using the current gold-standard of expert perceptual judgment, based on presence/absence of particular speech features during speaking tasks. Seventeen healthy age-matched adults served as controls. MRI scans were available for 11 control and 37 PPA cases; 23 of the PPA cases underwent amyloid ligand PET imaging. Measures, corresponding to perceptual features of apraxia of speech, were periods of silence during reading and relative vowel duration and intensity in polysyllable word repetition. Discriminant function analyses revealed that a measure of relative vowel duration differentiated nfvPPA cases from both control and lvPPA cases (r2 = 0.47) with 88% agreement with expert judgment of presence of apraxia of speech in nfvPPA cases. VBM analysis showed that relative vowel duration covaried with grey matter intensity in areas critical for speech motor planning and programming: precentral gyrus, supplementary motor area and inferior frontal gyrus bilaterally, only affected in the nfvPPA group. This bilateral involvement of frontal speech networks in nfvPPA potentially affects access to compensatory mechanisms involving right hemisphere homologues. Measures of silences during reading also discriminated the PPA and control groups, but did not increase predictive accuracy. Findings suggest that a measure of relative vowel duration from of a polysyllable word repetition task may be sufficient for detecting most cases of apraxia of speech and distinguishing between nfvPPA and lvPPA.
We examined the influence of bilingual experience and inhibitory control on the ability to learn a novel language. Using a statistical learning paradigm, participants learned words in two novel languages that were based on the International Morse Code. First, participants listened to a continuous stream of words in a Morse code language to test their ability to segment words from continuous speech. Since Morse code does not overlap in form with natural languages, interference from known languages was minimized. Next, participants listened to another Morse code language composed of new words that conflicted with the first Morse code language. Interference in this second language was high due to conflict between languages and due to the presence of two colliding cues (compressed pauses between words and statistical regularities) that competed to define word boundaries. Results suggest that bilingual experience can improve word learning when interference from other languages is low, while inhibitory control ability can improve word learning when interference from other languages is high. We conclude that the ability to extract novel words from continuous speech is a skill that is affected both by linguistic factors, such as bilingual experience, and by cognitive abilities, such as inhibitory control.
language acquisition; statistical learning; bilingualism; inhibitory control; Morse code; Simon task
Viewing hand gestures during face-to-face communication affects speech perception and comprehension. Despite the visible role played by gesture in social interactions, relatively little is known about how the brain integrates hand gestures with co-occurring speech. Here we used functional magnetic resonance imaging (fMRI) and an ecologically valid paradigm to investigate how beat gesture – a fundamental type of hand gesture that marks speech prosody – might impact speech perception at the neural level. Subjects underwent fMRI while listening to spontaneously-produced speech accompanied by beat gesture, nonsense hand movement, or a still body; as additional control conditions, subjects also viewed beat gesture, nonsense hand movement, or a still body all presented without speech. Validating behavioral evidence that gesture affects speech perception, bilateral nonprimary auditory cortex showed greater activity when speech was accompanied by beat gesture than when speech was presented alone. Further, the left superior temporal gyrus/sulcus showed stronger activity when speech was accompanied by beat gesture than when speech was accompanied by nonsense hand movement. Finally, the right planum temporale was identified as a putative multisensory integration site for beat gesture and speech (i.e., here activity in response to speech accompanied by beat gesture was greater than the summed responses to speech alone and beat gesture alone), indicating that this area may be pivotally involved in synthesizing the rhythmic aspects of both speech and gesture. Taken together, these findings suggest a common neural substrate for processing speech and gesture, likely reflecting their joint communicative role in social interactions.
gestures; speech perception; auditory cortex; magnetic resonance imaging; nonverbal communication