The presence of gesture during speech has been shown to impact perception, comprehension, learning, and memory in normal adults and typically developing children. In neurotypical individuals, the impact of viewing co-speech gestures representing an object and/or action (i.e., iconic gesture) or speech rhythm (i.e., beat gesture) has also been observed at the neural level. Yet, despite growing evidence of delayed gesture development in children with autism spectrum disorders (ASD), few studies have examined how the brain processes multimodal communicative cues occurring during everyday communication in individuals with ASD. Here, we used a previously validated functional magnetic resonance imaging (fMRI) paradigm to examine the neural processing of co-speech beat gesture in children with ASD and matched controls. Consistent with prior observations in adults, typically developing children showed increased responses in right superior temporal gyrus and sulcus while listening to speech accompanied by beat gesture. Children with ASD, however, exhibited no significant modulatory effects in secondary auditory cortices for the presence of co-speech beat gesture. Rather, relative to their typically developing counterparts, children with ASD showed significantly greater activity in visual cortex while listening to speech accompanied by beat gesture. Importantly, the severity of their socio-communicative impairments correlated with activity in this region, such that the more impaired children demonstrated the greatest activity in visual areas while viewing co-speech beat gesture. These findings suggest that although the typically developing brain recognizes beat gesture as communicative and successfully integrates it with co-occurring speech, information from multiple sensory modalities is not effectively integrated during social communication in the autistic brain.
Autism spectrum disorders; fMRI; gesture; language; superior temporal gyrus
Everyday communication is accompanied by visual information from several sources, including co-speech gestures, which provide semantic information listeners use to help disambiguate the speaker’s message. Using fMRI, we examined how gestures influence neural activity in brain regions associated with processing semantic information. The BOLD response was recorded while participants listened to stories under three audiovisual conditions and one auditory-only (speech alone) condition. In the first audiovisual condition, the storyteller produced gestures that naturally accompany speech. In the second, she made semantically unrelated hand movements. In the third, she kept her hands still. In addition to inferior parietal and posterior superior and middle temporal regions, bilateral posterior superior temporal sulcus and left anterior inferior frontal gyrus responded more strongly to speech when it was further accompanied by gesture, regardless of the semantic relation to speech. However, the right inferior frontal gyrus was sensitive to the semantic import of the hand movements, demonstrating more activity when hand movements were semantically unrelated to the accompanying speech. These findings show that perceiving hand movements during speech modulates the distributed pattern of neural activation involved in both biological motion perception and discourse comprehension, suggesting listeners attempt to find meaning, not only in the words speakers produce, but also in the hand movements that accompany speech.
discourse comprehension; fMRI; gestures; semantic processing; inferior frontal gyrus
Speakers convey meaning not only through words, but also through gestures. Although children are exposed to co-speech gestures from birth, we do not know how the developing brain comes to connect meaning conveyed in gesture with speech. We used functional magnetic resonance imaging (fMRI) to address this question and scanned 8- to 11-year-old children and adults listening to stories accompanied by hand movements, either meaningful co-speech gestures or meaningless self-adaptors. When listening to stories accompanied by both types of hand movements, both children and adults recruited inferior frontal, inferior parietal, and posterior temporal brain regions known to be involved in processing language not accompanied by hand movements. There were, however, age-related differences in activity in posterior superior temporal sulcus (STSp), inferior frontal gyrus, pars triangularis (IFGTr), and posterior middle temporal gyrus (MTGp) regions previously implicated in processing gesture. Both children and adults showed sensitivity to the meaning of hand movements in IFGTr and MTGp, but in different ways. Finally, we found that hand movement meaning modulates interactions between STSp and other posterior temporal and inferior parietal regions for adults, but not for children. These results shed light on the developing neural substrate for understanding meaning contributed by co-speech gesture.
Perception of speech and gestures engage common brain areas. Neural regions involved in speech perception overlap with those involved in speech production in an articulator-specific manner. Yet, it is unclear whether motor cortex also has a role in processing communicative actions like gesture and sign language. We asked whether the mere observation of hand gestures, paired and not paired with words, may result in changes in the excitability of the hand and tongue areas of motor cortex. Using single-pulse transcranial magnetic stimulation (TMS), we measured the motor excitability in tongue and hand areas of left primary motor cortex, while participants viewed video sequences of bimanual hand movements associated or not-associated with nouns. We found higher motor excitability in the tongue area during the presentation of meaningful gestures (noun-associated) as opposed to meaningless ones, while the excitability of hand motor area was not differentially affected by gesture observation. Our results let us argue that the observation of gestures associated with a word results in activation of articulatory motor network accompanying speech production.
transcranial magnetic stimulation; tongue motor excitability; speech perception; gesture perception; sign language
The issue of whether speech is supported by the same neural substrates as non-speech vocal-tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, were compared to the production of speech syllables without meaning. Brain activation related to overt production was captured with BOLD fMRI using a sparse sampling design for both conditions. Speech and non-speech were compared using voxel-wise whole brain analyses, and ROI analyses focused on frontal and temporoparietal structures previously reported to support speech production. Results showed substantial activation overlap between speech and non-speech function in regions. Although non-speech gesture production showed greater extent and amplitude of activation in the regions examined, both speech and non-speech showed comparable left laterality in activation for both target perception and production. These findings posit a more general role of the previously proposed “auditory dorsal stream” in the left hemisphere – to support the production of vocal tract gestures that are not limited to speech processing.
sensory-motor interaction; auditory dorsal stream; functional magnetic resonance imaging (fMRI)
The study of the production of co-speech gestures (CSGs), i.e., meaningful hand movements that often accompany speech during everyday discourse, provides an important opportunity to investigate the integration of language, action, and memory because of the semantic overlap between gesture movements and speech content. Behavioral studies of CSGs and speech suggest that they have a common base in memory and predict that overt production of both speech and CSGs would be preceded by neural activity related to memory processes. However, to date the neural correlates and timing of CSG production are still largely unknown. In the current study, we addressed these questions with magnetoencephalography and a semantic association paradigm in which participants overtly produced speech or gesture responses that were either meaningfully related to a stimulus or not. Using spectral and beamforming analyses to investigate the neural activity preceding the responses, we found a desynchronization in the beta band (15–25 Hz), which originated 900 ms prior to the onset of speech and was localized to motor and somatosensory regions in the cortex and cerebellum, as well as right inferior frontal gyrus. Beta desynchronization is often seen as an indicator of motor processing and thus reflects motor activity related to the hand movements that gestures add to speech. Furthermore, our results show oscillations in the high gamma band (50–90 Hz), which originated 400 ms prior to speech onset and were localized to the left medial temporal lobe. High gamma oscillations have previously been found to be involved in memory processes and we thus interpret them to be related to contextual association of semantic information in memory. The results of our study show that high gamma oscillations in medial temporal cortex play an important role in the binding of information in human memory during speech and CSG production.
Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex (PMC) has been shown to be active during both observation and execution of action (“Mirror System” properties), and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI) study, participants identified vowels produced by a speaker in audio-visual (saw the speaker's articulating face and heard her voice), visual only (only saw the speaker's articulating face), and audio only (only heard the speaker's voice) conditions with varying audio signal-to-noise ratios in order to determine the regions of the PMC involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the functional magnetic resonance imaging (fMRI) analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and PMC. The left ventral inferior premotor cortex (PMvi) showed properties of multimodal (audio-visual) enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex (PMvs/PMd) did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the PMC are involved with mapping unimodal (in this case visual) sensory features of the speech signal with articulatory speech gestures.
audio-visual; premotor; multisensory; mirror system; fMRI; internal model
Abstractness and modality of interpersonal communication have a considerable impact on comprehension. They are relevant for determining thoughts and constituting internal models of the environment. Whereas concrete object-related information can be represented in mind irrespective of language, abstract concepts require a representation in speech. Consequently, modality-independent processing of abstract information can be expected. Here we investigated the neural correlates of abstractness (abstract vs. concrete) and modality (speech vs. gestures), to identify an abstractness-specific supramodal neural network. During fMRI data acquisition 20 participants were presented with videos of an actor either speaking sentences with an abstract-social [AS] or concrete-object-related content [CS], or performing meaningful abstract-social emblematic [AG] or concrete-object-related tool-use gestures [CG]. Gestures were accompanied by a foreign language to increase the comparability between conditions and to frame the communication context of the gesture videos. Participants performed a content judgment task referring to the person vs. object-relatedness of the utterances. The behavioral data suggest a comparable comprehension of contents communicated by speech or gesture. Furthermore, we found common neural processing for abstract information independent of modality (AS > CS ∩ AG > CG) in a left hemispheric network including the left inferior frontal gyrus (IFG), temporal pole, and medial frontal cortex. Modality specific activations were found in bilateral occipital, parietal, and temporal as well as right inferior frontal brain regions for gesture (G > S) and in left anterior temporal regions and the left angular gyrus for the processing of speech semantics (S > G). These data support the idea that abstract concepts are represented in a supramodal manner. Consequently, gestures referring to abstract concepts are processed in a predominantly left hemispheric language related neural network.
gesture; speech; fMRI; abstract semantics; emblematic gestures; tool-use gestures
The dual-route model of speech processing includes a dorsal stream that maps auditory to motor features at the sublexical level rather than at the lexico-semantic level. However, the literature on gesture is an invitation to revise this model because it suggests that the premotor cortex of the dorsal route is a major site of lexico-semantic interaction. Here we investigated lexico-semantic mapping using word-gesture pairs that were either congruent or incongruent. Using fMRI-adaptation in 28 subjects, we found that temporo-parietal and premotor activity during auditory processing of single action words was modulated by the prior audiovisual context in which the words had been repeated. The BOLD signal was suppressed following repetition of the auditory word alone, and further suppressed following repetition of the word accompanied by a congruent gesture (e.g. [“grasp” + grasping gesture]). Conversely, repetition suppression was not observed when the same action word was accompanied by an incongruent gesture (e.g. [“grasp” + sprinkle]). We propose a simple model to explain these results: auditory and visual information converge onto premotor cortex where it is represented in a comparable format to determine (in)congruence between speech and gesture. This ability of the dorsal route to detect audiovisual semantic (in)congruence suggests that its function is not restricted to the sublexical level.
Visual speech (lip-reading) influences the perception of heard speech. The literature suggests at least two possible mechanisms for this influence: “direct” sensory-sensory interaction, whereby sensory signals from auditory and visual modalities are integrated directly, likely in the superior temporal sulcus, and “indirect” sensory-motor interaction, whereby visual speech is first mapped onto motor-speech representations in the frontal lobe, which in turn influences sensory perception via sensory-motor integration networks. We hypothesize that both mechanisms exist, and further that lip-reading functional activations of Broca’s region and the posterior planum temporale reflect the sensory-motor mechanism. We tested one prediction of this hypothesis using fMRI. We assessed whether viewing visual speech (contrasted with facial gestures) activates the same network as a speech sensory-motor integration task (listen to and then silently rehearse speech). Both tasks activated locations within Broca’s area, dorsal premotor cortex, and the posterior planum temporal (Spt), and focal regions of the STS, all of which have previously been implicated in sensory-motor integration for speech. This finding is consistent with the view that visual speech influences heard speech via sensory-motor networks. Lip-reading also activated a much wider network in the superior temporal lobe than the sensory-motor task, possibly reflecting a more direct cross-sensory integration network.
A framework is described for understanding the schizophrenic syndrome at the brain systems level. It is hypothesized that over-activation of dynamic gesture and social perceptual processes in the temporal-parietal occipital junction (TPJ), posterior superior temporal sulcus (PSTS) and surrounding regions produce the syndrome (including positive and negative symptoms, their prevalence, prodromal signs, and cognitive deficits). Hippocampal system hyper-activity and atrophy have been consistently found in schizophrenia. Hippocampal activity is highly correlated with activity in the TPJ and may be a source of over-excitation of the TPJ and surrounding regions. Strong evidence for this comes from in-vivo recordings in humans during psychotic episodes. Many positive symptoms of schizophrenia can be reframed as the erroneous sense of a presence or other who is observing, acting, speaking, or controlling; these qualia are similar to those evoked during abnormal activation of the TPJ. The TPJ and PSTS play a key role in the perception (and production) of dynamic social, emotional, and attentional gestures for the self and others (e.g., body/face/eye gestures, audiovisual speech and prosody, and social attentional gestures such as eye gaze). The single cell representation of dynamic gestures is multimodal (auditory, visual, tactile), matching the predominant hallucinatory categories in schizophrenia. Inherent in the single cell perceptual signal of dynamic gesture representations is a computation of intention, agency, and anticipation or expectancy (for the self and others). Stimulation of the TPJ resulting in activation of the self representation has been shown to result a feeling of a presence or multiple presences (due to heautoscopy) and also bizarre tactile experiences. Neurons in the TPJ are also tuned, or biased to detect threat related emotions. Abnormal over-activation in this system could produce the conscious hallucination of a voice (audiovisual speech), a person or a touch. Over-activation could interfere with attentional/emotional gesture perception and production (negative symptoms). It could produce the unconscious feeling of being watched, followed, or of a social situation unfolding along with accompanying abnormal perception of intent and agency (delusions). Abnormal activity in the TPJ would also be predicted to create several cognitive disturbances that are characteristic of schizophrenia, including abnormalities in attention, predictive social processing, working memory, and a bias to erroneously perceive threat.
hippocampus; schizophrenia; parietal; biological motion; superior temporal sulcus; intraparietal sulcus; autism; social cognition
When people talk to each other, they often make arm and hand movements that accompany what they say. These manual movements, called “co-speech gestures,” can convey meaning by way of their interaction with the oral message. Another class of manual gestures, called “emblematic gestures” or “emblems,” also conveys meaning, but in contrast to co-speech gestures, they can do so directly and independent of speech. There is currently significant interest in the behavioral and biological relationships between action and language. Since co-speech gestures are actions that rely on spoken language, and emblems convey meaning to the effect that they can sometimes substitute for speech, these actions may be important, and potentially informative, examples of language–motor interactions. Researchers have recently been examining how the brain processes these actions. The current results of this work do not yet give a clear understanding of gesture processing at the neural level. For the most part, however, it seems that two complimentary sets of brain areas respond when people see gestures, reflecting their role in disambiguating meaning. These include areas thought to be important for understanding actions and areas ordinarily related to processing language. The shared and distinct responses across these two sets of areas during communication are just beginning to emerge. In this review, we talk about the ways that the brain responds when people see gestures, how these responses relate to brain activity when people process language, and how these might relate in normal, everyday communication.
gesture; language; brain; meaning; action understanding; fMRI
In everyday conversation, listeners often rely on a speaker’s gestures to clarify any ambiguities in the verbal message. Using fMRI during naturalistic story comprehension, we examined which brain regions in the listener are sensitive to speakers’ iconic gestures. We focused on iconic gestures that contribute information not found in the speaker’s talk, compared to those that convey information redundant with the speaker’s talk. We found that three regions—left inferior frontal gyrus triangular (IFGTr) and opercular (IFGOp) portions, and left posterior middle temporal gyrus (MTGp)—responded more strongly when gestures added information to non-specific language, compared to when they conveyed the same information in more specific language; in other words, when gesture disambiguated speech as opposed to reinforced it. An increased BOLD response was not found in these regions when the non-specific language was produced without gesture, suggesting that IFGTr, IFGOp, and MTGp are involved in integrating semantic information across gesture and speech. In addition, we found that activity in the posterior superior temporal sulcus (STSp), previously thought to be involved in gesture-speech integration, was not sensitive to the gesture-speech relation. Together, these findings clarify the neurobiology of gesture-speech integration and contribute to an emerging picture of how listeners glean meaning from gestures that accompany speech.
gestures; semantic; language; inferior frontal gyrus; posterior superior temporal sulcus; posterior middle temporal gyrus
In a natural setting, speech is often accompanied by gestures. As language, speech-accompanying iconic gestures to some extent convey semantic information. However, if comprehension of the information contained in both the auditory and visual modality depends on same or different brain-networks is quite unknown. In this fMRI study, we aimed at identifying the cortical areas engaged in supramodal processing of semantic information. BOLD changes were recorded in 18 healthy right-handed male subjects watching video clips showing an actor who either performed speech (S, acoustic) or gestures (G, visual) in more (+) or less (−) meaningful varieties. In the experimental conditions familiar speech or isolated iconic gestures were presented; during the visual control condition the volunteers watched meaningless gestures (G−), while during the acoustic control condition a foreign language was presented (S−). The conjunction of the visual and acoustic semantic processing revealed activations extending from the left inferior frontal gyrus to the precentral gyrus, and included bilateral posterior temporal regions. We conclude that proclaiming this frontotemporal network the brain's core language system is to take too narrow a view. Our results rather indicate that these regions constitute a supramodal semantic processing network.
Manual gestures occur on a continuum from co-speech gesticulations to conventionalized emblems to language signs. Our goal in the present study was to understand the neural bases of the processing of gestures along such a continuum. We studied four types of gestures, varying along linguistic and semantic dimensions: linguistic and meaningful American Sign Language (ASL), non-meaningful pseudo-ASL, meaningful emblematic, and nonlinguistic, non-meaningful made-up gestures. Pre-lingually deaf, native signers of ASL participated in the fMRI study and performed two tasks while viewing videos of the gestures: a visuo-spatial (identity) discrimination task and a category discrimination task. We found that the categorization task activated left ventral middle and inferior frontal gyrus, among other regions, to a greater extent compared to the visual discrimination task, supporting the idea of semantic-level processing of the gestures. The reverse contrast resulted in enhanced activity of bilateral intraparietal sulcus, supporting the idea of featural-level processing (analogous to phonological-level processing of speech sounds) of the gestures. Regardless of the task, we found that brain activation patterns for the nonlinguistic, non-meaningful gestures were the most different compared to the ASL gestures. The activation patterns for the emblems were most similar to those of the ASL gestures and those of the pseudo-ASL were most similar to the nonlinguistic, non-meaningful gestures. The fMRI results provide partial support for the conceptualization of different gestures as belonging to a continuum and the variance in the fMRI results was best explained by differences in the processing of gestures along the semantic dimension.
American Sign Language; gestures; Deaf; visual processing; categorization; linguistic; brain; fMRI
Although the linguistic structure of speech provides valuable communicative information, nonverbal behaviors can offer additional, often disambiguating cues. In particular, being able to see the face and hand movements of a speaker facilitates language comprehension . But how does the brain derive meaningful information from these movements? Mouth movements provide information about phonological aspects of speech [2–3]. In contrast, cospeech gestures display semantic information relevant to the intended message[4–6].We show that when language comprehension is accompanied by observable face movements, there is strong functional connectivity between areas of cortex involved in motor planning and production and posterior areas thought to mediate phonological aspects of speech perception. In contrast, language comprehension accompanied by cospeech gestures is associated with tuning of and strong functional connectivity between motor planning and production areas and anterior areas thought to mediate semantic aspects of language comprehension. These areas are not tuned to hand and arm movements that are not meaningful. Results suggest that when gestures accompany speech, the motor system works with language comprehension areas to determine the meaning of those gestures. Results also suggest that the cortical networks underlying language comprehension, rather than being fixed, are dynamically organized by the type of contextual information available to listeners during face-to-face communication.
Although stuttering is regarded as a speech-specific disorder, there is a growing body of evidence suggesting that subtle abnormalities in the motor planning and execution of non-speech gestures exist in stuttering individuals. We hypothesized that people who stutter (PWS) would differ from fluent controls in their neural responses during motor planning and execution of both speech and non-speech gestures that had auditory targets. Using fMRI with sparse sampling, separate BOLD responses were measured for perception, planning, and fluent production of speech and non-speech vocal tract gestures. During both speech and non-speech perception and planning, PWS had less activation in the frontal and temporoparietal regions relative to controls. During speech and non-speech production, PWS had less activation than the controls in the left superior temporal gyrus (STG) and the left pre-motor areas (BA 6) but greater activation in the right STG, bilateral Heschl’s gyrus (HG), insula, putamen, and precentral motor regions (BA 4). Differences in brain activation patterns between PWS and controls were greatest in the females and less apparent in males. In conclusion, similar differences in PWS from the controls were found during speech and non-speech; during perception and planning they had reduced activation while during production they had increased activity in the auditory area on the right and decreased activation in the left sensorimotor regions. These results demonstrated that neural activation differences in PWS are not speech-specific.
Stuttering; Speech perception; planning; production; non-speech; functional magnetic resonance imaging (fMRI); auditory-motor interaction; forward model
Effective pain communication is essential if adequate treatment and support are to be provided. Pain communication is often multimodal, with sufferers utilising speech, nonverbal behaviours (such as facial expressions), and co-speech gestures (bodily movements, primarily of the hands and arms that accompany speech and can convey semantic information) to communicate their experience. Research suggests that the production of nonverbal pain behaviours is positively associated with pain intensity, but it is not known whether this is also the case for speech and co-speech gestures. The present study explored whether increased pain intensity is associated with greater speech and gesture production during face-to-face communication about acute, experimental pain. Participants (N = 26) were exposed to experimentally elicited pressure pain to the fingernail bed at high and low intensities and took part in video-recorded semi-structured interviews. Despite rating more intense pain as more difficult to communicate (t(25) = 2.21, p = .037), participants produced significantly longer verbal pain descriptions and more co-speech gestures in the high intensity pain condition (Words: t(25) = 3.57, p = .001; Gestures: t(25) = 3.66, p = .001). This suggests that spoken and gestural communication about pain is enhanced when pain is more intense. Thus, in addition to conveying detailed semantic information about pain, speech and co-speech gestures may provide a cue to pain intensity, with implications for the treatment and support received by pain sufferers. Future work should consider whether these findings are applicable within the context of clinical interactions about pain.
Recent research suggests that the brain routinely binds together information from gesture and speech. However, most of this research focused on the integration of representational gestures with the semantic content of speech. Much less is known about how other aspects of gesture, such as emphasis, influence the interpretation of the syntactic relations in a spoken message. Here, we investigated whether beat gestures alter which syntactic structure is assigned to ambiguous spoken German sentences. The P600 component of the Event Related Brain Potential indicated that the more complex syntactic structure is easier to process when the speaker emphasizes the subject of a sentence with a beat. Thus, a simple flick of the hand can change our interpretation of who has been doing what to whom in a spoken sentence. We conclude that gestures and speech are integrated systems. Unlike previous studies, which have shown that the brain effortlessly integrates semantic information from gesture and speech, our study is the first to demonstrate that this integration also occurs for syntactic information. Moreover, the effect appears to be gesture-specific and was not found for other stimuli that draw attention to certain parts of speech, including prosodic emphasis, or a moving visual stimulus with the same trajectory as the gesture. This suggests that only visual emphasis produced with a communicative intention in mind (that is, beat gestures) influences language comprehension, but not a simple visual movement lacking such an intention.
language; syntax; audiovisual; P600; ambiguity
Previous studies have shown that iconic gestures presented in an isolated manner prime visually presented semantically related words. Since gestures and speech are almost always produced together, this study examined whether iconic gestures accompanying speech would prime words and compared the priming effect of iconic gestures with speech to that of iconic gestures presented alone. Adult participants (N = 180) were randomly assigned to one of three conditions in a lexical decision task: Gestures-Only (the primes were iconic gestures presented alone); Speech-Only (the primes were auditory tokens conveying the same meaning as the iconic gestures); Gestures-Accompanying-Speech (the primes were the simultaneous coupling of iconic gestures and their corresponding auditory tokens). Our findings revealed significant priming effects in all three conditions. However, the priming effect in the Gestures-Accompanying-Speech condition was comparable to that in the Speech-Only condition and was significantly weaker than that in the Gestures-Only condition, suggesting that the facilitatory effect of iconic gestures accompanying speech may be constrained by the level of language processing required in the lexical decision task where linguistic processing of words forms is more dominant than semantic processing. Hence, the priming effect afforded by the co-speech iconic gestures was weakened.
co-speech gestures; cross-modal priming; lexical decision; language processing
Speech comprehension is a complex human skill, the performance of which requires the perceiver to combine information from several sources – e.g. voice, face, gesture, linguistic context – to achieve an intelligible and interpretable percept. We describe a functional imaging investigation of how auditory, visual and linguistic information interact to facilitate comprehension. Our specific aims were to investigate the neural responses to these different information sources, alone and in interaction, and further to use behavioural speech comprehension scores to address sites of intelligibility-related activation in multifactorial speech comprehension. In fMRI, participants passively watched videos of spoken sentences, in which we varied Auditory Clarity (with noise-vocoding), Visual Clarity (with Gaussian blurring) and Linguistic Predictability. Main effects of enhanced signal with increased auditory and visual clarity were observed in overlapping regions of posterior STS. Two-way interactions of the factors (auditory × visual, auditory × predictability) in the neural data were observed outside temporal cortex, where positive signal change in response to clearer facial information and greater semantic predictability was greatest at intermediate levels of auditory clarity. Overall changes in stimulus intelligibility by condition (as determined using an independent behavioural experiment) were reflected in the neural data by increased activation predominantly in bilateral dorsolateral temporal cortex, as well as inferior frontal cortex and left fusiform gyrus. Specific investigation of intelligibility changes at intermediate auditory clarity revealed a set of regions, including posterior STS and fusiform gyrus, showing enhanced responses to both visual and linguistic information. Finally, an individual differences analysis showed that greater comprehension performance in the scanning participants (measured in a post-scan behavioural test) were associated with increased activation in left inferior frontal gyrus and left posterior STS. The current multimodal speech comprehension paradigm demonstrates recruitment of a wide comprehension network in the brain, in which posterior STS and fusiform gyrus form sites for convergence of auditory, visual and linguistic information, while left-dominant sites in temporal and frontal cortex support successful comprehension.
speech; fMRI; auditory cortex; individual differences; noise-vocoding
To investigate, by means of fMRI, the influence of the visual environment in the process of symbolic gesture recognition. Emblems are semiotic gestures that use movements or hand postures to symbolically encode and communicate meaning, independently of language. They often require contextual information to be correctly understood. Until now, observation of symbolic gestures was studied against a blank background where the meaning and intentionality of the gesture was not fulfilled.
Normal subjects were scanned while observing short videos of an individual performing symbolic gesture with or without the corresponding visual context and the context scenes without gestures. The comparison between gestures regardless of the context demonstrated increased activity in the inferior frontal gyrus, the superior parietal cortex and the temporoparietal junction in the right hemisphere and the precuneus and posterior cingulate bilaterally, while the comparison between context and gestures alone did not recruit any of these regions.
These areas seem to be crucial for the inference of intentions in symbolic gestures observed in their natural context and represent an interrelated network formed by components of the putative human neuron mirror system as well as the mentalizing system.
Speech-associated gestures are hand and arm movements that not only convey semantic information to listeners but are themselves actions. Broca’s area has been assumed to play an important role both in semantic retrieval or selection (as part of a language comprehension system) and in action recognition (as part of a “mirror” or “observation–execution matching” system). We asked whether the role that Broca’s area plays in processing speech-associated gestures is consistent with the semantic retrieval/selection account (predicting relatively weak interactions between Broca’s area and other cortical areas because the meaningful information that speech-associated gestures convey reduces semantic ambiguity and thus reduces the need for semantic retrieval/selection) or the action recognition account (predicting strong interactions between Broca’s area and other cortical areas because speech-associated gestures are goal-direct actions that are “mirrored”). We compared the functional connectivity of Broca’s area with other cortical areas when participants listened to stories while watching meaningful speech-associated gestures, speech-irrelevant self-grooming hand movements, or no hand movements. A network analysis of neuroimaging data showed that interactions involving Broca’s area and other cortical areas were weakest when spoken language was accompanied by meaningful speech-associated gestures, and strongest when spoken language was accompanied by self-grooming hand movements or by no hand movements at all. Results are discussed with respect to the role that the human mirror system plays in processing speech-associated movements.
Language; Gesture; Face; The motor system; Premotor cortex; Broca’s area; Pars opercularis; Pars triangularis; Mirror neurons; The human mirror system; Action recognition; Action understanding; Structural equation models
When we talk to one another face-to-face, body gestures accompany our speech. Motion tracking technology enables us to include body gestures in avatar-mediated communication, by mapping one's movements onto one's own 3D avatar in real time, so the avatar is self-animated. We conducted two experiments to investigate (a) whether head-mounted display virtual reality is useful for researching the influence of body gestures in communication; and (b) whether body gestures are used to help in communicating the meaning of a word. Participants worked in pairs and played a communication game, where one person had to describe the meanings of words to the other.
In experiment 1, participants used significantly more hand gestures and successfully described significantly more words when nonverbal communication was available to both participants (i.e. both describing and guessing avatars were self-animated, compared with both avatars in a static neutral pose). Participants ‘passed’ (gave up describing) significantly more words when they were talking to a static avatar (no nonverbal feedback available). In experiment 2, participants' performance was significantly worse when they were talking to an avatar with a prerecorded listening animation, compared with an avatar animated by their partners' real movements. In both experiments participants used significantly more hand gestures when they played the game in the real world.
Taken together, the studies show how (a) virtual reality can be used to systematically study the influence of body gestures; (b) it is important that nonverbal communication is bidirectional (real nonverbal feedback in addition to nonverbal communication from the describing participant); and (c) there are differences in the amount of body gestures that participants use with and without the head-mounted display, and we discuss possible explanations for this and ideas for future investigation.
Space and shape are distinct perceptual categories. In language, perceptual information can also be used to describe abstract semantic concepts like a “rising income” (space) or a “square personality” (shape). Despite being inherently concrete, co-speech gestures depicting space and shape can accompany concrete or abstract utterances. Here, we investigated the way that abstractness influences the neural processing of the perceptual categories of space and shape in gestures. Thus, we tested the hypothesis that the neural processing of perceptual categories is highly dependent on language context. In a two-factorial design, we investigated the neural basis for the processing of gestures containing shape (SH) and spatial information (SP) when accompanying concrete (c) or abstract (a) verbal utterances. During fMRI data acquisition participants were presented with short video clips of the four conditions (cSP, aSP, cSH, aSH) while performing an independent control task. Abstract (a) as opposed to concrete (c) utterances activated temporal lobes bilaterally and the left inferior frontal gyrus (IFG) for both shape-related (SH) and space-related (SP) utterances. An interaction of perceptual category and semantic abstractness in a more anterior part of the left IFG and inferior part of the posterior temporal lobe (pTL) indicates that abstractness strongly influenced the neural processing of space and shape information. Despite the concrete visual input of co-speech gestures in all conditions, space and shape information is processed differently depending on the semantic abstractness of its linguistic context.
iconic gestures; deictic gestures; metaphoric gestures; functional magnetic resonance imaging; speech-associated gestures; cognition