The presence of gesture during speech has been shown to impact perception, comprehension, learning, and memory in normal adults and typically developing children. In neurotypical individuals, the impact of viewing co-speech gestures representing an object and/or action (i.e., iconic gesture) or speech rhythm (i.e., beat gesture) has also been observed at the neural level. Yet, despite growing evidence of delayed gesture development in children with autism spectrum disorders (ASD), few studies have examined how the brain processes multimodal communicative cues occurring during everyday communication in individuals with ASD. Here, we used a previously validated functional magnetic resonance imaging (fMRI) paradigm to examine the neural processing of co-speech beat gesture in children with ASD and matched controls. Consistent with prior observations in adults, typically developing children showed increased responses in right superior temporal gyrus and sulcus while listening to speech accompanied by beat gesture. Children with ASD, however, exhibited no significant modulatory effects in secondary auditory cortices for the presence of co-speech beat gesture. Rather, relative to their typically developing counterparts, children with ASD showed significantly greater activity in visual cortex while listening to speech accompanied by beat gesture. Importantly, the severity of their socio-communicative impairments correlated with activity in this region, such that the more impaired children demonstrated the greatest activity in visual areas while viewing co-speech beat gesture. These findings suggest that although the typically developing brain recognizes beat gesture as communicative and successfully integrates it with co-occurring speech, information from multiple sensory modalities is not effectively integrated during social communication in the autistic brain.
Autism spectrum disorders; fMRI; gesture; language; superior temporal gyrus
Everyday communication is accompanied by visual information from several sources, including co-speech gestures, which provide semantic information listeners use to help disambiguate the speaker’s message. Using fMRI, we examined how gestures influence neural activity in brain regions associated with processing semantic information. The BOLD response was recorded while participants listened to stories under three audiovisual conditions and one auditory-only (speech alone) condition. In the first audiovisual condition, the storyteller produced gestures that naturally accompany speech. In the second, she made semantically unrelated hand movements. In the third, she kept her hands still. In addition to inferior parietal and posterior superior and middle temporal regions, bilateral posterior superior temporal sulcus and left anterior inferior frontal gyrus responded more strongly to speech when it was further accompanied by gesture, regardless of the semantic relation to speech. However, the right inferior frontal gyrus was sensitive to the semantic import of the hand movements, demonstrating more activity when hand movements were semantically unrelated to the accompanying speech. These findings show that perceiving hand movements during speech modulates the distributed pattern of neural activation involved in both biological motion perception and discourse comprehension, suggesting listeners attempt to find meaning, not only in the words speakers produce, but also in the hand movements that accompany speech.
discourse comprehension; fMRI; gestures; semantic processing; inferior frontal gyrus
In a natural setting, speech is often accompanied by gestures. As language, speech-accompanying iconic gestures to some extent convey semantic information. However, if comprehension of the information contained in both the auditory and visual modality depends on same or different brain-networks is quite unknown. In this fMRI study, we aimed at identifying the cortical areas engaged in supramodal processing of semantic information. BOLD changes were recorded in 18 healthy right-handed male subjects watching video clips showing an actor who either performed speech (S, acoustic) or gestures (G, visual) in more (+) or less (−) meaningful varieties. In the experimental conditions familiar speech or isolated iconic gestures were presented; during the visual control condition the volunteers watched meaningless gestures (G−), while during the acoustic control condition a foreign language was presented (S−). The conjunction of the visual and acoustic semantic processing revealed activations extending from the left inferior frontal gyrus to the precentral gyrus, and included bilateral posterior temporal regions. We conclude that proclaiming this frontotemporal network the brain's core language system is to take too narrow a view. Our results rather indicate that these regions constitute a supramodal semantic processing network.
Although stuttering is regarded as a speech-specific disorder, there is a growing body of evidence suggesting that subtle abnormalities in the motor planning and execution of non-speech gestures exist in stuttering individuals. We hypothesized that people who stutter (PWS) would differ from fluent controls in their neural responses during motor planning and execution of both speech and non-speech gestures that had auditory targets. Using fMRI with sparse sampling, separate BOLD responses were measured for perception, planning, and fluent production of speech and non-speech vocal tract gestures. During both speech and non-speech perception and planning, PWS had less activation in the frontal and temporoparietal regions relative to controls. During speech and non-speech production, PWS had less activation than the controls in the left superior temporal gyrus (STG) and the left pre-motor areas (BA 6) but greater activation in the right STG, bilateral Heschl’s gyrus (HG), insula, putamen, and precentral motor regions (BA 4). Differences in brain activation patterns between PWS and controls were greatest in the females and less apparent in males. In conclusion, similar differences in PWS from the controls were found during speech and non-speech; during perception and planning they had reduced activation while during production they had increased activity in the auditory area on the right and decreased activation in the left sensorimotor regions. These results demonstrated that neural activation differences in PWS are not speech-specific.
Stuttering; Speech perception; planning; production; non-speech; functional magnetic resonance imaging (fMRI); auditory-motor interaction; forward model
Speakers convey meaning not only through words, but also through gestures. Although children are exposed to co-speech gestures from birth, we do not know how the developing brain comes to connect meaning conveyed in gesture with speech. We used functional magnetic resonance imaging (fMRI) to address this question and scanned 8- to 11-year-old children and adults listening to stories accompanied by hand movements, either meaningful co-speech gestures or meaningless self-adaptors. When listening to stories accompanied by both types of hand movements, both children and adults recruited inferior frontal, inferior parietal, and posterior temporal brain regions known to be involved in processing language not accompanied by hand movements. There were, however, age-related differences in activity in posterior superior temporal sulcus (STSp), inferior frontal gyrus, pars triangularis (IFGTr), and posterior middle temporal gyrus (MTGp) regions previously implicated in processing gesture. Both children and adults showed sensitivity to the meaning of hand movements in IFGTr and MTGp, but in different ways. Finally, we found that hand movement meaning modulates interactions between STSp and other posterior temporal and inferior parietal regions for adults, but not for children. These results shed light on the developing neural substrate for understanding meaning contributed by co-speech gesture.
How does gesturing help children learn? Gesturing might encourage children to extract meaning implicit in their hand movements. If so, children should be sensitive to the particular movements they produce and learn accordingly. Alternatively, all that may matter is that children move their hands. If so, they should learn regardless of which movements they produce. To investigate these alternatives, we manipulated gesturing during a math lesson. We found that children required to produce correct gestures learned more than children required to produce partially correct gestures, who learned more than children required to produce no gestures. This effect was mediated by whether children took information conveyed solely in their gestures and added it to their speech. The findings suggest that body movements are involved not only in processing old ideas, but also in creating new ones. We may be able to lay foundations for new knowledge simply by telling learners how to move their hands.
Recent research suggests that the brain routinely binds together information from gesture and speech. However, most of this research focused on the integration of representational gestures with the semantic content of speech. Much less is known about how other aspects of gesture, such as emphasis, influence the interpretation of the syntactic relations in a spoken message. Here, we investigated whether beat gestures alter which syntactic structure is assigned to ambiguous spoken German sentences. The P600 component of the Event Related Brain Potential indicated that the more complex syntactic structure is easier to process when the speaker emphasizes the subject of a sentence with a beat. Thus, a simple flick of the hand can change our interpretation of who has been doing what to whom in a spoken sentence. We conclude that gestures and speech are integrated systems. Unlike previous studies, which have shown that the brain effortlessly integrates semantic information from gesture and speech, our study is the first to demonstrate that this integration also occurs for syntactic information. Moreover, the effect appears to be gesture-specific and was not found for other stimuli that draw attention to certain parts of speech, including prosodic emphasis, or a moving visual stimulus with the same trajectory as the gesture. This suggests that only visual emphasis produced with a communicative intention in mind (that is, beat gestures) influences language comprehension, but not a simple visual movement lacking such an intention.
language; syntax; audiovisual; P600; ambiguity
Evidence is reviewed for the existence of a core system for moment-to-moment social communication that is based on the perception of dynamic gestures and other social perceptual processes in the temporal-parietal occipital junction (TPJ), including the posterior superior temporal sulcus (PSTS) and surrounding regions. Overactivation of these regions may produce the schizophrenic syndrome. The TPJ plays a key role in the perception and production of dynamic social, emotional, and attentional gestures for the self and others. These include dynamic gestures of the body, face, and eyes as well as audiovisual speech and prosody. Many negative symptoms are characterized by deficits in responding within these domains. Several properties of this system have been discovered through single neuron recording, brain stimulation, neuroimaging, and the study of neurological impairment. These properties map onto the schizophrenic syndrome. The representation of dynamic gestures is multimodal (auditory, visual, and tactile), matching the predominant hallucinatory categories in schizophrenia. Inherent in the perceptual signal of gesture representation is a computation of intention, agency, and anticipation or expectancy (for the self and others). The neurons are also tuned or biased to rapidly detect threat-related emotions. I review preliminary evidence that overactivation of this system can result in schizophrenia.
To investigate, by means of fMRI, the influence of the visual environment in the process of symbolic gesture recognition. Emblems are semiotic gestures that use movements or hand postures to symbolically encode and communicate meaning, independently of language. They often require contextual information to be correctly understood. Until now, observation of symbolic gestures was studied against a blank background where the meaning and intentionality of the gesture was not fulfilled.
Normal subjects were scanned while observing short videos of an individual performing symbolic gesture with or without the corresponding visual context and the context scenes without gestures. The comparison between gestures regardless of the context demonstrated increased activity in the inferior frontal gyrus, the superior parietal cortex and the temporoparietal junction in the right hemisphere and the precuneus and posterior cingulate bilaterally, while the comparison between context and gestures alone did not recruit any of these regions.
These areas seem to be crucial for the inference of intentions in symbolic gestures observed in their natural context and represent an interrelated network formed by components of the putative human neuron mirror system as well as the mentalizing system.
The dual-route model of speech processing includes a dorsal stream that maps auditory to motor features at the sublexical level rather than at the lexico-semantic level. However, the literature on gesture is an invitation to revise this model because it suggests that the premotor cortex of the dorsal route is a major site of lexico-semantic interaction. Here we investigated lexico-semantic mapping using word-gesture pairs that were either congruent or incongruent. Using fMRI-adaptation in 28 subjects, we found that temporo-parietal and premotor activity during auditory processing of single action words was modulated by the prior audiovisual context in which the words had been repeated. The BOLD signal was suppressed following repetition of the auditory word alone, and further suppressed following repetition of the word accompanied by a congruent gesture (e.g. [“grasp” + grasping gesture]). Conversely, repetition suppression was not observed when the same action word was accompanied by an incongruent gesture (e.g. [“grasp” + sprinkle]). We propose a simple model to explain these results: auditory and visual information converge onto premotor cortex where it is represented in a comparable format to determine (in)congruence between speech and gesture. This ability of the dorsal route to detect audiovisual semantic (in)congruence suggests that its function is not restricted to the sublexical level.
Gesturing is ubiquitous in communication and serves an important function for listeners, who are able to glean meaningful information from the gestures they see. But gesturing also functions for speakers, whose own gestures reduce demands on their working memory. Here we ask whether gesture’s beneficial effects on working memory stem from its properties as a rhythmic movement, or as a vehicle for representing meaning. We asked speakers to remember letters while explaining their solutions to math problems and producing varying types of movements. Speakers recalled significantly more letters when producing movements that coordinated with the meaning of the accompanying speech, i.e., when gesturing, than when producing meaningless movements or no movement. The beneficial effects that accrue to speakers when gesturing thus seem to stem not merely from the fact that their hands are moving, but from the fact that their hands are moving in coordination with the content of speech.
When people talk to each other, they often make arm and hand movements that accompany what they say. These manual movements, called “co-speech gestures,” can convey meaning by way of their interaction with the oral message. Another class of manual gestures, called “emblematic gestures” or “emblems,” also conveys meaning, but in contrast to co-speech gestures, they can do so directly and independent of speech. There is currently significant interest in the behavioral and biological relationships between action and language. Since co-speech gestures are actions that rely on spoken language, and emblems convey meaning to the effect that they can sometimes substitute for speech, these actions may be important, and potentially informative, examples of language–motor interactions. Researchers have recently been examining how the brain processes these actions. The current results of this work do not yet give a clear understanding of gesture processing at the neural level. For the most part, however, it seems that two complimentary sets of brain areas respond when people see gestures, reflecting their role in disambiguating meaning. These include areas thought to be important for understanding actions and areas ordinarily related to processing language. The shared and distinct responses across these two sets of areas during communication are just beginning to emerge. In this review, we talk about the ways that the brain responds when people see gestures, how these responses relate to brain activity when people process language, and how these might relate in normal, everyday communication.
gesture; language; brain; meaning; action understanding; fMRI
When we talk to one another face-to-face, body gestures accompany our speech. Motion tracking technology enables us to include body gestures in avatar-mediated communication, by mapping one's movements onto one's own 3D avatar in real time, so the avatar is self-animated. We conducted two experiments to investigate (a) whether head-mounted display virtual reality is useful for researching the influence of body gestures in communication; and (b) whether body gestures are used to help in communicating the meaning of a word. Participants worked in pairs and played a communication game, where one person had to describe the meanings of words to the other.
In experiment 1, participants used significantly more hand gestures and successfully described significantly more words when nonverbal communication was available to both participants (i.e. both describing and guessing avatars were self-animated, compared with both avatars in a static neutral pose). Participants ‘passed’ (gave up describing) significantly more words when they were talking to a static avatar (no nonverbal feedback available). In experiment 2, participants' performance was significantly worse when they were talking to an avatar with a prerecorded listening animation, compared with an avatar animated by their partners' real movements. In both experiments participants used significantly more hand gestures when they played the game in the real world.
Taken together, the studies show how (a) virtual reality can be used to systematically study the influence of body gestures; (b) it is important that nonverbal communication is bidirectional (real nonverbal feedback in addition to nonverbal communication from the describing participant); and (c) there are differences in the amount of body gestures that participants use with and without the head-mounted display, and we discuss possible explanations for this and ideas for future investigation.
Although the linguistic structure of speech provides valuable communicative information, nonverbal behaviors can offer additional, often disambiguating cues. In particular, being able to see the face and hand movements of a speaker facilitates language comprehension . But how does the brain derive meaningful information from these movements? Mouth movements provide information about phonological aspects of speech [2–3]. In contrast, cospeech gestures display semantic information relevant to the intended message[4–6].We show that when language comprehension is accompanied by observable face movements, there is strong functional connectivity between areas of cortex involved in motor planning and production and posterior areas thought to mediate phonological aspects of speech perception. In contrast, language comprehension accompanied by cospeech gestures is associated with tuning of and strong functional connectivity between motor planning and production areas and anterior areas thought to mediate semantic aspects of language comprehension. These areas are not tuned to hand and arm movements that are not meaningful. Results suggest that when gestures accompany speech, the motor system works with language comprehension areas to determine the meaning of those gestures. Results also suggest that the cortical networks underlying language comprehension, rather than being fixed, are dynamically organized by the type of contextual information available to listeners during face-to-face communication.
Humans produce hand movements to manipulate objects, but also make hand movements to convey socially relevant information to one another. The mirror neuron system (MNS) is activated during the observation and execution of actions. Previous neuroimaging experiments have identified the inferior parietal lobule (IPL) and frontal operculum as parts of the human MNS. Although experiments have suggested that object-directed hand movements drive the MNS, it is not clear whether communicative hand gestures that do not involve an object are effective stimuli for the MNS. Furthermore, it is unknown whether there is differential activation in the MNS for communicative hand gestures and object-directed hand movements. Here we report the results of a functional magnetic resonance imaging (fMRI) experiment in which participants viewed, imitated and produced communicative hand gestures and object-directed hand movements. The observation and execution of both types of hand movements activated the MNS to a similar degree. These results demonstrate that the MNS is involved in the observation and execution of both communicative hand gestures and object-direct hand movements.
action; fMRI; mirror neurons; non-verbal communication; STS
In numerous experimental contexts, gesturing has been shown to lighten a speaker’s cognitive load. However, in all of these experimental paradigms, the gestures have been directed to items in the ‘here-and-now’. This study attempts to generalize gesture’s ability to lighten cognitive load. We demonstrate here that gesturing continues to confer cognitive benefits when speakers talk about objects that are not present, and therefore cannot be directly indexed by gesture. These findings suggest that gesturing confers its benefits by more than simply tying abstract speech to the objects directly visible in the environment. Moreover, we show that the cognitive benefit conferred by gesturing is greater when novice learners produce gestures that add to the information expressed in speech than when they produce gestures that convey the same information as speech, suggesting that it is gesture’s meaningfulness that gives it the ability to affect working memory load.
gesture; working memory
We explored how speakers and listeners use hand gestures as a source of perceptual-motor information during naturalistic communication. After solving the Tower of Hanoi task either with real objects or on a computer, speakers explained the task to listeners. Speakers' hand gestures, but not their speech, reflected properties of the particular objects and the actions that they had previously used to solve the task. Speakers who solved the problem with real objects used more grasping handshapes and produced more curved trajectories during the explanation. Listeners who observed explanations from speakers who had previously solved the problem with real objects subsequently treated computer objects more like real objects; their mouse trajectories revealed that they lifted the objects in conjunction with moving them sideways, and this behavior was related to the particular gestures that were observed. These findings demonstrate that hand gestures are a reliable source of perceptual-motor information during human communication.
The talking face affords multiple types of information. To isolate cortical sites with responsibility for integrating linguistically relevant visual speech cues, speech and non-speech face gestures were presented in natural video and point-light displays during fMRI scanning at 3.0T. Participants with normal hearing viewed the stimuli and also viewed localizers for the fusiform face area (FFA), the lateral occipital complex (LOC), and the visual motion (V5/MT) regions of interest (ROIs). The FFA, the LOC, and V5/MT were significantly less activated for speech relative to non-speech and control stimuli. Distinct activation of the posterior superior temporal sulcus and the adjacent middle temporal gyrus to speech, independent of media, was obtained in group analyses. Individual analyses showed that speech and non-speech stimuli were associated with adjacent but different activations, with the speech activations more anterior. We suggest that the speech activation area is the temporal visual speech area (TVSA), and that it can be localized with the combination of stimuli used in this study.
visual perception; speech perception; functional magnetic resonance imaging; lipreading; speechreading; phonetics; gestures; temporal lobe; frontal lobe; parietal lobe
Recent research shows that our actions can influence how we think. A separate body of research shows that the gestures we produce when we speak can also influence how we think. Here we bring these two literatures together to explore whether gesture has an impact on thinking by virtue of its ability to reflect real-world actions. We first argue that gestures contain detailed perceptual-motor information about the actions they represent, information often not found in the speech that accompanies the gestures. We then show that the action features in gesture do not just reflect the gesturer’s thinking—they can feed back and alter that thinking. Gesture actively brings action into a speaker’s mental representations, and those mental representations then affect behavior—at times more powerfully than the actions on which the gestures are based. Gesture thus has the potential to serve as a unique bridge between action and abstract thought.
Visual speech (lip-reading) influences the perception of heard speech. The literature suggests at least two possible mechanisms for this influence: “direct” sensory-sensory interaction, whereby sensory signals from auditory and visual modalities are integrated directly, likely in the superior temporal sulcus, and “indirect” sensory-motor interaction, whereby visual speech is first mapped onto motor-speech representations in the frontal lobe, which in turn influences sensory perception via sensory-motor integration networks. We hypothesize that both mechanisms exist, and further that lip-reading functional activations of Broca’s region and the posterior planum temporale reflect the sensory-motor mechanism. We tested one prediction of this hypothesis using fMRI. We assessed whether viewing visual speech (contrasted with facial gestures) activates the same network as a speech sensory-motor integration task (listen to and then silently rehearse speech). Both tasks activated locations within Broca’s area, dorsal premotor cortex, and the posterior planum temporal (Spt), and focal regions of the STS, all of which have previously been implicated in sensory-motor integration for speech. This finding is consistent with the view that visual speech influences heard speech via sensory-motor networks. Lip-reading also activated a much wider network in the superior temporal lobe than the sensory-motor task, possibly reflecting a more direct cross-sensory integration network.
In order to produce a coherent narrative, speakers must identify the characters in the tale so that listeners can figure out who is doing what to whom. This paper explores whether speakers use gesture, as well as speech, for this purpose. English speakers were shown vignettes of two stories and asked to retell the stories to an experimenter. Their speech and gestures were transcribed and coded for referent identification. A gesture was considered to identify a referent if it was produced in the same location as the previous gesture for that referent. We found that speakers frequently used gesture location to identify referents. Interestingly, however, they used gesture most often to identify referents that were also uniquely specified in speech. Lexical specificity in referential expressions in speech thus appears to go hand-in-hand with specification in referential expressions in gesture.
Gesture; Referential expression; Communication; Discourse
When people are asked to perform actions, they remember those actions better than if they are asked to talk about the same actions. But when people talk, they often gesture with their hands, thus adding an action component to talking. The question we asked in this study was whether producing gesture along with speech makes the information encoded in that speech more memorable than it would have been without gesture. We found that gesturing during encoding led to better recall, even when the amount of speech produced during encoding was controlled. Gesturing during encoding improved recall whether the speaker chose to gesture spontaneously or was instructed to gesture. Thus, gesturing during encoding seems to function like action in facilitating memory.
This paper explores how the test-retest reliability is modulated by different groups of participants and experimental tasks. A group of 12 healthy participants and a group of nine stroke patients performed the same language imaging experiment twice, test and retest, on different days. The experiment consists of four conditions, one audio condition and three audiovisual conditions in which the hands are either resting, gesturing, or performing self-adaptive movements. Imaging data were analyzed using multiple linear regression and the results were further used to generate receiver operating characteristic (ROC) curves for each condition for each individual subject. By using area under the curve as a comparison index, we found that stroke patients have less reliability across time than healthy participants, and that when the participants gesture during speech, their imaging data are more reliable than when they are performing hand movements that are not speech-associated. Furthermore, inter-subject variability is less in the gesture task than in any of the other three conditions for healthy participants, but not for stroke patients.
Language; Brain; BOLD; Brain imaging; Test-retest reliability; Receiver operating characteristic (ROC) curves; fMRI; Neurological disease
The verbal transformation effect (VTE) refers to perceptual switches while listening to a speech sound repeated rapidly and continuously. It is a specific case of perceptual multistability providing a rich paradigm for studying the processes underlying the perceptual organization of speech. While the VTE has been mainly considered as a purely auditory effect, this paper presents a review of recent behavioural and neuroimaging studies investigating the role of perceptuo-motor interactions in the effect. Behavioural data show that articulatory constraints and visual information from the speaker's articulatory gestures can influence verbal transformations. In line with these data, functional magnetic resonance imaging and intracranial electroencephalography studies demonstrate that articulatory-based representations play a key role in the emergence and the stabilization of speech percepts during a verbal transformation task. Overall, these results suggest that perceptuo (multisensory)-motor processes are involved in the perceptual organization of speech and the formation of speech perceptual objects.
multistable perception of speech; verbal transformation effect; speech scene analysis; speech perception; audiovisual speech; multisensory
Manual gestures occur on a continuum from co-speech gesticulations to conventionalized emblems to language signs. Our goal in the present study was to understand the neural bases of the processing of gestures along such a continuum. We studied four types of gestures, varying along linguistic and semantic dimensions: linguistic and meaningful American Sign Language (ASL), non-meaningful pseudo-ASL, meaningful emblematic, and nonlinguistic, non-meaningful made-up gestures. Pre-lingually deaf, native signers of ASL participated in the fMRI study and performed two tasks while viewing videos of the gestures: a visuo-spatial (identity) discrimination task and a category discrimination task. We found that the categorization task activated left ventral middle and inferior frontal gyrus, among other regions, to a greater extent compared to the visual discrimination task, supporting the idea of semantic-level processing of the gestures. The reverse contrast resulted in enhanced activity of bilateral intraparietal sulcus, supporting the idea of featural-level processing (analogous to phonological-level processing of speech sounds) of the gestures. Regardless of the task, we found that brain activation patterns for the nonlinguistic, non-meaningful gestures were the most different compared to the ASL gestures. The activation patterns for the emblems were most similar to those of the ASL gestures and those of the pseudo-ASL were most similar to the nonlinguistic, non-meaningful gestures. The fMRI results provide partial support for the conceptualization of different gestures as belonging to a continuum and the variance in the fMRI results was best explained by differences in the processing of gestures along the semantic dimension.
American Sign Language; gestures; Deaf; visual processing; categorization; linguistic; brain; fMRI