Few studies have attempted to characterize the brain’s response to concurrently and spontaneously produced gesture and speech. We hypothesized that neural responses to natural, rhythmic gesture accompanying speech would be observed not only in visual cortex but also in STG and STS, areas well-known for their role in speech processing. This hypothesis was guided by research on iconic gestures and deaf signers which indicates that STG/S plays a role in processing movement. Additional cues were provided by studies on visual speech showing STG/S to be crucially involved in audiovisual integration of speech with accompanying mouth movement. Supporting our hypothesis, bilateral posterior STG/S (including PT) responses were significantly greater when subjects listened to speech accompanied by beat gesture than when they listened to speech accompanied by a still body. Further, left anterior STG/S responses were significantly greater when listening to speech accompanied by beat gesture than when listening to speech accompanied by a control movement (i.e., nonsense hand movement). Finally, right posterior STG/S showed increased responses only to beat gesture presented in the context of speech, and not to beat gesture presented alone, suggesting a possible role in multisensory integration of gesture and speech. Related research in biological motion, deaf signers, visual speech, and iconic gesture highlight the importance of these current data.
As would be expected, canonical speech perception regions in STG/S showed increased bilateral activity while subjects heard speech accompanied by a still body or speech accompanied by beat gesture. Importantly, when directly comparing these two conditions, responses in the posterior portion of bilateral STG (including PT) were significantly greater when speech was accompanied by beat gesture. These data provide further support for STG/S as a polysensory area, as was originally suggested by studies in rhesus and macaque monkeys (Desimone & Gross, 1979
; Bruce et al., 1981
; Padberg et at., 2003
). Neuroimaging data has shown that STG/S–especially the posterior portion–is responsive to both visual and auditory input in humans. Studies in hearing and nonhearing signers strongly implicate the posterior temporal gyrus in language-related processing, regardless of whether the language input is auditory or visual in nature (MacSweeney et al., 2004
). Most recently, Holle et al. (2007)
reported that the posterior portion of left STS showed increased activity for viewing iconic gestures as compared to viewing grooming-related hand movements. Wilson et al. (2007)
found a greater degree of intersubject correlation in the right STS when subjects viewed an entire body (e.g., head, face, hands, and torso) producing natural speech as compared to when they heard speech alone. STG/S has also been shown to be more active while listening to speech accompanied by a moving mouth than while listening to speech accompanied by a still mouth (Calvert et al., 1997
). Interestingly, the stimuli in these studies may all be said to have communicative intent, suggesting that the degree of STG/S involvement may be mediated by the viewer’s perception of the stimuli as potentially communicative. Such a characteristic of STG/S would be congruent with Kelly et al.’s (2006)
finding that the central N400 effect (i.e., a response known to occur when incongruent stimuli are presented) can be eliminated when subjects know that gesture and speech are being produced by different speakers.
It is important to distinguish between the posterior portion of STG/S
, and the STSp
(posterior superior temporal sulcus). The latter is a much-discussed area in the study of biological motion (for review, see Blake and Shiffrar, 2006), as STSp has consistently shown increased activity for viewing point-light representations of biological motion (Grossman et al., 2004
; Grossman and Blake, 2002
). Qualitative comparisons suggest that silently viewing beat gesture (versus a still body) leads to increased activity in the vicinity of STSp (right hemisphere) as reported in Grossman et al. (2004)
; Grossman and Blake (2002)
, and Bidet-Caulet, et al. (2005)
. Significant increases for speech-accompanied beat gesture over speech-accompanied still body, however, are anterodorsal to STSp. That is, speechless biological motion versus a still body yields significant increases in regions known to underlie processing of biological motion, but when accompanied by speech, biological motion versus a still body yields significant increases in an area more dorsal and anterior (to that identified by biological motion localizers). Once again, this suggests that the intent to participate in a communicative exchange (e.g., listening to speech) is a crucial determinant in how movement is processed. The idea that perception of gesture can be altered by the presence or absence of speech complements behavioral findings on gesture production, where it has been shown that the presence of speech impacts what is conveyed through gesture (So et al., 2005
We would like to suggest that processing of movement may, in many cases, be context driven. Rather than processing speech-accompanying movement in canonical biological motion regions, perhaps movement is processed differently when it is interpreted (consciously or unconsciously) as having communicative intent. We are not the first to suggest that–in the case of language–the brain may not break down sensory input to its smallest tenets and then build meaning from those pieces. In a survey of speech perception studies, Indefrey and Cutler (2004)
discovered that regions which are active while listening to single phonemes are not necessarily active while listening to a speech stream. Hence, it appears that the brain is not breaking the speech stream down into its component parts in order to extract meaning. Instead, the context in which the phonemes are received (e.g., words, sentences) determines neural activity. We are suggesting that this may be the case for biological motion as well – that biological motion with speech and without speech may be processed differently due to the contextualization afforded by speech.
Again when exploring activity within STG/S for the contrast of speech accompanied by beat gesture versus speech accompanied by a still body, it is notable that STG/S activity for this contrast includes planum temporale (PT) bilaterally. Within this study, PT has emerged as a potentially critical site for the integration of beat gesture and speech. Contrasting responses to the co-presentation of speech and beat gesture with responses to unimodal presentation of speech (with a still body) and beat gesture (without speech), the right PT was identified as a putative site of gesture and speech integration (). 1
In other words, in the right PT, beat gesture had no effect in the absence of speech. However, in the presence of speech, beat gesture resulted in a reliable signal increase in right PT.
Significant activity in bilateral PT (as well as inferior, middle, and superior temporal gyri) was observed by MacSweeney et al. (2004)
while hearing nonsigners viewed blocks of British Sign Language and Tic Tac (a communicative code used by racecourse betters). We observed no activity in planum temporale for either beat gesture or nonsense hand movements (which are based on ASL signs) when viewed without speech. MacSweeney et al. (2004)
, in addition to including a highly animated face in their stimuli, informed participants that the stimuli would be communicative and asked them to judge which strings of movements were incorrect. Thus, the participants had several cues indicating that they should search for meaning in the hand movements. In the current study, participants had no explicit instruction to assign meaning to the hand movement. Increased activity in planum temporale was observed only when beat gesture was accompanied by speech and not when beat gesture was presented silently. Hence, it appears that PT activity, especially, is mediated by imbuing movement with the potential to convey meaning.
Considering what is known about PT activity, it is likely that beat gesture establishes meaning through its connection to speech prosody. PT has been shown to process meaningful prosodic and melodic input, as significantly greater activity has been observed in this area for producing or perceiving song melody versus speech (Callan et al., 2006
; Saito et al., 2006
) and for listening to speech with strong prosodic cues (Meyer et al., 2004
). Greater activity in PT has also been observed for listening to music with salient metrical rhythm (Chen et al., 2006
), processing pitch modulations (Warren et al., 2005
; Barrett & Hall, 2006
), singing versus speaking, and synchronized production of song lyrics (Saito et al., 2006
). The observed right lateralization of multisensory responses for beat gesture and speech may be a further reflection of the link between speech prosody and beat gesture (Krahmer and Swerts, in press). Numerous fMRI, neurophysiological, and lesion studies have demonstrated a strong right hemisphere involvement in processing speech prosody (for review, see Kotz et al., 2006
). Along these lines, it has also been suggested that the right hemisphere is better suited for musical processing (Zatorre et al., 2002
Our findings both confirm the role of PT in processing rhythmic aspects of speech and suggest that this region also plays a pivotal role in processing speech-accompanying gesture. This warrants future work to determine the degree to which PT responses may be modulated by temporal synchrony between beat gesture and speech. Additionally, further studies will be necessary to determine the impact of beat gesture in the presence of other speech-accompanying movement (e.g., head and mouth movement). In order to begin to investigate the neural correlates of beat gesture independently from other types of speech-accompanying movement, the current study recreates environmental conditions where gesture is the only speech-accompanying movement that can be perceived (e.g., viewing a speaker whose face is blocked by an environmental obstacle or viewing a speaker from the back of a large auditorium whose face is barely visible).
Whereas the contrast of beat gesture with speech versus still body with speech showed significant increases in bilateral posterior
areas of STG/S, the contrast of beat gesture with speech versus nonsense hand movement with speech showed significant increases in left anterior
areas of STG/S. In light of the role of left anterior STG/S in speech intelligibility (Scott et al., 2000
; Davis and Johnsrude, 2003
), these data suggest that natural beat gesture may impact speech processing at a number of stages. Humphries, et al. (2005)
found that the left posterior temporal lobe was most sensitive to speech prosody. It may be the case that beat gesture focuses viewers’ attention on speech prosody which, in turn, leads to increased intelligibility and comprehension. Considering that responses to speech-accompanied beat gesture and nonsense hand movement are not significantly different within right PT, the synchronicity of beat gesture (or the asynchronicity of the random movements) may contribute to differential responses observed in anterior temporal cortex for listening to speech accompanied by these two types of movement.
Willems and Hagoort (2007)
have suggested that the link between language and gesture stems from a more general interplay between language and action. Perhaps attesting to this interplay, no other regions besides the anterior STG/S were more active for speech with beat gesture compared to speech with nonsense hand movements. The stimuli and design of the present study were also significantly different from those of another recent study which showed increased responses in Broca’s area for gesture-word mismatches (Willems et al., 2006
). Willems et al.’s findings are complimentary to those of the current study in that we investigated responses to gesture with very little semantic information, whereas Willems et al. examined the impact of semantic incongruency in gesture and speech.
Besides posterior temporal regions, we also observed greater activity for speech with beat gesture (as compared to speech with a still body) in bilateral premotor cortices and inferior parietal regions. This may reflect activation of the “mirror neuron system” (for review, see Rizzolatti & Craighero, 2004
, and Iacoboni & Dapretto, 2006
), whereby regions responsible for action execution (in this case, gesture production) are thought to likewise be involved in action observation. Wilson et al. (2007)
also reported bilateral premotor activity for audiovisual speech (but not for audio-only speech), although this activity was ventral to that observed in the present study and did not reach significance when audiovisual and audio-only conditions were compared directly. This difference in localization might reflect the fact that, unlike the stimuli used in the current study, the speaker’s head, face, and speech articulators were fully visible in the stimuli used by Wilson and colleagues (i.e., hand muscles are known to be represented dorsally to head and face muscles within the premotor cortex).
An important area for the processing of our ASL-derived nonsense hand movement was the parietal cortex. Parietal activity was consistently observed when beat gesture and nonsense hand movement (both with and without speech) were compared to baseline. In addition, parietal activity was significantly greater both when viewing nonsense hand movement accompanied by speech (as compared to viewing beat gesture accompanied by speech) and when viewing nonsense hand movement without speech (as compared to viewing beat gesture without speech). Interestingly, Emmorey at al. (2004
have identified parietal activity as being crucial to production of sign language. Considering that our subjects and the woman appearing in our stimuli neither spoke nor understood ASL, our data suggest that parietal regions may be optimized for perception of the types of movement used in ASL.
To conclude, our findings of increased activity in posterior STG/S (including PT) for beat gesture with speech indicate that canonical speech perception areas in temporal cortices may process and integrate not only auditory cues but also visual cues during speech perception. Additionally, our finding that activity in anterior STG/S is impacted by speech-accompanying beat gesture suggest differential but intertwined roles for anterior and posterior sections of the STG/S during speech perception, with anterior areas demonstrating increased effects for amplification of speech intelligibility and posterior areas demonstrating increased effects for the presence of multimodal input. In line with extensive research showing that speech-accompanied gesture impacts social communication (e.g., McNeill, 1992
) and evidence of a close link between hand action and language (for review, see Willems and Hagoort, 2007
), our findings highlight the important role of multiple sensory modalities in communicative contexts.