Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Hear Res. Author manuscript; available in PMC 2010 December 1.
Published in final edited form as:
PMCID: PMC2787678

The multisensory roles for auditory cortex in primate vocal communication


It is widely accepted that human speech is fundamentally a multisensory behavior, with face-to-face communication perceived through both the visual and auditory channels. Such multisensory speech perception is evident even at the earliest stages of human cognitive development (Gogate et al., 2001; Patterson et al., 2003); its integration across the two modalities is ubiquitous and automatic (McGurk et al., 1976), and at the neural level, audiovisual speech integration occurs at the ‘earliest’ stages of cortical processing (Ghazanfar et al., 2006a). Indeed, there are strong arguments which suggest that multisensory speech is the primary mode of speech perception and is not a capacity that is “piggybacked” on to auditory speech perception (Rosenblum, 2005). This implies that the perceptual mechanisms, neurophysiology and evolution of speech perception are based on primitives which are not tied to a single sensory modality (Romanski et al., 2009). The essence of these ideas is shared by many investigators in the domain of perception (Fowler, 2004; Liberman et al., 1985; Meltzoff et al., 1997), but has only caught on belatedly for those of us who study the auditory cortex.

The auditory cortex of primates consists of numerous fields (Hackett et al., 1998; Petkov et al., 2006), which in turn are connected to numerous auditory-related areas in the frontal and temporal lobes (Romanski et al., 2009). At least a subset of these fields appears to be homologous across monkeys and great apes (including humans) (Hackett et al., 2001). These fields are delineated largely by their tonotopic organization and anatomical criteria. The reasons for why there are so many areas are not known, and how each of them, together or separately, relate to behavior is also somewhat of a mystery. That they must be involved in multiple auditory-related behaviors is a given. The fundamental question is thus: how do these multiple auditory areas mediate specific behaviors through their interactions with each other and with other sensory and motor systems?

The question posed above has an underlying assumption that the different auditory cortical fields do not each have a specific function that is used in all types of auditory-related behavior. Rather, their roles (or weights) as nodes in a larger network change according to whatever specific behavior is being mediated. In this review, I focus on one behavior, vocal communication, to illustrate how multiple fields of auditory cortex in non-human primates (hereafter, primates) may play a role in the perception and production of this multisensory behavior (see Kayser et al, this volume, for a review of auditory cortical organization and its relationship to visual and somatosensory inputs). I will begin by briefly presenting evidence that primates do indeed link visual and auditory communication signals, and then describe how such perception may be mediated by the auditory cortex through its connections with association areas. I will then speculate as to how proprioceptive, somatosensory and motor inputs into auditory cortex also play a role in vocal communication. Finally, I will conclude with the suggestion that the use of cytoarchitecture and tonotopy represent one way of defining auditory cortical organization, and that there may be behavior-specific functional organizations that do not neatly fall within the delineation of different auditory cortical fields using these methods. Identifying specific behaviors a priori is the key to illuminating such putative organizational schemes.

Monkeys link facial expressions to vocal expressions

Human and primate vocalizations are produced by coordinated movements of the lungs, larynx (vocal folds), and the supralaryngeal vocal tract (Fitch et al., 1995; Ghazanfar et al., 2008a). The vocal tract consists of the column of air derived from the pharynx, mouth and nasal cavity. In humans, speech-related vocal tract motion results in the predictable deformation of the face around the oral aperture and other parts of the face (Jiang et al., 2002; Yehia et al., 1998; Yehia et al., 2002). For example, human adults automatically link high-pitched sounds to facial postures producing an /i/ sound and low-pitched sounds to faces producing an /a/ sound (Kuhl et al., 1991). In primate vocal production, there is a similar link between acoustic output and facial dynamics. Different macaque monkey vocalizations are produced with unique lip configurations and mandibular positions and the motion of such articulators influences the acoustics of the signal (Hauser et al., 1994; Hauser et al., 1993). Coo calls, like /u/ in speech, are produced with the lips protruded, while screams, like the /i/ in speech, are produced with the lips retracted (Figure 1). Thus, it is likely that many of the facial motion cues that humans use for speech-reading are present in other primates as well.

Figure 1
Exemplars of the facial expressions produced concomitantly with vocalizations. A. Rhesus monkey coo and scream calls taken at the midpoint of the expressions with their corresponding spectrograms.

Given that both humans and other extant primates use both facial and vocal expressions as communication signals, it is perhaps not surprising that many primates other than humans recognize the correspondence between the visual and auditory components of vocal signals. Macaque monkeys (Macaca mulatta), capuchins (Cebus apella) and chimpanzees (Pan troglodytes) all recognize auditory-visual correspondences between their various vocalizations (Evans et al., 2005; Ghazanfar et al., 2003; Izumi et al., 2004; Parr, 2004). For example, rhesus monkeys readily match the facial expressions of ‘coo’ and ‘threat’ calls with their associated vocal components (Ghazanfar et al., 2003). Perhaps more pertinent, rhesus monkeys can also segregate competing voices in a chorus of coos, much as humans might with speech in a cocktail party scenario, and match them to the correct number of individuals seen cooing on a video screen (Jordan et al., 2005). Finally, macaque monkeys use formants (i.e., vocal tract resonances) as acoustic cues to assess age-related body size differences among conspecifics (Ghazanfar et al., 2007b). They do so by linking across modalities the body size information embedded in the formant spacing of vocalizations (Fitch, 1997) with the visual size of animals who are likely to produce such vocalizations (Ghazanfar et al., 2007b).

Dynamic faces modulate voice processing in auditory cortex

Traditionally, the linking of vision with audition in the multisensory vocal perception described above would be attributed to the functions of association areas such as the superior temporal sulcus in the temporal lobe or the principal and intraparietal sulci located in the frontal and parietal lobes, respectively. Although these regions may certainly play important roles (see below), they are certainly not necessary for all types of multisensory behaviors (Ettlinger et al., 1990), nor are they the sole regions for multisensory convergence (Driver et al., 2008; Ghazanfar et al., 2006a). The auditory cortex, in particular, has many potential sources of visual inputs (Ghazanfar et al., 2006a) and this is borne out in the increasing number of studies demonstrating visual modulation of auditory cortical activity (Bizley et al., 2007; Ghazanfar et al., 2008b; Ghazanfar et al., 2005; Kayser et al., 2008; Kayser et al., 2007; Schroeder et al., 2002). Here we focus on those auditory cortical studies investigating face/voice integration specifically.

Recordings from both primary and lateral belt auditory cortex reveal that responses to the voice are influenced by the presence of a dynamic face (Ghazanfar et al., 2008b; Ghazanfar et al., 2005). Monkey subjects viewing unimodal and bimodal versions of two different species-typical vocalizations (‘coos’ and ‘grunts’) show both enhanced and suppressed local field potential (LFP) responses in the bimodal condition relative to the unimodal auditory condition (Ghazanfar et al., 2005). Consistent with evoked potential studies in humans (Besle et al., 2004; van Wassenhove et al., 2005), the combination of faces and voices led to integrative responses (significantly different from unimodal responses) in the vast majority of auditory cortical sites—both in primary auditory cortex and the lateral belt auditory cortex. The data demonstrated that LFP signals in the auditory cortex are capable of multisensory integration of facial and vocal signals in monkeys (Ghazanfar et al., 2005) and have subsequently been confirmed at the single unit level in the lateral belt cortex as well (Ghazanfar et al., 2008b) (Figure 2A). By ‘integration’, I simply mean that bimodal stimuli elicit significantly enhanced or suppressed responses relative to the best (strongest) response elicited by unimodal stimuli.

Figure 2
Single neuron examples of multisensory integration of Face+Voice stimuli compared with Disk+Voice stimuli in the middle lateral (ML) belt area. The left panel shows an enhanced response when voices are coupled with faces, but no similar modulation when ...

The specificity of face/voice integrative responses was tested by replacing the dynamic faces with dynamic discs which mimicked the aperture and displacement of the mouth. In human psychophysical experiments, such artificial dynamic stimuli can still lead to enhanced speech detection, but not to the same degree as a real face (Bernstein et al., 2004; Schwartz et al., 2004). When cortical sites or single units were tested with dynamic discs, far less integration was seen when compared to the real monkey faces (Ghazanfar et al., 2008b; Ghazanfar et al., 2005) (Figure 2). This was true primarily for the lateral belt auditory cortex (LFPs and single units) and was observed to a lesser extent in the primary auditory cortex (LFPs only). This is perhaps not surprising given that the lateral belt is well known for its responsiveness and, to some degree, selectivity, to vocalizations and other complex stimuli (Rauschecker et al., 1995; Recanzone, 2008). (See Ghazanfar et al., 1999; Ghazanfar et al., 2001a for review of vocal responses in auditory cortex.)

Unexpectedly, grunt vocalizations were over-represented relative to coos in terms of enhanced multisensory LFP responses (Ghazanfar et al., 2005). As coos and grunts are both produced frequently in a variety of affiliative contexts and are broadband spectrally, the differential representation cannot be attributed to experience, valence or the frequency tuning of neurons. One remaining possibility is that this differential representation may reflect a behaviorally-relevant distinction, as coos and grunts differ in their direction of expression and range. Coos are generally contact calls rarely directed toward any particular individual. In contrast, grunts are often directed towards individuals in one-on-one situations, often during social approaches as in baboons and vervet monkeys (Cheney et al., 1982; Palombit et al., 1999). Given their production at close range and context, grunts may produce a stronger face/voice association than coo calls. This distinction appeared to be reflected in the pattern of significant multisensory responses in auditory cortex; that is, this multisensory bias towards grunt calls may be related to the fact the grunts (relative to coos) are often produced during intimate, one-to-one social interactions.

Auditory cortical interactions with the superior temporal sulcus mediates face/voice integration

The “face-specific” visual influence on the lateral belt auditory cortex begs the question as to its anatomical source. Although there are multiple possible sources of visual input to auditory cortex (Ghazanfar et al., 2006a), the STS is likely to be a prominent one, particularly for integrating faces and voices, for the following reasons. First, there are reciprocal connections between the STS and the lateral belt and other parts of auditory cortex (Barnes et al., 1992; Seltzer et al., 1994). Second, neurons in the STS are sensitive to both faces and biological motion (Harries et al., 1991; Oram et al., 1994). Finally, the STS is known to be multisensory (Barraclough et al., 2005; Benevento et al., 1977; Bruce et al., 1981; Chandrasekaran et al., 2009; Schroeder et al., 2002). One mechanism for establishing whether auditory cortex and the STS interact at the functional level is to measure their temporal correlations as a function stimulus condition. Concurrent recordings of LFPs and spiking activity in the lateral belt of auditory cortex and the upper bank of the STS revealed that functional interactions, in the form of gamma band (>30Hz) correlations, between these two regions increased in strength during presentations of faces and voices together relative to the unimodal conditions (Ghazanfar et al., 2008b) (Figure 3A). Furthermore, these interactions were not solely modulations of response strength, as phase relationships were significantly less variable (tighter) in the multisensory conditions (Figure 3B).

Figure 3
A. Time-frequency plots (cross-spectrograms) illustrate the modulation of functional interactions (as a function of stimulus condition) between the lateral belt auditory cortex and the STS for a population of cortical sites. X-axes depict the time in ...

The influence of the STS on auditory cortex was not merely on its gamma oscillations. Spiking activity seems to be modulated, but not ‘driven’, by on-going activity arising from the STS. Three lines of evidence suggest this scenario. First, visual influences on single neurons were most robust when in the form of dynamic faces and were only apparent when neurons had a significant response to a vocalization (i.e., there were no overt responses to faces alone). Second, these integrative responses were often “face-specific” and had a wide distribution of latencies, which suggested that the face signal was an ongoing signal that influenced auditory responses (Ghazanfar et al., 2008b). Finally, this hypothesis for an ongoing signal is supported by the sustained gamma band activity between auditory cortex and STS and by a spike-field coherence analysis of the relationship between auditory cortical spiking activity and gamma oscillations from the STS (Ghazanfar et al., 2008b) (Figure 3C).

Both the auditory cortex and the STS have multiple bands of oscillatory activity generated in responses to stimuli that may mediate different functions (Chandrasekaran et al., 2009; Lakatos et al., 2005). Thus, interactions between the auditory cortex and the STS are not limited to spiking activity and high frequency gamma oscillations. Below 20Hz, and in response to naturalistic audiovisual stimuli, there are directed interactions from auditory cortex to STS, while above 20Hz (but below the gamma range), there are directed interactions from STS to auditory cortex (Kayser et al., 2009). Given that different frequency bands in the STS integrate faces and voices in distinct ways (Chandrasekaran et al., 2009), it’s possible that these lower frequency interactions between the STS and auditory cortex also represent distinct multisensory processing channels.

Two things should be noted here. The first is that functional interactions between STS and auditory cortex are not likely to occur solely during the presentation of faces with voices. Other congruent, behaviorally-salient audiovisual events such as looming signals (Cappe et al., 2009; Gordon et al., 2005; Maier et al., 2004) or other temporally coincident signals may elicit similar functional interactions (Maier et al., 2008; Noesselt et al., 2007). The second is that there are other areas that, consistent with their connectivity and response properties (e.g., sensitivity to faces and voices), could also (and very likely) have a visual influence on auditory cortex. These include the ventrolateral prefrontal cortex (Romanski et al., 2005; Sugihara et al., 2006) and the amygdala (Gothard et al., 2007; Kuraoka et al., 2007). It is not known whether STS, for instance, plays a more influential role than these two other ‘face sensitive’ areas. Indeed, it may be that all three play very different roles in face/voice integration. What is missing is a direct link between multisensory behaviors and neural activity—that is only way to assess the true contributions of these regions, along with auditory cortex, in vocal behavior.

Viewing vocalizing conspecifics

Humans and other primates readily link facial expressions with appropriate, congruent vocal expressions. What cues they use to make such matches are not known. One method for investigating such behavioural strategies is the measurement of eye movement patterns. When human subjects are given no task or instruction regarding what acoustic cues to attend, they will consistently look at the eye region more than the mouth when viewing videos of human speakers (Klin et al., 2002). Macaque monkeys exhibit the exact same strategy. The eye movement patterns of monkeys viewing conspecifics producing vocalizations reveal that monkeys spend most of their time inspecting the eye region relative to the mouth (Ghazanfar et al., 2006b) (Figure 4A). When they did fixate on the mouth, it was highly correlated with the onset of mouth movements (Figure 4B). This, too, was highly reminiscent of human strategies: subjects asked to identify words increased their fixations onto the mouth region with the onset of facial motion (Lansing et al., 2003).

Figure 4
A. The average fixation on the eye region versus the mouth region across three subjects while viewing a 30-sec video of vocalizing conspecific. The audio track had no influence on the proportion of fixations falling onto the mouth or the eye region. Error ...

Somewhat surprisingly, activity in both primary auditory cortex and belt areas is influenced by eye position. When the spatial tuning of primary auditory cortical neurons is measured with the eyes gazing in different directions, ~30% of the neurons are affected by the position of the eyes (Werner-Reiss et al., 2003). Similarly, when LFP-derived current-source density activity was measured from auditory cortex (both primary auditory cortex and caudal belt regions), eye position significantly modulated auditory-evoked amplitude in about 80% of sites (Fu et al., 2004). These eye-position effects occurred mainly in the upper cortical layers, suggesting that the signal is fedback from another cortical area. A possible source includes the frontal eye field (FEF) located in the frontal lobes, the medial portion of which generates relatively long saccades (Robinson et al., 1969), is interconnected with both the STS (Schall et al., 1995; Seltzer et al., 1989) and multiple regions of the auditory cortex (Hackett et al., 1999; Romanski et al., 1999; Schall etal., 1995).

It does not take a huge stretch of the imagination to link these auditory cortical processes to the oculomotor strategy for looking at vocalizing faces. A dynamic, vocalizing face is a complex sequence of sensory events, but one that elicits fairly stereotypical eye movements: we and other primates fixate on the eyes but then saccade to mouth when it moves before saccading back to the eyes. Is there a simple scenario that could link the proprioceptive eye position effects in the auditory cortex with its face/voice integrative properties (Ghazanfar et al., 2007a)? Reframing (ever so slightly) the hypothesis of Schroeder and colleagues (Lakatos et al., 2007; Schroeder et al., 2008), one possibility is that the fixations at the onset of mouth movements sends a signal to the auditory cortex which resets the phase of an on-going oscillation. This proprioceptive signal thus primes the auditory cortex to amplify or suppress (depending on the timing) of a subsequent auditory signal originating from the mouth. Given that mouth movements precede the voiced components of both human (Abry et al., 1996) and monkey vocalizations (Chandrasekaran et al., 2009; Ghazanfar et al., 2005), the temporal order of visual to proprioceptive to auditory signals is consistent with this idea. This hypothesis is also supported (though indirectly) by the finding that sign of face/voice integration in the auditory cortex and the STS is influenced by the timing of mouth movements relative to the onset of the voice (Chandrasekaran et al., 2009; Ghazanfar et al., 2005).

Somatosensory feedback during vocal communication

Numerous lines of both physiological and anatomical evidence demonstrate that at least some regions of the auditory cortex respond to touch as well as sound (Fu et al., 2003; Hackett et al., 2007a; Hackett et al., 2007b; Kayser et al., 2005; Lakatos et al., 2007; Schroeder et al., 2002; Smiley et al., 2007). Yet, the sense of touch is not something we normally associate with vocal communication. It can, however, influence what we hear under certain circumstances. For example, kinesthetic feedback from one’s own speech movements also integrates with heard speech (Sams et al., 2005). More directly, if a robotic device is used to artificially deform the facial skin of subjects in a way that mimics the deformation seen during speech production, then subjects actually hear speech differently (Ito et al., 2009). Surprisingly, there is a systematic perceptual variation with speech-like patterns of skin deformation that implicate a robust somatosensory influence on auditory processes under normal conditions (Ito et al., 2009).

The somatosensory system’s influence on the auditory system may also occur during vocal learning. When a mechanical load is applied to the jaw, causing a slight protrusion, as subjects repeat words (‘saw’, ‘say’, ‘sass’ and ‘sane’) it can alter somatosensory feedback without changing the acoustics of the words (Tremblay et al., 2003). Measuring adaptation in the jaw trajectory after many trials revealed that subjects learn to change their jaw trajectories so that they are similar to the pre-load trajectory—despite not hearing anything different. This strongly implicates a role for somatosensory feedback that parallels the role for auditory feedback in guiding vocal production (Jones et al., 2003; Jones et al., 2005). Indeed, the very same learning effects are observed with deaf subjects when they turn their hearing aids off (Nasir et al., 2008).

While the substrates for these somatosensory-auditory effects have not been explored, interactions between the somatosensory system and the auditory cortex seem like a likely source for the phenomena described above for the following reasons. First, many auditory cortical fields respond to, or are modulated by, tactile inputs (Fu et al., 2003; Kayser et al., 2005; Schroeder et al., 2001). Second, there are intercortical connections between somatosensory areas and the auditory cortex (Cappe et al., 2005; de la Mothe et al., 2006; Smiley et al., 2006). Third, auditory area CM, where many auditory-tactile responses seem to converge, is directly connected to somatosensory areas in the retroinsular cortex and the granular insula (de la Mothe et al., 2006; Smiley et al., 2006). Oddly enough, a parallel influence of audition on somatosensory areas has also been reported: neurons in the “somatosensory” insula readily and selectively respond to vocalizations (Beiser, 1998; Remedios et al., 2009). Finally, the tactile receptive fields of neurons in auditory cortical area CM are confined to the upper body, primarily the face and neck regions (areas consisting of, or covering, the vocal tract) (Fu et al., 2003) (Figure 5) and the primary somatosensory cortical (area 3b) representation for the tongue (a vocal tract articulator) projects to auditory areas in the lower bank of the lateral sulcus (Iyengar et al., 2007). All of these facts lend further credibility to the putative role of somatosensory-auditory interactions during vocal production and perception.

Figure 5
A. Examples of cutaneous receptive fields of neurons recorded in auditory cortical area CM

Like humans, other primates also adjust their vocal output according to what they hear. For example, macaques, marmosets (Callithrix jacchus), and cotton-top tamarins (Saguinus oedipus) adjust the loudness, timing and acoustic structure of their vocalizations depending background noise levels and patterns (Brumm et al., 2004; Egnor et al., 2006a; Sinnott et al., 1975); (Egnor et al., 2006b; Egnor et al., 2007). The specific number of syllables and temporal modulations in heard conspecific calls can also differentially trigger vocal production in tamarins (Ghazanfar et al., 2001b; Ghazanfar et al., 2002). Thus, auditory feedback is also very important for nonhuman primates and altering such feedback can influence neurons in the auditory cortex (Eliades et al., 2008). At this time, however, no experiments have investigated whether somatosensory feedback plays a role in influencing vocal feedback. The neurophysiological and neuroanatomical data described above suggest that it is not unreasonable to think that it does.

Is auditory cortex organized by action-specific representations?

The putative neural processes underlying multisensory vocal communication in primates calls to mind what the philosopher, Andy Clark, refers to as “action-oriented” representations (Clark, 1997). In generic terms, action-oriented representations simultaneously describe aspects of the world and prescribe possible actions. They are poised between pure control (motor) structures and passive (sensory) representations of the external world. For neural representations of primate vocal communication, this suggests that the laryngeal, articulatory and respiratory movements during vocalizations are inseparable from the visual, auditory and somatosensory processes that accompany vocal perception. This idea seems to fit well with the data reviewed above and suggests an alternative way of thinking about auditory cortical organization.

Typically, we think of auditory cortex as a set of very discrete fields, most of which can be defined by a tonotopic map. These physiological maps often correspond to cytoarchitectural and hodological signatures as well. It is possible, however, that this is just one of many possible schemes for auditory cortical organization (albeit an important one). One alternative is that different behaviors, such as multisensory vocal communication, each have their own organizational scheme super-imposed on these tonotopic maps, but not necessarily in one-to-one fashion. This is almost pure speculation at this point, but the main reason for thinking it a possibility is that many of the multisensory and vocalization-related responses properties for communication have no relationship to the tonotopic maps for, or frequency tuning of neurons in, those regions. For example, there is no reported relationship between the frequency tuning of auditory cortical neurons and the influence of eye position (Fu et al., 2004; Werner-Reiss et al., 2003), somatosensory receptive fields (Fu et al., 2003; Schroeder et al., 2001), or face/voice integration (Ghazanfar et al., 2005; Ghazanfar, unpublished observations). Likewise, the influence of vocal feedback on auditory cortex has no relationship to the underlying frequency tuning of neurons (Eliades et al., 2008). At a more global level, somatosensory and visual influences on the auditory cortex seem to take the form of a gradient, extending in the posterior-to-anterior direction, rather than having a discrete influence on particular tonotopically-defined subsets (Kayser et al., 2005; Kayser et al., 2007). Finally, the representation for pitch processing, important for vocal recognition, spans the low frequency borders of two core auditory cortical areas (primary auditory cortex and area R), violating the often implicit assumption of “single area-single function” rule (Bendor et al., 2005). The idea that there may be multiple behaviorally-specific auditory cortical organizations is very similar to the one recently put forth regarding the organization of the various somatotopically-defined motor cortical areas (Graziano et al., 2007).

To summarize, vocal communication is a fundamentally multisensory behavior and this will be reflected in the different roles brain regions play in mediating this behavior. Auditory cortex is illustrative, being influenced by both the visual, somatosensory, proprioceptive and motor modalities during vocal communication. In all, I hope that the data reviewed above suggest that investigating auditory cortex through the lens of a specific behavior may lead to a much clearer picture of its functions and dynamic organization.


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


  • Abry C, Lallouache M-T, Cathiard M-A. How can coarticulation models account for speech sensitivity in audio-visual desynchronization? In: Stork D, Henneke M, editors. Speechreading by humans and machines: models, systems and applications. Springer-Verlag; Berlin: 1996. pp. 247–255.
  • Barnes CL, Pandya DN. Efferent Cortical Connections of Multimodal Cortex of the Superior Temporal Sulcus in the Rhesus-Monkey. Journal of Comparative Neurology. 1992;318:222–244. [PubMed]
  • Barraclough NE, Xiao D, Baker CI, Oram MW, Perrett DI. Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. J Cogn Neurosci. 2005;17:377–91. [PubMed]
  • Beiser A. Processing of twitter-call fundamental frequencies in insula and auditory cortex of squirrel monkeys. Experimental Brain Research. 1998;122:139–148. [PubMed]
  • Bendor D, Wang XQ. The neuronal representation of pitch in primate auditory cortex. Nature. 2005;436:1161–1165. [PMC free article] [PubMed]
  • Benevento LA, Fallon J, Davis BJ, Rezak M. Auditory-visual interactions in single cells in the cortex of the superior temporal sulcus and the orbital frontal cortex of the macaque monkey. Experimental Neurology. 1977;57:849–872. [PubMed]
  • Bernstein LE, Auer ET, Takayanagi S. Auditory speech detection in noise enhanced by lipreading. Speech Communication. 2004;44:5–18.
  • Besle J, Fort A, Delpuech C, Giard MH. Bimodal speech: early suppressive visual effects in human auditory cortex. European Journal of Neuroscience. 2004;20:2225–2234. [PMC free article] [PubMed]
  • Bizley JK, Nodal FR, Bajo VM, Nelken I, King AJ. Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cerebral Cortex. 2007;17:2172–2189. [PubMed]
  • Bruce C, Desimone R, Gross CG. Visual Properties of Neurons in a Polysensory Area in Superior Temporal Sulcus of the Macaque. Journal of Neurophysiology. 1981;46:369–384. [PubMed]
  • Brumm H, Voss K, Kollmer I, Todt D. Acoustic communication in noise: regulation of call characteristics in a New World monkey. Journal of Experimental Biology. 2004;207:443–448. [PubMed]
  • Cappe C, Barone P. Heteromodal connections supporting multisensory integration at low levels of cortical processing in the monkey. European Journal of Neuroscience. 2005;22:2886–2902. [PubMed]
  • Cappe C, Thut G, Romei V, Murray MM. Selective integration of auditory-visual looming cues by humans. Neuropsycholgia. 2009;47:1045–1052. [PubMed]
  • Chandrasekaran C, Ghazanfar AA. Different neural frequency bands integrate faces and voices differently in the superior temporal sulcus. Journal Of Neurophysiology. 2009;101:773–788. [PubMed]
  • Cheney DL, Seyfarth RM. How Vervet Monkeys Perceive Their Grunts - Field Playback Experiments. Animal Behaviour. 1982;30:739–751.
  • Clark A. Being there: putting brain, body, and world together again. MIT Press; Cambridge, MA: 1997.
  • de la Mothe LA, Blumell S, Kajikawa Y, Hackett TA. Cortical connections of the auditory cortex in marmoset monkeys: Core and medial belt regions. Journal of Comparative Neurology. 2006;496:27–71. [PubMed]
  • Driver J, Noesselt T. Multisensory interplay reveals crossmodal influences on ‘sensory-specific’ brain regions, neural responses, and judgments. Neuron. 2008;57:11–23. [PMC free article] [PubMed]
  • Egnor SER, Hauser MD. Noise-induced vocal modulation in cotton-top tamarins (Saguinus oedipus) American Journal of Primatology. 2006a;68:1183–1190. [PubMed]
  • Egnor SER, Iguina CG, Hauser MD. Perturbation of auditory feedback causes systematic perturbation in vocal structure in adult cotton-top tamarins. Journal of Experimental Biology. 2006b;209:3652–3663. [PubMed]
  • Egnor SER, Wickelgren JG, Hauser MD. Tracking silence: adjusting vocal production to avoid acoustic interference. Journal of Comparative Physiology a-Neuroethology Sensory Neural and Behavioral Physiology. 2007;193:477–483. [PubMed]
  • Eliades SJ, Wang XQ. Neural substrates of vocalization feedback monitoring in primate auditory cortex. Nature. 2008;453:1102–1107. [PubMed]
  • Ettlinger G, Wilson WA. Cross-modal performance: behavioural processes, phylogenetic considerations and neural mechanisms. Behavioural Brain Research. 1990;40:169–192. [PubMed]
  • Evans TA, Howell S, Westergaard GC. Auditory-visual cross-modal perception of communicative stimuli in tufted capuchin monkeys (Cebus apella) Journal of Experimental Psychology-Animal Behavior Processes. 2005;31:399–406. [PubMed]
  • Fitch WT. Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. Journal of the Acoustical Society of America. 1997;102:1213–1222. [PubMed]
  • Fitch WT, Hauser MD. Vocal Production in Nonhuman-Primates - Acoustics, Physiology, and Functional Constraints on Honest Advertisement. American Journal of Primatology. 1995;37:191–219.
  • Fowler CA. Speech as a supramodal or amodal phenomenon. In: Calvert GA, Spence C, Stein BE, editors. The handbook of multisensory processes. MIT Press; Cambridge, MA: 2004. pp. 189–201.
  • Fu KMG, Johnston TA, Shah AS, Arnold L, Smiley J, Hackett TA, Garraghty PE, Schroeder CE. Auditory cortical neurons respond to somatosensory stimulation. Journal of Neuroscience. 2003;23:7510–7515. [PubMed]
  • Fu KMG, Shah AS, O’Connell MN, McGinnis T, Eckholdt H, Lakatos P, Smiley J, Schroeder CE. Timing and laminar profile of eye-position effects on auditory responses in primate auditory cortex. Journal of Neurophysiology. 2004;92:3522–3531. [PubMed]
  • Ghazanfar AA, Hauser MD. The neuroethology of primate vocal communication:substrates for the evolution of speech. Trends In Cognitive Sciences. 1999;3:377–384. [PubMed]
  • Ghazanfar AA, Hauser MD. The auditory behaviour of primates: a neuroethological perspective. Current Opinion In Neurobiology. 2001a;11:712–720. [PubMed]
  • Ghazanfar AA, Logothetis NK. Facial expressions linked to monkey calls. Nature. 2003;423:937–938. [PubMed]
  • Ghazanfar AA, Schroeder CE. Is neocortex essentially multisensory? Trends In Cognitive Sciences. 2006a;10:278–285. [PubMed]
  • Ghazanfar AA, Chandrasekaran CF. Paving the way forward: integrating the senses through phase-resetting of cortical oscillations. Neuron. 2007a;53:162–164. [PubMed]
  • Ghazanfar AA, Rendall D. Evolution of human vocal production. Curr Biol. 2008a;18:R457–R460. [PubMed]
  • Ghazanfar AA, Nielsen K, Logothetis NK. Eye movements of monkeys viewing vocalizing conspecifics. Cognition. 2006b;101:515–529. [PubMed]
  • Ghazanfar AA, Chandrasekaran C, Logothetis NK. Interactions between the Superior Temporal Sulcus and Auditory Cortex Mediate Dynamic Face/Voice Integration in Rhesus Monkeys. Journal Of Neuroscience. 2008b;28:4457–4469. [PMC free article] [PubMed]
  • Ghazanfar AA, Flombaum JI, Miller CT, Hauser MD. The units of perception in the antiphonal calling behavior of cotton-top tamarins (Saguinus oedipus): playback experiments with long calls. Journal Of Comparative Physiology A-Neuroethology Sensory Neural And Behavioral Physiology. 2001b;187:27–35. [PubMed]
  • Ghazanfar AA, Smith-Rohrberg D, Pollen AA, Hauser MD. Temporal cues in the antiphonal long-calling behaviour of cottontop tamarins. Animal Behaviour. 2002;64:427–438.
  • Ghazanfar AA, Maier JX, Hoffman KL, Logothetis NK. Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. Journal Of Neuroscience. 2005;25:5004–5012. [PubMed]
  • Ghazanfar AA, Turesson HK, Maier JX, van Dinther R, Patterson RD, Logothetis NK. Vocal tract resonances as indexical cues in rhesus monkeys. Current Biology. 2007b;17:425–430. [PMC free article] [PubMed]
  • Gogate LJ, Walker-Andrews AS, Bahrick LE. The intersensory origins of word comprehension: an ecological-dynamic systems view. Developmental Science. 2001;4:1–18.
  • Gordon MS, Rosenblum LD. Effects of intrastimulus modality change on audiovisual time-to-arrival judgments. Perception & Psychophysics. 2005;67:580–594. [PubMed]
  • Gothard KM, Battaglia FP, Erickson CA, Spitler KM, Amaral DG. Neural responses to facial expression and face identity in the monkey amygdala. Journal of Neurophysiology. 2007;97:1671–1683. [PubMed]
  • Graziano MS, Aflalo TN. Rethinking cortical organization: moving away from discrete areas arranged in hierarchies. The Neuroscientist. 2007;13:138–147. [PubMed]
  • Hackett TA, Stepniewska I, Kaas JH. Subdivisions of auditory cortex and ipsilateral cortical connections of the parabelt auditory cortex in macaque monkeys. Journal of Comparative Neurology. 1998;394:475–495. [PubMed]
  • Hackett TA, Stepniewska I, Kaas JH. Prefrontal connections of the parabelt auditory cortex in macaque monkeys. Brain Research. 1999;817:45–58. [PubMed]
  • Hackett TA, Preuss TM, Kaas JH. Architectonic identification of the core region in auditory cortex of macaques, chimpanzees, and humans. Journal of Comparative Neurology. 2001;441:197–222. [PubMed]
  • Hackett TA, De La Mothe LA, Ulbert I, Karmos G, Smiley J, Schroeder CE. Multisensory convergence in auditory cortex, II. Thalamocortical connections of the caudal superior temporal plane. Journal of Comparative Neurology. 2007a;502:924–952. [PubMed]
  • Hackett TA, Smiley JF, Ulbert I, Karmos G, Lakatos P, de la Mothe LA, Schroeder CE. Sources of somatosensory input to the caudal belt areas of auditory cortex. Perception. 2007b;36:1419–1430. [PubMed]
  • Harries MH, Perrett DI. Visual Processing of Faces in Temporal Cortex - Physiological Evidence for a Modular Organization and Possible Anatomical Correlates. Journal of Cognitive Neuroscience. 1991;3:9–24. [PubMed]
  • Hauser MD, Ybarra MS. The Role of Lip Configuration in Monkey Vocalizations - Experiments Using Xylocaine as a Nerve Block. Brain and Language. 1994;46:232–244. [PubMed]
  • Hauser MD, Evans CS, Marler P. The Role of Articulation in the Production of Rhesus-Monkey, Macaca-Mulatta, Vocalizations. Animal Behaviour. 1993;45:423–433.
  • Ito T, Tiede M, Ostry DJ. Somatosensory function in speech perception. Proc Natl Acad Sci U S A. 2009;106:1245–1248. [PubMed]
  • Iyengar S, Qi H, Jain N, Kaas JH. Cortical and thalamic connections of the representations of the teeth and tongue in somatosensory cortex of new world monkeys. Journal of Comparative Neurology. 2007;501:95–120. [PubMed]
  • Izumi A, Kojima S. Matching vocalizations to vocalizing faces in a chimpanzee (Pan troglodytes) Animal Cognition. 2004;7:179–184. [PubMed]
  • Jiang JT, Alwan A, Keating PA, Auer ET, Bernstein LE. On the relationship between face movements, tongue movements, and speech acoustics. Eurasip Journal on Applied Signal Processing. 2002;2002:1174–1188.
  • Jones JA, Munhall KG. Learning to produce speech with an altered vocal tract: The role of auditory feedback. Journal of the Acoustical Society of America. 2003;113:532–543. [PubMed]
  • Jones JA, Munhall KG. Remapping auditory-motor representations in voice production. Current Biology. 2005;15:1768–1772. [PubMed]
  • Jordan KE, Brannon EM, Logothetis NK, Ghazanfar AA. Monkeys match the number of voices they hear with the number of faces they see. Current Biology. 2005;15:1034–1038. [PubMed]
  • Kayser C, Logothetis NK. Directed interactions between auditory and superior temporal cortices and their role in sensory integration. Frontiers in Integrative Neuroscience. 2009 In press. [PMC free article] [PubMed]
  • Kayser C, Petkov CI, Logothetis NK. Visual modulation of neurons in auditory cortex. Cereb Cortex. 2008;18:1560–1574. [PubMed]
  • Kayser C, Petkov CI, Augath M, Logothetis NK. Integration of touch and sound in auditory cortex. Neuron. 2005;48:373–384. [PubMed]
  • Kayser C, Petkov CI, Augath M, Logothetis NK. Functional imaging reveals visual modulation of specific fields in auditory cortex. Journal Of Neuroscience. 2007;27:1824–1835. [PubMed]
  • Klin A, Jones W, Schultz R, Volkmar F, Cohen D. Visual fixation patterns during viewing of naturalistic social situations as predictors of social competence in individuals with autism. Archives of General Psychiatry. 2002;59:809–816. [PubMed]
  • Kuhl PK, Williams KA, Meltzoff AN. Cross-modal speech perception in adults and infants using nonspeech auditory stimuli. Journal of Experimental Psychology: Human perception and performance. 1991;17:829–840. [PubMed]
  • Kuraoka K, Nakamura K. Responses of single neurons in monkey amygdala to facial and vocal emotions. Journal of Neurophysiology. 2007;97:1379–1387. [PubMed]
  • Lakatos P, Chen C-M, O’Connell MN, Mills A, Schroeder CE. Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron. 2007;53:279–292. [PMC free article] [PubMed]
  • Lakatos P, Shah AS, Knuth KH, Ulbert I, Karmos G, Schroeder CE. An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. Journal of Neurophysiology. 2005;94:1904–1911. [PubMed]
  • Lansing IR, McConkie GW. Word identification and eye fixation locations in visual and visual-plus-auditory presentations of spoken sentences. Perception & Psychophysics. 2003;65:536–552. [PubMed]
  • Liberman AM, Mattingly I. The motor theory revised. Cognition. 1985;21:1–36. [PubMed]
  • Maier JX, Chandrasekaran C, Ghazanfar AA. Integration of bimodal looming signals through neuronal coherence in the temporal lobe. Curr Biol. 2008;18:963–968. [PubMed]
  • Maier JX, Neuhoff JG, Logothetis NK, Ghazanfar AA. Multisensory integration of looming signals by Rhesus monkeys. Neuron. 2004;43:177–181. [PubMed]
  • McGurk H, MacDonald J. Hearing lips and seeing voices. Nature. 1976 Dec;264:229–239.
  • Meltzoff AN, Moore M. Explaining facial imitation: a theoretical model. Early Development & Parenting. 1997;6:179–192. [PMC free article] [PubMed]
  • Nasir SM, Ostry DJ. Speech motor learning in profoundly deaf adults. Nature Neuroscience. 2008;11:1217–1222. [PMC free article] [PubMed]
  • Noesselt T, Rieger JW, Schoenfeld MA, Kanowski M, Hinrichs H, Heinze H-J, Driver J. Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus primary sensory cortices. Journal Of Neuroscience. 2007;27:11431–11441. [PMC free article] [PubMed]
  • Oram MW, Perrett DI. Responses of Anterior Superior Temporal Polysensory (STPa) Neurons to Biological Motion Stimuli. Journal of Cognitive Neuroscience. 1994;6:99–116. [PubMed]
  • Palombit RA, Cheney DL, Seyfarth RM. Male grunts as mediators of social interaction with females in wild chacma baboons (Papio cynocephalus ursinus) Behaviour. 1999;136:221–242.
  • Parr LA. Perceptual biases for multimodal cues in chimpanzee (Pan troglodytes) affect recognition. Animal Cognition. 2004;7:171–178. [PubMed]
  • Patterson ML, Werker JF. Two-month-old infants match phonetic information in lips and voice. Developmental Science. 2003;6:191–196.
  • Petkov CI, Kayser C, Augath M, Logothetis NK. Functional imaging reveals numerous fields in the monkey auditory cortex. Plos Biology. 2006;4:1213–1226. [PMC free article] [PubMed]
  • Rauschecker JP, Tian B, Hauser M. Processing of Complex Sounds in the Macaque Nonprimary Auditory-Cortex. Science. 1995;268:111–114. [PubMed]
  • Recanzone GH. Representation of Con-Specific Vocalizations in the Core and Belt Areas of the Auditory Cortex in the Alert Macaque Monkey. Journal of Neuroscience. 2008;28:13184–13193. [PMC free article] [PubMed]
  • Remedios R, Logothetis NK, Kayser C. An Auditory Region in the Primate Insular Cortex Responding Preferentially to Vocal Communication Sounds. Journal of Neuroscience. 2009;29:1034–1045. [PubMed]
  • Robinson DA, Fuchs AF. Eye movements evoked by stimulation of frontal eye fields. Journal Of Neurophysiology. 1969;32:637–648. [PubMed]
  • Romanski LM, Ghazanfar AA. The primate frontal and temporal lobes and their role in multisensory vocal communication. In: Platt ML, Ghazanfar AA, editors. Primate neuroethology. Oxford University Press; Oxford: 2009. In press.
  • Romanski LM, Bates JF, Goldman-Rakic PS. Auditory belt and parabelt projections to the prefrontal cortex in the rhesus monkey. Journal of Comparative Neurology. 1999;403:141–157. [PubMed]
  • Romanski LM, Averbeck BB, Diltz M. Neural representation of vocalizations in the primate ventrolateral prefrontal cortex. Journal of Neurophysiology. 2005;93:734–747. [PubMed]
  • Rosenblum LD. Primacy of multimodal speech perception. In: Pisoni DB, Remez RE, editors. Handbook of Speech Perception. Blackwell; Malden, MA: 2005. pp. 51–78.
  • Sams M, Mottonen R, Sihvonen T. Seeing and hearing others and oneself talk. Cognitive Brain Research. 2005;23:429–435. [PubMed]
  • Schall JD, Morel A, King DJ, Bullier J. Topography of visual cortex connections with frontal eye field in macaque: convergence and segregation of processing streams. J Neurosci. 1995;15:4464–4487. [PubMed]
  • Schroeder CE, Foxe JJ. The timing and laminar profile of converging inputs to multisensory areas of the macaque neocortex. Cognitive Brain Research. 2002;14:187–198. [PubMed]
  • Schroeder CE, Lakatos P, Kajikawa Y, Partan S, Puce A. Neuronal oscillations and visual amplification of speech. Trends Cogn Sci. 2008;12:106–113. [PMC free article] [PubMed]
  • Schroeder CE, Lindsley RW, Specht C, Marcovici A, Smiley JF, Javitt DC. Somatosensory input to auditory association cortex in the macaque monkey. Journal of Neurophysiology. 2001;85:1322–1327. [PubMed]
  • Schwartz J-L, Berthommier F, Savariaux C. Seeing to hear better: evidence for early audio-visual interactions in speech identification. Cognition. 2004;93:B69–B78. [PubMed]
  • Seltzer B, Pandya DN. Frontal-Lobe Connections of the Superior Temporal Sulcus in the Rhesus-Monkey. Journal of Comparative Neurology. 1989;281:97–113. [PubMed]
  • Seltzer B, Pandya DN. Parietal, temporal, and occipital projections to cortex of the superior temporal sulcus in the rhesus monkey: A retrograde tracer study. Journal of Comparative Neurology. 1994;343:445–463. [PubMed]
  • Sinnott JM, Stebbins WC, Moody DB. Regulation of Voice Amplitude by Monkey. Journal of the Acoustical Society of America. 1975;58:412–414. [PubMed]
  • Smiley JF, Hackett TA, Ulbert I, Karmas G, Lakatos P, Javitt DC, Schroeder CE. Multisensory convergence in auditory cortex, I. Cortical connections of the caudal superior temporal plane in macaque monkeys. Journal of Comparative Neurology. 2006 In press. [PubMed]
  • Smiley JF, Hackett TA, Ulbert I, Karmas G, Lakatos P, Javitt DC, Schroeder CE. Multisensory convergence in auditory cortex, I. Cortical connections of the caudal superior temporal plane in macaque monkeys. Journal of Comparative Neurology. 2007;502:894–923. [PubMed]
  • Sugihara T, Diltz MD, Averbeck BB, Romanski LM. Integration of auditory and visual communication information in the primate ventrolateral prefrontal cortex. J Neurosci. 2006;26:11138–47. [PMC free article] [PubMed]
  • Tremblay S, Shiller DM, Ostry DJ. Somatosensory basis of speech production. Nature. 2003;423:866–869. [PubMed]
  • van Wassenhove V, Grant KW, Poeppel D. Visual speech speeds up the neural processing of auditory speech. Proceedings of the National Academy of Sciences of the United States of America. 2005;102:1181–1186. [PubMed]
  • Werner-Reiss U, Kelly KA, Trause AS, Underhill AM, Groh JM. Eye position affects activity in primary auditory cortex of primates. Current Biology. 2003;13:554–562. [PubMed]
  • Yehia H, Rubin P, Vatikiotis-Bateson E. Quantitative association of vocal-tract and facial behavior. Speech Communication. 1998;26:23–43.
  • Yehia HC, Kuratate T, Vatikiotis-Bateson E. Linking facial animation, head motion and speech acoustics. Journal of Phonetics. 2002;30:555–568.