Our ability to recognize and integrate auditory and visual stimuli is the basis for many cognitive processes, but is especially important in meaningful communication. In the present study, we investigated the integration of audiovisual communication stimuli by single cells in the primate frontal lobes. We determined that some neurons in the primate VLPFC are bimodal and respond to both auditory and visual stimuli presented either simultaneously or separately. Some of the stimuli that evoked these prefrontal multisensory responses were rhesus macaque faces and vocalizations that have been shown previously to elicit robust responses from macaque VLPFC neurons when presented separately (
O’Scalaidhe et al., 1997,
1999;
Romanski and Goldman-Rakic, 2002;
Romanski et al., 2005). In the present study, VLPFC multisensory neurons exhibited enhancement or suppression, and it was found that face/vocalization stimuli evoked multisensory responses more frequently than nonface/nonvocalization combinations when both were tested. This adds support to the notion that VLPFC may be specialized for integrating face and vocalization information during communication and sets it apart from other brain regions that integrate sensory stimuli.
Although a lesion study (
Gaffan and Harrison, 1991) suggested the importance of the lateral PFC in sensory integration, only a small number of studies have examined the cellular basis for integrative processing in the primate PFC. An early study by
Benevento et al. (1977) found neurons in the lateral orbital cortex (area 12 orbital) that were responsive to simple auditory and visual stimuli and that at least some of these interactions were attributable to convergence on single cortical cells. Fuster and colleagues recorded from the lateral frontal cortex during an audiovisual matching task (
Bodner et al., 1996;
Fuster et al., 2000). In this task, prefrontal cortex cells responded selectively to tones, and most of them also responded to colors according to the task rule (
Fuster et al., 2000). However, the data presented here are the first to examine the integration of audiovisual communication information at the cellular level in the primate VLPFC.
In the present study, multisensory and unimodal neurons were colocalized in VLPFC and were coextensive with previously identified vocalization and face-cell responsive zones (). Some cells, which appeared unimodal when tested with auditory or visual stimuli separately, had robust responses to simultaneously presented audiovisual stimuli (). Cells may be incorrectly categorized as unimodal if they are not tested with additional stimuli in an appropriate paradigm. Thus, in future studies, more VLPFC cells may prove to be multisensory, given the importance of task demands on prefrontal responses (
Rao et al., 1997;
Rainer et al., 1998).
Only a small number of neurons were unimodal auditory (14 of 387), whereas a larger proportion was unimodal visual (194 of 387), consistent with previous data showing mostly visual responsive cells in the VLPFC with a small auditory responsive zone located anterolaterally within area 12/47 (
O’Scalaidhe et al., 1997;
Romanski and Goldman-Rakic, 2002). In the current study, visual neurons were responsive to pictures of faces and nonface objects (
n = 46 cells), movies depicting biological motion (
n = 47 cells), or to both static and dynamic visual stimuli (
n = 101 cells). The unimodal and multimodal visual motion cells recorded here suggest a potential role for the VLPFC in the perception and integration of biological motion. Multisensory neurons that respond to face and body movement, as well as auditory stimuli, have been recorded downstream from the VLPFC in the dorsal bank of the STS (
Oram and Perrett, 1994;
Barraclough et al., 2005), an area that is reciprocally and robustly connected with the VLPFC (
Petrides and Pandya, 1988;
Selemon and Goldman-Rakic, 1988;
Cavada and Goldman-Rakic, 1989;
Seltzer and Pandya, 1989;
Barbas, 1992;
Cusick et al., 1995;
Hackett et al., 1999;
Romanski et al., 1999a). This connection makes it possible that the VLPFC may receive already integrated audiovisual information from STS. Alternatively, the multisensory responses in the VLPFC could be a result of the integration of separate unimodal auditory and visual afferents, which target the VLPFC (
Selemon and Goldman-Rakic, 1988;
Webster et al., 1994;
Hackett et al., 1999;
Romanski et al., 1999a,
b). The data presented here do not distinguish between these two cellular mechanisms for prefrontal audiovisual responses, and additional studies are needed to determine whether VLPFC cells perform the audiovisual integration or receive already integrated signals.
Our current results that some VLPFC multisensory neurons are selective for face and voice stimuli are in agreement with human functional magnetic resonance imaging (fMRI) studies indicating that a homologous region of the human brain, area 47 (pars orbitalis) is specifically activated by human vocal sounds compared with animal and nonvocal sounds (
Fecteau et al., 2005). In contrast, the STS appears to be specialized for integrating general biological motion (
Oram and Perrett, 1994; Barraclough et al., 2004) rather than solely communication stimuli, whereas the multisensory responses in the auditory cortex, which receives afferents from a number of cortical areas (
Petrides and Pandya, 1988;
Hackett et al., 1999;
Romanski et al., 1999b) may be a product of these top-down cortical inputs (
Ghazanfar et al., 2005). Thus, each cortical node in a sensory integration network may contribute uniquely to the processing of multisensory communication stimuli.
In both the STS and in the auditory cortex, biologically relevant audiovisual stimuli elicited multisensory enhancement in some cases and multisensory suppression in others (
Barraclough et al., 2005;
Ghazanfar et al., 2005), similar to what we have shown in the VLPFC. In the STS,
Barraclough et al. (2005) found that neurons that exhibited multisensory enhancement, but not suppression, were strongly affected by stimulus congruence.
Ghazanfar et al. (2005) suggested that multisensory suppression occurred more frequently with stimuli that had long VOTs. There was no correlation between VOT and the occurrence of suppression in the current study. Data from studies in the cat superior colliculus have suggested that enhancement occurs when multisensory stimuli are temporally synchronous and originate from the same region of space (
Meredith et al., 1987;
Meredith and Stein, 1986;
Stanford et al., 2005). Several fMRI studies suggest that congruent multisensory communication stimuli (human vocalizations with corresponding mouth movements) induce enhanced activity, whereas incongruent multisensory stimuli result in decreased activations (
Calvert et al., 2001). However, this does not hold true in all studies (
Miller and D’Esposito, 2005;
Ojanen et al., 2005). In their fMRI analysis of multisensory perception,
Miller and D’Esposito (2005) asked subjects to evaluate temporally asynchronous stimuli and to categorize the stimuli as occurring simultaneously (fused percept) or sequentially (unfused percept). Some brain regions demonstrated an increase in activation when the stimuli were judged as fused and a decrease when stimuli were judged as unfused. The prefrontal cortex, however, showed the opposite effect whereby unfused (incongruent) percepts resulted in an increase and fused (congruent) percepts a decrease in activation.
Ojanen et al. (2005) also noted a decrease in activation when subjects viewed congruent stimuli and an increase in activation during incongruent audiovisual speech stimuli in the prefrontal cortex. We found a higher proportion of multisensory suppressed cells compared with enhanced cells during the viewing of congruent audiovisual stimuli in the prefrontal cortex. The occurrence of multisensory suppression with congruent audiovisual stimuli could be attributable to the stimulus in one sensory modality acting as a distractor for the processing of the other modality or suppression might be seen as a mechanism of neuronal efficiency much like the neuronal suppression attributable to familiarity in inferotemporal cortical neurons (
Ringo, 1996). Alternatively, because our stimuli are presented randomly as separate and conjoined audiovisual stimuli, neuronal activations in the VLPFC may reflect an “unfused” percept of the face and vocal stimuli, which could lead to suppression rather than enhancement (
Miller and D’Esposito, 2005). Furthermore, multisensory suppression might be more likely to occur with the use of “optimum” stimuli, as in the present study. If degraded stimuli were presented, making recognition difficult on the basis of one “degraded’ modality, the simultaneous bimodal stimulus presentation could lead to more superadditive responses, because unimodal responses would be decreased and recognition would be facilitated by the addition of a second modality, or more information. Specifically, if the neuronal response is related to the ability to discriminate the call being presented, either visual or auditory information may be sufficient in our experiments, and the addition of the other modality, in the multisensory condition, may not increase the response, in which case the neuron would appear to be multisensory suppressed in our analyses.
The principle of superadditivity, in which multisensory responses exceed the sum of the linear additive responses to the unimodal stimuli, has been advocated by some to be a requirement for brain regions involved in multisensory integration (
Calvert and Thesen, 2004;
Laurienti et al., 2005). Although many VLPFC neurons exhibited multisensory suppression, 27% of multisensory VLPFC cells were superadditive, suggesting that VLPFC is among the candidate brain regions involved in multi-sensory integration even when strict criteria are applied. Referring to both enhanced and suppressed neuronal populations as nonlinear preserves the idea that, in either case, the interaction could not be explained by the simple, linear sum of the unimodal components.
In conclusion, communication-relevant auditory and visual stimulus information reaches single cells of the VLPFC of the rhesus monkey. Integration of congruent audiovisual stimuli is achieved in the form of suppression or enhancement of the magnitude of neuronal responses. Additional work aimed at understanding the mechanism of sensory integration in the frontal lobes of nonhuman primates may provide us with an understanding of object recognition and speech perception in the human brain, which critically depends on the integration of multiple types of sensory information.