Here we illustrate that multiple factors affect how information is combined both within and across the auditory and visual domains. While many studies have considered how this is achieved either within one sensory system or across sensory systems, very few have considered both within- and across-modality integration. We asked whether the rules that determine object formation also influence the across-sensory illusions by exploring how spatial configuration affects the strength of sound-induced flash illusions.
By asking subjects to attend to one of two spatially separated streams and to report the number of auditory and visual events that occurred, we are able to assess how the presence of different competing stimuli affect visual and auditory temporal perception. We find that when observers are asked to report the number of events from a stream in one direction and in one modality, attentional focus is imperfect: subjects systematically misreport the number of events from the attended direction when a competing within-modality stream contains a conflicting number of events. In both auditory and visual conditions, the competing streams are readily separable, appearing from opposite hemifields and with distinct content (timbre or color); thus, these illusory percepts show that spatial attention does not fully suppress competing streams even when they are distinct. When we quantify this with a measure of “leakiness,” comparing the perceptual contributions of the stream that was supposed to be attended and the stream that was supposed to be ignored, we find that the stream from the to-be-ignored direction had a weight that is nearly 50% of the weight given to the to-be-attended stream, irrespective of whether subjects are counting auditory or visual stimuli. In addition, this within-modality leakiness is equally strong in the unisensory and cross-modal conditions.
When subjects are presented with both flashes and beeps and asked to report the number of each kind of event from one direction, the number of beeps biases the number of flashes perceived, but not the opposite. Thus, our results show perceptual integration both within and across sensory modalities and confirms that when making temporal judgments, audition has a much stronger effect on vision than vice versa.
One previous study demonstrated that a second flash at a different location could elicit an illusory flash, as we found in our unisensory conditions; however, this study found no evidence of a within-modality flash fusion effect (Chatterjee et al. 2011
). From this, the authors concluded that different underlying processes cause within-vision and cross-modal effects. However, in our study, we find both unisensory illusory flashes and flash fusion, perhaps because of procedural differences in the studies (e.g., here, both attended and unattended stimuli were presented at equal eccentricities, all possible combinations of single and double stimuli were tested, etc.). Regardless, our results are consistent with the possibility that there is a common mechanism underlying within- and across-modality effects integration of temporal information. That said, some previous studies have illustrated that seemingly similar cross-sensory effects can arise through different underlying mechanisms (Odgaard et al. 2003
; Odgaard et al. 2004
). Thus, even though we find striking similarities between within-modal effects and the influence of auditory information on visual perception, we cannot definitively conclude that the underlying physiological processes are the same.
Neuroimaging studies reveal potential neurophysiological correlates of sound induced visual illusions. fMRI studies isolate activity in V1 during both illusory flash trials and flash fusion trials that correlates both with the reported number of flashes and the activity seen in response to non-illusion trials (Watkins et al. 2006
; Watkins et al. 2007
). EEG studies have shown enhanced oscillatory activity within visual cortex during illusory trials (Shams et al. 2001
; Bhattacharya et al. 2002
) and suggest that a second sound event induces a complex interplay between auditory and visual cortices, resulting in the illusory flash. These studies reveal aspects of the neural basis for the cross-modal interactions that are observed psychophysically. The qualitative similarity between the within-modality effects that we observe and the across-modality effect that auditory perception has on visual perception suggests that similar neural computations are engaged during within- and across-modality interactions. Such interactions may be the functional consequence of the large-scale cross talk that has recently been observed between putatively unisensory cortices (Schroeder and Foxe 2005
). We believe our experimental paradigm can be exploited to explore the neural basis of across- and within-modality integration of sensory information in physiological, neuroimaging, and behavioral paradigms.
We find that visual judgments are more affected by auditory events on the to-be-attended hemifield than by auditory events in the opposite hemifield; however, the influence of auditory streams on visual perception depends on the temporally coincidence of the streams in the two modalities. We therefore argue that the phenomena we observe are consistent with a set of general object-formation principles that cause sensory inputs that occur closely together in time or space to be more tightly bound together into a single cross-modal object. In particular, we suggest that the perceived properties of such an object are derived from a weighted combination of the underlying cues, with the weight given to a particular cue determined by its relative reliability.
Previous studies have suggested that the flash-beep illusion is unaffected by the spatial configuration of the auditory and visual sources (Innes-Brown and Crewther 2009
). However, by presenting competing streams from opposite hemifields and asking observers to report the number of events from one of the visual streams, we demonstrate that spatial separation between the auditory and visual streams influences the strength of the flash-beep illusion. Given that spatial proximity strongly affects how sensory inputs group into perceptual objects (Marr 1982
; Bregman 1990
; Shinn-Cunningham 2008
), we suggest that object formation principles influence the likelihood of perceiving sound-induced visual illusions. Moreover, our subjects reported the perception that the visual flashes were ‘captured’ by the auditory beeps and that the location of the auditory beeps were, in a manner consistent with the ventriloquism illusion (Bertelson and Radeau 1981
), captured by the visual flashes, lending subjective evidence to the claim that the auditory bias of visual temporal perception is related to object formation.
Temporal synchrony also affects object formation (Alain and Arnott 2000
; Blake and Lee 2005
; Shinn-Cunningham 2008
; Hupe and Pressnitzer 2012
). Previous studies show that auditory-induced flash illusions are most salient when the onsets of the stream of flashes and the stream of beeps fall within milliseconds of one another (Shams, Kamitani et al. 2002
), consistent with the idea that across-sensory grouping plays a role in these illusions. Here, in most of the trials, the streams begin at nearly the same moment, promoting integration of the streams; this synchrony enhances both within- and across-modality integration. However, when the flashes were delayed relative to the beeps (), across-modality integration decreased.
A number of past studies of auditory formation show that when object-formation cues are ambiguous or conflicting, some cues can contribute to the perceived qualities of multiple perceptual objects (Darwin and Ciocca 1992
; Hall et al. 2000
; Shinn-Cunningham and Wang 2008
). In our design, various grouping cues are at odds with one another: temporal synchrony promotes grouping; spatial proximity varies, altering the likelihood of grouping; stimulus modality, color, and/or timbre strongly promote segregation of the streams into distinct objects. The inherent conflict of grouping cues may help explain why, in our experiments, there is “leakage” of information across streams, both within and across modality, even though the observers all subjectively perceive four different streams in the mixture.
Our aim was to see to what extent across-modal interactions were obligatory; for this reason, we explicitly instructed listeners that the number of auditory and visual events in a trial were independent, and asked them to report the perceived number of flashes (which were likely to be more influenced by across-modal effects) before reporting the number of beeps. We simulated auditory streams at 30 degrees left and right of midline to ensure that they were easily segregated, while the visual streams were at 10 degrees left and right of midline to ensure that they were easy to see while eye gaze was maintained straight ahead; of course, together, these decisions lead to a spatial mismatch between auditory and visual events. In contrast, in natural settings, a priori expectations are that one distal source will produce temporally correlated auditory and visual stimuli arising from exactly the same direction. Thus, our subject instructions and stimulus design are likely to produce weaker binding of auditory and visual inputs than would arise in natural settings, and likely represent a lower-bound on the extent to which cross-modal stimuli are integrated. Despite this, we observed a strong influence of temporal auditory information on visual temporal perception.
Taken together, these observations raise testable predictions about how parametric manipulation of spatial and temporal congruence, which alter multi-sensory object formation, will affect illusory percepts. For instance, the degree of temporal coincidence will influence both within- and across-modality interactions. In addition, just as spatial acuity of a visual input influences the “reverse-ventriloquism” illusion (Alais and Burr 2004
), reducing the reliability of visual temporal perception (e.g., by altering visual contrast) should increase the influence of the auditory stream on judgments of the number of flashes, while decreasing the reliability of auditory temporal perception is likely to increase the likelihood that visual events influence sound perception. Finally, increasing the semantic congruence of our stimuli should lead to greater integration across sensory modalities, and hence strong across-modality interactions in perception. These ideas can be tested and contrasted in behavioural, physiological, and neuroimaging experiments to reveal the neural mechanisms that enable across-modality perceptual binding and perceptual integration.