|Home | About | Journals | Submit | Contact Us | Français|
In temporal ventriloquism, auditory events can illusorily attract perceived timing of a visual onset [1-3]. We investigated whether timing of a static sound can also influence spatio-temporal processing of visual apparent motion, induced here by visual bars alternating between opposite hemifields. Perceived direction typically depends on the relative interval in timing between visual left-right and right-left flashes (e.g. rightwards motion dominating when left-to-right inter-flash intervals are shortest ). In our new multisensory condition, inter-flash intervals were equal, but auditory beeps could slightly lag the right flash, yet slightly lead the left flash, or vice-versa. This auditory timing strongly influenced perceived visual motion direction, despite providing no spatial auditory motion signal whatsoever. Moreover, prolonged adaptation to such auditorily-driven apparent motion produced a robust visual motion aftereffect in the opposite direction, when measured in subsequent silence. Control experiments argued against accounts in terms of possible auditory grouping, or possible attention capture. We suggest that the motion arises because the sounds change perceived visual timing, as we separately confirmed. Our results provide a new demonstration of multisensory influences on sensory-specific perception , with timing of a static sound influencing spatio-temporal processing of visual motion direction.
In classical demonstrations of visual apparent motion , motion can be perceived between two alternately flashing bars (Fig. 1a). The dominant direction of perceived motion can depend critically on visual timing, specifically the visual Stimulus Onset Asynchrony between Left-then-Right flashes (vSOALR), relative to Right-then-Left flashes . Thus, rightwards motion will typically dominate when vSOALR = 333ms and vSOARL = 666ms, but leftwards dominates for the converse timing (Figs 1b & c ). In our main multisensory experiment, the vSOA between alternating flashes was constant and symmetric at 500ms (Fig. 1d), but each flash could be paired with an auditory beep (from a central loudspeaker), one beep lagging a particular flash onset by 83ms and the next beep leading the next flash onset by the same asynchrony (see Figs. 1e/f). The beeps thus had their own asymmetric auditory SOAs (e.g. aSOALR = 333ms and aSOARL = 666ms, or vice versa), which differed between conditions (Figs 1e/f), by the same amount as for the asymmetric visual SOAs in the purely visual version (cf. Figs 1b/c, and see on-line animated supplementary demonstration). Given the phenomenon of ‘temporal ventriloquism’ [2,3], we hypothesized this might influence the perceived timing of visual onsets to create an illusory inequality between perceived vSOARL and vSOALR (despite the veridical visual equality, in our multisensory conditions). Our new question was whether such putative illusory shifts in visual onsets, determined solely by auditory timing, could determine the direction of perceived visual motion, analogous to when manipulating visual timing instead (readers may assess the auditory-timing effect from the demonstration movie). In addition to measuring “on-line” percepts of visual motion during exposure with beeps (Experiment 1), we also tested for any subsequent aftereffects produced by such exposure, on later judgements for purely visual displays in silence (Experiment 2).
On-line judgements (Experiment 1) are summarized as average proportion of ‘rightwards’ responses across all 8 observers in Figure 2a, for displays in which either the actual visual timing or instead just the auditory timing were manipulated. As classically found , observers reported rightwards motion more than leftwards when vSOALR was less than vSOARL (i.e. with asymmetric visual timing, rightmost two datapoints in Fig 2a). The new finding was that when using symmetric visual timing instead, and manipulating solely the auditory timing, this too had an analogous impact (leftmost two datapoints in Fig 2a) on judgements of visual motion direction. This multisensory effect arose even though absolutely no biasing auditory motion was presented [unlike 7,8,9,10,11], only a change in auditory timing relative to the regular visual events. The main effect of our timing manipulation was highly significant [F(1,7) = 60.01, p>0.0001] and the magnitude of this effect did not depend on the modality manipulated [2×2 interaction: F(1,7) = 0.68, n.s.]; see Fig 2a. Thus, manipulating just the auditory timing (i.e. to determine which flash was slightly lagged by a sound, and which flash was slightly led by a sound, by 83ms in both cases), had an analogous impact on perception of visual motion direction as a corresponding change in visual timing itself.
Response latencies for the judgements in Experiment 1 tended to be slow, in accord with the task of making unspeeded judgements for cyclical stimuli (mean latency in the Visual Timing conditions was 3.17sec, SE 0.55; for the Auditory Timing condition the corresponding mean was 2.67sec, SE 0.33, which did not differ significantly [t(7) = 1.51, ns]. While unspeeded tasks potentially allow time for top-down influences, we conducted several further control experiments that argue against specific top-down accounts, based on putative perceptual grouping or attentional capture (see Supplementary Materials, and brief description later below).
First, however, we further assessed the influence of auditory timing on processing of visual motion, via a quantitative measure of the strength of any subsequent visual aftereffects (Experiment 2). The stimulus sequences from which we had obtained the above results (i.e. with symmetric visual timing plus sounds that manipulated auditory timing; or during vision-only with asymmetries in visual timing), can now be reconsidered as adapting ‘exposure’ phases. We had by design interleaved these with short silent test sequences, used to measure any visual aftereffects induced by the preceding, more prolonged exposure phase. We measured any visual aftereffects in silence with a psychophysical nulling procedure, in which visual SOA was varied pseudorandomly over 5 equally-spaced values, according to the method of constant stimuli. The goal was to obtain a measure of the strength of any directional aftereffects, by finding the value of vSOALR at which any illusory aftereffect may now be perfectly cancelled (so that the observer then reports leftwards and rightwards motion with equal frequency; see Supplementary Materials and Fig S1 for sample psychometric functions from illustrative individuals).
The group results from this aftereffect procedure are summarized in main Figure 2b, which plots the mean shift of nulling point vSOA, with positive values denoting a shift in the direction predicted for an aftereffect given the prior exposure direction (i.e. with an opposite aftereffect predicted ). After a prolonged exposure phase that had a particular asymmetry in visual SOAs, the mean nulling point when measuring any aftereffect was indeed shifted in the opposite direction to the predominant exposure motion. The new result was that an analogous visual aftereffect was produced following exposure to symmetric visual timing, now with just auditory timing (i.e. slight lags and leads of the beeps by 83 ms, relative to the paired flashes) having crossmodally induced the directional motion percept during exposure. Statistical tests confirmed a significant main effect of adapting direction [F(1,7) = 18.30, p<.004], but no interaction with the modality manipulated during exposure [F(1,7) = 1.73, n.s.]; see Fig 2b. This provides further evidence that manipulating just the auditory timing of a static sound (without any auditory motion whatsoever) can impact on visual motion processing, with apparently similar effects to manipulations of visual timing. Moreover, the aftereffect cannot be readily explained by response bias.
Supplementary Materials provide further details of three additional control experiments, summarised briefly here. Experiment 3 used a duration-discrimination task to confirm that our manipulation of beep timing did indeed influence the perceived timing of flashes, as we expected. Experiments 4 and 5 provide some evidence against specific alternative ‘high-level’ accounts of the phenomena, in terms of possible influences from perceptual grouping, or potential attention-capture, that might otherwise potentially have made some intervals between flashes more salient than others, without impacting on perceived visual timing. The final experiment also demonstrates that beeps which lag flashes influence visual motion more effectively than beeps that lead in time, in accord with an observed asymmetry in past studies of temporal ventriloquism [2,12] (which unlike here did not consider the implications for motion).
While several prior studies reported other possible audio-visual interactions in motion processing, none made our current point that changes solely in auditory timing (for static sounds, without any auditory motion) can influence the perceived direction of visual motion, probably via an influence on apparent visual timing. Some studies showed that static sounds and their timing can bias subjective ratings of apparent motion strength [e.g. 13,14, see 15], but here we showed that auditory timing alone can determine direction for perceived visual motion. Our new paradigm also allowed us to compare the impact from auditory-timing versus visual-timing on perceived motion direction and on aftereffects for visual motion, finding these impacts to be analogous.
Several prior studies had investigated whether audition can influence visual motion direction but reported little or no effect [e.g. 7,8,9], although others reported more positive outcomes [10,11]. However, all those studies had manipulated the spatial relation between sound and vision. Here we did not present any auditory motion, manipulating only the timing of beeps from a single source, so that they could slightly lag or lead visual events, to produce ‘temporal ventriloquism’ [1-3], particularly when lagging (see Experiment 5 in Supplementary Materials). The observed influences of audition on perceived timing for vision (and thereby on visual motion direction as shown here) may accord with hypothesis of ‘modality appropriateness’ , given that audition typically has superior temporal resolution to vision. Our results provide decisive new evidence for the recent proposal  that auditorily-driven temporal ventriloquism may drive visual motion, an idea that was solely based on non-directional phenomenal ratings hitherto.
Other recent studies show that auditory onsets can cause anomalous visual phenomena [1,17,18], such as one flash appearing to be two if presented with two sounds in quick succession. Our own study bears only an abstract similarity to that (in showing an auditory influence on visual perception), differing in many specifics of the methods used and conclusions reached (e.g. here we varied only the relatively subtle ±83ms audio-visual timing relation, not presence versus absence of a second beep; moreover we measured the impact of auditory timing on perceived direction of visual motion; and we were able to show an auditorily determined visual aftereffect). Only one previous study  to our knowledge has tested whether audition can determine some visual-motion aftereffects, but found none. Unlike here, that study concerned motion in depth rather than laterally, and presented auditory motion during exposure (i.e. an apparently looming sound), whereas here we manipulated only auditory timing rather than auditory motion. Thus, while our study accords generally with a burgeoning multisensory literature [5,20] in showing that the senses are more intimately linked than traditionally thought, our specific demonstrations and conclusions here are novel, in showing that timing of a static auditory stimulus can determine direction of perceived visual motion, and induce directional visual aftereffects. We found that changing just the timing of static auditory events can substantially affect processing of visual motion, to affect on-line visual perception of motion direction and also to induce visual aftereffects. We propose that these motion effects may reflect the well-established influence of auditory timing on perceived visual timing [1-3]. Our new effects illustrate that auditory timing information can penetrate substantially into visual processing of motion.
8 healthy young adults (5 male) participated for financial compensation. None had previous experience of similar tasks, and all were naïve to the purposes of this study.
The visual display was a 21″ CRT (Sony GDM-F520), using video mode 1024 × 768 at 75hz, viewed in a dark chamber from 57cm. Auditory signals were emitted by a single PC speaker positioned 30cm below the display, out of view of the observer. On-line eye-tracking (100hz) was provided by an infra-red video camera set into the chin-rest (Cambridge Research Systems, Reading), trained on the left eye. Experimental control was provided by a PC running Matlab 7.1 with the Cogent Graphics toolbox (http://www.vislab.ucl.ac.uk/Cogent2000), and the CRS Video Eyetracker Toolbox. A white fixation point (diameter 0.4 deg, luminance 120 cdm-2) was displayed 13 degrees below the centre of the display before and during each trial. Visual bar stimuli were red (CIE chromaticity coordinates x=.611, y=.35) with luminance of 8 cdm-2 on a black background (0.2 cdm-2). The auditory signal comprised a rectangular-windowed 480Hz sine-wave carrier, sampled at 22kHz with 8-bit quantization.
In Experiments 1 and 2 (run as different parts of the same study), Visual Timing or Audio Timing conditions (see Fig. 1 for illustrations) were blocked in a pseudo-random order that was counterbalanced over subjects. Direction of ‘exposure’ motion (whether determined by visual or auditory SOAs) was also counterbalanced across block orders. Each observer performed 4-8 blocks for each of the four conditions (i.e. leftwards versus rightwards motion induced by visual or auditory timing). Each block consisted of 20 pairs of exposure-then-subsequent-aftereffect phases. The duration of the first exposure phase was 30 secs, followed by an aftereffect measure lasting 5 secs, but all subsequent exposure and aftereffect pairs in a block lasted for 5 seconds each. In order to engage continued attention during exposure, observers were instructed to continually hold down one of two response keys (left-arrow or right-arrow on a standard PC keyboard), corresponding to the direction of motion they were currently experiencing (or no key if ambiguous). For simplicity, only the initial keypress was used for analysis, as this gave similar results to more complex analyses. Very rarely (1% of trials) was no key depressed. Exposure phases (with continuous responding throughout) were followed, after a 500ms empty fixation display, by a silent 5-second after-effect phase, and then a blank interval which remained until a two-alternative forced choice response was made using the same response keys. For these after-effect phases, observers were instructed to only respond after the visual sequence had terminated.
At the start of each trial, both bars were presented simultaneously in silence for 500ms. A sequence of visual flashes and auditory beeps then commenced with the offset of one randomly selected bar. Commencing with an offset helped to avoid a phenomenon observed in piloting, with sequences that originally began with a single visual onset, whereby motion tended to be reported initially in a direction away from that initial onset regardless of visual SOA.
In Experiments 1 and 2, each event-cycle lasted 1000ms. Each 1000ms event-cycle comprised a Left and Right bar each appearing in alternation for 200ms with empty gaps of 300 ms (Figs. 1a, 1d). In each Visual Timing exposure sequence, the SOA between Left and Right bars (vSOALR) was set to either 500 + 166ms (=666 ms) leading to leftward apparent motion (see Fig 1b); or to 500 - 166ms (=333 ms) for rightward apparent motion (see Fig. 1c). Note that the sum of vSOARL and vSOALR was always 1000ms in Experiments 1 and 2, hence a short vSOARL entailed a long vSOALR, and vice versa. For the Auditory Timing exposure sequences, vSOAs were fixed symmetrically at 500ms (Fig. 1d) but 60 ms auditory beeps from a fixed central source were each paired with left or right flashes, with an 83ms lag or lead in onset, relative to the paired flash. When one beep lagged the onset of the left bar by 83ms, while the next beep preceded the right bar by the same interval, and so on, we found that this could induce rightwards visual motion. In this case aSOARL (interval between sounds paired to left and right flashes) was equal to 333 ms and aSOALR was 666 ms; the opposite motion arose when aSOALR was 333ms, and aSOARL was 666 ms (Fig. 1e/f). During aftereffect sequences (Experiment 2), no sounds were played but vSOARL was varied pseudorandomly in five steps across a range of 500±83ms.
Prior to the experiment, all observers were initially shown silent sequences with a gross visual SOA asymmetry (e.g. vSOALR or vSOARL = 267ms) and told that these displays could sometimes appear to have either leftwards or rightwards motion. Without further prompting all observers could readily discriminate between SOAs consistent with either leftward or rightward motion and likewise for more subtle visual SOA asymmetries.
No feedback was given during the main experiment, apart from occasional reminders to maintain central fixation during stimulus displays. Observers were not told that this was an experiment concerning adaptation nor aftereffects, and were given no information leading them to expect that any one sequence should affect their perception of a subsequent one in any particular direction.
Eye data were processed using the ILAB toolbox for Matlab , filtering for blinks and periods of signal loss. Horizontal and Vertical eye positions were then each analysed in an ANOVA with repeated-measures factors of apparent-motion-direction and inducing-modality during exposure (i.e. visual or auditory SOA asymmetries). There were no significant differences in mean eye position between conditions.
Behavioural data were obtained for observer’s responses in each of the exposure (Experiment 1) and aftereffect phases (Experiment 2), split by exposure condition (Auditory or Visual Timing manipulated) and the SOA during exposure (i.e. predicted to induce either rightward or leftward motion). For the exposure phases, responses were summarized according to the percentage of ‘rightwards’ responses (see Figure 2a). For aftereffect phases (which tested a range of visual SOAs in silence), the percentage of reporting ‘right’ apparent motion versus ‘left’ was calculated for each visual SOA (varied over 5 values across a range of 500±83ms), and initially considered as a psychometric function of the visual SOA in the test stimulus (see supplementary Figure S1 for examples). The nulling-point at which a given vSOA results in equal probability of ‘left’ versus ‘right’ responses was then estimated via maximum-likelihood logistic fit of the psychometric function. Shifts in nulling-point vSOALR were summarized relative to the unadapted nulling point vSOA (i.e. 500ms), with positive values indicating shifts in the predicted aftereffect direction given the prior exposure phase (see Fig 2b). The effect of exposure modality on these nulling-point shifts was then analysed by ANOVA.
This research was funded by the BBSRC (JD & EF), plus the MRC and the Wellcome Trust (JD). JD also holds a Royal Society Leverhulme Trust Senior Research Fellowship. We are grateful to two anonymous reviewers for insightful comments, and to Christopher Bolton for assistance in data collection.