Twenty four neurologically normal subjects (10 female, mean age 24) participated after written informed consent in accord with local ethics. Visual stimulation was in the upper left hemifield for 12 subjects, in the upper right for the other 12. This was presented at the top of the MR-bore via clusters of 4 optic fibres arranged into a rectangular shape, and 5 interleaved fibres arranged into a cross shape, 2
0 above the horizontal meridian at an eccentricity of 18
0. Visual stimuli were presented peripherally, which may maximise the opportunity for interplay between auditory and visual cortex (see
Falchier et al., 2002), and also allowed us to test for any contralaterality in effects for one visual field or the other. The peripheral fibre-optic endings could be illuminated red or green with a standard luminance of 40 cd/m
2 and were 1.5
0 in diameter (see for schematics of the resulting colored ‘shapes’). Streams of visual transients were produced by switching between the differently colored cross and rectangle shapes (red and green respectively in , but shape-color was counterbalanced across subjects). Throughout each experimental run, subjects fixated a central fixation cross of ~0.2
0 in diameter. Eight red-green (cross/square) reversals occurred in a 2 second interval, with the SOA between each successive color-change ranging in a pseudorandom fashion from 100 to 500 ms (mean reversal rate of 4 Hz, with rectangular distribution from 2 to 10 Hz, but note that reversal rate was never constant for successive transients), to produce a uniquely jittered timing for each 2 sec segment. Auditory stimuli were presented via a piezo-electric speaker inside the scanner, just above fixation. Each auditory stimulus was a clearly audible 1 kHz sound-burst with duration of 10 ms at ~70 dB. Identical temporally-jittered stimulation sequences within vision and/or audition were used in all conditions overall (fully counterbalanced), so that there was no difference whatsoever in temporal statistics between conditions, except for the critical temporal relation between auditory and visual streams during multisensory trials (unisensory conditions were also included, see below).
The experimental stimuli (for the visual-only baseline, auditory-only baseline, and for audio-visual temporal correspondence (AVC) or non-correspondence (NC)) were all presented during silent periods (2 s) interleaved with scanning (3 s periods of fMRI acquisition) to prevent scanner-noise interfering with our auditory stimuli or perception of their temporal relation with visual flashes. In the AVC condition, a tone burst was initiated synchronously with every visual transient (see ) and thus had exactly the same pseudorandom temporal pattern. During the NC condition (), tone bursts occurred with a different pseudo-random temporal pattern (but always having the same overall temporal statistics, including mean rate of 4 Hz within a rectangular distribution from 2 to 10 Hz), with a minimal protective ‘window’ of 100 ms now separating each sound from onset of a visual pattern-reversal ().
This provided clear information that the two streams were either strongly related, as in the AVC condition (such perfect coincidence for the erratic temporal patterns is exceptionally unlikely to arise by chance); or were unrelated, as for the NC condition. During the latter non-coincidence, up to two events in one stream could occur before an event in the second stream had to occur. The mean 4 Hz stimulation rate used here, together with the constraints (protective window, see ) implemented to avoid any accidental synchronies in the non-corresponding condition, should optimise detection of audio-visual correspondence versus non-correspondence (see
Fujisaki et al., 2006), while making these bimodal conditions otherwise identical in terms of the temporal patterns presented overall to each modality. All sequences were created individually for each subject using Matlab 6.5. Piloting confirmed that the correspondence versus non-correspondence relation could be discriminated readily when requested (mean percent correct 93.8%), even with such peripheral visual stimuli. Irregular stimulus trains were chosen, as this makes an audio-visual temporal relation much less likely to arise by chance alone, and hence (a)sychrony typically becomes easier to detect than for regular frequencies, or for single auditory and visual events rather than stimulus trains (see also
Slutsky and Recanzone, 2001;
Noesselt et al., 2005).
Two unisensory conditions (i.e. visual or auditory streams alone) were also run. These allowed our fMRI analysis to distinguish candidate multisensory brain regions (responding to either type of unisensory stream) from sensory-specific regions (visually- or auditorily-selective); see below.
Throughout each experimental run, participants performed a central visual monitoring task requiring detection of occasional brief (1 ms) brightening of the fixation point via button press. This could occur at random times (average rate 0.1 Hz) during both stimulation and scan periods. Participants were instructed to perform this fixation-monitoring task, and that auditory and peripheral visual stimuli were task-irrelevant. We chose this fixation-monitoring task to avoid the different multisensory conditions being associated with changes in performance that might otherwise have contaminated the fMRI data; because we were interested in stimulus-determined (rather than task-determined) effects of audio-visual temporal correspondence; and so as to minimize eye movements. Eye-position was monitored online during scanning (
Kanowski et al., 2007).
fMRI data were collected in 4 runs with a neuro-optimized 1.5 GE scanner equipped with a head-spine-coil. A rapid sparse-sampling protocol was used (136 volumes per run with 30 slices covering whole brain; TR of 3s; Silent Pause of 2s; TE of 40 ms; Flip angle of 90; resolution of 3.5×3.5 mm; 4 mm slice thickness; FOV was 20 cm). Experimental stimuli were presented during the silent scanner periods (2 s scanner pauses). Each mini-block lasted 20 s per condition, containing 8 s (4 × 2) of stimulation (with each successive 2 s segment of stimuli then separated by 3 s of scanning). These mini-blocks of experimental stimulation in one of the four conditions or another (random sequence) were each separated by 20 s blocks, in which only the central fixation task was presented (unstimulated blocks).
After pre-processing for motion correction, normalisation, and 6mm smoothing, data were analysed in SPM2 by modelling the 4 conditions and the intervening unstimulated baselines with box-car functions. Voxel-based group-effects were assessed with a second-level random-effects analysis, identifying candidate multisensory regions (responding to both auditory and visual stimulation); sensory-specific regions (difference between visual minus auditory, or vice-versa); and the critical differential effects of coincident minus non-coincident audio-visual presentations.
Conjunction analyses assessed activation within sensory-specific and multisensory cortex (thresholded at p<0.001), within areas that also showed a significant modulation of the omnibus F-test at p<0.001 (see
Beauchamp et al., 2005b) for clusters of more than 20 contiguous voxels. To confirm localization to a particular anatomical region (e.g. calcarine sulcus) in individuals, we extracted beta-estimates of BOLD-modulation for each condition, from their local maxima for the comparison AVC>NC, within regions-of-interest (ROIs) comprising early visual and auditory cortex, and within STS. These ROIs were initially identified via a combination of anatomical criteria (calcarine sulcus; medial part of anteriormost Heschl’s gyrus; posterior STS) and functional criteria in each individual (i.e. sensory-specific responses to our visual or auditory stimuli, for calcarine sulcus or Heschl’s gyrus respectively; or multisensory response to both modalities in the case of mSTS). We then tested the voxels within these individually-defined ROIs for any impact of the critical manipulation (which was orthogonal to the contrasts identifying those ROIs) of audio-visual correspondence minus non-correspondence. We also compared each of those two multisensory conditions to the unimodal baselines for the same regions on the extracted data.
Finally, we used connectivity analyses to assess possible influences (or ‘functional coupling’) between mSTS, V1 and A1, for the fMRI data. We first used the established ‘psychophysiological interaction’ (
Friston et al., 1997) or PPI approach, which is relatively assumption-free. This assesses condition-specific covariation between a seeded brain area and any other regions, for the residual variance that remains after mean BOLD-effects due to condition have been discounted. Data from the LVF-group were left-right flipped to allow pooling with the RVF group for this, and to assess any effects that generalised across hemispheres (
Lipschutz et al., 2002). PPI analyses can serve to establish condition-dependent functional-coupling (or ‘effective connectivity’) between brain regions, but do not provide information about the predominant direction of influence of information transfer. Accordingly, we further assessed potential influences between mSTS, V1 and A1 with a directed information transfer (DIT) measure, as recently developed (
Hinrichs et al., 2006). DIT assesses predictability of one time-series from another, in a data-driven approach that makes minimal assumptions. If the joint time-series for, say, regions A and B predict future signals in time-series B, better than B does alone, this is taken to indicate that A influences B with a strength indicated by the corresponding DIT measure. If DIT from A to B is larger than vice-versa, this indicates directed information flow from A to B. Our DIT analysis used 96 time points (4 runs of 4 blocks with 6 points per block) per condition and region. From these data we derived the DIT values from the current samples of A and B to the subsequent sample of B, and vice versa, then averaged over all 96 samples. Here we used the DIT approach to assess possible pairwise relations between mSTS, V1 and A1 for their extracted time-series, assessing DIT-measures for all pairings between these (i.e. V1-A1, or V1-STS, or A1-STS) with paired t-tests.