|Home | About | Journals | Submit | Contact Us | Français|
Combining information across modalities can affect sensory performance. We studied how co-occurring sounds modulate behavioral visual detection sensitivity (d’), and neural responses, for visual stimuli of higher or lower intensity. Co-occurrence of a sound enhanced human detection sensitivity for lower- but not higher-intensity visual targets. fMRI linked this to boosts in activity-levels for sensory-specific visual and auditory cortex, plus multisensory superior temporal sulcus (STS), specifically for a lower-intensity visual event when paired with a sound. Thalamic structures in visual and auditory pathways, the lateral and medial geniculate bodies respectively (LGB, MGB) showed a similar pattern. Subject-by-subject psychophysical benefits correlated with corresponding fMRI-signals in visual, auditory and multisensory regions. We also analysed differential ‘coupling’ patterns of LGB and MGB with other regions in the different experimental conditions. Effective-connectivity analyses showed enhanced coupling of sensory-specific thalamic bodies with the affected cortical sites during enhanced detection of lower-intensity visual events paired with sounds. Coupling strength between visual and auditory thalamus with cortical regions, including STS, co-varied parametrically with the psychophysical benefit for this specific multisensory context. Our results indicate that multisensory enhancement of detection sensitivity for low-contrast visual stimuli by co-occurring sounds reflects a brain network involving not only established multisensory STS and sensory-specific cortex, but also visual and auditory thalamus.
There is a growing literature on how combining information from different senses may enhance perceptual performance. The principle of “inverse effectiveness” (PoIE) originally introduced by Stein and colleagues for cell recordings (overview in Stein and Meredith, 1993) suggests that co-occurrence of stimulation in two modalities may lead to enhanced neural responses, particularly for stimuli that produce a weak response in isolation (though see Holmes (2009) for critique; Angelaki et al. (2009) for reconsideration within a Bayesian framework). It has been suggested that one behavioral consequence of a putative PoIE might be enhanced detection sensitivity for near-threshold stimuli in one sense when co-occurring with an event in another sense (e.g. Stein and Meredith, 1993; Frassinetti et al., 2002).
Many (but not all) behavioral studies of multisensory integration have used relatively intense suprathreshold stimuli, hence could not test for near-threshold detection sensitivity (though see McDonald et al., 2000; Frassinetti et al., 2002). Some audio-visual studies did assess crossmodal effects in relation to stimulus intensity, but studied intensity matching (Marks et al., 1986), audio-visual changes (Andersen and Mamassian, 2008), or reaction time (Doyle and Snowden, 2001), rather than unimodal detection sensitivity (d’). Given PoIE proposals that multisensory benefits should arise particularly in near-threshold detection for low-intensity stimuli, we focused on d’ for lower- (versus higher-) intensity visual stimuli when co-occurring with a sound or alone.
Recent results on the neural basis of audiovisual interactions indicate that these may affect not only brain areas traditionally considered as multisensory convergence-zones, such as cortically the superior temporal sulcus (STS, Bruce et al., 1981; Beauchamp et al., 2004; Barraclough et al., 2005), but also areas traditionally considered as modality-specific, e.g. visual striate/extrastriate cortex (Calvert, 2001; Noesselt et al., 2007; Watkins et al., 2007), plus core/belt auditory cortex (Brosch et al., 2005; Kayser et al., 2007; Noesselt et al., 2007; see Ghazanfar and Schroeder (2006) plus Driver and Noesselt (2008) for reviews on multisensory integration and Falchier et al. (2002), Rockland and Ojima (2003), Cappe and Barone (2005), Budinger and Scheich (2009) for possible anatomical pathways). Subcortical structures have also long been implicated in multisensory integration (e.g. Stein and Arigbede, 1972). Recent work indicates possible thalamic involvement in some multisensory effects (Baier et al., 2006; Cappe et al., 2009), including audiovisual speech processing (Musacchia et al., 2006) or training-induced plastic changes in speech processing (Musacchia et al., 2007). But here we focus on nonsemantic stimuli (cf. Sadaghiani et al., 2009), with particular interest in whether sensory-specific thalamus can be implicated.
We studied an audio-visual situation where co-occurring sound bursts might enhance detection sensitivity (d’) for visual Gabor patches. Behaviorally, given PoIE proposals, we predicted d’ benefits due to a co-occurring sound for lower- but not higher-intensity visual targets. We sought to identify the neural basis of any such behavioral pattern in the human brain using event-related fMRI. We tested within localized sensory-specific or heteromodal regions for patterns of BOLD activation (or inter-regional coupling) during our audio-visual paradigm that corresponded to the behavioral pattern for visual detection sensitivity.
Fourteen subjects (eight female, age range 19-33 years) participated in the initial psychophysical experiment. A further twelve participated in the fMRI study (see below) and also yielded behavioral data. Subjects provided informed consent in accord with local ethics.
Outside the scanner, visual stimuli were presented on a monitor using Presentation 9.13 (Neurobehavioral Systems, Albany, USA). Visual targets comprised rectified Gabor patches that differed in luminance. These stimuli subtended 1.5° visual angle, had a duration of 16.6 ms, a spatial frequency of 3 cycles per degree and were presented at 5° horizontal and 1° vertical eccentricity in the upper right quadrant. Auditory stimuli were presented from a loudspeaker located just above the visual stimulus position, and comprised a 3 kHz sound burst (70 dB, duration 16 ms); see below for sounds during scanning.
On each trial, subjects performed a signal detection task for visual targets, indicating the presence or absence of a visual target stimulus by pressing one of two buttons, irrespective of whether a co-occurring sound was presented or not. They had to maintain central fixation and respond as accurately and quickly as possible (we collected reaction times, RTs, for completeness, but they showed a different outcome compared to the critical signal-detection measure of visual sensitivity, namely d’; see below). A visible outline square on the monitor (1.7 × 1.7°, 13.05 cd/m2, 16.6 ms duration), surrounding the possible target position, always appeared to signal when a response was required (see Fig.1a). This visual square was present in all stimulus conditions, so was not predictive of target presence, and will be subtracted out by our contrasts of the different conditions in the later fMRI-experiment (see below), since it appeared in all conditions. We introduced it to signal when a response was required, and thereby to serve also as an onset-marker for the no-sound no-target condition, which otherwise could not have been estimated straightforwardly for the fMRI response. Likewise, NO RTs for the no-sound no-target condition could have been collected without the square frame to indicate a response was required.
The very brief target duration (16.6 ms) was chosen to obtain hit rates in the range of 60-80% without a mask, and also because similarly short stimulus durations have recently been used in several other studies of multisensory integration (e.g. Bonath et al., 2007; Lakatos et al., 2007). We acknowledge that longer presentation durations might in principle yield different results (e.g. see Meredith et al., 1987; Boenke et al., 2009, for some neural and behavioral effects of manipulating audiovisual stimulus durations). The possible role of stimulus duration might be studied in future variants of the paradigm introduced here.
Visual targets were presented on a random half of trials, and on a random half of these target-present trials the auditory soundburst also occurred concurrently. But the sound was equally likely to appear on target-absent trials also (see Fig.1c), thus conveying no information about the visual nature of the trial. Initially there were three visual-threshold-determination runs, in which targets at eight intensity levels (12.30, 8.50, 8.12, 7.93, 7.71, 7.31, 7.09, or 6.91 cd/m2, against a 6.01 cd/m2 grey background) could occur. In each run there were 30 trials per intensity condition, plus 240 non-target trials. These runs were used to determine the two intensity levels that yielded 55-65% correct detection and 85-95% correct detection for each subject (note that even the latter value was consistently below 100% ceiling). The two selected intensity values were then used in the main experiment, with half the target presentations at the higher luminance and half at the lower, in random order. Across subjects, the two mean luminances used were 7.09 cd/m2 and 7.71 cd/m2 (see Fig.1b).
The main experiment had a 3×2 factorial event-related design, with factors of visual condition (no target, lower-intensity target, or higher-intensity target) and sound condition (present/absent). The same number of non-target trials and target trials were presented in three runs (90 higher intensity target trials, 90 lower intensity target trials, and 180 non-target trials per subject) with a mean inter-trial interval (ITI) of 1500 ms; range 1200-1800 ms randomized. We used standard signal-detection analyses (Stanislaw and Todorov, 1999) to separate perceptual sensitivity (d’) from criterion (c) for detection of the visual target. This separation is important; for example, some recent studies attributed behavioral effects of an uninformative sound on visual judgements to a response bias rather than increased sensitivity (Marks et al., 2003; Odgaard et al., 2003; Lippert et al., 2007); whereas other groups have reported enhanced perceptual sensitivity instead (McDonald et al., 2000; Vroomen and de Gelder, 2000; Frassinetti et al., 2002; Noesselt et al., 2008), but sometimes only with informative sounds (Lippert et al., 2007), unlike here. We treated correct detections of targets as hits; reports of target absence when present as misses; reports of target absence when absent as correct rejections; and reports of target presence when absent as false alarms. Any responses later than 1500 ms after stimulus onset were discarded. We then used a z-transformed ratio to compute sensitivity (d’) separately from criterion (c); see Stanislaw and Todorov (1999). Note that all of the visual signal-detection scores were calculated separately for conditions with the (task-irrelevant) sound present, versus absent.
Behavioral results for all measures (d’; response criterion c; accuracy; and for completeness RT) were analyzed with mixed ANOVAs, having the within-subject orthogonal factors of visual condition and of sound, plus the between-subject factor of inside/outside scanner (for unscanned versus scanned groups of subjects, respectively). The latter factor had no significant impact, so behavioral data from inside and outside scanner could be pooled. Nevertheless for completeness, we also plot the behavioral results separately for inside and outside the scanner, to illustrate the replicability of the pattern (see supplementary Fig. S1a).
fMRI data were acquired for the MRI group (n=12; 6 female; aged 21-29 years) using a circular-polarized whole-head coil (BrukerBioSpin, Ettlingen, Germany) on a Siemens whole body 3T MRI Trio scanner (Siemens, Erlangen, Germany). The procedure was as for the behavioral measures in the unscanned group, except as follows:
After discarding the first 5 volumes of each imaging run, data were slice-acquisition-time corrected, realigned, normalized and smoothed (6 mm FWHM) using SPM2 (www.fil.ion.ucl.ac.uk/spm). After pre-processing, the data from the localizer runs were analyzed with a model comprising the two blocked stimulus conditions (high-intensity visual or auditory) plus the interleaved no-stimulus baseline blocks, for each subject. Subject-specific local maxima within lateral and medial geniculate bodies (LGB, MGB), plus primary visual or auditory cortex, were identified by a combination of functional and anatomical criteria. Functional criteria comprised a preference (at p<.01 or better) for visual stimulation in case of LGB and V1; or for auditory stimulation in case of MGB and A1. Structural criteria comprised posterior-ventral thalamic site for the relevant maxima in case of LGB or MGB; or the calcarine fissure for V1; or the anterior medial part of Heschl’s gyrus for A1/core region in auditory cortex. Anatomical structures were identified in subject-specific inversion-recovery EPIs which have the same distortions as functional data. Averaged subject-specific local maxima within A1 and V1 were also compared with probability maps of primary visual and auditory cortex (www.fz-juelich.de/ime/spm_anatomy_toolbox) to further corroborate these localizations (see supplementary Figures S2-3 for identified local maxima in LGB, MGB, A1 and V1 for all subjects). Finally, unisensory and candidate multisensory regions at the group voxelwise stereotactic level (as a supplement to individual anatomical criteria) were further identified using a random-effects group ANOVA, with comparisons of auditory and visual stimulation plus their respective baselines (thresholded at p< 0.01; corrected for cluster-level; see supplementary table S1), to define ‘inclusive masks’ for use in the SPM analysis of the main experiment, as described below.
For the main multisensory experiment (i.e. the visual detection task, with or without co-occurring auditory events on each trial), all trials with correct responses for the six experimental conditions (and separately all those trials with incorrect responses, regardless of condition) were modelled for each participant using the canonical hemodynamic response function (HRF) in SPM2. The incorrect responses were not further split by condition due to their infrequent occurrence (see below). Since only correct trials were considered for each condition in the fMRI results we present, the observed modulations of BOLD-response by condition cannot be readily explained by different error rates in specific conditions.
Voxel-based group results were assessed for the 3 × 2 factorial design of the main experiment with SPM, using a within-subject ANOVA with all six experimental conditions as provided by SPM2, at the random-effects group level. In order to focus initially upon sensory-specific (visual or auditory) or candidate heteromodal brain areas (responding to both vision and audition), results from the main event-related fMRI experiment were initially masked with the visual, auditory or heteromodal SPM masks as identified by the separate (and thus independent) blocked localizer runs (masked at p<0.01, cluster-level corrected for cortical structures). But whole-brain results are also reported where appropriate (see supplementary material and tables). Significance levels in the main experiment were set to p<0.01 (small volume corrected (FDR) using the localizer masks) unless otherwise mentioned. Cluster levels for cortical regions were set to k>20. The cluster level criterion for MGB/LGB was lowered to k>5 due to the smaller size of these subcortical structures (D’Ardenne et al., 2008).
Finally, in addition to the group voxel-wise SPM analysis in normalized stereotactic space, we also ran corroborating further analyses on subject-specific regions of interest (ROIs for LGB, MGB, V1 and A1), as defined by the localizer runs but now separately for each individual, via the combination of functional and structural criteria described earlier, independent of the main experiment. As will be seen, these provide an important form of corroborating analysis that is not subject to some of the selection issues that can inevitably arise in normalised SPM group contrasts (for which interaction contrasts inevitably select the most significant peak voxel for an interaction term, which might therefore show an exaggerated pattern compared with an independently defined ROI). ROIs were centred on subject-specific local maxima and each had a radius of 2 mm. We then tested the extracted BOLD signals per condition across the group, but now for individually-defined ROIs, which could thus correspond to somewhat different voxels in different subjects, albeit for the same defined region, unlike the voxel-wise normalized group analysis with the normalized inclusive group-masks.
For effective-connectivity analyses, time-courses from the LGB and MGB sites identified in each individual subject’s thalamic ROIs (via their individual combined structural and blocked-functional criteria, see above) were extracted and analysed in extended models with single-subject seeds. The regressors for analysing each subject comprised 6 experimental conditions, plus (separately) all incorrect trials as above, now plus LGB or MGB time-courses also (see below) and the derived ‘psychophysiological interaction’ (PPI), within the standard PPI approach (Friston et al., 1997) for assessing condition-dependent inter-regional coupling. The PPI approach is a well-established and relatively assumption-free approach which measures covariation of residual variance across regions as the index of effective connectivity, in the context of an experimental factor or factors. More specifically, PPIs test for changes in inter-regional effective connectivity (i.e. for higher or lower covariation in residuals) between a given ‘seed’ region and other brain regions, for a particular context relative to others. Here we used this standard approach to test specifically for higher inter-regional coupling when a sound (minus no sound) co-occurred with a lower-intensity (versus higher-intensity) visual target, using seeds as described below. In particular, our single-subject SPM-model now included all seven experimental conditions plus 5 additional regressors for the physiological response from the individually defined seed within the thalamus (LGB or MGB, in separately-seeded analyses) and the products of this physiological response with the four conditions yielded by crossing high/low visual intensity with the presence or absence of sounds. We then took the individually-computed PPI results to the second random-effects, group level (see also Noesselt et al., 2007) to assess common PPI effects emerging from individual seeds within MGB or LGB, again using within-subject ANOVAs with 4 conditions (connection strength for higher- or lower-intensity stimuli with versus without sounds), at the random effects level.
Finally, multiple regressions were used to assess any relations between the subject-by-subject size of the critical behavioral interaction effects, in relation to BOLD activations; or separately in relation to the strength of inter-regional effective-connectivity (individually-seeded PPI) effects in the fMRI data. Here, one regressor was defined for the main effect of fMRI activation (or coupling strength) for higher- or lower-intensity targets with vs. without sounds (interaction term); while another regressor was defined for the subject-specific behavioral interaction effect. Any brain-behavior relations were computed for the whole brain with SPM, but we focused on significant regression coefficients within regions that had been independently defined by the sensory-specific blocked localizers and their conjunction, to avoid the potential problem of ‘double dipping’ (Kriegeskorte et al., 2009). We also tested whether outliers potentially influenced the regression analysis (see Vul et al., 2009; Nichols and Poline, 2009).
Behaviorally, we tested whether a co-occurring sound could enhance visual target detection sensitivity (d’), even though the presence of a sound carried no information about whether a visual target was present or absent (since sounds were equally likely on target and non-target trials). Given some prior multisensory research (e.g. McDonald et al., 2000; Vroomen and de Gelder, 2000; Frassinetti et al., 2002; Noesselt et al., 2008), and past proposals associated with the possible function of the putative principle of inverse-effectiveness (e.g. Stein et al., 1988; Stein and Meredith, 1993; Kayser et al., 2008), we predicted that co-occurrence of a sound should benefit detection sensitivity (d-prime, d’) for the lower-intensity but not higher-intensity visual targets, as compared with their respective no-sound conditions.
Fig.1e plots the critical visual sensitivity scores, in formal signal-detection terms (i.e., d’ scores). As predicted, co-occurrence of a sound enhanced visual detection sensitivity, but only for lower-intensity not higher-intensity visual targets. This led to a significant interaction between visual intensity-level and sound presence (F(1,25) = 8.97; p=0.006) in a repeated-measures ANOVA. Visual detection sensitivity (d’) was affected by sound-presence only for the lower-intensity visual targets (t(25) = 5.49; p<0.001).
Fig. 1d shows a comparable outcome for raw accuracy data, rather than signal detection d’. Supplementary Fig. S1a also shows the accuracy data when separated for subjects tested inside or outside the scanner, who did not differ (as further confirmed by a mixed-effects ANOVA that found no impact of the between-subject inside/outside scanner factor, p>.1; and likewise for all our other behavioral measures). As with visual d’, accuracy in the visual detection task revealed that the co-occurrence of a sound enhanced detection for the lower-intensity but not for the higher-intensity visual targets (even though the latter were not completely at ceiling). This led once again to a significant interaction between visual intensity-level and sound presence (F(1,25) = 20.79; p<0.001) in a repeated-measures ANOVA. Accuracy increased only when a sound was paired with a lower-intensity visual target (t(25)=7.16, p<.001).
For completeness, we also analyzed criterion and reaction time measures (see supplementary material, Fig. S1b-c). Interestingly these showed a very different pattern to our more critical measures of sensitivity (d’) and hit-rate. Co-occurrence of a sound merely speeded responses overall (see supplementary Fig. S1b) regardless of visual intensity and even of the presence/absence of a visual target (see supplementary Fig. S1b legend). Thus RTs were simply faster in the presence of a potentially alerting sound, regardless of visual condition. Hence this particular RT result need not, strictly speaking, be considered multisensory in nature, as only the auditory factor influenced the RT pattern. In terms of possible fMRI analogues, the RT pattern would therefore correspond simply to the main effect of sound presence (which we found, as reported below, to activate auditory cortex, STS, and MGB, as would be expected). Hence we did not explore the RT effect any further. Nevertheless we note that the overall speeding due to a sound is broadly consistent with a wide literature showing that a range of visual tasks, both manual and saccadic, can be speeded by sound occurrence (e.g. Hughes et al., 1994; Doyle and Snowden, 2001). In the past such overall speeding by a sound has often been discussed in the context of possible non-specific alerting effects (Posner, 1978). Some behavioral studies have found more complex RT-patterns when varying both visual and auditory stimulus intensities (Marks et al., 1986; Marks, 1987), but here with a constant (relatively high) auditory intensity, we found only overall speeding of manual RTs in the visual task, regardless of visual condition (see supplementary Fig. S1b).
Turning to the final behavioral measure of criterion, we found that participants adopted a higher criterion for reporting low-intensity target-presence (see supplementary Fig. S1c). But since within the signal-detection framework criterion is strictly independent of sensitivity, d’, this criterion effect cannot contaminate our critical d’ results. Moreover the criterion effect as a function of visual intensity applied here regardless of auditory condition (see supplementary Fig. S1c), so unlike the d’ and accuracy results need not be interpreted as reflecting any multisensory phenomenon.
Thus, only the critical behavioral measures of d’ and accuracy showed differential multisensory effects (i.e. that depended on both auditory and visual conditions), with co-occurrence of a sound genuinely enhancing perceptual sensitivity (d’) and accuracy for lower-intensity but not higher-intensity visual targets. This pattern of multisensory outcome for detection sensitivity and accuracy appears compatible with the idea, long associated with the putative PoIE for multisensory integration, that co-occurrence of events in multiple modalities might particularly benefit near-threshold detection (as for the lower-intensity, but not higher-intensity, visual targets here). Our analyses of fMRI data below test for neural consequences of the co-occurring sounds, for visual targets of lower- versus higher-intensity.
We used separate passive blocked fMRI localizers to pre-determine potential candidate ‘sensory-specific’ brain regions (responding to our high-intensity visual stimuli more than our auditory, or vice-versa); and for determining potential candidate ‘heteromodal’ regions (those areas responding significantly to both our auditory and our high-intensity visual stimuli, on a conjunction test). As expected, passive viewing of our high-intensity visual gratings activated left occipital cortex contralateral to the (right) visually stimulated hemifield, plus the contralateral LGB (see supplementary table S1a); with bilateral parietal, frontal and temporal regions also activated. Activation caused by passive listening to our auditory tones arose in bilateral MGB, plus middle temporal cortical areas including the planum temporale, Heschl’s gyrus, planum polare, and extending ventrally into medial STS (see supplementary table S1b). Finally, candidate heteromodal regions that responded significantly both to visual stimuli and also to sounds included posterior STS, plus parietal and dorsolateral prefrontal regions, all in accord with previous studies (e.g. Beauchamp et al., 2004; Noesselt et al., 2007; see supplementary table S1c).
Below we present results from our main event-related fMRI experiment divided into three sections. First we present results from a conventional group voxel-wise SPM analysis of BOLD activations, supplemented by results from individually defined ROIs. Second we present results from inter-regional effective-connectivity analyses of functional coupling between brain areas as a function of experimental condition. Third we present brain-behaviour regression analyses testing whether local BOLD signals in the implicated areas, or inter-regional coupling strength, co-vary with subject-by-subject psychophysical benefits specific to combining a sound with a lower-intensity visual target.
For the event-related results from the main fMRI experiment, the most important contrast concerns a greater enhancing impact of the sound on lower-intensity than higher-intensity visual targets, analogous to the behavioral effect on visual detection d’ and accuracy. The critical interaction contrast is as follows:
(Lower Intensity Light with Sound) minus Lower Intensity Light alone) >
(Higher Intensity Light with Sound) minus Higher intensity Light alone)
This two-way interaction-contrast subtracts out any trivial effects due to visual-intensity per se, due to visual-frame presentation that signalled when a response was required on every trial, or due to sound-presence per se. See supplementary table S2 for details of the outcome for the visual-intensity or sound-presence contrasts, which all turned out as expected, i.e. higher activation of visual cortex due to increased visual intensity (main effect of high intensity > low intensity; supplementary table S2a); and of auditory cortex due to sound presence (main effect of sound > no sound conditions; see supplementary table S2b).
The critical two-way interaction contrast (as per the formula above) is analogous to the critical behavioral interaction that affected accuracy and d’ (Fig. 1d and 1e), so may reveal the neural analogue of the sound-induced boosting of visual processing. We interrogated the visual, auditory, and candidate heteromodal audio-visual regions (via SPM inclusive masking) that had been defined independently of the main experiment by the separate localizers. We found the critical interaction to be significant not only in STS (Fig. 2 top right; and table 1c), a known multisensory brain region; but also in extrastriate visual regions contralateral to the visual target (see Fig. 2, top left; and table 1a), plus in posterior insula/Heschl’s Gyrus, i.e. likely to correspond with low-level auditory cortex (Fig. 2 top middle; see also table 1b). Note that while showing the critical interaction effect, the response patterns within visual and auditory cortex also shows the overall modality-preferences one would expect for high-intensity visual or auditory stimuli, as confirmed also by the independent localizers. Table 1d lists further areas showing an interaction outside the visual, auditory and heteromodal regions of main interest, for completeness.
The plot for the group interaction contrast within voxel-wise normalized space in Fig. 2 (top middle plot) shows for insula/Heschl’s Gyrus not only the anticipated (PoIE-like) increase in response when the sound co-occurs with a low-intensity visual target, but also an apparent lack of auditory response when the same sound is paired with a high-intensity visual target (although please note that zero on the y-axis in the plots of Fig. 2 represents the session mean, rather than absolute zero). While in principle the latter unexpected outcome might potentially reflect sub-additive responses for high-intensity pairings (cf. Angelaki et al., 2009; Sadaghiani et al., 2009; Stevenson and James, 2009; Kayser et al., 2010), alternatively it might reflect the inevitable tendency for SPM interaction-contrasts to highly the most significant voxels showing the strongest interaction pattern (so in the present context, not only an enhanced response to the sound when paired with a lower-intensity visual target; but also some reduction in this response when paired with a higher-intensity visual target, at the peak interaction voxel in SPM).
Accordingly we next extracted the beta weights from the main experiment for subject-specific, individually-defined A1 regions of interest (ROI), as derived from the independent localizer runs (see Fig. 3, bottom), thereby circumventing any selection bias. This individual ROI analysis confirmed the interaction pattern for A1, showing significantly increased BOLD signal for low-intensity visual targets when paired with sounds versus without sounds there (F=8.33, p<0.05 for the interaction; post-hoc t =2.3, p<0.05, for the pairwise contrast; see Fig. 3, bottom). In this more sensitive individual ROI analysis of A1, free from any voxelwise selection biases, the ROI results showed robust auditory responses from primary auditory cortex, even for the significantly reduced response found when paired with a higher-intensity visual stimulus (Fig. 3, bottom).
Importantly, the critical sound-induced enhancement of visual responses was also found subcortically in the LGB (main Fig. 2 bottom left; and table 1a) and in the MGB (Fig. 2 bottom right; and table 1b) in the group voxel-wise SPM analysis. Thus, in addition to multisensory STS, not only did visual fusiform and auditory A1 / Heschl’s gyrus show the critical interaction pattern cortically, but so did subcortical thalamic stages of the visual and auditory pathways.
As shown in the plots of Fig. 2 from the group voxel-wise SPM analysis, for all of the affected areas (i.e. STS, visual cortex, auditory cortex, LGB and MGB) the co-occurrence of a sound enhanced the BOLD response for the lower-intensity visual-target condition more than for the higher-intensity visual target condition. In principle, one must consider whether the latter outcome could reflect some ‘ceiling’ effect for the BOLD signal in the high-intensity condition. However, the bar graphs in Figure 2 show that the BOLD-responses in the affected regions (with the exception of the fusiform gyrus) were typically higher for low-intensity stimuli paired with sounds, than for any of the high-intensity conditions, which thus argues against any ‘ceiling’ concerns in terms of BOLD level per se. Moreover, even our higher-intensity visual stimuli had modest absolute intensities. Other work (e.g. Buracas et al., 2005) suggests that visual BOLD signals typically saturate only for much higher luminance levels than those used here. Nonetheless we found the expected pattern of enhanced BOLD-responses for higher- vs. lower-intensity visual stimuli in fusiform, LGB, plus further visual regions (see Table S2) when presented without sounds, as expected.
One aspect of the specific pattern of BOLD responses in group-normalized LGB (bottom left of main Fig. 2) may appear somewhat counterintuitive, with an apparent decrease for high-intensity visual stimuli when paired with sounds (as had been found in group-results for the interaction pattern in A1 above).
To address this point and also provide further validation of the novel results at the thalamic level, we again supplemented the group voxel-wise analyses by identifying visual and auditory thalamic body ROIs in each individual subject (supplementary Fig. S2; see above for the rationale of using complimentary ROI-analyses). We then assessed the experimental effects in these individual thalamic ROIs (i.e. which could now correspond to somewhat different voxels in different subjects, but for the same defined area). Since these ROIs were defined independently of the main experiment, via the separate localizers, this again circumvents any selection bias for the interaction contrast. This individual approach corroborated our group voxel-wise SPM results, while also removing the one unexpected outcome (for the LGB) with this more unbiased ROI approach. Again, we found enhanced BOLD signals when a sound is added to a lower-intensity visual target; but no significant change in response when the same sound is added to a higher intensity visual target (see Fig. 3, top left, for LGB ROI results from all subjects; see also supplementary Fig. S5, its upper plot, for a confirmatory analysis on the subset of subjects who showed the most unequivocal LGB and MGB localization).
The pattern of activity for the fusiform interaction peak in the voxel-wise SPM analysis (top left of main Fig 2) showed one unexpected trend, namely a tendency for lower activation in the absence of a sound for a low-intensity visual target versus none. But this trend was nonsignificant (p>.1) so need not be considered further. In any case it may again simply reflect a selection bias for peaks of SPM interaction contrasts to highlight voxels that show apparent ‘crossover’ patterns, as explained above for other regions.
To summarize the BOLD activation results so far (Figs (Figs22 and and3),3), predefined (inclusively masked) heteromodal cortex in STS showed a pattern of enhanced BOLD signal by co-occurring sounds only for lower-intensity but not higher-intensity visual targets (thus analogous to the impact on visual detection d’ found behaviourally, cf. Fig 1e). Similar patterns were found in sensory-specific visual cortex, in sensory-specific insula/Heschl’s gyrus, and even in sensory-specific thalamus (LGB and MGB). Further analysis of individually-defined ROIs confirmed the interaction pattern in primary auditory cortex, MGB, and LGB (see Figure 3, plus supplementary Figure S5). This ROI analyses also confirmed that the few unexpected aspects of the interaction results from group-normalized space (i.e. apparent crossover interaction pattern for LGB; apparent loss of auditory response in presence of higher-intensity visual targets for insula/Heschl’s gyrus) were no longer evident for the more sensitive individual analyses of independently-defined ROIs. Those unexpected aspects of the group-normalized results should thus be treated with caution. By contrast all of our critical activations were found robustly in the individual ROIs, as well as for the voxel-wise group-normalized analysis.
While the observed pattern for MGB and A1 ROIs (Fig 3, top right and bottom) was not identical, both showed the critical interaction, with strongest responses when the sound was paired with a low-intensity visual target. It appears that the MGB ROI tended to be somewhat more responsive to high-intensity visual targets in the absence of sound (albeit only as a nonsignificant trend) than for A1. This may reflect the fact that some subnuclei within the MGB receive visual inputs (Linke et al., 2000), given that the BOLD signal will aggregated across different subnuclei; and/or it could reflect possible feedback signals from heteromodal STS.
Mechanistically, on the level of neuronal firing rates audio-visual integration may be rather complex, as different frequency bands of neural response can be differentially modulated by audiovisual stimuli (e.g. in the STS; see Chandrasekaran and Ghazanfar, 2009). More generally, one mechanism potentially underlying multisensory integration in the time-frequency domain was proposed by Schroeder/Lakatos and colleagues in recent influential work (e.g. Lakatos et al., 2007; Lakatos et al., 2008; Schroeder et al., 2008; Schroeder and Lakatos, 2009) that primarily concerned tactile-auditory situations, rather than audio-visual as here. Tactile stimulation can phase-reset neural signalling in auditory cortex, thereby enhancing response to synchronous auditory inputs. Moreover, some overlapping audiotactile representations in the thalamus (Cappe et al., 2009) have now been reported, as have learning-induced plastic changes of auditory and tactile processing due to (musical) training (Schulz et al., 2003; Musacchia et al., 2007). Related effects might conceivably impact on an audio-visual situation like our own, but potential phase-resetting would seem to require visual signals to precede auditory signals sufficiently to overcome the different transduction times (Musacchia et al., 2006; Schroeder et al., 2008; Schroeder and Lakatos, 2009). This seems somewhat unlikely for the present concurrent audio-visual pairings. To our knowledge, the earliest impact of concurrent visual stimuli on auditory ERP components has been found to emerge at ~50 ms poststimulus, well beyond the initial phase of auditory processing (Giard and Peronnet, 1999; Molholm et al., 2004). We note that Lakatos et al. (2009) report phase-resets in macaque auditory cortex due to visual stimulation only after the initial activation.
The modulations we observe in visual cortex (and LGB) might in principle reflect phase-resetting there (cf. Lakatos et al., 2008 for attention-related phase-resetting of visual cortex), and/or involve projections from auditory or multisensory cortex, that serve to increase the signal-to-noise ratio for the trials pairing a concurrent sound with low-intensity visual targets. In accord, Romei and colleagues recently reported an enhancement of TMS-induced ‘phosphene perception’ when sounds were combined with near-threshold TMS over visual cortex (Romei et al., 2007). This phase-resetting could potentially be the underlying mechanism of our regional fMRI-effects and may reflect the functional coupling of distant brain regions.
To assess functional coupling between brain regions, we next tested for potential condition-dependent changes in ‘effective connectivity’ between areas (i.e. inter-regional coupling), for the affected thalamic bodies with cortical sensory-specific and heteromodal structures. Note that possible changes in inter-regional coupling are logically distinct from effects on local BOLD activations as described above, so can produce a different outcome. We tested for inter-regional coupling using the relatively assumption-free ‘psychophysiological interaction’ (PPI) approach (Friston et al., 1997). We seeded the PPI analyses in (individually defined) left LGB or left MGB, and tested for enhanced ‘coupling’ with other regions, that arose specifically in the context of a lower- rather than higher-intensity visual target being paired with a sound (i.e. analogous interaction pattern to that found for behavioral sensitivity, d’; and for local BOLD activations above; but now testing for analogously condition-dependent changes in the strength of functional inter-regional coupling, rather than for local activations as in the preceding fMRI results section).
We found such enhanced coupling for the critical interaction effect (see Fig. 4 and table 2.1), between left LGB with ipsilateral occipital areas including primary visual cortex (consistent with the visual nature of the LGB, and thus providing further confirmation of that functional localization). Analogously, we also found such condition-dependent enhanced coupling of left MGB with ipsilateral Heschl’s gyrus (consistent with the auditory nature of MGB, see Fig. 4 and Table 2.2). Beyond these sensory-specific coupling results for LGB or MGB seeds, we also found enhanced coupling of both MGB and LGB with STS and putative MT+ (Campana et al., 2006; Eckert et al., 2008; see Fig. 4 and table 2.3). This enhanced coupling (i.e. higher covariation in residuals) with STS and putative MT+ was again specific to the context of a sound being paired with a lower-intensity (rather than higher-intensity) visual target, i.e. to the very condition that had led to the critical behavioral enhancements of d’ and accuracy.
These new effects for MGB are in line with anatomical and electrophysiological studies reporting that subnuclei within the MGB receive some visual inputs (Linke et al., 2000) and can respond to visual stimulation (Wepsic, 1966; Benedek et al., 1997; Komura et al., 2005); plus demonstrations that the MGB is connected with STS (Burton and Jones, 1976; Yeterian and Pandya, 1989). To our knowledge, no direct connections of auditory regions nor of STS with LGB have been reported to date, though there is some evidence for direct connections of LGB with extrastriate regions (Yukie and Iwai, 1981). Alternatively, the observed modulations in LGB and its condition-dependent coupling with other areas might in principle potentially involve early visual cortex which is anatomically linked with posterior STS (Falchier et al., 2002; Ghazanfar et al., 2005; Kayser and Logothetis, 2009) and reciprocally connected with LGB.
Thus far, we have shown that: (i) co-occurrence of a sound significantly enhances perceptual sensitivity (d’) and detection accuracy for a lower-intensity but not a higher-intensity visual target, in apparent accord with the principle of inverse effectiveness; (ii) that a related interaction pattern is observed for BOLD activations in STS, visual cortex (plus LGB), and auditory cortex (plus MGB); (iii) we also find a logically analogous interaction pattern for inter-regional coupling. Specifically on this latter point we found enhanced coupling of the two thalamic sites (LGB or MGB) with their respective sensory-specific cortices, and also between both of these thalamic sites and STS (plus lateral occipital cortex possibly corresponding to MT+), for the particular context that led to enhanced behavioral sensitivity, i.e. with this effective connectivity being most pronounced when a lower-intensity visual target is paired with a sound.
To test for an even closer link between brain activity and behavior, we next assessed whether our independently localized brain regions (i.e. the visually-responsive, auditorily-responsive, and candidate hetermodal areas identified by the separate blocked localizers) showed BOLD signals for which the critical interaction pattern correlated with subject-by-subject behavioral benefits for the impact of adding sound to a lower- rather than higher-intensity visual target. We first tested for subject-by-subject brain-behaviour relations for the regional BOLD activations (i.e. for the basic contrasts of conditions). We regressed the subject-specific BOLD-interaction differences against the analogous behavioural difference. This revealed significant subject-by-subject brain-behavior regression coefficients in left visual cortex, contralateral to the visual target (Fig. 5a and table 3a); plus left auditory cortex (Fig. 5b and table 3b) and heteromodal STS (Fig. 5c and table 3c). See also supplementary Figure S4 for a confirmatory brain-behavior regression with behavioral outliers removed from the analysis.
Next we tested whether changes in the inter-regional ‘coupling’ of these thalamic structures with cortical areas (analogous to Fig. 4) might relate to the subject-by-subject behavioral interaction outcome. When weighting the PPI analyses (seeded either in left LGB or left MGB, as individually defined) by the parametric, subject-by-subject size of the critical behavioral interaction, we found that both the LGB and the MGB independently showed stronger enhancement of coupling with bilateral STS (see Fig. 5d and table 4) in the specific context of the sound-plus-lower-intensity-light condition, in relation to the impact on performance. The outcome of this weighted PPI analysis reveals that these thalamic-cortical neural coupling effects (for LGB-STS, and also separately replicated for MGB-STS) have some parametric relation to the corresponding behavioral effect in psychophysics.
When taken together the different aspects of our fMRI results clearly identify a functional corticothalamic network of visual, auditory, and multisensory regions. These regions are activated more strongly, and become more functionally integrated as shown by the inter-regional coupling data, when a task-irrelevant sound co-occurs with a lower-intensity visual target; the very same condition that led behaviorally to enhanced d’ and hit-rate in the visual detection task. This link is further strengthened by the brain-behavior relations we observed.
Behaviorally we found that co-occurrence of a sound increased accuracy and enhanced sensitivity (d’) for detection of lower-intensity but not higher-intensity visual targets. This psychophysical outcome provides some new evidence apparently consistent with ‘inverse effectiveness’ proposals for multisensory integration (Meredith and Stein, 1983; Stein et al., 1988; Stein and Meredith, 1993; Calvert and Thesen, 2004). While other recent studies have shown some auditory influences on visual performance or d’ (e.g. McDonald et al., 2000; Frassinetti et al., 2002; Noesselt et al., 2008), following on from pioneering audio-visual studies that did not study sensitivity per se (e.g. Marks et al., 1986; Marks et al., 2003), here we specifically showed that only lower-intensity targets benefited in visual-detection d’ from co-occurring sounds, while higher-intensity did not, thereby setting the stage for our fMRI study.
Our fMRI data provide evidence that both sensory-specific and heteromodal brain regions (as defined by independent fMRI ‘localizers’) showed the same critical interaction in the main event-related fMRI experiment. Thus, STS, visual cortex (and LGB), auditory cortex (and MGB), all showed enhanced BOLD responses when a sound was added to a lower-intensity visual target; but not when the same sound was added to a higher-intensity visual target instead (for which any trends were if anything suppressive instead). The present findings of audio-visual multisensory effects that influence not only heteromodal STS but also auditory and visual cortex accord well with several other recent studies (e.g. Ghazanfar and Schroeder, 2006; Kayser and Logothetis, 2007; Kayser et al., 2007; Noesselt et al., 2007; Watkins et al., 2007). Moreover, recent human fMRI studies and recordings in macaque STS and low-level auditory cortex also suggest that the multisensory principle of inverse effectiveness may apply there for some audio-visual situations (e.g. Ghazanfar et al., 2008; Stevenson and James, 2009). Here we extend the possible remit of this principle to implicate sensory-specific visual and auditory thalamus (LGB and MGB), in multisensory effects that can evidently relate to the conditions determining subject-specific psychophysical detection sensitivity.
This expands the recently uncovered principle that crossmodal interplay can affect sensory-specific cortices (e.g. Calvert et al., 2000; Macaluso et al., 2000; McDonald et al., 2003; Macaluso and Driver, 2005; Wang et al., 2008; Driver and Noesselt, 2008; Fuhrmann Alpert et al., 2008), to encompass thalamic levels of sensory-specific pathways also (see also Musacchia et al, 2006, albeit while noting that the V-brainstem potentials they measured in a speech task cannot be unequivocally attributed to specific thalamic structures; plus Baier et al. (2006) for some non-specific thalamic modulations). Here we find that audio-visual interplay can be observed even at the level of sensory-specific thalamic LGB and MGB. This might potentially arise due to feedback influences (see below), and/or through feedforward thalamic interactions that may guide some of the earlier audiovisual interaction effects (cf. Giard and Peronnet, 1999; Molholm et al., 2002; Brosch et al., 2005). While subcortically we did not find any significant impact upon the human superior colliculus, this might reflect some of the known fMRI limitations for that particular structure (Sylvester et al., 2007).
We also performed analyses of “effective connectivity”, i.e. inter-regional functional coupling (as distinct from local regional activation), for the fMRI data. We found that LGB and MGB showed stronger inter-regional coupling (residual covariation) with their associated sensory-specific cortices (visual or auditory, respectively), for the particular context of a sound paired with a low-intensity visual target. This enhanced coupling between visual or auditory thalamus and their respective sensory cortices further confirms our ability to separate and dissociate those visual and auditory thalamic structures (as also indicated by the blocked localizers, and by our individual analyses). This aspect of the effective-connectivity pattern goes beyond other recent results showing crossmodal influences that involve sensory-specific auditory cortex (e.g. Brosch et al., 2005; Ghazanfar et al., 2005; Noesselt et al., 2007; Ghazanfar et al., 2008; Kayser et al., 2008) or visual cortex (e.g. Noesselt et al., 2007). A further notable finding from our coupling results was that the separate analyses seeded in either individually-defined LGB or MGB both independently revealed enhanced coupling with the heteromodal STS for the same particular context (i.e. stronger coupling when a sound was paired with a lower-intensity visual target, and thus when sensory detection was enhanced). The observed effective-connectivity patterns might potentially serve to enhance the unisensory features of a bound multisensory object, enhancing an otherwise weak representation in one modality by means of the co-occurrence with a bound strong event in another modality, consistent with the impact on unimodal visual detection sensitivity here from the co-occurring sound.
To demonstrate an even closer link between the audio-visual effect found psychophysically (i.e. enhanced detection for lower-intensity visual targets when paired with a sound) and the fMRI data, we tested for regions showing subject-by-subject brain-behaviour relations. Within our independently defined visually-selective, auditorily-selective, or heteromodal areas, brain-behaviour relations for BOLD activation were found for STS, auditory cortex, and for visual cortex contralateral to the targets. Moreover, the functional coupling of LGB and MGB with heteromodal STS varied parametrically with the subject-by-subject size of the critical crossmodal behavioral pattern (Fig. 5d).
The visual cortical effects here were consistently contralateral to the low-intensity visual target, and consistently highlighted the fusiform gyrus. Additionally, our connectivity analyses revealed enhanced connectivity of LGN with V1 and a lateral occipital region (Fig. 4, bottom left panel) whose MNI coordinates correspond reasonably well with putative V5/MT+, as described in many previous purely visual studies (Campana et al., 2006). Although our main conclusions do not specifically depend on identifying this region as true MT+, that appears consistent with reports that MT+ may be particularly involved in detection of low-contrast visual stimuli (Tootell et al., 1995), as for the lower-intensity visual targets here. It might also relate to reports that MT+ may show some auditory modulations of its visual response (Calvert et al., 1999; Amedi et al., 2005; Beauchamp, 2005; Ben-Shachar et al., 2007; Eckert et al., 2008).
Our crossmodal findings arose even though only the visual modality had to be judged here (cf. McDonald et al.; 2000; Busse et al., 2005; Stormer et al., 2009). Future extensions of our paradigm could test whether these crossmodal effects are modulated when attention to modality is varied (cf. Busse et al., 2005). The possible impact of attentional load could also be of interest (cf. Lavie, 2005).
The new effects we found for human sensory-specific visual and auditory thalamus highlight the importance of thalamo-cortical connectivity. A role for non-specific thalamo-cortical loops in multisensory integration had been hypothesized recently, with some evidence from animal studies (Fu et al., 2004; Hackett et al., 2007; Lakatos et al., 2007; Reches and Gutfreund, 2009) or human fMRI (Baier et al., 2006). Some early interactions have recently been uncovered for rat MGB with audiovisual stimuli (Komura et al., 2005), as have effects for the tectothalamic pathway in owls (Reches and Gutfreund, 2009); while other physiological evidence in rodents points at the potentially multisensory nature of subnuclei within the MGB (Wepsic, 1966; Bordi and LeDoux, 1994; Benedek et al., 1997; see Budinger et al., 2006, 2008, for reviews). At a very different level to human fMRI data, one possible mechanism for audiovisual interplay between thalamic and cortical structures might involve calbindin-positive neurons within the thalamus, which could directly link thalamic with cortical structures (Jones, 2001; Lakatos et al., 2007). Alternatively (or in addition), the present thalamic modulations could potentially reflect feedback mechanisms from cortical upon thalamic regions (O’Connor et al., 2002), to subserve the audio-visual interactions found. Feedback signals may selectively enhance visual stimulus representations even within LGB (O’Connor et al., 2002). Future invasive animal studies are needed to shed light on the temporal dynamics of multisensory interplay within this thalamo-cortical network, while electrophysiologial measures in humans might also prove useful (see Musacchia et al., 2006). But the present data already indicate a link between multisensory behavioral benefits in visual detection, found specifically for lower-intensity visual targets paired with a sound, to modulation of sensory-specific auditory and visual thalamus. Moreover, we found enhanced inter-regional coupling of LGB and MGB with cortical STS, which scaled with behavioral performance, emphasizing the close interplay of subcortical and cortical processing in multisensory integration (Jiang et al., 2007) and the relevance of this for behavioral impacts upon perceptual sensitivity.
In conclusion, we uncovered a network of sensory-specific and multisensory regions in the human brain in relation to auditory-visual interactions. We were able to link this network closely to the psychophysical phenomenon of enhanced detection sensitivity for lower- but not higher-intensity visual targets when co-occurring with a sound. Visual and auditory thalamic nuclei were more activated when a sound co-occurred with a lower-intensity (but not a higher-intensity) visual target, and also coupled more strongly with the affected cortical regions in this context. Together these results implicate thalamo-cortical interplay in enhanced perceptual sensitivity (d’) due to multisensory interactions. They demonstrate in humans that crossmodal influences upon sensory-specific brain structures can extend even to sensory-specific visual and auditory thalamus, with these influences relating to psychophysical performance.
This work was funded by DFG-SFB-TR31/TPA8 and TP11, plus by the Medical Research Council (UK), the Wellcome Trust, and a Royal Society Anniversary Research Professorship to JD.