|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: JA JWB SH. Performed the experiments: JA SH CT. Analyzed the data: JA SH CT. Contributed reagents/materials/analysis tools: JWB. Wrote the paper: JA SH.
In everyday life, we need a capacity to flexibly shift attention between alternative sound sources. However, relatively little work has been done to elucidate the mechanisms of attention shifting in the auditory domain. Here, we used a mixed event-related/sparse-sampling fMRI approach to investigate this essential cognitive function. In each 10-sec trial, subjects were instructed to wait for an auditory “cue” signaling the location where a subsequent “target” sound was likely to be presented. The target was occasionally replaced by an unexpected “novel” sound in the uncued ear, to trigger involuntary attention shifting. To maximize the attention effects, cues, targets, and novels were embedded within dichotic 800-Hz vs. 1500-Hz pure-tone “standard” trains. The sound of clustered fMRI acquisition (starting at t=7.82 sec) served as a controlled trial-end signal. Our approach revealed notable activation differences between the conditions. Cued voluntary attention shifting activated the superior intraparietal sulcus (IPS), whereas novelty-triggered involuntary orienting activated the inferior IPS and certain subareas of the precuneus. Clearly more widespread activations were observed during voluntary than involuntary orienting in the premotor cortex, including the frontal eye fields. Moreover, we found evidence for a frontoinsular-cingular attentional control network, consisting of the anterior insula, inferior frontal cortex, and medial frontal cortices, which were activated during both target discrimination and voluntary attention shifting. Finally, novels and targets activated much wider areas of superior temporal auditory cortices than shifting cues.
The human brain can process only a limited amount of auditory information at a time. Attention shifting is constantly needed to allow redirecting of our focus to detect the most relevant sounds amongst noise. Such shifts can be triggered top-down, for example, to voluntarily shift the focus based on our goals and interests (an endogenous process), or bottom-up, when a potentially interesting unexpected sound involuntarily captures our attention (an exogenous process). The exact neuronal mechanisms controlling these two modes of auditory attention shifting are not, however, fully clear.
Previous neuroimaging studies in this field, which have mainly concentrated on the visuo-spatial domain, suggest that shifting of attention activates a network of brain areas including dorsolateral prefrontal cortex, premotor, medial frontal areas, and the posterior parietal cortex. On the basis of these studies, it has been proposed that separate dorsal (superior parietal lobule, SPL, intraparietal sulcus, IPS, frontal eye fields, FEF) and ventral (right temporal-parietal junction, ventral frontal cortex/anterior insula) attention systems underlie voluntary vs. involuntary attention shifting processes, respectively –. However, the distinction between dorsal and ventral attention systems is still under debate, as a number of visual – fMRI studies have failed to find fully segregated neural systems subserving endogenous and exogenous spatial orienting.
Despite the critical role that auditory information processing plays in human communication, a much smaller number of fMRI studies have been conducted to investigate voluntary attention shifting in the auditory, compared to the visual modality. The results obtained in different studies have not been fully consistent either. For example, a recent study  suggested that automatic orienting, compared to controlled orienting, is associated with greater activations in several frontal and parietal regions, while others ,  have reported increased activations in the posterior parietal cortex associated with top-down control of attention shifting. Inconsistencies like this are, obviously, in part related to differences in the experimental designs. At the same time, previous studies on auditory attention shifting have seldom controlled for potential biases caused by the acoustical scanner noise, which can mask the auditory stimuli and modulate the BOLD response in auditory  or even non-auditory cortices .
Resolving trade-offs related to acoustical scanner noise might be particularly essential for studies on involuntary attention shifting, an area of research that has been much more intensively investigated – than voluntary auditory orienting. Notably, this line of research has been, almost exclusively, based on methods such as MEG and EEG that are not biased by factors such as scanner noise. According to these studies, involuntary attention is triggered by an automatic change-detection process in superior temporal auditory cortices, as reflected by the mismatch negativity (MMN) response. This mismatch detection process is then followed by a sequence of brain events associated with attentional orienting and conscious detection of the sound change in extra-auditory association areas, which potentially involve dorsolateral prefrontal cortices –, anterior cingulate regions , and/or inferior frontal gyrus –. However, the relative contributions of auditory areas and other regions contributing to automatic change detection and involuntary orienting are not yet fully clear.
Another factor that has received relatively little attention in classic orienting studies is the role of the anterior insula in attention shifting. Accumulating evidence suggests that the anterior insula, one of the structures originally proposed to be associated with the ventral attention system , may actually play an important role in voluntary cognitive control – and perceptual decision making . A number of recent imaging studies on visual and auditory task switching, auditory working memory, and auditory attention have reported activations in the anterior insula –. The anterior insula has, consequently, been conjectured to contribute to the switching of attention  and to the related top-down interference resolution processes . It has also been recently suggested that the anterior insula constitutes a supramodal region that controls the orienting of attention . However, the exact role of this region in top-down aspects of auditory attention shifting needs further investigation.
In the present study, we investigated voluntary and involuntary attention shifting using a paradigm modified from classic visuospatial cued orienting , auditory-spatial selective attention , , and auditory involuntary attention-shifting ,  designs. Stimulus-driven orienting was triggered by unexpected novel sounds, a strategy that has been well documented to produce strong event-related potential responses and behavioral distraction effects related to involuntary attention shifting , . Biases related to acoustical scanner noises were controlled by using a mixed design, which combined event-related and sparse sampling approaches.
The present auditory task design (Figure 1) was modified from classic visual attention shifting  and auditory selective attention  paradigms (see Materials and Methods). During fMRI acquisition, subjects were instructed to detect a monaural harmonic target sound, which was embedded within trains of high- and low-pitch pure tones presented asynchronously to their left and right ears respectively. Outliers were defined as responses longer or slower than two standard deviations of the average reaction times within each run and counted as misses in the final behavioral data. One subject was excluded because of an inability to perform the task. In the final dataset (N=18, 11 females, age 19–28 years), the average hit rate was 90.2±7.9% and the reaction time was 495±48 ms. The mean±SD false alarm rate, as calculated from Cue+Standards and Cue+Novel+Standards trials, was 1.2±1.5%.
To verify the beneficial effect of cues in directing attention to subsequent targets, we conducted a separate behavioral control analysis (N=10, 4 females, age 22–43 years). The result demonstrated that spatial cueing significantly (t(9)=−4.17, P<0.01) speeded-up target discrimination, as compared to “invalidly cued” trials where the target occurred in the ear opposite of the cue (mean±SD reaction times 463±68 vs. 555±105 ms to validly vs. invalidly cued targets, respectively). To make tentative inferences of cueing benefits during fMRI, the data from this behavioral group were also compared to the main fMRI group’s performance during the fMRI session. There were no significant differences in the reaction times to validly cued targets during the control or main experiment. The reaction times to the invalidly cued targets during the behavioral control experiment were, however, significantly longer (t(26)=2.24, P<0.05) than the reaction times during fMRI to validly cued targets, suggesting that subjects may have been benefiting from the spatial cueing also during the fMRI experiment.
Figure 2 shows activations associated with the main contrasts, presumed to reflect cued attention shifting, novelty-triggered attention shifting, and target discrimination. The anatomical areas associated with these activations have been identified in Tables 1, ,2,2, ,33 based on the parcellation included the FreeSurfer package . Our approach revealed notable activation differences between cued attention shifting, novelty-trigger attention shifting, and target discrimination. The specific contrasts that were utilized to determine these effects have been described below.
We first compared activations between the condition where the cue occurred in one of the ears (atop dichotic standard tones) but no target followed, and the condition consisting of standard tones only (Figure 2A). This contrast, presumably reflecting cued voluntary attention shifting, was associated with significantly (P<0.05, cluster threshold corrected for the family-wise error based on the theory of Gaussian random fields, GRF) increased activations in the bilateral precentral areas (including premotor cortex, PMC and FEF), anterior insula, medial superior frontal cortex (mSFC) including pre-SMA extending to paracingulate and anterior mid-cingulate cortex (aMCC), dorsal posterior cingulate (dPCC), posterior superior temporal gyrus (pSTG), planum temporale (PT), superior temporal sulcus (STS), angular gyrus (AG), and IPS. Lateralized activations were found in the right inferior parietal region (including supramarginal gyrus, SMG, and the sulcus intermedius primus of Jensen) and left cerebellum. Several subcortical structures including the thalamus, putamen, and caudate were also activated bilaterally.
In the second comparison, we contrasted the condition where an unexpected “novel” sound occurred opposite to the cued ear with the condition consisting of the cue and standard tones but no target (Figure 2B). This contrast, presumably reflecting novelty-triggered involuntary attention shifting, was associated with significant (P<0.05, cluster threshold corrected for the family-wise error based on the GRF theory) activations in several frontal and cingular cortex regions, including the right PMC/FEF, middle frontal cortex (MFC), and pars triangularis of IFC, as well as in the orbital regions, pregenual ACC, posterior MCC (pMCC), dPCC, and subparietal sulcus regions (i.e., parietal continuum of cingulate sulcus). Activations associated with novelty-triggered involuntary attention shifting were also found bilaterally in the posterior insula, the temporo-parietal junction (TPJ), in the SMG and the AG of inferior parietal regions, IPS, and precuneus. In the temporal lobe, activations associated with this contrast extended to primary (medial 2/3 of Heschl’s gyrus) and non-primary (anterior and posterior STG, PT, lateral 1/3 of Heschl’s gyrus) auditory cortex areas, as well as to the STS and middle and inferior temporal areas. Finally, in this contrast, we also observed activations in the visual cortex (left cuneus) and in several subcortical regions, including the bilateral thalamus and putamen, as well as in the right cerebellum.
Figure 2C shows data from the contrast that compared activations in the condition where the target occurred in the cued ear, to the condition consisting of the cue and standard tones only. In this contrast, presumably reflecting target discrimination, we observed significantly (P<0.05, cluster threshold corrected for the family-wise error based on the GRF theory) increased activations in several frontal, parietal, temporal, and occipital regions. Frontocingularly, activations were found in the bilateral superior frontal cortex, the dorsolateral prefrontal cortex (DLPFC), PMC, IFC, ACC (orbital, subgenual, pregenual), aMCC, pMCC, dPCC, and the pars marginalis. In and near the parietal cortex, the activations extended to TPJ, SPL, the SMG and AG of inferior parietal regions, IPS, the subparietal area, the precuneus, and to the parieto-occipital sulcus. Increased activations to target discrimination emerged also in visual cortices including the cuneus, calcarine sulcus, and lingual gyrus. In temporal areas, in addition to the primary and non-primary (anterior and posterior STG, PT) auditory cortex, target discrimination also activated STS, and the middle and inferior temporal areas. Activations lateralized only to one hemisphere were found in the left central sulcus, postcentral gyrus, and SMA of the mSFC (extending to the paracentral lobule), as well as in right FEF and pre-SMA of the mSFC. Finally, bilateral activations were observed in subcortical regions including the thalamus, putamen, caudate, and pallidum, as well as the cerebellum.
To examine the demands of standard sounds on attention, we contrasted the condition in which only standard sounds occurred to the condition of fixation (Figure 2D). Significantly (P<0.05, cluster threshold corrected for the family-wise error based on the GRF theory) increased activations were observed in several frontal, temporal and parietal regions, including bilateral SFC, primary auditory cortex, posterior insula, STG, paracentral region, and suborbital sulcus. Left lateralized activations were found in MFC, STS, inferior parietal region (AG), while right lateralized activations were found in the central sulcus extending to pre-central and post-central gyrus.
This analysis was conducted to illustrate and compare areas specifically concentrating on cued vs. novelty-triggered attention shifting, and target discrimination (Figure 3). The more limited comparison in Figure 3A shows the areas activated significantly in the cued attention shifting (originally shown in Figure 2A) and novelty-triggered attention shifting (originally shown in Figure 2B) conditions. This comparison, which did not consider the target-discrimination related activations, showed overall distribution differences between areas activated during cued (red color) vs. novelty-triggered (green) attention shifting that are principally consistent with previous models – distinguishing between separate dorsal and ventral attention systems (for anatomical details, see Tables 1 and and2).2). However, the anterior insula, an area that has been previously often associated with the ventral stimulus-driven/salience detection network, was activated bilaterally during cued attention shifting, while certain areas in the right IFC were activated during novelty-triggered but not cued attention shifting.
However, when activations during auditory target discrimination were also considered (dark blue in Figure 3B), the presumed dorsal vs. ventral distinction between goal-driven (cued shifting, target discrimination) and stimulus-driven activation networks became slightly less obvious. That is, particularly in the right hemisphere, many of the more “ventral” areas, specifically near the superior temporal auditory areas and in the lower parts of the lateral temporal cortex (STS), which were significantly activated during novelty-triggered (but not cued attention shifting) were also strongly activated during detection of auditory targets. Nonetheless, the more posterior aspects of STS, more extensively in the left hemisphere, seemed to be quite selectively related to stimulus-driven processes (green in Figure 3B).
Note that there were also overlaps between the two more goal-driven auditory attention conditions (cued attention shifting and target discrimination) in areas not activated by novelty-triggered attention shifting (pink in Figure 3B). One such area is, interestingly, the anterior insula. Overlapping activations between the two more goal-driven auditory attention conditions were also observed in parts of the PMC/FEF and cingulate cortex.
Despite the complex pattern of overlapping activations to the three major contrasts of interest, we also found areas that were significantly associated by only one of the three processes. An interesting distribution of activations was observed particularly in the right posterior parietal areas: cued attention shifting activated the superior and anterior region of IPS, novelty-triggered attention shifting activated the posterior and inferior IPS, while target discrimination activated more anterior/superior aspects of IPS. Finally, areas activated selectively by voluntary attention shifting were also found in the right and left precentral areas, in the vicinity of FEF.
Figure 3B show activations in a widespread array of regions related selectively to target discrimination more dorsally and also medially in the neocortex. For example, the superior lateral PFC areas (including DLPFC), the medial PFC, and cingulate cortex areas were activated by target discrimination only (see Figure 2C and Table 3 for detailed anatomical descriptions). Similarly, activations of visual cortex areas, including the calcarine sulcus, cuneus, and lingual gyrus, were almost specific to target discrimination, with only a few activation points found more dorsally near the parietal-occipital junction and cuneus during novelty-triggered attention shifting.
Additionally, we also directly compared cue vs. novel and novel vs. cue contrasts using second-level random-effects group analysis thresholded at P<0.01 (Figure 4). Activations associated with the Cue vs. Novel contrast were significantly higher in bilateral mSFC/aMCC (both more prominently at the right), anterior insula, and IPS, as well as right FEF, PMC, and IFC (pars opercularis). Activations associated with the Novel vs. Cue contrast were significantly higher in bilateral primary and non-primary (anterior and posterior STG, PT) auditory cortex, posterior insula, STS, MTG, ITG, TPJ, inferior parietal (SMG, AG), and precuneus, as well as right IFC (pars triangularis).
While a small number of fMRI studies , ,  utilizing continuous fMRI scanning on cued auditory attention shifting have been published, the current study was specifically designed to compare auditory fMRI activations between attention shifting to predictable cues and stimulus-driven orienting to unexpected novel sounds. Noting the essential role of sensory areas in automatic change detection processes triggering involuntary attention in the auditory domain, we used a “mixed” event-related/sparse sampling approach to mitigate both sensory and attentional confounds caused by acoustical scanner noise to achieve this goal. In addition to similarities between activations to putative endogenous and exogenous processes consistent with previous studies , –, our approach also revealed some notable activation differences between cued attention shifting, novelty-triggered attention shifting, and target discrimination.
Areas activated selectively by cued attention shifting were found in the bilateral precentral/FEF regions, and in posterior parietal areas. In these areas, the foci of activations were clearly different between cued attention shifting (more posteriorly/superiorly in IPS), novelty-triggered attention shifting (inferior/posterior to IPS), and target discrimination (more anteriorly in IPS/SMG). In line with the theory – of distinct dorsal vs. ventral attention systems, novelty-triggered attention shifting activated selectively posterior aspects of the STS/medial lateral temporal cortex, inferiorly to the activations of the two other more goal-directed attention conditions. Interestingly, the anterior insula, which has often been associated with more fundamental processes of stimulus-driven change  and salience detection , , was activated during cued attention shifting and target discrimination, but not during attention shifting to sound novelty. In the prefrontal cortices, activations associated with target discrimination were widespread and found in regions more anterior and superior to the frontal areas activated by other conditions.
In the precentral areas including the FEF and lateral PMC areas (BA 6 and 44), most widespread activations were observed for the contrast associated with cued attention shifting. Within these areas, the region probably most closely corresponding to FEF was significantly activated also during novelty-triggered attention shifting, but only very weakly during target discrimination. Our results, thus, would suggest that these areas are related to the orienting of auditory attention, and most strongly, during cued attention shifting. This interpretation is consistent with the long-held view that FEF constitutes a critical locus for the control of spatial attention , as it is presumably interconnected with other frontoparietal regions, such as IPS, and because it may also be involved in multisensory attention  and orienting , . The increased FEF activations associated with auditory attention shifting, during a condition in which subjects were instructed to fixate on a cross in the center of the screen for all tasks, is also in line with the notion ,  that this area is involved in more than just the control of eye movements and overt gaze orienting. However, similar to previous observations , , , our results suggest that the right FEF is activated by distracting events that catch attention in a bottom-up manner as well. Indeed, it was recently suggested that regions of human FEF and IPS may reflect the representation or integration of attentional priority , , instead of constituting strictly a “voluntary” attentional system. In other words, the function of FEF may, instead of voluntary control only, be more essentially associated with orienting of spatial attention. Intriguingly, the lateralization of present FEF effects for novel and distracting auditory events is also in line with the traditional view that the triggering of involuntary auditory attention is specifically lateralized to the right frontal cortex . Meanwhile, the voluntary shifting condition seemed to activate precentral areas including the FEF more bilaterally.
Our results on precentral areas (beyond FEF) may be interesting in light of the recent debate on the attentional role of lateral PMC. Some studies ,  suggest that lateral PMC is involved in the detection of salient and behaviorally relevant stimuli, especially in unattended and task-irrelevant locations (stimulus-driven attention). This finding has led to a proposition that these regions constitute a part of the same ventral fronto-parietal network that also includes the anterior insula and TPJ , . However, the present evidence of more widespread activations in the lateral PMC during cued rather than novelty-triggered attention shifting suggests that these regions are involved in top-down/voluntary attentional control.
The posterior parietal cortex, especially IPS, is essential for spatial attention , ,  supported by converging evidence from monkey physiology and human neuroimaging studies. There is even some evidence that further suggests a topographic organization of spatial attention signals within IPS . Our findings that cued attention shifting activates superior IPS, novelty-triggered attention shifting activates inferior IPS, and target discrimination activates more anterior/superior IPS areas suggest possible functional differentiation within these posterior parietal areas. These findings are also principally in line with the proposed dorsal vs. ventral distribution of networks dedicated to goal-driven (cued attention shifting, target discrimination) and stimulus-driven (novelty-triggered attention shifting) attentional processes , , .
Another intriguing finding in our study is that the anterior insula was, in both hemispheres, more significantly activated in cued attention shifting and target discrimination than novelty-triggered attention shifting. Previous studies on executive control of auditory spatial attention have reported activations in the bilateral anterior insula . However, the anterior insula has not, traditionally, been viewed as a task-control region, and its activations have typically been considered subsidiary to IFC , ,  or they have not been extensively discussed , . Nevertheless, our data are in line with the accumulating human and non-human primate evidence –, ,  suggesting that the anterior insula, as a part of a cingulo-opercular system, might play a more significant role in voluntary cognitive control than previously assumed. The notion has been further supported by animal  and human  evidence on anatomical connectivity between the anterior insula and mSFC areas, as well as by histological evidence , . It has been, however, also debated whether the anterior insula has a more executive role in maintaining a sustained task mode and strategy , or whether it is merely a transient saliency detector that initiates attentional control signals in other higher-order areas . The present lack of anterior insula activations to the most salient sounds of the present design, the novel sounds, would seem to be clearly at odds with the latter idea. Our findings would, instead, seem to be more consistent with an alternative theory  that the anterior insula activity does not express perceptual salience, per se, but rather the recruitment of processing resources when faced with a given sensory event, whatever the source of that recruitment, bottom-up or top-down. Finally, it is also noteworthy that, in addition to the actual redirection of attention, attention shifting presumably involves endogenous processes that allow us to disengage from previous activity and to maintain heightened top-down control on the new task . Consequently, noting the recent evidence by Wu et al.  and Alain et al.  showing anterior insula activations during working memory processing and goal-directed actions, it is also possible that activations in the anterior insula during cued attention shifting and target discrimination are most essentially related to engagement of attention control.
It has been proposed that the anterior insula and the cingulate gyrus may belong to the same cingulo-insular system involved in top-down cognitive control. Our data is consistent with this proposal, as the mSFC regions (including bilateral pre-SMA and extending to aMCC, and rostral cingulate zones) and the anterior insula were both activated during cued attention shifting and target discrimination, but not in novelty-triggered attention shifting. The cingulate gyrus has also been identified as a major component in a distributed network subserving the dynamic relocation of spatial attention , . Previous studies also show that aMCC is associated with conflict resolution ,  and decision making . Here we provide data suggesting that aMCC is specifically involved in top-down spatial attention control.
In general, our data are consistent with previous studies on auditory attention shifting. However, we also observed certain discrepancies. Such discrepancies may, obviously, be in part explainable by differences in paradigms and methods between the studies. For example, our results differ slightly from a previous event-related fMRI study by Mayer et al.  where subjects were instructed to localize targets following informative (75% valid) or uninformative (50% valid) cues. In contrast to present findings, as well as those reported in several recent auditory studies , , the authors found that automatic orienting elicited by the uninformative cue condition increased activations in the precentral areas and the insula, both of which in the present study were associated with voluntary instead of stimulus driven processes. However, the study of Mayer et al. apparently did not aim at separating the processes of auditory cued attention shifting and target identification. Moreover, based on the behavioral data, it is not entirely clear that uninformative cues used in Mayer et al. were, in fact, followed by less intensive top-down processing than informative cues.
At the same time, the present results differ slightly from the event-related fMRI study of Salmi et al. , in which the authors used a dichotic target-discrimination design that was in many ways analogous to the present study. Specifically, Salmi and colleagues asked their subjects to detect occasional targets in the attended stream. Instead of novel sounds presented only to the uncued ear, involuntary attention shifts were triggered by unexpected loudness deviations presented to either ear. Finally, unlike in the present study, attention shifts were guided using central visual cues. Their results were quite similar to the present findings in regards to top-down controlled attention shifting. Salmi et al did not, however, observe top-down related activations in the anterior insula. At the same time, using the visually presented shifting cues, Salmi et al. observed visual-cortex activations that were absent during cued shifting in the present study. For the bottom-up driven attention shifting, the present study showed more extensive activations in bilateral auditory cortex (possibly due to auditory stimulation and scanning parameter differences explicated above), posterior insula, IPS, and posterior cingulate than the study of Salmi and colleagues. In addition to aforementioned differences in stimulation and scanning parameters (continuous vs. sparse sampling), some of these discrepancies may be explainable by anatomical interpretation approaches. For example, the present surface-based approach may produce different results in terms of the exact anatomical boundaries between the anterior insula and IFC than the fully volumetric atlas that was used by Salmi and colleagues.
The present sparse sampling design may help make the results more easily comparable to cognitive neuroscience studies conducted using other methods that are not confounded by factors such as acoustical scanner noise. That is, in comparison to the relatively small number of auditory studies on voluntary attention shifting, there has been a profusion of MEG and EEG research on involuntary attention shifting to unattended sound changes –, –. In these studies, involuntary auditory attention shifting has been proposed to be triggered by an automatic change-detection process, reflected by the MMN response , followed by a sequence of brain events associated with attentional orienting and conscious detection of this change (however, see also ). Indeed, in the present study, quite remarkable differences between conditions emerged in the superior temporal auditory areas. In these areas, the activations during novel sound processing extended all over the superior temporal plane, while activations to attention shifting cues were only significant in posterior aspects of auditory cortex (pSTG, PT). This distribution difference of effects could, in principle, be interpreted to be in line with the suggestion  that automatic deviance detection (reflected by the MMN process) originates more anteriorly in the auditory cortex than responses to more predictable shifting cues. At the same time, previous studies also suggest that non-primary auditory cortex processes sound identity and location in parallel, through the anterior “what” and posterior “where” pathways , –. In the current study, the novels contained much richer identity features than the cues used to trigger voluntary attention shifting. Hence, the enhanced spreading of auditory cortex activations to the putative “what” regions might reflect stimulus-driven activations in the sound-object identification system. A sound-identification process might also explain some of the IFC activations during novelty-triggered attention shifting, given the theory that the “what” streams extend to ventral frontal cortex areas , . Meanwhile, the broader activations associated with auditory target discrimination, compared to cued attention shifting, could be partially explained by the more enhanced top-down influences needed for the more difficult process of discriminating the targets from the repetitive standards, as established by numerous imaging studies –. Our previous work on auditory attention has also demonstrated correlations between attentional modulation of auditory cortex activation and behavioral discrimination of target tones (as measured from the difference in the hit rate between easier vs. more difficult targets delivered to the ear).
We observed extensive activations in the visual cortices associated with target discrimination, similar to Wu et al.  (note however that Wu and colleagues asked their subjects to keep their eyes shut throughout the study). This may be the result of cross-modal influences between the auditory and visual cortices. That is, previous studies have shown that the visual cortex can be activated by auditory input , ,  and that there are direct anatomical connections between the superior temporal and occipital regions in primates  and humans , . Meanwhile, a recent fMRI study  also showed that auditory occipital activations depend strictly on the sustained engagement of auditory attention and are enhanced in more difficult listening conditions.
The “standards only” condition, during which subjects were instructed to listen carefully and wait for the cue, revealed significantly increased activations in several frontal, temporal and parietal regions. Interestingly, the activated areas included the paracentral region, which according to recent studies is activated during maintenance of attention . These paracentral activations might have also overlapped with a supplementary eye field region, which has been previously proposed to be involved in visuospatial control processes and performance . The areas activated during the standards only condition also included the SFC, which has been previously associated with high-level cognitive control processes such as monitoring  and anticipatory spatial attention . However, it has to be noted that, in contrast to other comparisons that were conducted across active task/stimulation conditions, the “standards only” comparison was contrasted with the fixation condition, in which no explicit task was included.
Here, we utilized sparse sampling to control for biases caused by the acoustical scanner noise. Acoustical scanner noise is potentially a problematic variable in all fMRI experiments, but it is of particular concern in studies of audition and language processing. First, as discussed above, these effects might obviously modulate stimulus-driven orienting, which presumably receives a major contribution from auditory cortex –, –. Although scanner noise does not necessarily entirely abolish change-detection activations that trigger involuntary orienting , the benefits of sparse sampling in studies on stimulus-driven auditory cortex activities have been well documented –. Second, the ongoing acoustic and somatosensory stimulation associated with continuous scanning may also confound attentional top-down effects, both in auditory cortices and higher-order association areas. The longer-term effects of continuous environmental noise on our ability to concentrate have been very well documented . Not surprisingly, in fMRI studies, it has been shown that increasing the intensity of acoustical scanner noise modulates extra-auditory activations during working memory performance, resulting activation increases in certain areas (inferior, medial, and superior frontal gyri) and decreases in others (e.g., anterior cingulate) . Using PET, it has been further shown that recorded scanner noise may increase regional blood flow in anterior cingulate and Wernicke’s areas during visual imagery . Event-related potential (ERP) studies also suggest that continuous fMRI scanner noise may reduce and delay certain “endogenous” components that are related to auditory attention . Consistent with these results, active behavioral auditory performance may be improved during sparse vs. continuous fMRI scanning . Finally, recent sparse-sampling fMRI studies  also suggest that top-down attention effects may produce detectable modulations in auditory cortex even in the absence of any acoustic stimuli. Such endogenous feedback activations could be easily masked or cofounded in by acoustical scanner noise. These kinds of confounds have been also discussed , , for example, in the context of interpreting top-down modulations of auditory cortex activity by visual stimuli during continuous scanning.
Based on the above notions, it might seem quite obvious that sparse sampling is the best approach for any study involving auditory functions. However, it has to be also noted that with sparse sampling designs, a much smaller number of volumes can be acquired in a given experiment, which may reduce the signal-to-noise ratio in comparison to continuous scanning. Indeed, a recent auditory cortex mapping study  (which however also utilized 70 dB noise masking on the background in both fMRI scanning conditions) showed relatively small differences between sparse and continuous scan experiments. In terms of more complex designs, a disadvantage of sparse sampling is the reduced temporal resolution that will make it difficult to extract stimulus specific BOLD time courses. Finally, a trade-off in sparse sampling is the fact that the clustered scan noise may, itself, become a “rare sound” that triggers strong activations of the alerting and orienting networks. Although the BOLD responses of such activations will not be necessarily caught by fMRI when the TR is long enough, the cognitive significance and relative saliency of the subsequent stimuli of interest may still be modulated. However, a novel feature in the present orienting design was that these biases were controlled by using the noise stimulus produced by each fMRI volume acquisition as a part of the task design.
It is noteworthy that in experimental conditions, it is difficult to produce and document activations that are purely stimulus driven vs. endogenous. For example, novelty-triggered attention shifting may involve a number of top-down processes that are associated with, for example, the suppression of involuntary attention shifting, reorienting to the relevant task (if this is part of the instruction), and conflict resolution processes to “evaluate the situation” after the automatic orienting response (see, e.g., Escera et al. , Schröger and Wolff  ). Cued attention shifting is, in turn, contaminated by stimulus-driven processes triggered by the cue itself. At the same time, voluntary attention shifting may involve active disengagement from the previous strategy, as well as engagement to the new attentional task (termed “cued attention,” in Petkov et al. , also see ). In other words, although the process is collectively referred to as “attention shifting”, the parts that are actually the most “voluntary” or “goal directed/endogenous” might not be related to the orienting, per se.
It is also possible that differences in auditory cortex activities during the triggering of involuntary and voluntary attention are related to the context and predictability of the stimulation. Strong unpredictable stimuli, such as novel sounds, tend to result in widespread sensory responses from the bottom up, which then triggers an involuntary orienting process that, according to previous ERP studies, is related to the strength of the auditory cortex response. A more predicable and repeated stimulus, such as the cue, may trigger less prominent responses – but even in this case, the cue plays a role in orienting. However, proportionally speaking, the bottom-up influence is smaller than in the case of novelty-triggered attention (consistent with our predictions and conclusions). At the same time, these processes are essentially modulated by top-down attention, especially when the discrimination task is difficult (such as in the case of targets that result in a stronger auditory cortex response than the cues).
Also note that in most visual studies, the cue that triggers voluntary shifting is usually an arrow, which is physically different from the target. A related consideration is whether the present cues were more prone to induce stimulus-driven activations than the symbolic arrows, which have been utilized in many visual and also auditory attention shifting studies . It has been thought that because arrow symbols do not occur in the physical location of the target, they might trigger purely goal-driven processes. However, it is worth noting that the processing of any simple cue probably gets rapidly automatized during the course of an experiment, and as the simple symbol is repeated, the account of stimulus-driven processes subsequently increases , . Most importantly, the present study showed notable differences between activations during cued and novelty-triggered auditory attention shifting, clearly beyond areas associated with sensory cortices processing physical properties of the sounds. Further, activations in these sensory areas, where one might expect particularly strong stimulus-driven activations, were clearly weaker during attention-shifting cues than during novel-sound or target-sound detection.
In conclusion, our study revealed distinct activations during cued auditory attention shifting, involuntary orienting to novel sound, and auditory target discrimination. Areas most selectively involved with cued voluntary shifting included the superior/posterior IPS and precentral areas (including FEF and PMC), which provides important evidence supporting these regions’ involvement in top-down/voluntary attentional control. Activations specific to involuntary attention shifting to novel sound were found in posterior STS, inferior IPS and TPJ, which is principally consistent with models suggesting more ventral distribution of stimulus-driven attention , , , , but also from the right IFC. Interestingly, our results also revealed marked differences in the anterior insula and IFC activations associated with goal-driven attentional processing (cued attention shifting and target discrimination) and novelty-triggered involuntary attention shifting, suggesting that the anterior insula may play a more executive role in auditory attention than previously thought.
Potential subjects were first screened with a phone interview to ensure that they had normal hearing and had not been exposed regularly to environments with excessively loud noise. Nineteen right-handed college-level educated adults with normal hearing and no neurological disorders, psychiatric conditions, or learning disabilities, gave written informed consent prior to testing, in accordance with the experimental protocol approved by the MGH IRB. One subject was excluded from the final sample due to an inability to perform the task (hit rate below 50%), rendering a total of eighteen subjects (N=18, 11 females, age range 19–28).
In all trials, brief pure tones (duration 50 ms, 5-ms ramps) were presented in the background, randomly to the right (800 Hz) or left ear (1500 Hz), similar to a classic study . Because the standard sounds merely offered a context to other sounds (which were consistent across ears), the ear/frequency order of the background standard stream was held constant across subjects. Subjects were told to wait for a cue (250-ms buzzer sound) that occurred in the ear where a subsequent target (50-ms tone with 800- and 1500-Hz harmonics) was likely to occur. The average interval between the cue and the target was ~1.7 sec. Upon hearing the cue, the subjects were advised to shift their attention to the designated ear (with eyes remaining fixated), pay close attention to the tones presented in that ear, and press a button with the right index finger as rapidly as possible after hearing the target. Specifically, subjects were instructed to pay attention to a change in relation to the ongoing stimulation (a “thickening” of the sound), and they were kept naïve to the fact that the targets were actually similar in both ear streams.
Previous event-related MEG/EEG studies , ,  suggest that strong event-related MEG/EEG responses (e.g., the P3a component) associated with involuntary auditory orienting can be evoked by physically varying “novel” sounds. In 20% of the trials, the target was therefore replaced by a task-irrelevant novel sound presented opposite to the cued ear. These novel sounds consisted of eight spectrotemporally complex environmental and synthetic sounds whose peak intensities, onset rise times, and perceived loudness, as well as their grand-average time envelope, were made as close to the cues as possible. Pure tones only (no cue, novel, or target) were presented in 20% of trials. At 7.82 sec after the trial onset, subjects heard the sound of 2.18-sec fMRI volume acquisition signaling that the trial had ended. In other words, the confounding effects of fMRI acquisition noise were controlled by using it as a task stimulus. Tonal stimulation started 2.3 sec after the onset of preceding scan/simulation, at a 1.1-sec average stimulus-onset asynchrony (SOA) in each ear, and ended on average 1.3 sec before the next scan. The SOA was jittered within each trial to avoid omission-response confounds. During fMRI, three silent baseline trials occurred after every 6 active trials (i.e., a mixed blocked/event-related design was utilized). In subsequent analyses, individual trials with target-detection responses beyond the subject’s mean ±2SD reaction time were considered outliers. Finally, in an additional ten-minute behavioral experiment testing whether the spatial cueing indeed produced significant performance benefits, we replaced 50% of the novel sounds with a target sound opposite to the cued ear (“invalidly cued target”).
A standardized computerized approach taking about 5 minutes was utilized to teach the task to the subjects before scanning. During fMRI sessions, subjects were presented with randomly ordered 10-sec trials. Sound stimuli were presented at 55 dB Sensation Level, as tested individually at the beginning of each session, and delivered through MRI compatible insert earphones (Sensimetrics, Malden, MA). The insert included an eartip to protect the subjects’ ears during the scan acquisitions. A cross (fixation mark) was projected on the center of an MRI compatible video display. Subjects were instructed to look at the fixation mark throughout the whole study. Each scan session contained three runs, and there was a brief break after each run to restart the stimulation and communicate with the subject. For each task run, there were 136 trials/blocks that lasted 22 minutes and 40 seconds. Subjects were instructed to respond with their right index finger.
Whole-head fMRI was acquired at 3T using a 32-channel coil (Siemens TimTrio, Erlagen, Germany) and an interleaved echo planar imaging (EPI) method. To circumvent response contamination by scanner noise, we used a sparse-sampling gradient-echo blood oxygen level dependent (BOLD) sequence (TR=10 sec, TE=30 ms, 7.82 sec silent period between acquisitions, flip angle 90°, FOV 192 mm) with 36 axial slices aligned along the anterior-posterior commissure line (3-mm slices, 0.75-mm gap, 3×3 mm2 in-plane resolution), with the coolant pump switched off. A field mapping sequence (TR=500 ms, flip angle 55°; TE1=2.83 ms, TE2=5.29 ms) with the same number of slices, voxel size, and slice orientation to the EPI sequence was applied to obtain phase and magnitude maps utilized for unwarping of B0 distortions of the functional data. T1-weighted anatomical images were obtained for combining anatomical and functional data using a multi-echo MPRAGE pulse sequence (TR=2510 ms; 4 echoes with TEs=1.64 ms, 3.5 ms, 5.36 ms, 7.22 ms; 176 sagittal slices with 1×1×1 mm3 voxels, 256×256 mm2 matrix; flip angle=7°).
The fMRI data were preprocessed using tools from FEAT Version 5.98, a part of the FSL package  (www.fmrib.ox.ac.uk/fsl). Skull stripping was performed with BET, B0 unwarping using FUGUE, and motion correction with MCFLIRT. The data were smoothed with a Gaussian kernel (5-mm FWHM) and registered to the Montreal Neurological Institute (MNI) space using FLIRT. The intensity normalized fMRI time-series were then entered into a general linear model (GLM) with the task conditions as explanatory variables. At the second-stage, individual experimental runs were combined within each subject by using a fixed-effect model. Finally, contrasts pertaining to the main effects of the factorial design constituted the data for the third-stage (mixed-effect) analysis with automatic outlier detection , where the significance of observations was determined across the group of 18 subjects using FMRIB’s Local Analysis of Mixed Effects (FLAME) 1 and 2 , ,
Group analysis was performed in MNI space. Gray-matter partial volume information obtained from each subject using Freesurfer 5.0 anatomical segmentation results was entered as voxel-dependent anatomical covariate in the group statistics . The Z-statistic images were corrected for multiple comparisons using whole-brain cluster correction based on GRF theory, with an initial cluster threshold of Z>2.3 and a post-hoc corrected threshold of P<0.05 . Finally, to interpret the anatomical results, the results were coregistered to the FreeSurfer brain template (“fsaverage”) and shown in the surface space. The contrasts presumed to reflect cued attention shifting, novelty triggered reorienting, and target discrimination processes were defined as “cue + standards vs. standards only,” “cue + novel + standards vs. cue + standards,” “cue + target + standards vs. cue + standards,” respectively. Additionally, the “baseline” contrast, i.e. “standards vs. fixation” was calculated to examine the effect of standard sounds on attention. Finally, cued attention shifting and novelty-triggered attention shifting were directly compared by defining the “cue vs. novel” and “novel vs. cue” contrasts at the second-level using a random-effects model of the group analysis with a threshold at P<0.01. Finally, behavioral results were analyzed using paired and independent-samples t-tests as appropriate.
We thank Dr. Wei-Tang Chang, Dr. Sharon Furtak, An-Yi Hung, Stephanie Rossi, Mary O’Hara, and Lawrence White for their help.
This work was supported by National Institutes of Health awards R01MH083744, R21DC010060, R01HD040712, R01NS037462, and P41RR14075. The research environment was supported by Shared Instrumentation grants S10RR014978, S10RR021110, S10RR019307, and S10RR023401. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.