|Home | About | Journals | Submit | Contact Us | Français|
In everyday listening situations, we need to constantly switch between alternative sound sources and engage attention according to cues that match our goals and expectations. The exact neuronal bases of these processes are poorly understood. We investigated oscillatory brain networks controlling auditory attention using cortically constrained fMRI-weighted magnetoencephalography/ electroencephalography (MEG/EEG) source estimates. During consecutive trials, subjects were instructed to shift attention based on a cue, presented in the ear where a target was likely to follow. To promote audiospatial attention effects, the targets were embedded in streams of dichotically presented standard tones. Occasionally, an unexpected novel sound occurred opposite to the cued ear, to trigger involuntary orienting. According to our cortical power correlation analyses, increased frontoparietal/temporal 30–100 Hz gamma activity at 200–1400 ms after cued orienting predicted fast and accurate discrimination of subsequent targets. This sustained correlation effect, possibly reflecting voluntary engagement of attention after the initial cue-driven orienting, spread from the temporoparietal junction, anterior insula, and inferior frontal (IFC) cortices to the right frontal eye fields. Engagement of attention to one ear resulted in a significantly stronger increase of 7.5–15 Hz alpha in the ipsilateral than contralateral parieto-occipital cortices 200–600 ms after the cue onset, possibly reflecting crossmodal modulation of the dorsal visual pathway during audiospatial attention. Comparisons of cortical power patterns also revealed significant increases of sustained right medial frontal cortex theta power, right dorsolateral prefrontal cortex and anterior insula/IFC beta power, and medial parietal cortex and posterior cingulate cortex gamma activity after cued vs. novelty-triggered orienting (600–1400 ms). Our results reveal sustained oscillatory patterns associated with voluntary engagement of auditory spatial attention, with the frontoparietal and temporal gamma increases being best predictors of subsequent behavioral performance.
Human behavior and communication depends on our ability to flexibly shift the focus between alternative locations of acoustic space, and to engage voluntary auditory attention to the most relevant sources of information, such as when one tries to follow a conversation in a multitalker environment. In such situations, attention may also be involuntarily, or exogenously, captured by novel sounds that occur beyond our immediate perceptual field. Distinguishing between these voluntary vs. involuntary modes of orienting has been a major goal in previous attention studies (for a review, see Corbetta and Schulman 2002). Yet, many previous human neuroimaging studies, including both visual (Kim et al., 1999; Mayer, Harrington, Adair, & Lee, 2006; Peelen, Heslenfeld, & Theeuwes, 2004; Rosen et al., 1999; Serences & Yantis, 2007) and auditory efforts (Salmi, Rinne, Koistinen, Salonen, & Alho, 2009), suggest that largely overlapping networks are activated during voluntary and involuntary orienting. Notably, the majority of this evidence has been obtained with methods that provide only indirect indices of neuronal activities and have a poor spectral and temporal resolution. This might be a specifically relevant limitation in the auditory domain: Unlike visual objects that occur in parallel across the visual field, sounds consist of complex spectrotemporal signals that are distributed across time. Particularly in the auditory domain, voluntary and involuntary attention processes could be better distinguishable based on their distinct temporal behaviors.
Classic models developed based on visuospatial studies suggest that attention shifting can be divided to distinct stages, involving the disengagement of previous activity, redirection of attention, and finally the engagement of attention to a new location (Posner & Petersen, 1990). Cognitive control of auditory attention can, thus, be presumed to involve processes occurring at different time scales. Redirection of attention from one sound source to another may occur almost instantaneously, as suggested by psychoacoustic (Mondor & Zatorre, 1995) and EEG (Näätänen, 1992) studies in humans. The rapid orienting process is presumably accompanied by sustained control activities that facilitate longer-term engagement of processing resources after the redirection of attention has occurred (Posner & Rothbart, 2007). Evidence for such executive processes, beyond transient activities related to the shifting operations, have been found in visuospatial (Liu, Slotnick, Serences, & Yantis, 2003; Yantis et al., 2002) and visual abstract task-switching fMRI studies (Braver, Reynolds, & Donaldson, 2003). Behavioral visuospatial studies, in turn, suggest that, whereas the facilitating effects of stimulus-driven orienting occur in less than 100 ms (Klein, 2000; Posner & Cohen, 1984), the effects of endogenous attention shifting may evolve more gradually, with the focus of attention sharpening and detection performance improving within the first second after orienting (Cheal & Lyon, 1991; Shepherd & Muller, 1989). In the auditory system, neurophysiological animal models have shown analogous top-down effects that develop up to several seconds after orienting (Fritz, Elhilali, & Shamma, 2005). However, despite the importance of the temporal information on sound processing, the distinction between transient and sustained attention processes is not yet fully known in the auditory domain.
In humans, non-invasive estimates of transient and sustained control processes could be achieved with magnetoencephalography (MEG) and EEG, which as direct measures of neural activity provide much better temporal resolution than fMRI, the prevailing technology in contemporary attention studies. Pioneering advances have been achieved using trial-averaged MEG/EEG event-related fields/potentials (ERF/ERP). ERF/ERPs have been utilized to investigate attention shifting with visual (Hopf & Mangun, 2000; Nobre, Sebestyen, & Miniussi, 2000; Rushworth, Passingham, & Nobre, 2002) and auditory paradigms (Alho et al., 1998; Escera, Alho, Winkler, & Näätänen, 1998; Näätänen, 1992; Salmi, Rinne, Degerman, & Alho, 2007; Schröger & Wolff, 1998). One important limitation of ERP/ERF analyses is, however, that endogenous processes are often relatively loosely phase locked to external triggers and, thus, particularly higher-frequency oscillations related to such processes are easily canceled out from trial-averaged responses (Tallon-Baudry & Bertrand, 1999). Additional information of such endogenous processes could be obtained by analyses of neuronal oscillations, by analyzing either transient oscillatory processes related to specific stimuli/events or sustained oscillatory modulations occurring over longer periods of time. Further, examining oscillations occurring at distinct frequency ranges, such as the theta (4–7 Hz), alpha (8–15 Hz), beta (15–30 Hz), and gamma bands (30–100 Hz), may help distinguish between different types of attention effects within anatomically overlapping networks (Buschman & Miller, 2007; Fan et al., 2007; Schroeder & Lakatos, 2009).
In previous studies (Fries, Reynolds, Rorie, & Desimone, 2001; Jensen, Kaiser, & Lachaux, 2007), tasks requiring endogenous effort, including voluntary orienting (Fan et al., 2007; Landau, Esterman, Robertson, Bentin, & Prinzmetal, 2007), have been most consistently associated with oscillations at the gamma band. Neurophysiological animal models suggest that gamma synchronization increases in local neuronal networks representing attended features (Bichot, Rossi, & Desimone, 2005; Fries et al., 2001). In non-invasive MEG or EEG measurements, this is detectable as increases of transient gamma-band responses to attended auditory stimuli (Kaiser, Hertrich, Ackermann, & Lutzenberger, 2006; Mulert et al., 2007; Tiitinen et al., 1993). Sustained increases of frontotemporal MEG gamma band activity, as potentially reflecting ongoing endogenous cognitive processing, have, in turn, been found during auditory pattern memory maintenance (Kaiser, Ripper, Birbaumer, & Lutzenberger, 2003), analogously to similar effects identified in the visual system (Tallon-Baudry, Bertrand, Peronnet, & Pernier, 1998). Interestingly, recent intracranial EEG studies in humans have suggested that endogenous attention is reflected by sustained broadband gamma-band activity in the dorsolateral prefrontal cortex (DLPFC), medial frontal cortex, superior parietal cortex, and in the frontal eye fields (FEF) (Ossandon et al., 2012). Intracranial EEG studies in human visual cortices have provided evidence for sustained broadband gamma activities that are modulated by selective attention (Tallon-Baudry, Bertrand, Henaff, Isnard, & Fischer, 2005), consistent with similar results obtained in non-invasive MEG measurements (Kahlbrock, Butz, May, & Schnitzler, 2012). Based on these findings, one might expect stronger sustained gamma patterns after voluntary engagement of attention than transient involuntary orienting.
The associations between attention and beta-band oscillations are not yet as clear as those related to gamma oscillations. It has been proposed that beta-band activities reflect less demanding control processes than gamma-band responses (Engel & Fries, 2010). However, there is human EEG evidence that tasks requiring enhanced top-down processing are coupled with increased beta activity (Kaminski, Brzezicka, Gola, & Wrobel, 2012). Neurophysiological studies in non-human primates and cats have shown sustained beta synchronization patterns during attentional expectancy of stimuli (de Oliveira, Thiele, & Hoffmann, 1997; Montaron, Bouyer, & Rougeul-Buser, 1979; Roelfsema, Engel, König, & Singer, 1997). Beta-band modulations have been linked to increased endogenous processing in the human auditory system as well (Iversen, Repp, & Patel, 2009), but their exact role in auditory attention is still relatively poorly known.
While the higher-frequency beta and gamma oscillations may relate to increased processing, it is generally though that enhancement of alpha rhythms reflects disengagement of task-irrelevant cortex areas (Adrian & Matthews, 1934; Cooper, Croft, Dominey, Burgess, & Gruzelier, 2003; Klimesch, Sauseng, & Hanslmayr, 2007; Pfurtscheller, 2003). Visuospatial attention studies have shown that alpha activity increases in cortical areas representing task-irrelevant aspects of the visual field (Worden, Foxe, Wang, & Simpson, 2000) and that such effects correlate with the subjects’ ability to ignore irrelevant stimuli (Händel, Haarmeier, & Jensen, 2011). Parieto-occipital visual alpha oscillations are also strongly affected by crossmodal effects by auditory attention (Ahveninen et al., 2012; Fu et al., 2001). These crossmodal effects include lateralized alpha increases in parieto-occipital areas ipsilateral to the attended auditory field (Banerjee, Snyder, Molholm, & Foxe, 2011; Thorpe, D’Zmura, & Srinivasan, 2012). However, identifying lateralized alpha-inhibition effects in auditory cortices, per se, may be more difficult because, although sounds presented to one ear elicit strongest activations in the contralateral auditory pathway (Langers, van Dijk, & Backes, 2005; Virtanen, Ahveninen, Ilmoniemi, Näätänen, & Pekkonen, 1998), a significant amount of information ascends also directly to the ipsilateral hemisphere (see, however, also Gomez-Ramirez et al., 2011; Müller & Weisz 2012).
In the human cortex, attentional oscillatory activities may also occur at the theta range. It has been proposed that, while higher-frequency synchronization reflects local integration, inter-regional phase locking at the theta range supports longer-distance functional coupling in the brain (Canolty et al., 2006; Doesburg, Green, McDonald, & Ward, 2012; von Stein, Chiang, & König, 2000). The power of frontocentral cortical theta, presumably originating from MFC regions (Wang, 2010), strengthens with increased allocation of attentional resources (Sauseng, Hoppe, Klimesch, Gerloff, & Hummel, 2007) and enlarged working memory load (Gevins, Smith, McEvoy, & Yu, 1997; Jensen & Tesche, 2002). Based on these findings, one might expect increased MFC theta during engagement of auditory attention.
Going beyond comparisons of activation-magnitude differences provided by methods such as fMRI, examining neuronal oscillatory activities at different frequency bands can help make more specific inferences about the role of activated areas. However, achieving anatomically accurate non-invasive MEG/EEG estimates of oscillatory activities during auditory attention has been limited by the ill-posed electromagnetic inverse problem. Employing constraints that reduce the number of potential solutions mitigates this problem. Because MEG and EEG activities are mainly generated within cerebral gray matter, the source locations can be restricted to the cortical mantle derived from anatomical MRI (Dale & Sereno, 1993). Additional improvements can be achieved by combining the complementary information provided by simultaneously measured MEG and EEG (Sharon, Hämäläinen, Tootell, Halgren, & Belliveau, 2007). MEG/EEG inverse solution can be constrained even further by fMRI information on blood-oxygen-level dependent (BOLD) changes within the gray matter (Dale et al., 2000), which have been shown to correlate with the post-synaptic neuronal events that also generate the MEG/EEG signals (Logothetis, 2002).
Here, we therefore utilized a combination of MEG/EEG and fMRI to investigate orienting and engagement of auditory attention, using a cued dichotic-listening paradigm modified from classic visuospatial (Posner, 1980), auditory-spatial selective attention (Hillyard, Hink, Schwent, & Picton, 1973; Näätänen, 1992), and auditory involuntary attention-shifting studies (Escera et al., 1998; Schröger & Wolff, 1998). We presumed that exogenous and endogenous stages of orienting and engagement of attention would be reflected by differential transient and sustained oscillatory power changes. We specifically hypothesized that voluntary allocation of attention, subsequent to cued orienting, would result in power increases of high-frequency oscillations in frontoparietal regions, while inhibition of task-irrelevant processes was expected to result in lateralized alpha modulations in posterior cortex.
During separate fMRI and MEG/EEG sessions, subjects (N=16, mean±SD age = 23±5 years, 8 females) were presented with randomly ordered 10-s sound dichotic-listening trials (Figure 1). During each trial, subjects were instructed to look at a fixation mark at the center of a MRI or MEG compatible video display, and wait for a cue (250-ms buzzer sound) delivered to the ear where a subsequent target (50-ms tone with 800 and 1500-Hz harmonics) was likely to occur. Upon hearing the cue, the subjects were advised to shift their attention accordingly (without shifting their gaze), pay close attention to the tones presented to the designated ear, and press a button with the right index finger as rapidly as possible after hearing the target. To maximize the attention effects (Hillyard et al., 1973; Näätänen, 1992; Woldorff, Hackley, & Hillyard, 1991), we increased the serial load by embedding the cues and targets amongst pure-tone “standards” (duration 50 ms, 5-ms ramps) presented randomly to the right (800 Hz) or left ear (1500 Hz). The subjects were instructed to discriminate changes in the timbre (a “thickening” of the sound) in the task-relevant standard-sound sequence, but they were not told that the targets were actually similar in both ear sequences. According to previous studies (Alho et al., 1998; Knight, 1984), strong event-related MEG/EEG responses related to involuntary auditory orienting can be evoked by physically varying “novel” sounds. To investigate oscillatory networks underlying involuntary attention shifting, in 20% of the trials, target was therefore replaced by task-irrelevant novel sound presented opposite to the cued ear. The novel sounds consisted of eight spectrotemporally complex environmental and synthetic sounds whose peak intensities, onset rise times, and perceived loudness, as well as their grand-average time envelope, were made as close to the cues as possible. The design also included 20% of “catch trials” with the cue and pure-tone standards, but no target. Only pure tones (no cue, novel, or target) were presented in 20% of trials. The MEG/EEG session, consisting of two 37-minute runs (220 trials/run), lasted approximately 2.5–3 hours. The fMRI session, including three 23-minute runs (136 trials/run) lasted 2 hours with preparations and training. During each task session, there were an equal number of right and left ear events (cues, targets, novels, and standards). The order of sessions was randomized. Additionally, a separate ten-minute behavioral experiment (N=10, 4 females, age 22–43 years) was conducted to test whether the spatial cueing indeed produced significant performance benefits. In this control experiment, we replaced 50% of the novel sounds with a target sound opposite to the cued ear (“invalidly cued target”).
In all sessions, sound stimuli were presented at 55 dB over the subjective hearing threshold, tested individually at the beginning of each session for each ear. At 7.82 s after the trial onset, subjects heard the sound of 2.18-s fMRI volume acquisition, or during MEG/EEG a 2.18-s binaural scan-sound recording, signaling that the trial had ended. In other words, using it as a controlled alerting stimulus minimized the confounding effects of fMRI acquisition noise. The fMRI scanner sound (along with the buzzer cues and novel sounds) also acted as an interruption signal, to prevent the buildup of frequency-based “streaming” (which according to previous studies typically takes 5–10 s) (Bregman, 1978). In each trial, a total of 13 auditory stimuli were presented starting 2.3 s after the onset of preceding scan/simulation and ending on average 1.3 s before the next scan. Within the sequence of 13 sounds, the stimulus onset asynchrony (SOA) was 350–750 ms, varied quasi randomly such that there was always at least 650 ms silence after each cue, target, and novel. The jittering of SOA was presumed to help avoid omission-response confounds. During fMRI, three silent baseline trials occurred after every 6 active trials (i.e., a mixed blocked/event-related design was utilized). A standardized computerized approach taking about 5 min was utilized to teach the task to the subjects before the scanning. In subsequent analyses, individual trials with target-detection responses beyond the subject’s mean±2SD reaction time were considered outliers. Of an initial cohort of 20 subjects, one subject was excluded from the final MEG/EEG/fMRI analyses because of an incapability to perform the tasks and three other subjects for technical reasons.
Human subjects’ approval was obtained and voluntary consents were signed before each measurement. 306-channel MEG (Elekta-Neuromag, Helsinki, Finland) and 74-channel EEG data were recorded simultaneously (600 samples/s, passband 0.01-192 Hz) in a magnetically shielded room. Common-average reference was utilized for all analyses of EEG data. The position of the head relative to the sensor array was monitored continuously using four Head-Position Indicator (HPI) coils attached to the scalp. Electro-oculogram (EOG) was also recorded to monitor eye artifacts. Whole-head 3T fMRI was acquired in a separate session using a 32-channel coil (Siemens TimTrio, Erlagen, Germany). To circumvent response contamination by scanner noise, we used a sparse-sampling gradient-echo BOLD sequence (TR/TE = 10,000/30 ms, 7.82 s silent period between acquisitions, flip angle 90°, FOV 192 mm) with 36 axial slices aligned along the anterior-posterior commissure line (3-mm slices, including 0.75-mm gap, 3×3 mm2 in-plane resolution), with the coolant pump switched off. T1-weighted anatomical images were obtained for combining anatomical and functional data using a multi-echo MPRAGE pulse sequence (TR=2510 ms; 4 echoes with TEs = 1.64 ms, 3.5 ms, 5.36 ms, 7.22 ms; 176 sagittal slices with 1×1×1 mm3 voxels, 256×256 mm2 matrix; flip angle = 7°). A field mapping sequence (TR= 500 ms, flip angle 55°; TE1=2.83 ms, TE2=5.29 ms) with similar slice and voxel parameters to the EPI sequence was utilized to obtain phase and magnitude maps utilized for unwarping of B0 distortions of the functional data.
Neuronal bases of auditory attention shifting were studied using an MEG/EEG/fMRI approach, analogous to, e.g., (Ahveninen et al., 2011). External MEG noise was suppressed and subject movements, estimated continuously at 200-ms intervals, were compensated for using the signal-space separation method (Taulu, Simola, & Kajola, 2005) (Maxfilter, Elekta-Neuromag, Helsinki, Finland). The MEG/EEG data were then downsampled (300 samples/s, passband 0.5-100 Hz). Epochs coinciding with over 150 μV EOG, 100 μV EEG, or 2000 fT/cm MEG sensor-data changes were excluded from further analyses. Some studies suggest that ocular muscle activity may become time locked to sound presentations (Yuval-Greenberg & Deouell, 2011). We therefore used the signal-space projection (SSP), calculated around the time points of artifacts, for removing MEG/EEG field patterns originating from the eyes.
To calculate fMRI-guided depth-weighted 2 minimum-norm estimates (MNE) (Hämäläinen, Hari, Ilmoniemi, Knuutila, & Lounasmaa, 1993; Lin, Belliveau, Dale, & Hämäläinen, 2006), the information from structural segmentation of the individual MRIs and the MEG sensor and EEG electrode locations were used to compute the forward solutions for all putative source locations in the cortex using a three-compartment boundary element model (Hämäläinen et al., 1993). The shapes of the surfaces separating the scalp, skull, and brain compartments were determined from the anatomical MRI data using FreeSurfer 5.0 (http://surfer.nmr.mgh.harvard.edu/). For whole-brain inverse computations, cortical surfaces extracted with FreeSurfer were decimated to ~1,000 vertices per hemisphere. The individual forward solutions for current dipoles placed at these vertices comprised the columns of the gain matrix (A). A noise covariance matrix (C) was estimated from the raw MEG/EEG data during a 20-200 ms pre-stimulus baseline. These two matrices, along with the source covariance matrix R, were used to calculate the MNE inverse operator W = RAT (ARAT + C)−1.
To obtain an fMRI prior, i.e., an fMRI-weighted source covariance matrix, each vertex point in the cortical surface was assigned an fMRI significance value using FreeSurfer-FSFAST 5.0. Individual functional volumes were motion corrected, unwarped, coregistered with each subject’s structural MRI, intensity normalized, resampled into cortical surface space, smoothed using a 2-dimensional Gaussian kernel with an FWHM of 5 mm, and entered into a general-linear model (GLM) with the task conditions as explanatory variables. The fMRI weighting was set to 90%. That is, diagonal elements in R corresponding to vertices with below-threshold (P < 0.01, all conditions vs. baseline) significance values were multiplied by 0.1 (group fMRI result was used as a prior in 3 subjects). The entire MEG/EEG raw data time series at each time point were multiplied by the inverse operator W and noise normalized to yield the estimated source activity as a function of time (Lin et al., 2006).
Accepted MEG/EEG/fMRI trial epochs were analyzed using the FieldTrip toolbox (http://www.ru.nl/fcdonders/fieldtrip) in Matlab 7.11 (Mathworks, Natick, MA), for each vertex of the cortical surface. Power analyses were performed using a FFT taper approach with sliding time windows. At 4–30 Hz, an adaptive time-window of 3 cycles and a Hanning taper was used. At 30–100 Hz, a fixed 0.2-s time window and 10-Hz frequency smoothing were used, resulting in three orthogonal Slepian tapers being applied to the sliding time window (Percival & Walden, 1993). Dynamic power estimates were then calculated within a 2-s period (including a 500 ms pre-stimulus baseline) relative to the stimulus onset at each vertex location with neighboring time points temporally segregated by 0.05 s. The resulting time-frequency estimates, particularly when converted to a standard brain representation, would have yielded extremely large data sets. Therefore, the whole-cortex power estimates were pooled to a smaller number of time-frequency regions of interest. The available frequency band (4–100 Hz) was divided to consecutive one-octave wide sub ranges, which corresponded to theta (4–7.5 Hz), alpha (7.5–15 Hz), beta (15–30 Hz), lower gamma (30–60 Hz), and higher gamma (60–100 Hz) bands (i.e., the highest band was less than one octave). For each bands, a power estimate, divided by the pre-stimulus baseline and base-10 logarithm normalized, was calculated within three geometrically increasing post-stimulus time windows of 0–200, 200–600, and 600–1400 ms, separately for the cues and novels. The resulting cortical power estimates were then normalized to the Freesurfer standard brain representation (Fischl, Sereno, Tootell, & Dale, 1999). Note that estimates of responses to the targets, per se, were analyzed only cursorily because of the presence of motor activity.
Analyses of cortical MEG/EEG power estimates were conducted at each vertex point of the cortical surface, in both hemispheres, using a GLM masked at locations where the group fMRI omnibus F-test contrast (i.e., including both activations and deactivations related to any individual regressor) was significant at P<0.05 (Freesurfer/FSFAST 5.0). To control for multiple comparisons, the resulting statistical estimates were tested against an empirical null distribution of maximum cluster size across 10,000 iterations with a vertex-wise threshold of P<0.05 and cluster-forming threshold of P<0.05, yielding clusters corrected for multiple comparisons across the surface.
Three kinds of statistical comparisons were conducted. (a) We examined correlations between cortical oscillatory power changes after cued orienting, i.e., allocation of attention before target presentation, and subsequent behavioral performance. To reduce the dimensionality of this analysis, post-cue cortical power estimates were correlated with an inverse efficiency score (IES), which reflects the RT divided by the hit rate (Townsend & Ashby, 1978). That is, a higher value indicates worse performance, similarly to RT. The IES has been previously utilized in behavioral and ERP studies of visual orienting (Akhtar & Enns, 1989; Kennett, Eimer, Spence, & Driver, 2001) as a measure of processing efficiency that discounts possible criterion shifts or speed/accuracy tradeoffs. Before the analysis, the IES values were logarithm normalized; according to a Jarque-Bara test, the resulting behavioral correlate was normally distributed. (b) To explore lateralization of power changes during audiospatial attention, we compared cortical power estimates to the left-ear vs. right ear cues. (c) Finally, we compared power estimates to the cues vs. novels, with the ear-specific responses pooled together. To control for possible gradual baseline power shifts within trials, comparisons across cues and novels were normalized based on standards occurring in catch trials in the same sequential positions than the events of interest.
Behaviorally, the subjects’ mean±SD reaction times (RT) were 466±100 ms and hit rates (HR) 90±10% during MEG/EEG. These results did not differ significantly from behavioral observations made during fMRI measurements (mean±SD: RT=495±48 ms, HR=90±8% during fMRI). To verify the beneficial effect of cues in directing attention to subsequent targets, we conducted a separate behavioral control analysis (N=10, 4 females, age 22–43 years) where a portion of targets were presented in the location opposite to the cued ear. The result demonstrated that spatial cueing significantly (t(9)=−4.17, P<0.01, paired t test) speeded-up target discrimination, as compared to trials where the target occurred in the ear opposite of the cue (mean±SD reaction times 463±68 vs. 555±105 ms to validly vs. invalidly cued targets, respectively). Notably, the reaction times to invalidly cued targets of these subjects were also, significantly longer (t(26)=2.21, P<0.05, independent-samples t test) than the reaction times to validly cued targets during MEG/EEG, suggesting that subjects complied with the task instruction and also benefited behaviorally from the cueing.
To examine engagement of auditory attention after cued orienting, oscillatory activities following the cues, and preceding the target presentation, were correlated with the behavioral measure IES, which represents RT normalized by HR (i.e., the unit is ms, smaller IES values represent fast and accurate performance) (Figure 2). These analyses showed significant (cluster-based Monte Carlo simulation test) negative correlations, with higher power predicting better behavioral performance, between gamma power and IES at 200–600 ms and 600-1400 ms post-cue time windows. These correlations started earlier in the right hemisphere, spreading from temporoparietal junction (TPJ), auditory cortices, anterior insula, and inferior frontal cortex (IFC) areas to the right FEF. In the left hemisphere, the cluster of significant correlations also extended to postcentral and posterior parietal regions, as well as to the anterior temporal cortex areas. The details of cluster statistics are shown in Table 1.
To examine attentional lateralization of oscillatory power changes, we compared estimates to the cues presented to the left and right ears (Figure 3). A double dissociation of cortical power changes as a function of the direction of attention emerged at the alpha range, at 200–600 ms after the cue onset. In the left posterior parietal and lateral occipital regions, a significant (cluster-based Monte Carlo simulation test) increase of alpha power was observed for the left vs. right cues. In contrast, in the corresponding right parietal occipital areas, the alpha power was significantly decreased after left vs. right ear cues. These effects suggest that, in both hemispheres, the parieto-occipital visual cortex alpha power is significantly stronger when attention is directed to the ipsilateral than contralateral ear. As shown by the data in Figure 3, the ipsilateral power enhancement effect was stronger, longer lasting, and anatomically more widespread in the right hemisphere, with significant effects observed also in the medial parieto-occipital regions (the precuneus, retrosplenial cortex) and in lateral inferior parietal cortices, and TPJ. Further, in the right hemisphere, the effect also spread into the surrounding frequency ranges, consistent with, e.g., recent macaque studies that showed occipital oscillatory modulations that were similar at the alpha and low beta ranges (Zhang, Wang, Bressler, Chen, & Ding, 2008). Finally, a significant increase of gamma power was observed in the right inferior temporal regions at 600-1400 ms after left vs. right ear cues.
In addition to the lateralized alpha power changes, there was also evidence of a significant modulation of theta power at 0–200 ms after cues presented to the left vs. right ears in the left sensorimotor regions, near the representation of the right hand, and in left the retrosplenial complex and left inferior temporal cortex. In other words, in these areas, the theta power was significantly increased when attention was directed to the right, that is, the contralateral hemisphere (or, alternatively, decreased when attention was directed to the left hemisphere). The details of cluster statistics are shown in Table 2.
There was a significant decrease of alpha power in the right auditory cortex and TPJ regions at 0–200 ms after cues vs. novels (Figure 4). This effect was followed by an increase of alpha power in the right inferior parietal cortex, right medial parieto-occipital cortices (precuneus, retrosplenial cortex), left retrosplenial cortex, bilateral inferior temporal cortices, and in the right auditory cortex.
At 600–1400 ms after the onsets of cues vs. novels, there was a significant increase of theta activity in the right MFC region, including the anterior cingulate cortex (ACC), which coincided with an increase of beta power in the right DLPFC, and anterior insula, and a bilateral increase of lower gamma band activity in medial parieto-occipital and posterior cingulate cortices. These more sustained longer-term oscillatory differences after cues vs. novels might be associated with voluntary engagement of auditory attention, after the initial reorienting triggered by the cue. There was also evidence of a sustained decrease of mu-rhythm power (alpha/beta ranges) 600–1400 ms after cues vs. novels in the left sensorimotor cortices (~right hand area), possibly reflecting differences in motor preparation. The details of cluster statistics are shown in Table 3.
We utilized fMRI-weighted MEG/EEG source estimates to examine changes in cortical oscillations during cued orienting and subsequent goal-directed engagement of auditory attention. We specifically focused on effects occurring beyond the transient stimulus-driven processes, presumed to reflect endogenous attention. Gamma power increases after cued attention, emerging within auditory cortices, TPJ, anterior insula, IFC, and the right FEF, were associated with fast and accurate discrimination of subsequent targets. These correlation patterns evolved 200–1400 ms after cued orienting, which corresponds to the hypothesized time course of endogenous engagement of attention after the initial reorienting process. Engagement of attention to one ear resulted in a stronger increase of alpha activity in the ipsilateral than contralateral parieto-occipital cortex areas at 200–600 ms after cue onset, suggesting crossmodal modulation of dorsal visual pathways by audiospatial attention. Our results also suggest sustained increases of MFC theta, DLPFC and IFC/anterior insula beta, and medial parieto-occipital and posterior cingulate cortex gamma power (30–60 Hz) during cued vs. novelty-triggered auditory attention.
In addition to the initial redirection of focus, cognitive control of auditory attention presumably involves a variety of endogenous processes associated with disengagement from previous tasks and engagement of heightened top-down control on the new task-relevant activity (Braver et al., 2003; Mayr & Keele, 2000; Rushworth et al., 2002). In current neurophysiological literature (Engel & Fries, 2010; Jensen et al., 2007), these kinds of control processes have been associated with gamma band activities. Consistent with this prediction, we found significant correlations between post-cue/pre-target gamma activities, which followed the initial cue-triggered orienting, and subsequent target discrimination performance in frontoparietal and temporal cortices. The significant correlations emerged > 200 ms after the shifting cue, which is in line with visuospatial behavioral results (Cheal & Lyon, 1991; Shepherd & Muller, 1989) suggesting gradual emergence of attention within the first second after reorienting to a new focus. The spectral distribution of the present correlation patterns, extending across the lower and higher gamma bands, is consistent with recent human intracranial EEG evidence of sustained power increases that occur very broadly across the gamma band during voluntary endogenous attention (Ossandon et al., 2012). The gamma correlations in the auditory cortex are, in turn, in agreement with sustained broadband gamma increases by selective attention that have been reported in intracranial EEG (Tallon-Baudry et al., 2005) and non-invasive MEG measurements (Kahlbrock et al., 2012) of human visual cortex activity.
Significant gamma correlation effects were found in frontoinsular (IFC/anterior insula, FEF) regions previously associated with shifting or controlling spatial attention. The behavioral correlations at the higher gamma range in the right FEF are consistent with previous literature associating this area with control of spatial attention (Mesulam, 1981), voluntary shifts of attention (Kincade, Abrams, Astafiev, Shulman, & Corbetta, 2005), and preparation for motor activity after cued attention shifting (Wise, Weinrich, & Mauritz, 1983). Similarly, the recent intracranial EEG study of Ossandon et al. (2012) reported broadband gamma power increases in FEF during endogenous engagement of attention in humans. Areas roughly corresponding to IFC/anterior insula have, in turn, been linked to voluntary task shifting (Wager, Jonides, & Reading, 2004) and sustained effort-related control of auditory processing (Altmann, Henning, Doring, & Kaiser, 2008; Falkenberg, Specht, & Westerhausen, 2011). The behavioral correlations in auditory cortices, in turn, support previous findings that increased gamma power in sensory cortices correlates with improved behavioral response times (Rose, Sommer, & Buchel, 2006) and accuracy (Kaiser et al., 2006). During the last time window of 600–1400 ms, the gamma correlation effect might also have been partially contributed by attentional modulations of oscillatory responses elicited by standard stimuli, given that transient auditory-cortex gamma activities may be enhanced by selective attention (Ahveninen et al., 2000; Kaiser et al., 2006; Mulert et al., 2007; Tiitinen et al., 1993). In both hemispheres, significant correlations between increased gamma power and enhanced behavioral performance emerged also in areas near TPJ. This effect somewhat differs from the predictions of the influential model that distinguishes between dorsal voluntary and ventral stimulus-driven attention networks (Corbetta & Shulman, 2002). However, there are also several studies that have associated TPJ with top-down auditory processes (Alain, He, & Grady, 2008; Salmi et al., 2009). For example, a recent time-domain MEG study (Larson & Lee, 2013) suggested correlations between activations of right TPJ at 300–600 ms after visually presented shifting cues and behavioral indices of auditory attention. Taken together, the present correlation patterns between post-cue/pre-target gamma-power increases and behavioral discrimination performance could therefore reflect voluntary engagement of auditory attention, which follows the initial cue-triggered orienting process.
The main effects in comparisons between cued and novelty triggered dynamic oscillatory responses with ear-specific attention conditions pooled together, as well as the gamma-band correlation patterns, suggest stronger effects in the right than left frontoparietal cortex regions. Previous PET studies have shown that, irrespective to the attended ear, audiospatial attention is dominated by a network of right-hemispheric frontoparietal areas (Zatorre, Mondor, & Evans, 1999). Similarly, a recent fMRI study comparing exogenous and endogenous audiospatial modulations suggested that highest-order auditory attention is most likely controlled by right-hemispheric frontoparietal networks (Teshiba et al., 2012). Our result is also consistent with previous studies suggesting that the right hemisphere dominates higher-order processing of auditory space (At, Spierer, & Clarke, 2011; Spierer, Bellmann-Thiran, Maeder, Murray, & Clarke, 2009).
Here, we also found evidence for lateralized modulations of posterior alpha activities as a function of the direction of auditory attention. Alpha power increased in parieto-occipital cortices ipsilateral to the cued ear. This effect, which resembles inter-hemispheric alpha-power modulations during visuospatial attention (Worden et al., 2000), is consistent with lateralized parietal-occipital alpha modulations found in other recent auditory attention studies (Banerjee et al., 2011; Thorpe et al., 2012). The modulation of alpha activities by auditory attention was centered in the superior parietal cortex, near the border of parietal and occipital lobes (Desikan et al., 2006). These modulations, further, overlapped with the areas V3 and V7 in both hemispheres and extended to the precuneus, inferior parietal cortex, and intraparietal sulcus in the right hemisphere. From the perspective of the alpha-inhibition theory, the present lateralized alpha modulations could, thus, be interpreted to reflect crossmodal suppression of dorsal visual “where” pathway representing the task-irrelevant spatial hemifield.
Consistent with the overall pattern suggesting more enhanced attention effects in the right than left hemisphere, the alpha suppression effect was stronger, longer lasting, and anatomically more widespread in the right than in the left hemisphere. At the same time, in the right hemisphere, the contralateral suppression effects spread also to the adjacent frequency bands. Evidence for analogous broader band power-suppression effects have been found in both human auditory EEG studies (Thorpe et al., 2012), which suggested lateralized theta, alpha, and beta power increases in the hemisphere ipsilateral to the attended auditory direction. The spreading of the present contralateral power suppression effect from alpha to beta bands is also consistent with a recent local-field potential measurement in macaque visual cortex (Zhang et al., 2008). Finally, there was also evidence for lateralized increases of lower gamma band power in inferior temporal visual cortices contralaterally to the attended ear, 600–1400 ms after cue. Although this effect is, in principle, very consistent with the alpha modulations (i.e., alpha decrease coupled with gamma increase in the same hemisphere), further studies are needed to verify the exact interpretation of this effect, as no corresponding modulations were observed in the opposite hemisphere.
There was no clear evidence of lateralized attention effects at the level of auditory cortex. The lack of these effects may be explained by the fact that, although strongest stimulus-evoked responses to monaural sounds are observed in the hemisphere contralateral to the stimulated ear, human auditory cortices also receive significant inputs from the ipsilateral ear. Previous MEG studies have shown that selective attention increases auditory responses to monaural sounds similarly in the hemisphere ipsilateral and contralateral to the task-relevant ear (Fujiwara, Nagamine, Imai, Tanaka, & Shibasaki, 1998), which contradicts a hypothesis of widespread inhibition of the ipsilateral auditory cortex by ear-specific selective attention. Previous studies have also shown that the contralateral dominance of ascending auditory pathways is much clearer in the primary auditory cortex areas than in non-primary areas most sensitive to attentional modulations (Woods et al., 2009). It should be also noted that, although certain positron emission tomography (PET) studies have provided evidence of lateralized auditory-cortex attention effects (Alho et al., 2003), many previous fMRI (Salmi et al., 2009; Shomstein & Yantis, 2006) and PET (Zatorre et al., 1999) studies have failed to support this finding.
Finally, there was also evidence of an early modulation of theta power in the sensorimotor areas representing the right hand to right vs. left ear cues. One potential explanation is a slight enhancement of auditory crossmodal effect within the left hemisphere, given the spatial congruency of the cue and responding hand (or, alternatively, a decrease of theta when the laterality of attention and responding hand is incongruent). Further studies are needed to verify the functional significance of this phenomenon.
We also compared the estimates to cues and novel sounds, assuming that this comparison would reveal differences related to voluntary vs. stimulus driven attention, particularly after the time window of initial stimulus processing (0–200 ms). These comparisons demonstrated a sustained increase of right MFC theta power at 600–1400 ms after cues vs. novels. This result is line with previous findings that theta power strengthens with increased need for top-down auditory processing (Gevins et al., 1997; Jensen & Tesche, 2002; Sauseng et al., 2007). The present theta increase was centered in the right ACC, an area that has previously been associated with attention and cognitive control (Wager et al., 2005; Weissman, Gopalakrishnan, Hazlett, & Woldorff, 2005). It is thus possible this theta effect is related to endogenous allocation of auditory attention after cued orienting.
At the alpha range, we observed power decreases in the right auditory cortex and TPJ at 0–200 ms after cues vs. novels, which were at 200–600 ms followed by power increases in the medial parieto-occipital cortices (right precuneus, bilateral retrosplenial cortex), bilateral inferior temporal cortices, and areas located inferiorly/posteriorly to the right auditory cortex areas (superior temporal sulcus, STS; right inferior parietal cortex). Based on the assumptions of alpha-inhibition theory, these effects could reflect increased activation of the right auditory cortex and TPJ after cued vs. novelty-triggered attention, followed by crossmodal inhibition of posterior and medial parietal cortices. The latter effect is consistent with previously reported increases of posterior “visual” alpha power during engagement of auditory attention (Ahveninen et al., 2012; Foxe, Simpson, & Ahlfors, 1998; Fu et al., 2001). Taken together, it thus seems that engagement of auditory attention increases alpha power in visual cortex areas. However, based on the lateralization effects discussed above, this crossmodal inhibition effect could even more prominent in regions ipsilateral than contralateral to the attended ear.
The comparisons between cued and novelty-triggered attention effects also revealed significant differences at the beta range that could be potentially related to longer-term engagement of endogenous auditory attention. Significant sustained increases of beta power were observed in the right DLPFC, in the right anterior insula, and in the right IFC 600–1400 ms after cues vs. novels. In comparison to the well-documented links between gamma oscillations and top-down processing, the association of beta oscillations with attention and cognitive control has remained somewhat elusive (Engel & Fries, 2010). However, beta power increases have been previously observed during attentional expectancy of stimuli (de Oliveira et al., 1997; Montaron et al., 1979; Roelfsema et al., 1997), and also during increased endogenous processing of auditory stimuli (Iversen et al., 2009). It should be also noted that the right DLPFC and anterior insula/IFC, where the present sustained beta increases during cued vs. novelty-triggered attention originated, have previously been reported to be activated during top-down auditory attention tasks (Alain, Shen, Yu, & Grady, 2010; Huang, Belliveau, Tengshe, & Ahveninen, 2012; Rinne, Koistinen, Salonen, & Alho, 2009; Wu, Weissman, Roberts, & Woldorff, 2007; Zatorre et al., 1999).
Sustained power increases at 600–1400 ms to cues vs. novels were also observed at the lower gamma range (30–60 Hz), centered in the precuneus, retrosplenial cortex, and posterior cingulate cortices. These areas have been previously linked to spatial attention (Corbetta & Shulman, 2002; Grent-’t-Jong & Woldorff, 2007; Shomstein & Yantis, 2006; Small et al., 2003). Previous fMRI studies have also show that, of these areas, the posterior cingulate cortex is activated during anticipatory allocation of spatial attention (Small et al., 2003). Medial parietal cortex areas including the precuneus are, in turn, activated during both visual (Astafiev et al., 2003) and auditory (Shomstein & Yantis, 2006; Smith et al., 2010; Wu et al., 2007) spatial attention tasks. The significant increase of sustained gamma activity at 600–1400 ms after cues, as compared to novels, could thus reflect increased engagement of audiospatial attention. However, in the present study, correlations between gamma increases and behavioral performance measures were focused in lateral frontoparietal and temporal regions, while the correlations in medial parietal/posterior cingulate regions did not survive the cluster-based post-hoc correction.
Finally, the power comparisons between cued and novelty-triggered attention suggest sustained decreases of sensorimotor mu rhythms near the left hemisphere representation of the right hand, which was utilized for responding in the present study. These effects are consistent with previous human MEG observations obtained in tactile attention studies (van Ede, de Lange, & Maris, 2012), as well as motor MEG studies (Donner, Siegel, Fries, & Engel, 2009) that have suggested lateralized motor/premotor cortex beta-power decreases during preparation for contralateral hand movements.
A central question is whether our subjects indeed utilized spatial attention to perform the task. That is, analogously to previous dichotic listening studies, the non-target sound sequences were also separable based on their frequency (Hillyard et al., 1973; Näätänen, 1992; Woldorff et al., 1991). This may have, in theory, provided an alternative non-spatial streaming cue. However, according to the present behavioral control experiment, spatially valid vs. invalid cues improved performance, suggesting that spatial cueing (and attention) was beneficial. The interpretation that our subjects utilized spatial attention is also supported by the lateralized alpha effects to the left vs. right ear cues (Fig. 3), which are consistent with previous auditory-spatial attention studies (Banerjee et al., 2011; Thorpe et al., 2012). It is also worth noting that there were a number of factors that likely counteracted frequency (or pitch) based streaming in the present paradigm. First, frequency-based streaming does not occur instantly (Bregman, 1978), but can take for several seconds to evolve, depending on a variety of stimulus parameters. Here, the task was divided to 10-s trials, interrupted by the scanner noise stimulus, which according to previous studies results in a washout of streaming (Bregman, 1978). Further, in contrast to typical studies on binaural auditory grouping where subjects are presented with regularly alternating patterns or temporally consistent melody components (Darwin, 1997; Deutsch, 1979), here, the SOA was jittered and the order of dichotic presentation randomized. Within each trial, there were additional interruptions caused by the cues, targets, and novel sounds. It is therefore very unlikely that frequency-based streaming could have evolved during these trials, at least to a degree that would have made it a stronger cue than the spatial separation across the ears (for further details on comparisons between dichotic vs. frequency/pitch cues, see Näätänen, Porkka, Merisalo, & Ahtola, 1980; Woods, Alain, Rhodes, & Ogawa, 2001).
A potential limitation to the present study needing further experimental attention is that, because of technical challenges, it was not possible to include several control conditions in the present MEG/EEG paradigm. For example, it would have been beneficial to have the novel sound appear also in the task relevant channel, to make it possible to better control for the effect of the saliency of the novel sounds vs. the involuntary attention shifting effect. At the same time, due to the nature of novelty detection, there was no behavioral response associated with sounds following novels, to confirm that attention had indeed shifted to the stimuli. However, this concern is alleviated by numerous previous studies that demonstrate by using both psychophysical and neurophysiological measures that stimuli similar to the present novel sounds trigger very strong involuntary auditory attention shifting (see, e.g., Escera et al., 1998).
Although the present MEG/EEG source localization methods can separate functionally distinct anatomical areas at a sufficient accuracy (Sharon et al., 2007), their spatial resolution is not as good as that provided, for example, by fMRI. Even with current anatomical and functional constraints, the estimates are spatially distributed, which is why the present discussion has been concentrated on the overall statistical patterns. A limitation of MEG/EEG is also their limited capability to detect simultaneous synchronous activity on the opposite banks of sulci due to cancellation effects (Ahlfors et al., 2009). This limitation might, for example, explain why the present alpha lateralization effects were more prominent in the lateral than medial parieto-occipital cortices. At the same time, it is important to note that the time resolution of oscillatory analyses is limited. For example, at the lowest frequencies of 4 Hz, oscillatory power is being estimated by a three-cycle, i.e., 750-ms sliding window. In this case, a power estimate at any time point is, thus, affected by activities 375 ms before and after the centerpoint of the analysis window. However, because the sliding window is tapered, the results are strongly weighted towards the midpoint of the window.
Finally, the scope of the present study is limited to the power estimates. Future studies examining the functional connectivity patterns using cortico-cortical phase-locking analyses are needed to clarify the role of the different areas showing effects related to cued and novelty triggered attention. For example, Granger causality estimates (Lin et al., 2009) could be highly informative, to elucidate the internal driving relationships across the areas showing sustained gamma correlation patterns during post-cue engagement of attention.
Our results suggest distinct modulations of neuronal oscillations during cued auditory spatial attention and novelty-triggered orienting. Endogenous engagement of attention, subsequent to cue-triggered orienting, may be associated with sustained higher-frequency gamma activity in frontoparietal and temporal cortices. The brain regions associated with endogenous engagement of auditory attention include the bilateral TPJ, anterior insula, IFC, and the right FEF. As shown by the lateralized alpha power modulations, crossmodal inhibition of parieto-occipital visual cortices by auditory attention is stronger in the hemisphere ipsilateral than contralateral to the attended ear. Comparisons of dynamic power estimates during cued and novelty-triggered attention also suggested increased theta activity in medial frontal cortices, increased alpha in the medial and lateral parieto-occipital regions, and increased beta activity in the right DLPFC and IFC/anterior insula.
We thank An-Yi Hung, Natsuko Mori, Stephanie Rossi, Chinmayi Tengshe, and Nao Suzuki for their help. This work was supported by National Institutes of Health Awards R01MH083744, R21DC010060, R01HD040712, R01NS037462, 5R01EB009048, and P41RR14075. The research environment was supported by National Center for Research Resources Shared Instrumentation Grants S10RR014978, S10RR021110, S10RR019307, S10RR014798, and S10RR023401.