|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: JK IJ MS. Performed the experiments: JK. Analyzed the data: JK. Wrote the paper: JK IJ MS.
An experienced car mechanic can often deduce what's wrong with a car by carefully listening to the sound of the ailing engine, despite the presence of multiple sources of noise. Indeed, the ability to select task-relevant sounds for awareness, whilst ignoring irrelevant ones, constitutes one of the most fundamental of human faculties, but the underlying neural mechanisms have remained elusive. While most of the literature explains the neural basis of selective attention by means of an increase in neural gain, a number of papers propose enhancement in neural selectivity as an alternative or a complementary mechanism.
Here, to address the question whether pure gain increase alone can explain auditory selective attention in humans, we quantified the auditory cortex frequency selectivity in 20 healthy subjects by masking 1000-Hz tones by continuous noise masker with parametrically varying frequency notches around the tone frequency (i.e., a notched-noise masker). The task of the subjects was, in different conditions, to selectively attend to either occasionally occurring slight increments in tone frequency (1020 Hz), tones of slightly longer duration, or ignore the sounds. In line with previous studies, in the ignore condition, the global field power (GFP) of event-related brain responses at 100 ms from the stimulus onset to the 1000-Hz tones was suppressed as a function of the narrowing of the notch width. During the selective attention conditions, the suppressant effect of the noise notch width on GFP was decreased, but as a function significantly different from a multiplicative one expected on the basis of simple gain model of selective attention.
Our results suggest that auditory selective attention in humans cannot be explained by a gain model, where only the neural activity level is increased, but rather that selective attention additionally enhances auditory cortex frequency selectivity.
The neural basis of selective attention constitutes one of the most fundamental questions in cognitive neuroscience and psychology. Based on previous studies in sensory systems, two alternative neural mechanisms, multiplicative increase in neural gain – vs. enhanced feature selectivity of the neurons in sensory cortices –, have been suggested to underlie selective attention. Additionally, there are theories according to which selective attention activates neural populations separate from those processing the stimuli , .
When referring to increase in response gain in the literature, terms like ‘higher’ or ‘increased’ activity are typically used to describe the observed attentional enhancement in the modality-specific sensory cortices in human neuroimaging studies. Few studies so far have addressed the issue of whether this is simple gain change (amplification) in respective sensory areas  or whether there are some selective changes as well. For example, increase in activity level has been reported in functional Magnetic Resonance Imaging (fMRI) , , , , positron emission tomography (PET) , , , , electroencephalography (EEG) , , , , and magnetoencephalography (MEG) , , ,  studies in humans, but the issue of whether there is a ‘bias’ mechanism in addition to the attentional gain control has been only recently raised .
Typically auditory selective attention studies have employed a dichotic listening task, instructing subjects to listen to one ear and ignore the sounds coming from the other (‘attend towards’ vs. ‘attend away’). This way, selective attention has been found to modulate neurophysiological responses to tones 70–80 ms after the stimulus onset using EEG ,  and electrocorticography (ECoG) , and at later latency of 150 ms using MEG . Even earlier attentional modulation between 20–50 ms has been observed both in EEG and MEG , . Notably, the N100 response (N100m in MEG) at ~100 ms from stimulus onset, generated at least partly in the primary auditory cortex , , was increased in amplitude during selective attention in all of these studies. Similarly with PET, selectively attending to left and right-ear tones was found to increase the overall activity in the contralateral auditory cortices and in frontal cortex , while the visual control task increased signal measured from the occipital visual areas. However, the dichotic listening task has some features making inference about the neurophysiological attentional mechanisms problematic, for instance, because of interaction between left and right side of the brain during binaural vs. monaural listening (see, e.g., ).
When subjects are led to expect tones of a certain frequency, they tend to detect the expected frequency tones better than the ones with an unexpected frequency when using a continuous noise masker , . The frequency range where perception is enhanced, ‘attentional band’, resembles the critical band and auditory filter measures obtained psychophysically . However, when a bandpass-filtered noise burst is presented just before the target (‘gated noise masker’) the detection performance of unexpected tones is actually improved while the expected tone detection is worsened . This would suggest that priming the cue with a gated noise masker expands the detection template, whereas a continuous masker masks the frequency areas adjacent to the target tone frequency without any special interaction with the target tone detection. When using a continuous notched-noise masker, the detection threshold is lower at the frequencies within the notch, near the notch center , and again higher at frequencies residing on the outer edges of the noise masker, close to notch corners .
In contrast to the evidence favoring a gain-based model of selective attention, increase in neuronal selectivity  and shifts in receptive fields towards attended location have been reported in the visual cortex . Within the auditory modality, a recent combined fMRI and MEG study in humans suggested feature-specific enhancement of neural selectivity to phonetic vs. spatial features in the anterior secondary auditory cortex “what” vs. posterior secondary auditory cortex “where” processing pathways . Furthermore, recent studies in ferrets have suggested task-dependent modulation of spectrotemporal receptive fields (STRFs) of primary auditory cortex neurons during classical conditioning . Specifically, STRFs were characterized by playing the animals a number of spectrally complex auditory stimuli (‘temporally orthogonal ripple combinations’, TORCs) for 2.5 minutes and reverse-correlating the neural response with the TORC parameters. During an active tone detection task, the majority of STRFs were modulated in comparison to the passive STRF, enhancing the target frequency representation. The most common type of plastic receptive field modulation was excitatory field enhancement at the target frequency and lateral suppression that correlated with improved task performance , . Further, in one third of the cells showing facilitation the receptive field modification was found to be quickly reversible (i.e., the post-behavioral task passive STRF was highly similar to the pre-behavioral one, see also ), suggesting that these effects could be due to transient selective attention type of phenomena.
Here, we quantified auditory cortex frequency tuning during selective auditory attention to frequency vs. duration vs. an ignore sounds task in 20 healthy subjects by masking 1000-Hz tones with continuous white noise band-stop filtered to contain frequency notches  of parametrically varying width around 1000 Hz (Figure 1). When the notch is narrower, the masker edges are closer to the 1-kHz standard tone in the frequency axis. Due to the tonotopic organization of the human auditory system, starting already from the cochlea, this induces stronger frequency masking and makes the detection of both standard and target tones more difficult by decreasing the effective signal-to-noise ratio. Therefore, we expected gradually smaller neural responses with narrower notches together with poorer task performance. We were specifically interested in estimating how attention influences the shape of the function describing how neural response diminishes with narrower notches. We hypothesized that an increase in neuronal gain would result in multiplicative increase of auditory-cortex response amplitude as a function of the width of the notch, still keeping the basic shape of amplitude reduction function the same, whereas enhanced selectivity would result in a more level amplitude reduction as a function of the notch width (Figure 2). This latter hypothesis is based on assumption of enhanced frequency selectivity in the auditory system narrowing, in our experiment, the receptive field near the 1 kHz. Thus, frequency masking would have less effect on the neural level, requiring narrower frequency notches to mask out the same amount of the 1-kHz tone than with broader receptive fields. A combination of these two effects was hypothesized to occur in the case that increased neural gain and enhanced feature selectivity together explain selective attention.
Due to frequency masking, the amplitude of the peak in global field power  (GFP) of event-related brain responses at 100 ms from stimulus onset (N100) decreased as an inverse function of the noise gap width (F(7,63)=10.69, P=0.0020, ε=0.24; see Figure 3B). The N100 event-related potential (ERP) was most prominent at the central and frontal electrode positions (Fz, Cz) due to our choice of reference electrode and binaural stimuli (ERP waveforms are shown in Figure 4 and the resulting GFP waveforms in Figure 5). Together with peak amplitude decrease the latencies increased as a function of the narrowing notch (F(7,63)=4.22, P=0.013, ε=0.45)
Selective attention significantly increased the GFP amplitude at the N100 peak latency (main effect: F(2,18)=49.20; P<0.00001, ε=0.85; Attend vs. Ignore contrast: F(1,9)=93.76, P<0.00001), but as a non-multiplicative function of the width of the noise masker notch (see Figure 3B). Specifically, after fixing the grand average GFP amplitude reduction function during Ignore task to the end-point stimuli types (no masker, white noise) of Attend conditions with a simple linear scaling and offset, a comparison was made between the predicted GFP amplitudes for the notched-noise stimuli types (±500, …, ±40 Hz) and the observed GFP amplitudes, revealing a significant Datatype (i.e., predicted vs. observed)×Stimulus interaction effect (F(2,18)=5.14; P=0.014, ε=0.56) and nearly significant Datatype×Condition×Stimulus interaction (F(10,90)=39.00; P=0.129, ε=0.42), which both reached significance by taking the Attend vs. Ignore contrast (F(1,9)=17.18; P=0.0025 for the first interaction, F(1,9)=5.57; P=0.043 for the second). To further assess this effect, the observed GFP peak amplitudes were pairwise compared to the predicted ones. With noise gap ±500 Hz, the observed value was smaller than predicted, and with noise gaps ±150 and ±40 Hz larger than predicted (Figure 6). These findings suggest that increased gain and enhanced selectivity together explain selective attention (as hypothesized in Figure 2). The peak latencies were also significantly longer during selective attention (main effect: F(2,18)=115.35, P<0.00001, ε=0.76; Attend vs. Ignore contrast: F(1,9)=134.47; P<0.00001), tentatively suggesting that the enhanced feature selectivity could be accomplished through inhibitory mechanisms .
There were significantly fewer correct responses (F(7,63)=30.30, P<0.00001, ε=0.33, Figure 7A) and longer reaction times (F(7,63)=8.82; P<0.001, ε=0.42, Figure 7B) with narrower notches. Hit rates in the Attend Frequency vs. Attend Duration conditions did not differ significantly, indicating that the two discrimination tasks were of similar difficulty. Hit rates and reaction times correlated significantly with the GFP amplitude at the N100 peak latency (Table 1).
Our experimental data corroborate the previous findings of attentional augmentation of the N100 response. Our novel procedure was to focus on the general shape in which the N100 amplitude changes as a function of parametrically varying, continuous notched-noise masker. The shape of this function (Figure 2 vs. 3B, see also Figure 6) suggests that multiplicative gain increase alone cannot explain the present findings, but rather that a combination of increased gain and enhanced selectivity underlies auditory selective attention, similarly to what has been recently suggested to be the case in the human visual system , . This is analogous to the monkey visual system, where selective attention increases response gain of orientation-selective neurons in the early areas MT, V1 and V4 –. Some of these animal studies report only the gain increase, that is, no difference was found in the preferred orientation that the neurons responded to , . Still, more recent studies propose a combination of increased gain and selectivity to explain the observed modulation in areas V4 and MT ,  or an additional ‘response bias’ towards higher-contrast stimuli in case of multiple stimuli inside the neurons’ receptive field .
In the present study, attentional modulation was strongest with the intermediate notch widths (e.g., ±150 Hz and ±80 Hz; see Figures 3 and and4),4), again decreasing in amplitude with ±40 Hz notch and plain white noise. As the task difficulty increased with decreasing notch, resulting in poorer performance, we could tentatively say that task difficulty (or attentional load) did increase the proportional attentional effect or sensory gain, but only to a certain extent, which could even be related to the physiological limits of the human auditory system (the critical band). The increase in attentional effect together with task load is in line with previous studies , however, it should be noted that in our study the presented tones were deliberately adjusted to be perceptually very weak, and the use of random, long inter-stimuli interval (ISI) made the task even more difficult.
Here, we did not specifically address the question of where these attentional modulations took place. Some previous studies have suggested that attention-related modulations in humans are confined to secondary auditory cortex , while others have suggested involvement of the primary auditory cortex , , . A recent study in awake monkeys employing a masking paradigm similar to the present study found neuronal population responses in A1  that were clearly more sharply tuned than in noninvasive measurements of the auditory cortex activity ,  (including the present study). This suggests that noninvasive human measures of tuning also encompass activity of less-sharply tuned neurons in non-primary cortical areas. An interesting lesion study shows that auditory attention strongly modulates non-primary auditory areas and can lead to perception of sound onset and offset, termed ‘deaf-hearing’, even with an absence of primary auditory areas . This could be explained by by top-down attentional control signals, projecting from the prefrontal cortex to the remaing non-primary auditory areas, leading to conscious perception. Therefore, caution should be exercised when drawing conclusions in which specific auditory cortical areas the present attention effects are generated.
We quantified the EEG response by a well-established technique of using the tone-evoked N100 peak amplitude. This, of course, does not completely define the neural activity during perception, because the evoked potential is a product of massively synchronized neural populations, and makes hypotheses about underlying neural mechanisms only speculative. Due to this, the increase in the observed EEG amplitude during Attention could be accounted for by an increase in synchrony of the neural activity (i.e., less jitter in single epochs). Still, this type of enhancement alone would produce results roughly identical to the ones with multiplicative gain increase (see Figure 2). The question of linearity between the EEG response and stimulus level might also need clarification. The N100m response (MEG counterpart of N100) amplitude to a simple tone increases approximately linearly with level only very near the psychophysical threshold, showing saturation at higher levels . The stimulus level dependence of the evoked potential amplitude can be even more complicated with spectrally complex stimuli as the ones used here, where we employed frequency masking to modify the perceived stimuli level. However, as we used identical auditory stimuli across all conditions, we did not need to assume linearity between the response amplitude and perceived stimulus level when analyzing the main results (although the assumption of linearity was made in the simulations, so direct comparisons between the curves shown in Figure 2 and Figure 3B should be done cautiously). Selective attention can naturally also alter this linear or nonlinear relationship between the EEG amplitude and the stimulus level, but we did not specifically address this question here.
According to some theories, selective attention modulates neural populations separate from those processing the stimulus per se  so that there are in fact two distinct auditory pathways: one faithfully transmitting the incoming sensory information and another attention-modulated pathway analyzing the task-relevant acoustic features . However, given that the current source density maps disclosing left and right hemisphere auditory cortex sources were observed to be highly similar in the present study across conditions (see Figure 8), our findings are in favor of models and previous results suggesting a direct modulation of feature-specific neurons and their receptive fields , , , , .
Interestingly, attending to sound duration produced effects at the N100 latency similar to attending to sound frequency. Psychophysical studies indeed demonstrate that the template used in detecting an auditory stimulus stores both its frequency and duration – (i.e., duration is not represented independently of frequency).
While our results indicate that increased sensory gain alone cannot explain selective attention in the auditory cortex, but that enhanced frequency selectivity in the underlying tonotopically organized neurons additionally contributes to filtering of task-relevant stimuli from noise, there are number of alternative mechanisms that could explain enhanced selectivity during selective attention. One possible mechanism could be the receptive field modulations, evidenced by neurons' best frequency (BF) shifts towards the target sounds in lower animals , . These effects have been observed to take place on the timescale of seconds or minutes when induced by electric stimulation ,  (however, see also ). It is also possible that such receptive field modulations could be explained by modulation/tuning of sound processing at subcortical , or even cochlear , , level, driven by corticofugal connections from the primary auditory cortex , .
Alternatively, the increased selectivity can be explained by lateral interaction mechanisms, which have been shown to occur in early vision due to attention – and in human audition following exposure to notched-noise stimuli , . The longer response peak latencies in the present study during the selective attention conditions provide tentative support for inhibitory mechanisms, but as previous studies have found opposite latency effects during attention (e.g., ), this finding by itself should be interpreted carefully. Importantly, the changes in the balance of excitation and inhibition could explain the observed relatively higher amplitude with ±500 Hz notch during the Ignore task (see Figure 3B). This observation should not be considered as a measurement artifact, since in a previous study with similar symmetrical notches around the 1-kHz tone and passive reading task, the MEG dipole amplitude was found to be largest with about ±400–500 Hz notches, and decreased in amplitude with wider notches (±600–1000 Hz) . It is, for instance, possible that when increasing the notch width around the test tone frequency, the lateral inhibitory band of the sharp notched-noise edge at a certain notch width reaches neurons which normally inhibit neurons whose CF is near the target tone frequency, consequently allowing more neurons to respond to the test tone. Further neurophysiological studies are, however, needed to address the question of exactly which neural mechanism underlies the enhancements in frequency selectivity.
The current experiment shows that auditory selective attention in humans cannot be explained by a gain model, where only the neural activity level is increased, but rather that selective attention additionally enhances auditory cortex frequency selectivity. Selective attention increased the amplitude of the evoked brain responses with all of the used notch widths, but as a non-multiplicative function of the width of the notch when compared to the Ignore condition. The question of exactly which neural mechanism increases the selectivity still requires further studies, but we hypothesize that a combination of both increase in cortical gain and enhanced selectivity could best explain the results. Additionally, based on the current data, lateral inhibition likely plays a role in enhancing the auditory cortex response.
20 healthy right-handed volunteers (13 males and 7 females, age 18–28 years) with normal hearing participated in the study. The studies were carried out in accordance with the Helsinki Declaration, ethical approval was obtained from the Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa and a written voluntary consent was obtained from each subject prior to participation. The stimulus paradigm used was highly similar to one that we have previously used to quantify auditory-cortex sound frequency tuning . Within a given block, repetitive 100 ms 1000 Hz sine wave tones (65 dB SPL, inter-stimulus interval (ISI) 1.5–2.5 s) were presented either in silence, during continuous white noise, or during white noise band-stop filtered (120 dB/Hz) with one of six different notch widths ±ΔF (±500, ±400, ±250, ±150, ±80, and ±40 Hz) centered at 1000 Hz. The SPL of the noise maskers was adjusted before the experiment in 1-dB steps  so that the 1000-Hz tones were at 50% hearing threshold with the white noise masker (range: 47–53 dB SPL). To keep the length of the sessions within reasonable limits, half of the notched-noise maskers (±500, ±150, and ±40 Hz) were presented to 10 of the subjects, and the rest of the noise maskers (±400, ±250 and ±80) to the remaining 10 subjects. Occasionally, the 1000 Hz tones were replaced by infrequent target tones either higher in frequency (5%, 1020 Hz), longer in duration (5%, 150 ms), or both higher in frequency and of longer duration (5%). All tones had 5-ms linear rise and fall times. The auditory stimuli were presented using a PC and Presentation Software (Neurobehavioral Systems Inc., Albany, CA, USA). All of the sound files used in the experiment were created using Matlab (R12.1, MathWorks Inc., Natick, MA, USA) with 44.1-kHz sampling frequency and 16-bit precision. The sounds were presented binaurally through two high-quality speakers (Roland Stereo Micro Monitor MA-8), which were located symmetrically on both sides of the computer monitor (distance from subjects' head one meter).
In addition to the baseline condition, where the subjects were instructed to watch a silent movie and ignore the auditory stimuli, there were two discrimination conditions, where the subjects had to press a button with their right index finger in response to detection of frequency deviants (Attend Frequency), and duration deviants (Attend Duration). During these tasks, a small fixation cross was displayed on the computer screen. Identical auditory stimuli were presented during all three tasks. A hit was defined as a button press before the onset of the next stimulus. Late responses and responses to non-target stimuli were regarded as false alarms.
The 32-channel EEG was recorded (0.1–225 Hz passband, 500 Hz sampling rate; Brain Products GmbH, Germany) in a sound-attenuated and electrically shielded room. 800-ms epochs with 200-ms pre-stimulus baseline were filtered with a 0.5–40 Hz passband and averaged relative to the onset of the 1000-Hz sounds, with responses exceeding ±75 µV at any channel rejected due to possible extra-cerebral artifacts. The number of accepted epochs per stimulus type and condition (e.g., a given noise-masker type during Attend Frequency condition) was on average 108 (out of 119).
After averaging and obtaining the event-related potential (ERP), the Global Field Power (GFP) measure  was calculated for each epoch with a formula
where ui,uj are the evoked potentials from channel numbers i,j at each time point tk, and n is the number of electrodes used. The GFP peak latencies for N100 response at around 100 ms were determined for each stimulus type and condition from the grand averaged GFP. Mean GFP amplitude of ±20-ms time window around this grand average peak latency was then used as the N100 peak value for each subject. The mean GFP value during pre-stimulus period (100–0 ms) did not differ significantly across stimuli or conditions, so it was subtracted from the N100 peak; the resulting value was used as the individual GFP value in further analysis. Scalp current densities were also calculated for each stimulus type and condition, to reveal possible changes in source configuration across conditions. Behavioral and neurophysiological data were analyzed using a two-way, repeated measures analysis of variance (ANOVA), with Condition (Attend frequency, Attend duration, Ignore sounds) and Stimulus (no masker, ±500, …, 0) as within-subjects factors. To test the hypothesis that function of amplitude reduction was modified during attention, an additional Datatype (observed data vs. predicted data) within-subject factor was used. The reported p-values from the ANOVA were corrected using Greenhouse-Geisser correction for degrees of freedom, but are reported with original, uncorrected degrees of freedom.
The simulation of different effects of attention to the neural response amplitude (depicted in Figure 2) was done in Matlab (R12.1, MathWorks Inc., Natick, MA, USA). There, an array of neuron-like elements was created, each with own characteristic frequency (CF) and tuning curve around it. The basic shape of the tuning curve was a single-parameter rounded-exponential Roex(p) filter
where p is the steepness of the filter and g the relative width of the frequency notch. Here, the Roex(p) filter shape was chosen only for simplicity and g was used as a factor of difference between each neurons CF and the 1-kHz probe sound. Neurons' firing probability at each time point during stimulus presentation was inversely proportional to the tuning curve shape. The time course of this hypothetical ‘neural population’ was calculated in 1-ms steps, during which each neuron was either ‘on’ or ‘off’, and the population response amplitude was defined as number of active neurons during stimulus presentation vs. baseline.
The effect of multiplicative gain was simulated by increasing the firing probability for each neuron (i.e., lower threshold of activation for all the simulated neurons). The tuning curve sharpness change was simulated by changing the p parameter in Roex(p) auditory filter, uniformly across all simulated CFs (higher p, sharper tuning curve). In addition to the isolated changes, a combination of effects was simulated by both increasing the gain and sharpness.
We thank M. Andermann for comments.
Competing Interests: The authors have declared that no competing interests exist.
Funding: This study was financially supported by the Academy of Finland (grant nos. 206368, 212261, 213470) and by the National Institutes of Health (grant no. RO1-HD040712).