|Home | About | Journals | Submit | Contact Us | Français|
Attending to a stimulus enhances its neuronal representation, even at the level of primary sensory cortex. Cross-modal modulation can similarly enhance a neuronal representation, and this process can also operate at the primary cortical level. Phase reset of ongoing neuronal oscillatory activity has been shown to be an important element of the underlying modulation of local cortical excitability in both cases. We investigated the influence of attention on oscillatory phase reset in primary auditory and visual cortices of macaques performing an intermodal selective attention task. In addition to responses “driven” by preferred modality stimuli, we noted that both preferred and non-preferred modality stimuli could “modulate” local cortical excitability by phase reset of ongoing oscillatory activity, and that this effect was linked to their being attended. These findings outline a supramodal mechanism by which attention can control neurophysiological context, thus determining the representation of specific sensory content in primary sensory cortex.
Imagine a couple dancing: one of the dancers has to lead, and set the rhythm for the other, to avoid confusion. Similar to this scenario, our hypothesis is that at any given moment there also tends to be a leading sense orchestrating the oscillatory dynamics that form the electrophysiological context of neural processing. This would enable the concerted control of excitability in cortical areas processing modality specific properties of a multisensory object, thereby resulting in low-level multisensory interactions and an efficient merging of information at higher levels of processing. Several lines of research provide the backbone of this hypothesis.
It has long been known that field potentials arise from the summation of postsynaptic potentials, spontaneous neuronal membrane potential fluctuations, and spike afterpotentials in the brain (e.g. Mitzdorf, 1985; Schroeder et al., 1995; Kamondi et al., 1998; Buzsaki et al., 2003), however this was considered more or less to be random background activity with no, or minimal effect on perceptual processes. By now, ample evidence supports the proposition which is almost as old as the electroencephalogram (EEG) itself (Bishop, 1933), that cortical excitability is “slave to the rhythm” of ongoing neuronal oscillations, meaning that rhythmically fluctuating cortical excitability is controlled by oscillations that are detectable in local field potentials or the EEG (Steriade et al., 1993; Azouz and Gray, 1999; Fiser et al., 2004; Lakatos et al., 2005; 2007; 2008). These studies support the hypothesis first proposed by Buzsaki and Chrobak (1995) that background, or ongoing neuronal network oscillations constitute the “context” that affects processing of the “content” conveyed by specific sensory inputs. However, if the neurophysiological context was independent of what is happening in the world around us, it could still be considered as neatly organized noise that randomly amplifies or attenuates neuronal responses, leading to the formation of unstable sensory representations. A number of studies show that this is not the case. Rather, there is a two way interaction between the neurophysiological context and the sensory inputs that assures the most effective sampling of our environment (Lakatos et al., 2007, 2008; Kayser et al., 2009). One side of this interaction is the effect of the context on sensory inputs, while the other is the effect of sensory inputs on the neurophysiological context. The ability of sensory inputs to modulate the context provides a link between the temporal structure (e.g. speech) or rhythmical sampling (e.g. sniffing or saccades) of sensory information and the rhythmically changing excitability of the neuronal ensembles involved in the processing of this information. This alignment of the neurophysiological context with key external events ensures the most effective processing of sensory inputs or input patterns that modulate the context (Schroeder et al., 2008; Kayser et al., 2009).
The mechanism by which sensory inputs can interact with the context is the phase reset of ongoing neuronal oscillations (Sayers et al., 1974; Basar, 1980; Makeig et al., 2004). Results from multisensory studies demonstrate that besides preferred modality stimuli, the context can be modulated by cross-modal inputs related to non-preferred modality stimuli already at the level of primary cortical areas (Lakatos et al., 2007, 2008; Kayser et al., 2008). If all sensory inputs had equal access and ability to reset ongoing neuronal oscillations, the context would be degraded to noise by stimuli that continuously bombard our senses. This means that there would be no prevalent rhythm in the EEG, and the neurophysiological context would lose its functionality. However, it is clear that the EEG retains an intricately organized oscillatory structure under a wide range of conditions (e.g. Canolty et al., 2006; Lakatos et al., 2008), thus the question is: how are inputs that influence ongoing rhythms selected? Sensory inputs must have controlled access to the context to ensure that while some can modulate it, others have no effect. Our hypothesis is that there is a leading sense in perceptual processes that sets the rhythm across ensembles of oscillating neurons, meaning that it has access to, and can shape the neurophysiological context by means of oscillatory phase reset. We propose that the modality of the leading sense is dynamically changing, and is determined based on the relative salience of stimuli, as influenced by their physical properties and by attention. Given the profound effect of context on the processing of specific sensory content, control of context by one of the senses can drive facilitation (Nickerson, 1973), suppression (Colavita, 1974) or qualitative change (McGurk and MacDonald, 1976) in the perception of co-occurring events in other modalities. The hypothesis of a leading sense is strongly supported by the common finding that attended stimuli in one sensory modality can affect processing of inputs in another modality, while ignored stimuli do not result in similar cross-modal effects (Busse et al., 2005; Talsma et al., 2007).
To test the specific hypothesis that ongoing oscillatory activity is differentially modulated by attended and ignored stimuli, we analyzed responses to pure tones and light flashes in primary auditory (A1) and visual (V1) cortical areas of macaques, who were performing an intermodal selection task. The monkeys were trained to either attend to a stream of sounds and identify a target while ignoring the simultaneously presented but temporally offset stream of visual stimuli, or to attend to visual stimuli while ignoring tones in alternate blocks of trials. Our findings support the hypothesis that the neurophysiological context consisting of a characteristic pattern of ongoing oscillations is accessible to inputs independent of stimulus modality at even the first stage of cortical processing, and that this process is regulated by attention.
In the present study we analyzed responses to auditory and visual stimuli in 19 penetrations of area A1 of the auditory cortex and 25 penetrations of area V1 of the visual cortex in 3 macaques. On alternate trial blocks, the monkeys were attending either to a stream of tones or to a stream of light flashes trying to identify a target, while they were ignoring the simultaneously presented temporally offset stream of stimuli in the other modality. The effect of attention on oscillatory phase reset was evaluated by comparing responses to attended and ignored standard stimuli. Intracortical field potentials and multiunit activity (MUA) profiles were recorded concurrently with linear array multielectrodes. Instead of analyzing the field potentials we calculated current source density (CSD) profiles, which allow better localization and more direct physiological interpretation of transmembrane currents underlying sub- and supra-threshold excitability changes in a neuronal ensemble.
Figure 1A illustrates the characteristic difference between responses to attended preferred modality and non-preferred modality stimuli - a 2 kHz tone and light flash respectively - in primary auditory cortex. The upper left color-map shows the laminar CSD profile of a typical response to a best frequency or close to best frequency tone. Characteristic of this type of response is the initial sink in lamina 4 signaling the excitatory response of granular layer cells in response to specific “lemniscal” thalamic input, which is followed by later sink-source pairs above and below, signaling di- and trisynaptic activation of supragranular (S) and infragranular (I) neuron populations. This laminar pattern of activation can be regarded as a signature of feedforward type activation (Schroeder et al., 1998; Lakatos et al., 2007). The color map below shows the laminar profile of a visual stimulus related response in A1. The high amplitude CSD activity before stimulus onset is partly the result of responses to auditory stimuli that occurred in the −250–−450 ms time frame before the visual stimulus in the mixed train of auditory-visual stimuli (see methods), and partly related to entrainment of neuronal oscillations to the temporal structure of the attended (in this case visual) stimulus stream (Lakatos et al., 2008). Despite this ongoing CSD pattern, there is a clear stimulus related response that is very different from the feedforward type activation related to tones. First of all, the amplitude of the response to the visual stimulus is much lower (note that the auditory and visual stimulus related laminar CSD profiles are on a different scale), and it is clearly weighted towards the supragranular layers. Also, the initial stimulus related activity occurs in these layers, meaning that it is driven by inputs that target supragranular neuronal populations in contrast to specific thalamic inputs (Jones 1998a). Another important difference between responses to preferred and non-preferred modality stimuli is that while auditory stimuli result in a significant increase of post-stimulus firing, there is no apparent MUA change related to flashes despite the stimulus related CSD response (traces below the laminar profiles in Fig. 1A). To determine whether the pattern of MUA responses apparent in Figure 1A, that is, supra- versus sub-threshold response to preferred versus non-preferred modality stimuli, was characteristic of the whole auditory dataset, we statistically compared pre-stimulus (−50 0) and post-stimulus (10–110 ms) MUA amplitudes in our 19 A1 recordings. All A1 sites had significantly different post-stimulus MUA (dependent t-test, p<0.01). Attending to 2 kHz tones resulted in significantly increased neuronal population firing in 10 sites, while the other 9 A1 recording regions showed significantly decreased firing. In contrast, attending to visual stimuli resulted in significantly different MUA responses in only 2 A1 sites with increased post-stimulus firing. In these cases, rather than having a sharp post-stimulus onset, MUA had the shape of a gradually rising slope around the time when visual stimuli occurred. The fact that there is no significant change in MUA coupled with the supragranularly-weighted post-stimulus CSD pattern (Fig. 1A) observed in all our recordings means that while the visual response does modulate net local neuronal excitability indexed by the CSD, in the absence of appropriate auditory input, this does not result in significant changes of neuronal ensemble firing. This is characteristic of a “modulatory” rather than a “driving” type of response.
Based on earlier findings, which the underlying mechanism of the modulatory response is oscillatory phase reset in the supragranular layers (Lakatos et al., 2007). Therefore, a modulatory response should be characterized by an increase in phase coherence across trials (indexed by intertrial coherence – ITC), in the absence of pre- to post-stimulus increase in CSD amplitude or MUA. To examine this, we calculated laminar CSD amplitude profiles by averaging single trial analytic CSD amplitudes (Fig. 1A, middle). Characteristic of an evoked type response (Makeig et al., 2004; Shah et al., 2004; Lakatos et al., 2007), there is an amplitude increase in all cortical layers in response to auditory stimuli. In contrast, visual stimuli result in no obvious CSD amplitude increase, meaning that although visual stimuli do modulate local net transmembrane current flow in a predictable way (this is why stimulus related activity is evident in the averaged CSD profiles on the left), they do not significantly increase the amplitude of membrane potential fluctuations of the local neuronal ensemble, which would result in increased net transmembrane current flow and hence increased CSD amplitude. The same pattern was observed in all A1 recording sites. When we compared visual stimulus related CSD amplitudes (averaged across all layers and in the 10–110 ms time interval) to baseline, we found no significant pre- to post-stimulus increase in CSD amplitude in any case (dependent t-test, p<0.01). The opposite was true for auditory responses: 15 out of 19 A1 sites showed a significant CSD amplitude increase in response to attended 2 kHz tones. The sites that did not were accompanied by significant MUA decreases (see above). This finding probably reflects the fact that preferred modality stimulus does not equal optimal stimulus, and is a consequence of recording from A1 regions which were not tuned to the 2 kHz tone used in the intermodal attention paradigm. In this study we did not differentiate between the two types of responses, since our main results were not different for the 2 (excitatory and inhibitory) groups.
Color-maps to the right of CSD amplitude maps (Fig. 1A) show laminar intertrial coherence (ITC) values - indexing phase similarity across trials - averaged across frequencies from 4 to 100 Hz. While the auditory evoked response is characterized by high ITC values in all cortical layers, “phase locking” of neuronal activity to visual events is only evident in the supragranular layers, which was the case in all of our A1 recordings. Traces below illustrate that the peak of the supragranular ITC is later in the case of visual stimuli, which is to be expected since visual inputs reach cortical areas ~25 ms later than auditory inputs in macaques. There are 2 traditional explanations for event related inter-trial coherence, or differently put for a predictable CSD pattern across trials: 1) a “stimulus-evoked” response that signals a significantly changed activation of the local neuronal ensemble, and is reflected by a waveform that is added to the ongoing activity (e.g. Shah et al., 2004), or 2) the reorganization of ongoing neuronal activity by means of oscillatory phase reset (Sayers et al., 1974; Basar, 1980; Makeig et al., 2004). The CSD amplitude increase that appears in the same locations as the increased ITC in the case of auditory stimuli argues for the former, while the lack of amplitude change coupled with a visual stimulus related increase in ITC agrees for the latter mechanism. We will show that while attended non-preferred modality stimuli result in pure phase reset, modality specific responses are mixed evoked/reset type.
Figure 1B shows laminar CSD, CSD amplitude, and ITC profiles from a representative V1 site. As in auditory cortex, preferred modality stimuli in V1 (light flashes) result in a typical feedforward type response with earliest onset in the granular layers, and later activation of the extragranular layers. Although admittedly not optimal for stimulating visual cortical neurons, light flashes always resulted in an excitatory response in V1, signaled by significantly increased cell firing in the 10–110 ms post-stimulus time interval compared to baseline in all V1 sites analyzed (dependent t-test, p<0.01). In contrast, only 2 out of 25 sites showed significantly different MUA following auditory stimuli, possibly due to a preceding visual response, since visual stimuli could occur 200–400 ms earlier in the stimulus train (see methods).
The CSD amplitude profiles in visual cortex (Fig. 1B, middle) show a similar pattern to that observed in auditory cortex. There is an amplitude increase throughout the laminar CSD profile in response to preferred modality stimuli, while despite an apparent sink-source pattern, there is no post-stimulus CSD amplitude increase in response to the non-preferred modality auditory stimuli. Statistical comparison of pre- to post-stimulus CSD amplitudes showed a significant increase in the case of visual stimuli in all V1 locations (dependent t-test, p<0.01), but only in 1 out of 25 sites in the case of auditory stimuli.
Laminar ITC profiles (Fig. 1B right) illustrate that as in auditory cortex, preferred modality flashes result in post-stimulus ITC increase across all cortical layers while non-preferred stimuli result in higher post-stimulus ITC values only in the supragranular layers in the attend condition; this was true for all V1 sites.
To summarize, the characteristic differences in responses to preferred and non-preferred modality stimuli in A1 and V1 are strikingly similar. Most of the time, preferred modality stimuli drive a feedforward type response. This type of response is coupled with significant changes in the amplitude of net transmembrane currents and as a consequence, the population firing rate, which is by definition an evoked type response (Makeig et al., 2004; Shah et al., 2004), also known as a “driving” response (Lakatos et al., 2007). In contrast to evoked type responses, those related to attended non-preferred modality stimuli cause a predictable (similar across trials) change in neuronal ensemble excitability reflected by a change in the pattern of net transmembrane current flow without any significant associated changes in its amplitude or the ensemble firing rate. This is typical of oscillatory phase reset (Makeig et al., 2004; Shah et al., 2004), which is the mechanism of “modulatory” type responses (Lakatos et al., 2007). In the following we will present data showing that both preferred and non-preferred modality stimuli reset ongoing oscillatory activity, but only if stimuli are attended.
Figure 1C–D show the same electrophysiological variables as Figure 1A–B discussed above, but for ignored standard tones and flashes in areas A1 and V1. Preferred modality stimuli that are ignored evoke smaller amplitude CSD and MUA responses than attended ones, accompanied by smaller ITC in all cortical layers, but especially in the supragranular layers. In the case of non-preferred modality stimuli, there seems to be no organized post-stimulus CSD pattern (color maps on the left), which is supported by the laminar ITC profiles that show no apparent post-stimulus ITC increase in any of the cortical layers. To investigate the effect of attention on the ITC, in each A1 and V1 recording session we selected the electrode site with the largest post-stimulus (10–110 ms) ITC calculated using the pooled responses to non-preferred modality stimuli (all attended and ignored trials). As mentioned earlier, this electrode site always corresponded to the supragranular layers. Time frequency plots in Figure 2 show pooled ITC values from these suprangranular sites in A1 (n=19) and V1 (n=25) related to preferred (upper) and non-preferred modality (lower) stimuli. To compensate for the effect of sample size (number of trials) on ITC, for this figure and for the analysis described below an equal number (200) of randomly selected epochs was used to calculate the ITC for each of our experiments. While preferred modality stimuli result in coherent phase across trials in a wide range of frequencies, typical of an evoked type complex waveform (Lakatos et al., 2007), non-preferred stimuli result in high ITC values in two distinct bands in the theta (4–10 Hz) and low gamma (25–55 Hz) frequency ranges. It is apparent that ITC is smaller in the case of ignored stimuli, independent of their modality in both A1 and V1. The fact that there is no significant phase coherence across trials in the case of ignored non-preferred modality stimuli means that these stimuli do not reset oscillatory activity, since this would result in increased post-stimulus ITC values. Traces to the right of the time-frequency maps show pooled ITC values at the time of maximal post-stimulus gamma frequency ITC related to attended stimuli (grey arrows in time frequency plots). Red lines along the frequency (y) axis denote frequency regions where attended and ignored stimuli result in significantly different ITC (dependent t-test, p<0.01). This significant attention effect seems to be independent of stimulus type or cortical area and is restricted to the theta and gamma bands for both preferred modality and non-preferred modality stimuli. To verify the effect of attention on the level of individual experiments, first we determined the frequency and latency of maximal gamma (25–55) and theta (4–10) ITC values related to attended stimuli in the 0–200 ms time interval (Fig. 3). Although not apparent in the pooled data (Fig. 2.), in the case of most ignored stimuli, an “ITC peak” could be identified around the same time and frequency as the gamma and theta peaks related to attended stimuli (see methods). Statistical testing revealed no significant difference between the frequency or latency of the peaks in attended versus ignored conditions (dependent t-test, p>0.01). We also did not find any significant differences in the frequency or timing of ITC peaks related to preferred versus non-preferred modality stimuli (dependent t-test, p>0.01). We did however find significant differences in the frequency of the gamma band ITC dependent on the modality of primary cortex we recorded from (A1 vs. V1, Fig. 3A). There was also a significant difference in the timing of the highest phase coherence across trials within the theta and gamma bands (Fig. 3B), which was dependent on stimulus modality independent of cortical area (Tukey’s test, p<0.01). Next we determined the significance of peak ITC values using the Rayleigh statistic (p<0.01). In auditory cortex, the peak ITC values related to attended preferred and non-preferred modality stimuli signaled a significantly non-random phase distribution both in the gamma and theta range. In contrast, while ITC peaks related to ignored preferred modality stimuli were significant in most cases (15 out of 19 for both gamma and theta, Rayleigh, p<0.01), they did not signal significantly non-random phase distribution in the cases of ignored non-preferred modality stimuli in 16 out of 19 experiments. Similar results were found in the visual cortex: in most cases (25/25 related to preferred and 23/25 related to non-preferred stimuli) peak ITC values related to attended stimuli signaled a significantly non-random phase distribution both in the gamma and theta range independent of their modality. Also, while ignored preferred modality stimuli still resulted in significant ITC in most cases (18/25 for gamma and 24/25 for theta, Rayleigh, p<0.01), post-stimulus gamma and theta phase distribution across trials in V1 was not significantly different from random in the case of ignored auditory stimuli (24/25 for gamma and 19/25 for theta). The above data indicate that inputs related to non-preferred modality stimuli do not reset ongoing oscillatory activity if they are ignored. The reason for significant ITC related to ignored preferred modality stimuli is not explicitly verifiable, but is probably related to the complex waveform of the evoked type response, which is present in both attention conditions (Fig. 1) and has a broadband frequency spectrum (e.g. Lakatos et al. 2007). Since the latency jitter and variation in the shape of the early response component is relatively minimal, this results in a “baseline ITC” that contiguously spans multiple frequency bands, with the post-stimulus ITC increase related to the oscillatory phase reset of ongoing oscillations superimposed on this (traces in Fig. 2).
To statistically test for a significant attention related gamma and theta ITC difference within experiments, we compared single trial phases at the frequency and latency of attended and non-attended post-stimulus ITC peaks using a non-parametric statistical method which is - unlike the ITC - independent of unequal sample sizes (Maris et al., 2007). In auditory cortex, auditory event-related ITC proved to be significantly greater in the attend compared to the ignore condition in 16/19 cases for gamma and in 18/19 cases for theta. The same was true for visual event-related ITC in 19/19 cases for gamma, and 17/19 cases for theta (Monte Carlo p < 0.05). In visual cortex, visual event-related ITC was significantly greater in the attend relative to the ignore condition in 19/25 cases for gamma and in 21/25 cases for theta, and the same was true for auditory event-related ITC in 18/25 cases for gamma and in 17/25 cases for theta (Monte Carlo p < 0.05). The reciprocity of the present results between A1 and V1 addresses any concerns that variations in the previously reported attention related oscillatory entrainment in V1 (Lakatos et al., 2008) could be related to different difficulties of the auditory and visual tasks.
Figure 4 shows the mean phase of post-stimulus gamma and theta oscillations related to preferred and non-preferred stimuli in A1 and V1. In most cases, the distribution of mean phases is not random (Rayleigh, p<0.05). This means that stimuli reset ongoing oscillations in a predictable manner, meaning that although this was not tested in the present study, we can make specific predictions about the effect of reset oscillations on the processing of subsequently appearing preferred modality stimuli. Interestingly, the mean phases of gamma and theta oscillations in auditory cortex related to auditory (preferred modality) stimuli appear to be more random than in other cases. A possible explanation for this is that since the auditory stimuli used here resulted in both excitation and inhibition (see above), the related phase reset in these two groups results in different post-stimulus mean phases, i.e. the phase ongoing oscillations are reset to (high or low excitability) is frequency dependent in A1. This possibility is a target of ongoing studies in our laboratory.
We also compared pre- and post-stimulus (−50–0 ms and 10–110 ms) oscillatory amplitudes of the wavelet transformed single trials averaged in the 4–100 Hz frequency band, to test for any post-stimulus amplitude increase related to attended non-preferred modality stimuli that we might have missed by averaging analytic CSD amplitudes across layers (see above). Consistent with those results, we found that in most A1 (18/19) and V1 (23/25) sites there was no significant event related change in supragranular oscillatory amplitudes. To ensure that decreased post-stimulus oscillatory amplitude in one band - like alpha desynchronization - is not “masking” an amplitude increase related to an evoked process in other ones (Sauseng et al., 2007), we compared pooled (n=19 for A1 and n=25 for V1) pre- and post-stimulus oscillatory amplitudes in 4 different frequency bands, that were selected based on the inspection of baseline spectrograms (Fig. 5A) and results of previous studies (Lakatos et al., 2005, 2007). We found that there was no significant amplitude change in either A1 or V1 in the gamma (25–55), beta (13–25), alpha (10–13) or theta (4–10) frequency bands (dependent t-test, p<0.01). This pattern of findings is consistent with the hypothesis that both in A1 and V1 the event-related change in oscillatory phase distribution, which can often appear to be an event related “response” to non-preferred or “modality-inappropriate” stimuli, results purely from the phase reset of ongoing oscillations.
Our analysis determined that independent of stimulus modality, the frequency of gamma oscillations that are “phase locked” to attended stimuli (Fig. 3A) is significantly different in A1 versus V1. If, as our hypothesis suggests, these phase locked oscillations are the result of the phase reset of ongoing oscillatory activity, we should find a significant difference in the frequency of ongoing gamma oscillations between A1 and V1. To analyze this prediction, we calculated the baseline oscillatory spectrum of each site from the data recorded during cueing trial blocks, where only non-preferred modality stimuli were presented. Since our present data and earlier studies show that non-preferred modality stimuli do not result in evoked type responses in primary cortices, we assume that the EEG recorded during these trial blocks is representative of the ongoing oscillations in an alert state. Figure 5 shows normalized averaged spectrograms from A1 and V1. It is clear that the different pattern in phase reset - i.e. higher gamma frequency in V1 - corresponds to different frequency dominant ongoing oscillations in areas A1 and V1. While gamma frequency is significantly different between A1 and V1 (independent two sample t-test, p<0.01), there is no significant difference in the frequency of theta oscillations, just like there was no significant difference in the frequency of theta ITC. When we directly compared the frequency of dominant ongoing oscillations and the frequency of peak ITCs related to preferred and non-preferred modality stimuli within cortical areas in any of the frequency bands, we found no significant difference (Kruskal-Wallis test, p>0.01) indicating that ongoing gamma and theta oscillations are being reset by inputs related to attended stimuli independent of whether a given primary cortical area is primarily involved in processing specific attributes of the stimulus. Although in the present study we found no difference between the frequency of ongoing and phase reset gamma oscillations, it is possible that alerting, or behaviorally relevant stimuli instantly change the frequency of ongoing gamma oscillations. The exact role of oscillatory frequency in the gamma band remains to be tested.
It is important to note that while there is a clear peak in the frequency range above theta in the baseline spectrogram at ~11 Hz in both A1 and V1, we did not find a corresponding peak signaling non-random post-stimulus phase distribution across trials in the ITC values, nor did we find a significant attention related effect on ITC in this frequency band (Fig. 2). Although - since they do not appear to be reset by sensory inputs - the detailed analysis of alpha band oscillations is beyond the scope of the present study, in agreement with earlier studies (Bollimunta et al., 2008), an inspection of ongoing activity spectrograms revealed a different laminar amplitude distribution than that of theta or delta band oscillations. This further suggests a different role for these oscillations in sensory processing than that played by the supragranularly-weighted oscillatory hierarchy of delta-theta-gamma rhythms (Lakatos et al., 2005, 2007, 2008).
In this study, we compared responses to preferred and non-preferred modality stimuli in attended and ignored conditions in primary auditory and visual cortices. We have shown that non-preferred modality stimuli - specifically visual stimuli in A1 and auditory stimuli in V1 - result in stimulus related responses, but only when they are attended. The responses to these stimuli were concentrated in the supragranular layers, and were characterized by increased phase locking accompanied by no significant CSD amplitude or MUA change, indicating that these responses reflect pure phase reset, as opposed to the evoked type responses related to most preferred modality stimuli. This notion was further supported by the fact that significant phase locking occurred only in frequency bands with prominent peaks in the baseline spectrogram (dominant ongoing oscillations) of both A1 and V1. Preferred modality stimuli resulted in evoked type responses, and we found that there was a significant difference in phase locking in the dominant ongoing oscillatory frequency bands between the two attentional conditions. This provides indirect evidence that inputs related to these stimuli reset ongoing oscillations as well, but only in the case when they are attended. Overall, our findings suggest that stimuli reset the ongoing electrophysiological context independent of their modality, and that their ability to cause phase reset can be controlled by attention.
It seems to be clear by now that there are two types of neuronal responses that contribute to “traditional” event related potential (ERP) waveforms gained by averaging across signals related to repetitions of events: the “evoked” type and “phase reset” type (for reviews see Makeig et al., 2004; Shah et al., 2004; Klimesch et al., 2007; Sauseng et al., 2007). Evoked type electrophysiological activity is additive in nature: stimulus processing results in phasic increase of postsynaptic activity in neuronal ensembles, which presents as a complex waveform that is absent in the ongoing neuronal activity preceding stimulation. Although at one point it was widely believed that this evoked waveform is independent of, and simply added to the ongoing activity, now we know that ongoing neuronal oscillations have an important effect on evoked responses even in primary cortical areas (Steriade et al., 1993; Azouz and Gray, 1999; Fiser et al., 2004; Lakatos et al., 2005; 2007; 2008). In contrast to evoked type neuronal activity, phase reset simply reorganizes the phase of ongoing neuronal oscillations, without increasing their amplitude (Makeig et al., 2004; Shah et al., 2004). This results in organized post-stimulus activity, or phase locked oscillations, which can be detected in stimulus locked averages of the electrophysiological signal.
Our recording (intracortical electrodes) and signal analysis (CSD) technique enables the investigation of evoked and phase reset activity with minimal volume conduction on the small neuronal ensemble level (Mitzdorf, 1985; Schroeder et al, 1998). If the firing of the neuronal ensemble is recorded concurrently, as in this study in the form of MUA, the two mechanisms that generate the ERP appear to be functionally different: while evoked responses lead to increased firing, phase reset does not lead to an apparent increase of firing in the neuronal ensemble. This is because phase reset only reorganizes ongoing oscillations - which are mainly subthreshold membrane potential fluctuations - without increasing their amplitude.
The data presented here show that in primary auditory and visual cortices, attended preferred modality stimuli result in mixed evoked - phase reset type responses, whereas attended non-preferred modality stimuli result primarily in phase reset type responses. While it seems that multisensory influences on higher order cortical areas can be conveyed by both evoked type mechanisms (for a review see Ghazanfar and Schroeder 2006) and subthreshold membrane potential changes (Allman and Meredith, 2007; Meredith et al., 2009), primary cortices seem to be modulated primarily at a subthreshold level by inputs related to non-preferred modality stimuli (Lakatos et al., 2007; 2008; Ghazanfar et al., 2008; Kayser et al., 2008). This means that although primary cortices can be influenced by “low level” behaviorally relevant properties of non-preferred modality stimuli (e.g., timing and rudimentary spatial information), this effect simply influences the probability that a preferred modality input will drive action potentials. Thus, the phase reset of ongoing oscillations is an important modulatory mechanism in the neural system, but it would not lead to percepts on its own, except in extreme cases, such as that in which subjects are highly trained on a specific set of discriminative stimuli (e.g. Brosch et al., 2005). To summarize, our findings suggest that evoked and oscillatory phase reset type mechanisms of event related potential generation are functionally different, with the former related to driving inputs transmitting sensory specific information and the latter related to modulatory influences.
Using the same paradigm as in the present study, previously we have shown that low frequency (delta band) oscillations in V1 can entrain to streams of rhythmically presented visual or auditory stimuli independent of stimulus modality, and that entrainment is controlled by attention (Lakatos et al., 2008). A prerequisite for entrainment is phase-reset, which ensures that ongoing oscillations can be realigned to match the temporal structure of the attended stimulus stream to allow for a predictive rather than reactive processing of rhythmic input patterns (Large and Jones, 1999; Nobre et al., 2007; Lakatos et al., 2008; Schroeder and Lakatos 2009). Since there are oscillations in multiple frequency bands, one of our main questions in the present study was whether these oscillations are similarly affected by attended stimuli independent of modality, or does the effect we described in V1 depend solely on stimulus rhythm. Additionally, we wanted to know whether we would find reciprocal effects in primary auditory cortex. Based on our results the answer to both questions is yes: we found that attended stimuli reset ongoing oscillations in multiple (dominant) frequency bands in both V1 and A1, while ignored stimuli do not seem to reset ongoing oscillations even if they are of preferred modality. This suggests that cross-modal timing influences can be exerted without the presence of any rhythm, and thus even, randomly appearing stimuli can modulate the oscillatory electrophysiological context. The fact that phase reset occurs in multiple frequency bands covering a wide range of frequencies indicates that phase reset affects ongoing oscillations unrelated to the presentation frequency. Simultaneously it also implies that entrainment is possible on multiple time scales if attended stimuli are rhythmic.
We know that the impact of a sensory stimulus related input on a neuronal population depends on the phase of its dominant ongoing oscillations (Steriade et al., 1993; Azouz and Gray, 1999; Fiser et al., 2004; Lakatos et al., 2005; 2008). What enables the use of these ongoing oscillations as instruments in perceptual processes is phase reset, by taking the guesswork out of the oscillatory phase - sensory input relationship. If we identify the phase an oscillation is reset to by some external (stimulus related) or internal (motor/attention related) event and its frequency, we can predict when temporal windows of high and low excitability will occur, and thus the effect of reset oscillations on sensory inputs occurring at specific times relative to the reset. Although we cannot be certain with our methods, the onset of phase reset and evoked responses in the suprangranular layers probably overlaps in the case of preferred modality stimuli, meaning that the effect of reset phase on the evoked activity would be instantaneous. Evaluating this issue will require additional experimentation. In the case of multisensory stimuli, the temporal relationship of different modality inputs is key in determining whether they facilitate or impede each other (Lakatos et al., 2007). We speculate that if attention is directed towards a multisensory object, the brain is able to modulate ongoing neuronal oscillations so that their reset results in the most beneficial multisensory interaction. This dynamic mechanism could explain the adaptive recalibration of perceptual simultaneity that has been shown to occur in human behavioral experiments (Fujisaki et al., 2004; Vroomen et al., 2004). It is important to note that the temporal window(s) of integration can be very diverse in different cortical structures (compare e.g. Lakatos et al., 2007 and Chandrasekaran and Ghazanfar 2009). This suggests that the mechanisms of multisensory interactions in lower and higher order sensory cortical areas may operate via different principles, a possibility which remains to be tested.
The effect of oscillatory phase reset on perception will naturally be a summation of outputs from neuronal ensembles that are differentially modulated by oscillatory phase reset. As an example, it has been shown that while contralateral somatosensory stimuli reset ongoing oscillatory activity of the auditory cortex to its high excitability phase, ipsilateral stimuli cause reset to the low excitability phases (Lakatos et al., 2007). Thus, a somatosensory stimulus applied to the right hand will enhance auditory responses in the left and suppress responses in the right primary auditory cortices. Since auditory location seems to be coded at least in part by firing rate in A1 (Werner-Reiss and Groh 2008), this can modulate the perceived location or apparent motion of auditory stimuli (Soto Faraco et al., 2004; Sanabria et al., 2005). Similar mechanisms could explain the tactile (Caclin et al., 2002) and visual capture of attention, the so called ‘ventriloquism effect’ (Bertelson and Radeau, 1981).
Besides enhancement and suppression of responses, another obvious role of oscillatory phase reset could be that of adjusting the timing of neuronal activity (Fries et al., 2005) related to specific inputs. Slightly offset stimuli might appear simultaneous if neurons from both senses fire in the same “excitatory” phase of an oscillatory cycle set by one of the stimuli. This “excitatory window” could correspond to the temporal window of perceived simultaneity (Stone et al., 2001). Since oscillations consist of rhythmically reoccurring high excitability (and low excitability) phases, timing of neural activity also results in its segmentation. In fact there are studies suggesting that perception might be a discrete, periodic process that works based on snapshots taken at regular intervals (e.g. Van Rullen et al., 2007). Segmentation of modality specific evoked neuronal activity by non-preferred modality inputs via resetting of neuronal oscillations in visual cortex could be responsible for the so called illusory flash effect in visual cortex: if a single brief visual flash (resulting in specific evoked activity) is accompanied by two auditory beeps (non-preferred modality inputs), the single flash is perceived as two flashes (Shams et al., 2000).
Concurrent phase reset of A1 and V1 by attended stimuli also has to result in coherent oscillations if it occurs in similar frequency ranges as shown here, which is the proposed mechanism for the dynamic cross-modal linking of distant neuronal ensembles (Fries et al., 2005; Senkowski et al., 2008a).
Our data show that non-preferred modality stimuli that are attended result in the phase reset of ongoing neuronal oscillations in primary cortical areas, and that they produce little or no modulation of the neuronal activity of primary cortices if they are ignored. We also provided indirect evidence that the same rule governs the phase reset of ongoing oscillations in the case of preferred modality stimuli. These findings support the hypothesis of the leading sense, stating that only attended - or otherwise salient - stimuli modulate the neurophysiological context. This mechanism would be of tremendous importance in crowded multisensory scenes, like a cocktail party, where we selectively focus our attention on a speaker and at the same time largely ignore other visual and auditory inputs. It has been shown, that visual inputs support audiovisual speech processing in noisy environmental conditions (Sumby and Pollack 1954), and that attention to the visual modality plays an important role in this process (Senkowsky et al., 2008b). In such scenarios, our visual attention is able to use oscillatory phase reset to help control the auditory neurophysiological context and shape it so that high excitability phases of oscillatory activity correspond to times when auditory stimuli from the attended source - for example an attended speaker - are likely to arrive. As opposed to this type of top-down control, a bottom up mechanism is probably responsible for the “pip and pop” effect, where sounds make synchronously presented visual stimuli pop out from dynamically changing cluttered visual environments (Van der Burg et al., 2008a). We hypothesize, that in this case the auditory stimulus resets ongoing oscillations in visual cortex, thereby boosting the processing of simultaneously occurring visual stimuli. Although in these experiments subjects were not instructed to attend to the sounds, in a follow up study Van der Burg and colleagues found that there is an exogenous attention component related to the auditory stimuli presented in synchrony with visual ones (Van der Burg et al., 2008b), meaning that these stimuli automatically draw attention. Thus, although we did not specifically test this in the present experiments, we propose that it is stimulus salience rather than attention per-se that drives the phase reset of ongoing oscillations and grants access to the neurophysiological context. This proposition is supported by multisensory studies showing that in the case of non-attended stimuli, the effect of stimulus intensity plays a crucial role in whether these stimuli can influence processing of attended modality stimuli (Bolognini et al., 2007; Diederich and Colonius 2008; Occelli et al., 2009).
In multisensory studies where attention is not controlled (e.g., Lakatos et al., 2007), it is conceivable that attention to the stimuli presented in an artificially isolated environment (recording chamber) contributes to phase-reset effects. It is worth noting that some stimuli are inherently salient (e.g., looming or rapid movement in the periphery, and electrical stimuli applied to a skin surface), and thus, while attentional orienting is triggered, it is not necessarily causal in phase resetting. The situation is quite different in experiments like the present one, where subjects are easily capable of actively ignoring non-relevant stimuli that are physically homogenous (standards) and presented repetitively, with high attentional demand placed on a different sensory channel. In fact, a recent study shows that the classic McGurk illusion (McGurk and MacDonald, 1976) is severely reduced when subjects are performing an unrelated demanding visual, auditory or tactile task (Alsius et al., 2005, 2007). Our present findings strongly support the growing recognition that supramodal top-down influences do play an important role in most forms of multisensory integration. Further electrophysiological exploration is needed to determine which of the above mentioned multisensory effects can be explained solely by stimulus salience driven context-content interactions, and which are due to evoked type mechanisms of neuronal populations in multisensory areas (Stein and Stanford, 2008), that might be less susceptible to attentional effects.
Anatomical studies in monkeys (reviewed by Schroeder et al., 2003; Smiley et al., 2007; Hackett et al., 2007) outline three main routes by which non-modality specific inputs may access low level auditory and visual cortices: 1) feed-forward projections from “nonspecific” thalamic afferents or multisensory nuclei, 2) direct lateral projections from low level cortices, and 3) feedback projections from higher order multisensory regions of neocortex. Our finding that maximal phase locking in the gamma band occurs approximately at the same time in auditory and visual primary cortices for stimuli in a given modality (Fig. 3B) favors the first alternative, since both lateral and feedback routes would result in a significant delay of non-preferred modality inputs compared to preferred modality ones. Several anatomical studies separate specific and nonspecific thalamocortical pathways based on calcium binding protein expression and cortical termination pattern. Parvalbumin-expressing projections transmitting retinotopically- or tonotopically- specific sensory information from thalamic nuclei such as LGNd and MGNd target the middle layers of the cortex in a topographically organized manner. Non-specific inputs, that is those that do not convey a precise, focal representation of the receptor surface, are transmitted by the calbindin expressing neurons of the so-called “thalamic matrix,” and project more widely to the supragranular layers of cortex (Jones 1998a, 1998b). The different laminar activation pattern of evoked and phase reset responses (Fig. 1; see also Lakatos et al., 2007), link these to the specific and nonspecific input pathways, respectively. Their differing functional effects support the conclusion that evoked responses are related to driving, and phase reset is related to modulatory thalamocortical inputs.
The control of modulatory thalamocortical inputs by attention or stimulus salience must involve some form of inhibition, and since inputs from the thalamic reticular nucleus (TRN) are the most prevalent inhibitory inputs to thalamocortically projecting neurons (for a review see Guillery et al., 1998), it is reasonable to propose that the TRN plays an important role in the inhibition/disinhibition of modulatory thalamocortical projections that reset ongoing oscillatory cortical activity. The circuitry between the thalamus, TRN and cortex is ideally suited to enhance inputs related to salient stimuli while simultaneously suppressing others (for a review see Zikopoulos and Barbas 2007). In addition, the TRN receives extensive inputs from prefrontal cortical areas (Zikopoulos and Barbas 2006), which have been implicated in selective attention (for a review see Miller and Cohen, 2001), thus prefrontal pathways could convey top-down influences on the modulatory thalamocortical projections through the TRN. The TRN also links different modality nuclei of the dorsal thalamus (Crabtree et al., 1998; Crabtree and Isaac, 2002), which is a likely substrate for the non-modality specific characteristics of modulatory thalamocortical projections. This sort of connectivity could provide a mechanism for the interaction of competing transmissions from modality specific thalamic regions in a natural multisensory context, thus for the dynamic selection of the leading sense.
Our findings outline a mechanism for the control of ongoing oscillatory activity, or neurophysiological context, by phase reset related to attended inputs. This mechanism appears to be supramodal, meaning that even in primary cortical areas, the ability to cause phase reset is not constrained by the preferred or primary modality. Our results show that attention plays a key role in determining which inputs can reset ongoing oscillations. However, we propose that the dominant sensory modality, or the leading sense, is dynamically changing and is determined based on the relative salience of stimuli across modalities. By orchestrating the interplay between context and content, the phase reset of ongoing oscillations by the leading sense can enhance the perception, and in theory, can change the perceived temporal and spatial characteristics of accompanying stimuli in other modalities leading to many of the widely observed multisensory illusions.
Electrophysiological data analyzed in this study were recorded in 19 penetrations of area A1 and 25 penetrations of area V1 in 3 male macaques using linear array multi-contact electrodes (150 or 200 μm intercontact spacing). During the experiments the monkeys were performing an intermodal selective attention task, which required them to attend to and discriminate stimuli within one modality while ignoring stimuli in the other modality. The standard visual stimulus (86%) presented centrally consisted of a 10 μs long, red light flash subtending 12 retinal degrees. Deviant stimuli (14%) differed slightly in intensity. The auditory standard stimulus (86%) was a 100 ms long 2 kHz tone (70 dB SPL), deviants (14%) differed slightly in frequency. The stimulus onset asynchrony within one modality was on average 650 ms, while between auditory and visual stimuli was 300 ms. Before each selective attention trial block, a cueing trial block consisting of stimuli in one modality alone instructed the monkey which modality to attend to in the upcoming block.
During the experiments we continuously recorded laminar profiles of field potentials and MUA. Using the field potentials we calculated one-dimensional CSD profiles to minimize the effects of volume conduction and to estimate the net transmembrane current flow. In the present study we only analyzed electrophysiological activity related to standard stimuli, because from trial block to trial block deviants were varied in intensity (visual)/frequency (auditory), meaning that we could not pool them over trial blocks for analysis. Future studies are needed to determine whether ignored deviants result in greater phase resetting due to attentional capture. The first standard of each stimulus train was also excluded from the analysis because these stimuli are inherently salient. Details of the surgery, behavioral task, electrophysiology and data analysis are described in the Supplemental Information.
We thank Dr. Ashesh D. Mehta and Dr. Istvan Ulbert for their invaluable assistance in collecting the data. Support for this work was provided by NIMH grants MH061989 and MH060358.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.