|Home | About | Journals | Submit | Contact Us | Français|
The effect of stimulus modulation rate on the underlying neural activity in human auditory cortex is not clear. Human studies (using both invasive and noninvasive techniques) have demonstrated that at the population level, auditory cortex follows stimulus envelope. Here we examined the effect of stimulus modulation rate by using a rare opportunity to record both spiking activity and local field potentials (LFP) in auditory cortex of patients during repeated presentations of an audio-visual movie clip presented at normal, double, and quadruple speeds. Mean firing rate during evoked activity remained the same across speeds and the temporal response profile of firing rate modulations at increased stimulus speeds was a linearly scaled version of the response during slower speeds. Additionally, stimulus induced power modulation of local field potentials in the high gamma band (64–128 Hz) exhibited similar temporal scaling as the neuronal firing rate modulations. Our data confirm and extend previous studies in humans and anesthetized animals, supporting a model in which both firing rate, and high-gamma LFP power modulations in auditory cortex follow the temporal envelope of the stimulus across different modulation rates.
An important clue to understanding the nature of auditory representations concerns the effect of stimulus duration and stimulus presentation rates on neuronal activity. Behaviorally, the perception of speech and music are relatively robust to such changes. Thus within certain limits, human listeners can comprehend others whether they are speaking fast or slow. Similarly, humans can recognize a melody even when it is played at a different tempo. In such cases, both the duration of individual stimuli (e.g., the duration of each note) and the stimulus presentation rate (e.g., the number of notes per second) are varied. The effect of such changes on neuronal activity in human auditory cortex is not known.
Previous animal studies have examined alterations of the temporal envelope of species-specific vocalizations on the neural activity in primary auditory cortex of anesthetized cats and marmoset monkeys [Gehr et al., 2000; Gourevitch and Eggermont, 2007; Wang et al., 1995]. Firing rate in cat auditory cortex did not dramatically change when a 1.5 time-expanded or 0.75 time-compressed meow was presented [Gehr et al., 2000; Gourevitch and Eggermont, 2007]. In monkeys on the other hand, firing rates evoked by time-compressed and time-expanded twitter calls were lower compared with those evoked by the natural call [Wang et al., 1995].
In humans, noninvasive techniques such as functional magnetic resonance imaging (fMRI) and Magnetoencephalography (MEG) have been used to probe the effect of stimulus rate on activity in auditory cortex. Ahissar et al. presented subjects with speech stimuli (sentences) at different compression rates while measuring the MEG signal in auditory cortex [Ahissar et al., 2001]. In their study speech comprehension correlated with the degree of frequency-matching and phase locking between the power spectra of the cortical MEG signal and the power spectra of the stimulus envelope. In another study, Peelle et al. reported that the fMRI signal in lateral temporal and superior temporal cortex increased when subjects listened to sentences increasingly compressed in time [Peelle et al., 2004]. Other neuroimaging studies have also demonstrated a correlation between signal strength in Heschl's gyrus and stimulus presentation rate [Binder et al., 1994; Dhankhar et al., 1997; Price et al., 1992; Zatorre and Belin, 2001]. Invasive recordings in neurosurgical patients have shown using intracranial EEG that at the population level auditory cortex can time lock to stimulus envelope [Brugge et al., 2008, 2009; Nourski et al., 2009]. The effect of stimulus modulation rate in human auditory cortex at the level of single cells is not yet known.
In the current study, our aim was to examine the effect of stimulus modulation rate in human auditory cortex at the neuronal and population level. To that end, we simultaneously recorded the spiking activities of single cells and the local field potentials in neurosurgical patients while they were exposed to audio-visual video clips presented at various temporal modulation rates. Our results demonstrate that spiking activity, and local field potentials in the high-gamma band (64–128 Hz) track the stimulus envelope and that firing rate is preserved across all stimulus modulation rates.
We created a 9-min audio-visual clip starting with 14 s of silent period, followed by 8:40 min of an unedited audio-visual segment containing speech, music, explosions, and background noise, taken from the movie “The Good, The Bad, and The Ugly” (starting from minute 38:25 in the original film). The clip ended with 6 s of a silent grey screen. A clip of the same audio-visual sequence but at double speed was created by dropping every second frame of the movie (Adobe Premiere, Adobe Systems). The duration of the clip was 4:30 min. The sound wave of the clip was compressed in time using GoldWave's time warp filter with search range of 50 ms and window size of 25 ms (Goldwave, Goldwave Inc.). This algorithm is based on the WSOLA algorithm which minimally affects perceived aspects of the sound (like timbre and pitch), [Verhelst Wener, 1993]. In a similar fashion a clip at quadruple speed lasting 2:15 min was prepared by applying the same scheme described above on the double speed clip. For shape and power of stimulus soundwave see Supporting Information Figures S1 and S2. For soundwave envelope and its auto-correlation see Supporting Information Figure S3. While the speech in the normal and double speed clips was intelligible, in the quadruple speed it was incomprehensible. During each experimental session, each clip was presented twice in the following order: quadruple speed, double speed, and normal speed amounting to a total of six clip presentations. This order was chosen so that the most comprehensible condition (the normal speed) was presented last and thus minimally affects the comprehension during the earlier (and faster) presentation rates. Between clip presentations there was a rest period of few minutes. The patients’ task was to follow the plot. The normal speed data from these patients has been described in previous publications [Bitterman et al., 2008; Mukamel et al., 2005].
In three sessions with Patient 1 and another session with Patient 2, we presented various movie stimuli at normal speed either with or without the audio content. Thus, we first presented the movie without the audio (visual-only run, by turning off the volume of the speakers), and then we presented the audio-visual content of the same movie twice (by turning the volume back on). In two sessions (one with Patient 1 and one with Patient 2) we managed to record an additional visual-only run at the end. With Patient 2, the movie we used was “The Good, the Bad, and the Ugly” (same as in the Multiple Speeds experiment described above). In the first session with Patient 1 we used a 4:54 min long segment taken from the movie “Pretty Woman” (starting at time 01:02:00 h in the original film), and in the two other sessions we used a 6:00 min segment taken from the movie “Sister Act 2” (starting at time 58:30 min in the original film). To keep the patients engaged in the experiment after multiple presentations of the GBU stimulus, the movies in the control sessions were chosen by the patients.
Extracellular activity of single and multi units were obtained from two patients (patient 1: 39-year-old female; patient 2: 21-year-old male) with pharmacologically intractable epilepsy, implanted with intracranial depth electrodes to identify seizure focus for potential surgical treatment. In both patients, the left hemisphere was dominant for language. Electrode location was based solely on clinical criteria. Electrodes were positioned bilaterally in Heschl's gyrus (see Fig. 1 for electrode location on 2D structural MRI; Table I provides Talairach coordinates). Each electrode terminated in a set of nine 40-μm platinum-iridium microwires [Fried et al., 1999]— eight active recording wires, referenced to the ninth. Signals from these microwires were recorded at 28 kHz. Raw signal was band-pass filtered between 1 Hz and 9 kHz and recorded using a 64-channel acquisition system (Neuralynx, Tucson, AZ). To detect spiking activity, the data was band-pass filtered offline between 300 and 3,000 Hz, and thresholded by using a cut-off of five standard deviations above the median of the filtered signal. All events passing this threshold were grouped into clusters using Super Para-magnetic Clustering algorithm [Quiroga et al., 2004]. The different groups were labeled as noise, single, or multi units. Similar to [Quiroga et al., 2005], the classification between single unit and multi-unit was done visually based on the following: (1) Average spike shape and its variance; (2) the ratio between the spike peak value and the noise level; (3) the inter-spike interval distribution of each cluster; and (4) the presence of a refractory period for the single units (that is, less than 1% of spikes within less than 3 ms inter-spike interval).
To verify electrode position, CT scans following electrode implantation were coregistered to the preoperative MRI using Vitrea® (Vital Images). Patients provided written informed consent to participate in the experiments. The study was approved by and conformed to the guidelines of the Medical Institutional Review Board at UCLA. For further methodological details the reader is referred to previous publications [Fried et al., 1999; Mukamel et al., 2005; Quiroga et al., 2008].
The raw data was extracted and 60-Hz electrical noise was removed by applying a notch filter (4th order butter-worth filter between 59.5 and 60.5 Hz; Matlab Mathworks). No significant power at higher harmonics was observed (e.g., at 120 Hz). Artifacts 5 standard deviations above or below the signal median (e.g., due to amplifier saturation as a result of excessive movement of the subject) were detected and removed from further analysis. Such artifacts constituted on average less than 0.5% of the data in a given recording session (range 0–2% on individual recording session). Next, the signals were down sampled from the original 28 kHz to 1 kHz. The resulting signal was band passed to the different frequencies as mentioned in the text using a 4th order butterworth filter (Matlab Mathworks).
For each cell, spike trains from the two movie presentations at normal speed were binned in windows of 200 ms and the Pearson correlation coefficient between the two binned spike trains was computed. Cells exhibiting correlation coefficients greater than 0.1 were considered responsive to the movie and taken for further analysis. This criterion was chosen as a conservative measure since even for the quadruple speed data (2:15 min = 135,000 ms) and 200-ms bins (135,000/200 = 675 independent measures), a correlation coefficient of 0.1 is still significant at a P = 0.01 level.
For each neuron and each presentation rate, we recorded two spike trains corresponding to the two movie repetitions (Fig. 2A,C). Firing rates in each spike train were smoothed using a sliding square bin (steps of 1 ms) and the correlation coefficient between the two temporally smoothed spike trains was computed [Schreiber et al., 2003]. The size of the square bin ranged from 20 to 4,000 ms. Since the correlation level reached plateau around 500 ms, we consequently used a bin size of 500 ms for analysis using correlations (see below). In Figure 2C, the same procedure described above was conducted on the rectified (absolute) value of the high-gamma band LFP's.
The LFP was extracted from each electrode from which single neurons were detected (N = 5 and 7 for the first and second sessions with Patient 1, and N = 9 for one session with Patient 2; Fig. 2B). Next, the LFP was band-pass filtered between 1–4 Hz, 4–8 Hz, 8–16 Hz, 16–32 Hz, 32–64Hz, and 64–128 Hz. For each frequency band, we took the absolute value of the LFPs and similar to the spiking activity, we binned them using consecutive 200-ms windows. Finally, we computed the correlation between the first and second run of the normal speed experiment for the different LFP bands.
For each neuron (N = 25 cells), we averaged the total number of spikes recorded during each stimulus presentation speed (N = 2 stimulus presentations for each speed) and divided by the average number of spikes recorded during the normal speed presentation (Fig. 3A).
The normal speed spike-trains of each neuron were divided into 20-s segments (Fig. 3E). The double speed spike trains of the same neuron were divided into the corresponding 10-s segments. Firing rate ratio for each time segment was computed by dividing the average firing rate during the two double speed spike trains by the average firing rate of the two normal speed spike trains. Repeatability across runs was measured using the Pearson correlation coefficient between the two normal speed spike trains smoothed with a 500-ms square bin (see above).
The spike trains of double and quadruple speeds were stretched in the following manner. The original spike trains were represented as vectors of 0's and 1's at a resolution of 1 ms (1's representing the times at which a spike occurred). Each bin was duplicated in time so that both the duration and the spike count in the stretched signal are double relative to the original signal. Before calculating the correlation between signals (within speeds or across speeds), we smoothed both signals using a sliding square bin of 500 ms (see above). Since we recorded two runs for each stimulation speed, the average correlation within speeds was computed by averaging the Pearson correlation coefficient between the two runs across all cells (N = 25). On the other hand, the average correlation across speeds (e.g., speed 1 × stretched speed 2) was computed by averaging the correlations between the two speeds and two runs (1st run normal speed with 1st run stretched double speed, 1st run normal speed with 2nd run stretched double speed, 2nd run normal speed with 1st run stretched double speed, and 2nd run normal speed with 2nd run stretched double speed; N = 4 correlation values per cell × 25 cells). The same correlations (within speeds and across speeds) were computed also using the average population signal.
To compute the envelope of the stimulus sound wave and its relationship with the spiking/LFP activity, we took the absolute value of the raw sound wave, down sampled it from 41.1 to 1 kHz by taking the sum of all values in 1-ms bins.
Soundwave envelope and the average population spike-train were smoothed with a small bin size (20 ms) and the cross-correlation between the two signals was computed. The time lag at which the cross-correlation was maximal was taken as the time latency between the population spiking activity and stimulus soundwave.
We took the absolute value of the normal speed high Gamma-band LFP and smoothed it with a sliding square bin of 500 ms. The double speed Gamma-band LFP was “stretched” by duplicating the value in each 1-ms bin and then smoothed in a similar fashion.
Figure 1 displays the location of electrodes on the 2D MRI images for both patients while Talairach coordinates are provided in Table I. In Patient 1, the electrodes in the left hemisphere were on the posterior-medial border between Heschl's gyrus and Planum Temporale. The electrodes in the right hemisphere were in the posterior medial portion of Heschl's gyrus, most probably in primary auditory cortex as indicated by probabilistic histological mapping [Rademacher et al., 2001]. In Patient 2, the electrode in the left hemisphere was in the anterior lateral part of Heschl's gyrus and the electrode in the right was in the anterior medial part of Heschl's gyrus. These regions are slightly anterior to where primary auditory cortex is typically located [Rademacher et al., 2001].
First, we assessed to what extent the recorded unit activity was evoked by our stimulus (see Methods). We recorded from a total of 30 cells (20 single units and 10 multi units) and 25 were responsive to the movie sound-track with an average inter-run correlation of 0.25 0.14 (mean S.D.; Table II).
The first question we addressed was what temporal resolution captures best the sensory evoked responses given the naturalistic nature of our stimuli. To that end, we calculated for each neuron the correlation level between two repeats of the same stimulus under different levels of temporal smoothing. This was conducted separately for the normal, double, and quadruple speeds. Figure 2A shows the results for the entire population of responsive neurons. We found a highly consistent result for all three presentation speeds tested, where the correlation reached a plateau at bin sizes of ~500 ms (for complete set of spiking time courses see Supporting Information Figs. S4–S5). At the population level, responses seemed to be phase-locked to the stimulus envelope across all stimulus modulation rates used in our experiment (Supporting Information Figs. S4–S5). The Pearson correlation coefficient (r value) between the smoothed population spiking activity and stimulus envelope was 0.68 (df = 1,080, P < 10–4), 0.66 (df = 540, P < 10–4), and 0.68 (df = 270, P < 10–4) for the normal, double and quadruple speeds, respectively. For individual neurons, the correlations were 0.19 ± 0.11, 0.21 ± 0.11, and 0.23 ± 0.12 (mean ± S.D.) and significant at P < 10–4 level. In addition, we calculated the latency between population spiking activity and stimulus soundwave envelope. To that end, we computed the time-lag at which maximal cross-correlation between the minimally-smoothed population spiking activity and the stimulus soundwave was achieved (see Methods). The latencies and correlations across the different modulation rates were 40 ms (r = 0.17), 41 ms (r = 0.18), and 40 ms (r = 0.21) for the normal, double, and quadruple speeds, respectively.
In addition to the spiking activity, we tested what aspect of the LFP signals was evoked by our stimulus. We first filtered the LFP signals into six different frequency bands (1–4 Hz, 4–8 Hz, 8–16 Hz, 16–32 Hz, 32–64 Hz, and 64–128 Hz) and extracted the power changes in each frequency band by rectifying the filtered signals. We then assessed the level of correlation across the two normal speed runs (Fig. 2B). As shown in the graph, only LFPs at the high gamma band frequencies (64–128 Hz) displayed significant correlation across the two normal speed runs (r = 0.26 ± 0.10, P < 10–4; for complete set of gamma band LFP time courses see Supporting Information Figs. S6–S7). Next, similar to the spiking activity, we examined how the level of correlation between runs depends on the bin size of temporal smoothing (Fig. 2C). As was the case for spiking activity, sensory-evoked correlations increased with increasing temporal smoothing and reached a plateau around 500 ms. Consequently, we used 500 ms as our temporal bin size for further analysis of both spikes and the power changes in the high gamma band LFPs. The correlation between the soundwave envelope and high Gamma LFP power modulations was r = 0.52, 0.55, and 0.49 for the normal, double, and quadruple speeds. The latencies and correlations between the minimally smoothed LFPs and soundwave envelope across the modulation rates were 62 ms (r = 0.26), 68 ms (r = 0.28), and 69 ms (r = 0.26).
We next examined whether neuronal firing rates were affected by the modulation rate of the stimulus. Overall the total number of spikes emitted during the entire stimulus presentation was proportional to the duration of the stimulus (Fig. 3A), indicating that firing rates were preserved and invariant to modulation rate. Thus firing rates for normal, double, and quadruple speeds were 3.46 ± 3.1, 3.45 ± 2.80, and 3.62 ± 3.11 Hz, respectively (mean ± S.D.). These differences were not significant (one-way ANOVA, F(2, 147) = 0.05, P = 0.94), suggesting that firing rate is invariant to stimulus modulation rate.
A trivial explanation for preservation of firing rates across the different stimulus modulation rates is that during most of the stimulation period the neurons simply did not respond to the stimulus and hence neuronal firing was dominated by spontaneous activity. To examine this possibility, we checked whether firing rate invariance consistently holds throughout the entire experiment. To this end, we first chunked the 9-min (4.5-min) spike-trains emitted during the normal (double) speed stimulation into shorter blocks of 20 s (10 s). Next, for each neuron, and each time segment, we compared the firing rate during normal speed stimulation with the firing rate during the corresponding double speed stimulation (Figs. 3B–D). As shown in the graphs, there was a linear correspondence (with slopes close to 1) between the firing rates during different speeds regardless of the absolute firing rate level. Most importantly, even during high levels of firing rate in both speeds, presumably evoked by the stimulus, the firing rate across stimulation speeds remained the same, indicating that firing rate invariance was sensory-driven rather than driven by spontaneous activity.
While high firing rates imply evoked activity, a more demanding test for such sensory evoked responses is the degree of reproducibility across repeated stimulation. We therefore performed a second analysis as follows: first, the spike-trains of each neuron evoked during normal (double) speed stimulation were chunked into 20 s (10 s). For each time segment we calculated (a) the degree of correlation between the two runs of the normal speed stimulation, and (b) the firing rate ratio across speeds (average firing rates during double speed stimulation divided by average firing rates during normal speed stimulation). If the firing rate invariance across speeds was merely due to noise, then we would expect it to depend on the degree of reproducibility, i.e., that during time segments with reproducible large fluctuations in the sensory-driven responses (high correlation values across the two normal speed runs), the firing rate ratio across speeds would be different than 1. Alternatively, if the firing rate during sensory-evoked responses was invariant to stimulus modulation rate, we should expect to see similar firing rates across speeds (ratio of 1) during highly reproducible time segments. Figure 3E displays the degree of firing rate invariance as a function of the reproducibility across repeated presentations. The results indicate that firing rate invariance was maintained even for highly reproducible responses, providing further support to the notion that a linear reduction in spike count across stimulation speeds was not due to spontaneous activity.
Next, we compared the distribution of inter spike intervals (ISIs) across the different speeds (Fig. 4A). An increase in the proportion of spikes with short ISIs during faster stimulation would imply an increase in the evoked instantaneous firing rates. The overall distribution of ISI across stimulus modulation rates was largely similar. Moreover, neurons tended to fire in bursts where very short ISIs (between 4 and 6 ms) were most frequent. Even when focusing on these very short ISIs we could not reveal a significant effect of speed (one-way ANOVA on the percent of spikes with ISI between 4 and 6 ms, F(2, 147) = 0.24, P = 0.78). In addition, we computed the autocorrelation of the population spike trains recorded during the different speeds (Fig. 4B). Evoked spike trains of similar duration should result in auto-correlation functions with similar width while a decrease in duration of evoked activity should result in a narrower auto-correlation function. We found a clear decrease in the width of the autocorrelation function as a function of stimulus modulation rate, suggesting that the duration of evoked responses matched the duration of the stimulus.
To further explore the notion that the duration of neural responses matched the duration of the auditory stimuli, we directly examined whether the responses to faster stimulus modulation rates could be modeled as the responses to normal speed stimulation “condensed” in time. To this end, the double speed spike-trains were first “stretched” in time so that they will be of the same length as the normal speed spike trains (i.e., from 4:30 min to 9:00 min). This procedure involved simply duplicating in time each millisecond time bin (containing 0 or 1 signifying spike occurrence) of the double speed spike-train. Thus, both the duration and the spike count of the original double speed spike train were doubled. Next, the spike trains were smoothed with a 500 ms square bin (see Fig. 2 above, and Methods) and the correlation between the smoothed normal speed and smoothed “stretched” double speed spike trains was computed.
Figure 5A depicts the smoothed population activity of the normal speed superimposed with the “stretched” population activity of the double speed. The graph presents a time segment of 120 s, while full-length time-courses are provided in Supporting Information Figure 8. As can be seen, there is a tight correspondence between the two signals (r = 0.66, df = 1,080, P < 0.01; Fig. 5B right panel). The data of the double speed and quadruple speed were compared in a similar manner and also exhibited high correspondence (r = 0.68, df = 540, P < 0.01). In addition, we compared the normal speed and quadruple speeds by applying the stretch procedure twice to the quadruple speed spike train (r = 0.59, df = 270, P < 0.01). In addition to examining population activity, we also conducted the above analysis on the spike trains of individual neurons (Fig. 5B left panel). The correlations of individual neurons were substantially reduced (although still highly significant) compared to the population responses (Fig. 5B right panel). Similarly, the high gamma LFPs power modulations also exhibited strong correlation between the normal speed and double speed time course after applying the same stretch procedure (Fig. 5C). The full-length high Gamma LFP power modulation time courses for the entire experiment are provided in supplementary Fig. 9. The correlation and latency between the population spiking activity and the high Gamma LFP power modulation were r = 0.52, and 6 ms for the normal speed, r = 0.56, and 9 ms for the double speed, and r = 0.6 and 6 ms for the quadruple speeds (LFP's preceding the spikes). Finally, since neurons in auditory cortex and superior temporal gyrus have been shown to have audio-visual interactions [e.g., Ghazanfar et al., 2005; Kayser et al., 2008; Reale et al., 2007] we wanted to see whether the visual component of our stimulus could have driven our results. Therefore, we recorded neural activity from these electrodes during movie presentation either with or without sound (i.e., audio-visual or visual-only stimulation respectively; see Methods). We recorded from a total of 34 cells in four sessions (Patient 1, three sessions; Patient 2, one session; Supporting Information Table S1). Critically, while 25/30 cells displayed significantly reproducible responses upon consecutive presentations of the audio-visual stimulus (see above), no cells displayed reproducibility between their activity during audio-visual and visual-only stimulation (r = 0.03 ± 0.03; see also Supporting Information Table S1 and Methods). This lack of reproducibility indicates that the recorded cells respond differently to audio-visual stimulation and visual-only stimulation. However it does not rule out the possibility that these cells respond in a consistent manner to the visual content of the stimulus and in a consistent, albeit different manner, when the visual content is presented simultaneously with the auditory content of the stimulus. To examine this possibility, we recorded two repetitions of the visual-only stimulation (as opposed to one repetition) in two sessions. The reproducibility between the first and second visual-only stimulation was again extremely weak (r = 0.03 ± 0.04, N = 10 cells; Supporting Information Table S1). Supporting Information Figure S10 displays the population spiking activity time courses during audio-visual and visual-only stimulation during one session.
Finally, we tested whether the LFP power exhibited consistent responses between audio-visual and visual-only stimulation. Although high gamma band LFP responses were found to be highly reproducible across repeated audio-visual stimulations (see above), correlations between audio-visual and visual-only stimulations were near zero for all frequencies including the high gamma band (Supporting Information Figs. S11–S12). Overall, both spiking activity and high gamma band LFP power indicate that, in the regions we recorded from, responses were predominantly evoked by the auditory content of our stimuli.
In the current study, we recorded extra-cellular neural activity and LFPs in human auditory cortex while presenting ecologically relevant auditory stimuli at varying modulation rates. Our main findings are that firing rate was invariant to the modulation rate of the stimulus (see Fig. 3). The instantaneous firing rate remained similar across speeds (Fig. 4A) despite the fact that the duration of auditory responses was shortened (Fig. 4B). Furthermore, the temporal profile of firing rate during increased stimulus modulation rate was a linearly scaled version of the normal speed spike trains (see Fig. 5). These results support a model in which firing rates are preserved with increasing modulation rate of the stimulus, entailing a proportional reduction in both the number of spikes and duration of stimulus-evoked responses. Firing rate tracked the stimulus envelope (at least for the speeds tested in the current study). Furthermore, our analysis of LFPs suggests that only power modulations in the high gamma frequency band exhibited evoked responses to our stimulus (Fig. 2B). Similar to spike firing rate, the temporal profile of LFP activation during the faster stimulus presentations was also a linear scaling of the slower speeds (Fig. 5B, and Supporting Information S9). Both spike and LFP responses were predominantly evoked by the auditory content of our audio-visual stimulus (Supporting Information Table S1 and Figs. S10–S12).
There is a growing body of research pointing to multiplicity of auditory areas in the human brain [e.g., Formisano et al., 2008; Howard et al., 2000; Morosan et al., 2001]. It should be realized that single unit data from human auditory cortex are extremely rare and electrode placement is strictly according to clinical criteria. Therefore it was unfeasible to address this complexity in the current study. Furthermore, the correlation between macro-anatomical landmarks and functional boundaries in auditory cortex is limited [Leonard et al., 1998; Morosan et al., 2001]. Our estimates, based on comparison with probabilistic maps derived from postmortem human brains, indicate that while in Patient 1 the recordings were probably localized in primary auditory cortex (A1), in Patient 2 they were slightly anterior. Although differences in latencies, amplitude and phase tracking ability have been demonstrated between antero-lateral and postero-medial aspects of auditory cortex [Brugge et al., 2008, 2009; Liegeois-Chauvel et al., 2004], in our limited sample we did not see marked differences between the two patients. The response latencies in our dataset were a bit long (~40 ms) suggesting that our recordings were not in primary auditory cortex. Previous animal studies have demonstrated modulation of spiking activity and LFP in primary auditory cortex by visual [Ghazanfar et al., 2005; Kayser et al., 2008] and somatosensory [Lakatos et al., 2007] stimuli. Such multisensory modulations were more prevalent in belt regions compared with A1 [Ghazanfar et al., 2005; Kayser et al., 2008]. In the current dataset, the neural responses were predominantly evoked by the auditory content of our audio-visual stimulus (Supporting Information Table S1 and Figs. S10–S12) thus suggesting that our recordings were closer to A1 than belt regions.
Previous studies have demonstrated precise phase-locking of local fields to stimulus envelope [Liegeois-Chauvel et al., 2004; Steinschneider et al., 1999], and action potentials to tone stimuli [DeWeese et al., 2003]. The effect of increased modulation rate of auditory stimuli has been previously explored using speech, and other species-specific vocalizations. Compatible with previous studies in anesthetized cats, we revealed that firing rates were invariant across different stimulus modulation rates [Gehr et al., 2000; Gourevitch and Eggermont, 2007]. Likewise, in anesthetized monkeys, firing rate has been reported to track the soundwave envelope of the stimulus [e.g., Nagarajan et al., 2002; Wang et al., 1995]. In humans, the MEG signal in auditory cortex has been reported to track the soundwave envelope of sentences [Ahissar et al., 2001] and that this tracking ability correlated with sentence comprehension. Recently, it has been shown that event-related band power modulations in the high frequencies (equivalent to our high gamma LFPs), and average evoked potentials in core auditory cortex can track auditory speech envelope even at rates too fast for comprehension [Nourski et al., 2009]. These results suggest that envelope tracking ability in core auditory cortex is not a limiting factor in speech comprehension. Our LFP results support these findings by demonstrating stimulus envelope tracking even at the quadruple speed stimulation where speech comprehension was absent. Furthermore, we extend these findings to the single unit level.
While our study focused on firing rates and power modulations of the LFP signals, other electrophysiological measures in auditory cortex have been demonstrated to play an important role in auditory perception. The phase of MEG signals in human auditory cortex has been demonstrated to track pitch contour [Patel and Balaban, 2000]. Furthermore, it has been shown that the MEG signal phase in the theta band reliably tracks spoken sentences and can be used to discriminate between different sentences even after 50% compression [Luo and Poeppel, 2007]. More recently, it has been shown in alert monkeys that the phase of the LFP signal in low frequencies carries additional and complementary information about the stimulus compared with firing rates [Kayser et al., 2009]. An additional physiological measure that has been shown to correlate with auditory signals is the average evoked potential [AEP; Nourski et al., 2009]. Since our dataset only contained two repetitions per given stimulus, it is difficult to robustly assess the degree to which the LFP signal phase or AEP in our experiments tracked the auditory signal since both measures require averaging across many trials.
The current results confirm and extend previous reports pointing to the gamma band as an index of global firing rate comodulation in auditory cortex [Nir et al., 2007; Rasch et al., 2008; Steinschneider et al., 2008]. The fact that temporal profile of gamma band modulation during the faster stimulus presentations was a linear scaling of the slower speeds (Fig. 5 and Supporting Information S9) provides further support to the robustness of the correlation between γ-band LFP power modulations and population spiking activity. One potential concern is that gamma band power modulations are inevitably contaminated by individual spikes. However, we have previously shown that robust correlations between firing rates and gamma power modulation are preserved even upon removal of spikes from the LFP trace [Nir et al., 2007].
It could be argued that the observed firing rate invariance to stimulus speed may be due simply to weak auditory responses—i.e., that spontaneous ongoing activity, unrelated to the stimuli may have dominated our recorded spike trains. However, our results are inconsistent with such an account. First, we find high correlations between repeats of the same stimulus (e.g., Fig. 2 and Supporting Information Fig. 4 and 5), which clearly indicate robust and reliable sensory evoked responses to our stimuli over extended time periods. Second, at the population level, the responses to normal speed presentation are strikingly similar to the “stretched” responses to faster presentations (see Fig. 5), further demonstrating that the auditory activity is evoked by the stimuli to a large extent. Third, we found that firing rate invariance was maintained even at shorter time segments exhibiting high firing rates at both speeds (Fig. 3B–D). Fourth, firing rate invariance was maintained also for the most reproducible (evoked) time segments. Fifth, the normal speed data from one patient has been previously shown to correlate strongly with auditory evoked fMRI responses of healthy subjects [Mukamel et al., 2005]. Finally, careful analysis of the response properties of these neurons revealed an exquisite and highly complex auditory selectivity also in shorter responses [Bitterman et al., 2008]. Thus, the firing rate invariance we find in our data cannot be attributed simply to low signal to noise in the sensory responses. The average firing rate we recorded (~3.5 Hz) is compatible with previous reports in humans [~3.7 Hz; Howard et al., 1996]. Since the pitch of the stimulus was conserved by our compression algorithm, it could be that the invariance of firing rate and high gamma band LFP power modulations is related to coding this attribute of the stimulus across the different stimulus modulation rates. Indeed the high gamma LFP band in the regions we recorded from has been shown to correlate with pitch salience [Griffiths et al., 2009].
Our results show that a large fraction of the response was dominated by modulations in firing rates over relatively large temporal bins in which the individual action potentials were not precisely time locked to the auditory stimulus. Such long temporal windows have been suggested to be relevant for encoding acoustic features such as syllables [Poeppel, 2003]. Whether or not word comprehension relies on the degree of phase-locking of these neurons to the stimulus is still open, since at quadruple speed the speech was not intelligible but the musical theme was. Additional experiments will be needed to address this issue.
The current data are compatible with a previous imaging study by Poldrack et al.  who showed that the fMRI signal in superior temporal gyrus/planum temporale decreased linearly with speech compression, reflecting higher stimulus modulation rates. In the fMRI study, block duration and inter-stimulus interval were kept constant while individual sentences were compressed at different levels. It was found that the fMRI signal decreased with increased stimulus compression. Similarly, in the current study we show at the single neuron level a linear decrease in spike count with increased stimulus modulation rate (higher compression level). This correspondence between the current dataset and the fMRI data described by Poldrack et al., supports our previous finding showing that the fMRI signal in auditory cortex correlates both with gamma band LFP power and with the underlying spiking activity [Mukamel et al., 2005; Nir et al., 2007].
Our results demonstrate the following:
These results suggest that in human auditory cortex, temporal modulations in neural firing rate and LFP power in the high gamma frequency band (64–128 Hz) scale in time to accommodate fluctuations in stimulus modulation rate during natural audition.
The authors thank the patients for their cooperation in participating in the experiments. They thank I. Nelken for fruitful comments on the manuscript. E. Ho, E. Behnke, and T. Fields for technical assistance; K. Upchurch and N. Solomon for help with anatomical localization of electrodes; I. Wainwright and B. Salaz for administrative help.
Contract grant sponsors: NINDS; Human Frontiers Science Program Organization (HFSPO)
Additional Supporting Information may be found in the online version of this article.