|Home | About | Journals | Submit | Contact Us | Français|
Just-noticeable differences of physical parameters are often limited by the resolution of the peripheral sensory apparatus. Thus, two-point discrimination in vision is limited by the size of individual photoreceptors. Frequency selectivity is a basic property of neurons in the mammalian auditory pathway1,2. However, just-noticeable differences of frequency are substantially smaller than the bandwidth of the peripheral sensors3. Here we report that frequency tuning in single neurons recorded from human auditory cortex in response to random-chord stimuli is far narrower than that typically described in any other mammalian species (besides bats), and substantially exceeds that attributed to the human auditory periphery. Interestingly, simple spectral filter models failed to predict the neuronal responses to natural stimuli, including speech and music. Thus, natural sounds engage additional processing mechanisms beyond the exquisite frequency tuning probed by the random-chord stimuli.
Sounds are decomposed to different frequency bands by the auditory periphery. Tonotopic (‘by frequency’) organization is kept throughout the auditory pathway, at least up to and including primary auditory cortex. In vision and somatosensation, the resolution of the peripheral sensors to a large degree determines overall behavioural discrimination capabilities. However, in the auditory system, frequency just-noticeable differences in well-trained subjects may be 30 times smaller than the presumed bandwidth of the peripheral filters (‘critical bands’, typically about a sixth of an octave in humans, as measured in psychoacoustical tests). Electrophysiological correlates of critical bands have been suggested4–6, and frequency just-noticeable differences can be derived by integrating information over a large population of neurons7, but there are currently no reports of a significant population of single neurons the bandwidth of which corresponds to the behavioural just-noticeable differences. Does the high-frequency resolution expressed behaviourally have explicit neural representation? If so, can high-frequency resolution explain the response patterns to complex sounds?
Responses of neurons in human auditory cortex were recorded from four patients with intractable epilepsy monitored with intracranial depth electrodes to identify seizure foci for potential surgical treatment8. Using clinical criteria, electrodes were implanted bilaterally in the transverse gyri of Heschl, loci of the auditory cortex (see Methods). Patients were presented with artificial random-chord stimuli at a resolution of six tones per octave (two patients) or 18 tones per octave (one patient), and with segments from the popular English-speaking western film “The Good, the Bad and the Ugly” (three patients, see Methods). Thus, for many neurons, the stimulus ensemble included both artificial stimuli and more structured stimuli. The artificial stimuli were designed to sample evenly the spectral range of the movie soundtrack. Results are based on 95 units recorded in four patients.
Figure 1 displays raster responses of one unit to the different frequencies in the six-tones-per-octave random-chord stimulus. Each frequency appeared simultaneously with two other frequencies selected essentially randomly. Only one of the 41 possible frequencies elicited excitatory responses in this unit. Furthermore, when a tone burst of that frequency appeared in the stimulus, a sustained response outlasting tone duration was elicited with high reliability. The lack of excitatory response to the two adjacent frequencies implies that this unit was more selective than the frequency resolution of the stimulus (six tones per octave).
Of 31 units from the two patients presented with the six-tones-per-octave random-chord stimulus, 27 had a narrow, well-circumscribed frequency response area. About half (14/31) showed reliable responses to tone bursts at a single frequency, with no consistent excitatory response to any other frequency. Thirteen units responded to two to three adjacent frequencies. The rest (4/31) exhibited more complex responses. The resolution of six tones per octave was thus too coarse directly to measure the spectral bandwidth of most units.
A high-resolution random-chord stimulus with 18tones per octave was presented to a third patient. Of 16 units recorded in this patient, 14 exhibited a highly elevated firing rate in response to a single frequency, with additional weaker, although significant, responses to only one or two adjacent frequencies. The average bandwidth of these units can be conservatively estimated at about a twelfth of an octave, in agreement with the results presented above (Fig. 2a). Figure 2b displays typical spectro-temporal receptive fields (called ‘artificial STRFs’ below) derived from responses to the random-chord stimuli by spike-triggered averaging. The best frequencies, defined as the frequency that elicited maximal response, ranged from 250 to 2kHz in this population (Fig. 2c). It is generally accepted that the frequency tuning curve of the auditory periphery in humans has a width of about a sixth of an octave3. Therefore, when presented with random chords, the great majority of auditory cortical neurons showed substantially better frequency selectivity than the auditory nerve.
The frequency discrimination performance based on responses in single trials was estimated using receiver operating characteristic (ROC) analysis. We compared the empirical spike count distributions elicited by the different frequencies and determined the lowest discrimination threshold for each of the 47 units tested with the random-chord stimuli. Performance was quantified by the probability of correct decision in a two-interval, two-alternative forced choice test. Discrimination threshold was set at 70.7%, as typically done in auditory psychophysics. In more than 60% of the excitatory cells (25/42) discrimination was above threshold for the smallest possible frequency difference tested, the spectral resolution of the stimulus (20/27 units tested with six tones per octave and 5/15 units tested with 18 tones per octave; see for example Fig. 3).
For these units, we linearly interpolated spike count distributions to simulate possible distributions at intermediate frequencies that were not actually tested (see Methods). Thresholds were again estimated by the smallest frequency interval that could be discriminated using these intermediate distributions. These thresholds are under estimates because maximum slopes of frequency response curves are bounded from below by linear interpolation. Even so, this procedure revealed units that had discrimination thresholds that matched and even exceeded the behavioural performance of naive human subjects9(Fig. 3e).
Do units also respond as narrow spectral filters when presented with natural sounds? We analysed responses elicited by nine-minute clips from the soundtrack of the feature film “The Good, the Bad and the Ugly”, shown twice in each recording session. The soundtrack contained approximately equal-duration segments of dialogue, music and background noise. The average firing rate was not significantly different between responses to the random-chord stimuli and responses to the soundtrack (paired t-test, t = 1.04, degrees of freedom d.f. 5 13, not significant), suggesting the soundtrack was, on average, as successful as random chords in driving neuronal responses, with comparable reproducibility (see Supplementary Information).
We estimated STRFs from responses to the soundtrack (called ‘natural STRFs’ below) using generalized reverse correlation techniques following ref. 10. The exquisite spectral filtering clearly apparent in the artificial STRFs was partially lost—natural STRFs were noisier and appeared to have richer structure (Fig. 4a). Nevertheless, there were similarities between natural and artificial STRFs estimated for the same unit. For the units recorded with both stimuli, the best frequency of the artificial STRF and the best frequency of the natural STRFs were highly correlated (r=0.7, d.f.=16, P 0.01; Fig. 4b). This agrees with the general finding that the best frequency is largely independent of auditory context11.
The first- and second-order statistics characterizing the soundtrack were fully sampled by the random-chord stimuli (verified by comparing the joint distribution of spectral and temporal modulations in the two stimulus ensembles) and the calculation of the natural STRFs corrected for second-order correlations in the stimulus10. Thus, if neurons linearly integrate their spectro-temporal input, natural and artificial STRFs should be essentially equivalent. However, the soundtrack also contained higher-order spectral correlations the effects of which on the STRFs could become apparent if the neurons had significant nonlinearities. These effects could be the reason for the additional structure in the natural STRFs.
We addressed this by comparing the predictive power of the STRFs within and across context (random-chord stimuli or film soundtrack). If artificial STRFs predict responses to the soundtrack as well as (or better than) natural STRFs, or vice versa, we can conclude that the potential nonlinear mechanisms that are not captured by the STRFs have only a small effect on the neuronal responses. Alternatively, if each STRF predicts the responses to new sounds from the ensemble used to estimate the STRF better than does the STRF derived from the other sound ensemble, then it can be inferred that there are significant nonlinearities in the responses, with the natural sounds possibly engaging processing mechanisms different from those engaged by the artificial sounds.
For units recorded with both stimuli, predicted responses to one-minute segments of the soundtrack were generated with both artificial and natural STRFs (the natural STRF was estimated without using the responses to the segment whose responses were predicted). Predictive power was quantified by the correlation coefficient between the prediction and the actual response of the unit. The expected maximum correlation (estimated as the average correlation between responses to two presentations of the soundtrack) was 0.3 (ref. 12). The predictive power of the artificial STRFs on the soundtrack was notably low: 0.136 ± 0.14 (mean ± s.d.), about 40% of the expected maximum. More importantly, correlations were significantly higher within context: a natural STRF typically predicted the actual responses to a soundtrack segment better than did an artificial STRF, with an average correlation coefficient of 0.25 ± 0.14 (Fig. 4c), over 80%of the expected maximum correlation. A three-way analysis of variance (ANOVA) on STRF type x predicted segment x neuron showed a highly significant main effect of STRF type, F1,229 = 72, P0.01. The same general result was obtained when we used narrowband filters fitted to each unit instead of the artificial STRFs (see Supplementary Information). Similarly, natural STRFs were substantially less successful in predicting the responses to the random-chord stimuli than artificial STRFs (see Supplementary Information). Thus, stimulus encoding was not entirely determined by frequency selectivity. STRFs exhibited superior predictive power when tested with sounds that belong to the ensemble used to estimate them, suggesting that nonlinear mechanisms participate crucially in shaping the neuronal responses10.
Our results demonstrate that frequency tuning in the human auditory cortex is substantially narrower than that typically found in the auditory cortex of non-human mammals (except bats). Using pure tones under the commonly used barbiturate anaesthesia, the tuning width at suprathreshold levels was found to be about one octave in cats13 and about a third of an octave on average in rats14. Comparisons of tuning between awake and anesthetized animals within the same species have repeatedly shown that bandwidths are wider in the awake preparation (cats15, see review16; rats14). Surveys of tuning in the auditory cortex of the awake macaque reported bandwidths that were typically half to one octave17, and either very narrowly tuned neurons were rare17 or bandwidths were wider than a seventh of an octave18. In the only other report of a unit in human auditory cortex1, the width at half-height was at least one octave. The frequency tuning derived from STRFs is typically somewhat narrower than that derived from pure tone responses, but seems to be wider than the data shown here. For example, in deeply anaesthetized cats, the STRF width was about half an octave19. Thus, in mammalian responses, the typical selectivity of cortical neurons was worse, not better, than that found on the periphery of the same species. With the caution required by the small sample reported here, we propose that in contrast with animal studies, the spectral selectivity of neurons in human auditory cortex is substantially better than that of the auditory periphery.
These results are relevant to the apparent paradox of frequency hyperacuity demonstrated repeatedly in human psychoacoustics. Subjects with normal hearing, even untrained, successfully detect spectral differences substantially narrower than the presumed bandwidth of single auditory nerve fibres. Our results demonstrate that frequency differences smaller than 3% could be reliably detected from single-trial responses of single units in human auditory cortex. This value is comparable to the minimum detection threshold reported in untrained subjects9. Thus, the responses of one of these cortical neurons could, in principle, underlie behavioural performance on a single-trial basis. Tramo et al.20 reported that bilateral lesions of human auditory cortex cause significant elevations in frequency discrimination thresholds, suggesting a functional role for the electrophysiological findings reported here. Remarkably, thresholds (frequency ratios) after the lesions were about 10–20%, matching the peripheral tuning in humans. We therefore suggest that the neural responses we observed in human auditory cortex reflect a readout of information available in the activity of large neuronal ensembles in subcortical stations, and that the auditory cortex is necessary for this readout to be performed, resulting in the behavioural hyperacuity of frequency discrimination in humans.
Previous studies in alert human subjects have shown very selective responses in single neurons from other brain areas. Notably, Quiroga et al.21 reported highly specific responses to individual people or landmarks from a subset of medial temporal lobe neurons, suggesting an invariant, sparse code. The high selectivity reported here may be a counterpart of the same phenomenon, resulting in a sparse coding of frequency in auditory cortex. We can only speculate why a low-level cue such as frequency is represented so explicitly and predominantly in single neurons of human auditory cortex but not in the auditory cortex of other terrestrial mammalian species. There is evidence that frequency discrimination in humans is correlated with a number of cognitive skills, including language abilities22, working memory23 and learning capabilities24, but more research is needed to clarify this puzzle.
Extracellular single-unit recordings were obtained from four patients with pharmacologically intractable epilepsy, implanted with intracranial electrodes to identify seizure focus for potential surgical treatment. Electrode location was based solely on clinical criteria. All patients had electrodes placed bilaterally in Heschl’s gyri. In each experimental session, patients 1 to 3 were presented twice in succession with 8:40 min of an audio-visual segment of the film “The Good, the Bad, and the Ugly”. Patients 2, 3 and 4 were presented with random-chord stimuli25,26 accompanied with random visual textures. Each chord had three pure-tone components, selected quasi-randomly out of a frequency table spanning the frequency range of the soundtrack. The tone duration was 100 ms (patients 2 and 3) or 50 ms (patient 4) with 10 ms linear onset and offset ramps. The frequencies were equally spaced along a logarithmic axis from 100 Hz to 10 kHz. The resolution was either a sixth of an octave (41 different frequencies, patients 2 and 3) or 1/18th of an octave (108 frequencies, patient 4). Sequence duration was 3.5 min (patients 2 and 3) or 5 min (patient 4). Data were acquired in ten sessions, all conducted at the patients’ quiet bedside using a standard laptop screen and the laptop’s built-in speakers (patients 1 and 2) or external speakers (patients 3 and 4). Sound intensity was set to a comfortable hearing level but absolute sound level was not measured. The free-field presentation was most probably accompanied by reverberation. Though unlikely to have influenced the results presented here, these factors represent differences from most studies in anesthetized animals. The data consist of 95 units (20 units from patient 1, 21 from patient 2, 38 from patient 3 and 16 from patient 4). The linear approximation to the response function for each unit in response to the soundtrack was computed using the software package STRFpak27. Discrimination thresholds were computed using ROC analysis based on empirical spike count distributions.
Supplementary Information is linked to the online version of the paper at www.nature.com/nature.
We thank the patients for their cooperation in participating in the experiments. We thank E. Behnke, T. A. Fields, E. Ho and C. Wilson for technical assistance. This work was supported by an ISF grant (to I.N.), a NINDS grant (to I.F.), the US-Israel BSF fund (R.M. and I.F.) and a European Molecular Biology Organization and Human Frontier Science Program fellowship (R.M.).
Author Information Reprints and permissions information is available at www.nature.com/reprints.