|Home | About | Journals | Submit | Contact Us | Français|
We measured the responses of neurons in auditory cortex of male and female ferrets to artificial vowels of varying fundamental frequency (f0), or periodicity, and compared these to the performance of animals trained to discriminate the periodicity of these sounds. Sensitivity to f0 was found in all five auditory cortical fields examined, with most of those neurons exhibiting either low-pass or high-pass response functions. Only rarely was the stimulus dependence of individual neuron discharges sufficient to account for the discrimination performance of the ferrets. In contrast, when analyzed with a simple classifier, responses of small ensembles, comprising 3-61 simultaneously recorded neurons, often discriminated periodicity changes as well as the animals did. We examined four potential strategies for decoding ensemble responses: spike counts, relative first-spike latencies, a binary “spike or no-spike” code and a spike-order code. All four codes represented stimulus periodicity effectively, and, surprisingly, the spike count and relative latency codes enabled an equally rapid readout, within 75 ms of stimulus onset. Thus, relative latency codes do not necessarily facilitate faster discrimination judgments. A joint spike count plus relative latency code was more informative than either code alone, indicating that the information captured by each measure was not wholly redundant. The responses of neural ensembles, but not of single neurons, reliably encoded f0 changes even when stimulus intensity was varied randomly over a 20 dB range. Because trained animals can discriminate stimulus periodicity across different sound levels, this implies that ensemble codes are better suited to account for behavioral performance.
Any sound will activate neurons throughout auditory cortex, but we know little about how the activity of neural populations is “read out”. In addition to spike rate, response latencies can encode sensory events (Jenison, 2000; Brugge et al., 2001; Nelken et al., 2005), but it is unclear what would serve as a temporal “reference”, given that the brain has no independent measure of stimulus onset. It has been suggested that single unit spike latencies might be referenced to the population onset response (Chase and Young, 2007), or that the order in which neurons fire might be important (Gautrais and Thorpe, 1998; Van Rullen and Thorpe, 2001). Relative latency codes are particularly attractive because they could potentially facilitate rapid behavioral judgments (Van Rullen and Thorpe, 2002; Johansson and Birznieks, 2004).
There have so far been few successful attempts to record from sufficiently large ensembles of neurons to assess the potential of different population codes. One criterion against which to judge a neural code is to ask whether it provides sufficient stimulus-related information to account for behavioral performance. Neurometric techniques have therefore been used to model sensory discrimination abilities from the firing behavior of single neurons (Parker and Newsome, 1998). Extending this approach to codes carried by activity patterns distributed across neuron ensembles may provide novel insights into the link between neural activity and perception (Nishikawa et al., 2008; Walker et al., 2008).
Natural sounds are often periodic, and this periodicity can evoke the percept of pitch. Therefore, examining how neurons encode the fundamental frequency, f0, of sounds may contribute to our understanding of the neural basis of pitch perception. The periodicity of a sound's waveform is also reflected in the harmonicity of its spectrum, and both temporal and spectral mechanisms appear to contribute to pitch extraction (Cedolin and Delgutte, 2005; De Cheveigne, 2005; McDermott and Oxenham, 2008).
Sound periodicity is represented by the time-locked discharges of auditory nerve fibers (Javel, 1980; Winter et al., 1993; Cariani and Delgutte, 1996) and cochlear nucleus neurons (Sayles and Winter, 2008). In the midbrain, periodicities up to a few 100 Hz are represented as temporal patterns of spikes, and faster periodicities are represented with rate codes (Langner and Schreiner, 1988; Schreiner and Langner, 1988; Rees and Palmer, 1989). Sensitivity to f0 is found throughout ferret auditory cortex (Bizley et al., 2009), whereas a specialized pitch center has been described in marmosets (Bendor and Wang, 2005). However, none of the proposed coding mechanisms has been shown to account for pitch judgments independent of changes in other parameters such as sound intensity or spectral timbre. Thus, it is unclear whether, or how, the responses of cortical neurons, which are typically broadly sensitive to multiple stimulus dimensions, could be decoded to explain behavioral performance.
We compared the responses of single auditory cortical neurons and ensembles of simultaneously-recorded neurons to the behavioral performance of ferrets trained on an f0 discrimination task. We examined different putative decoding strategies, and found that codes based on spike timing or count can discriminate stimulus periodicity about equally well.
All animal procedures were approved by the local ethical review committee and performed under license from the UK Home Office. Twelve pigmented ferrets were used in this study. Five of these animals participated in behavioral testing and electrophysiological recordings were obtained from the 7 others.
Recordings were performed in one awake, passively-listening animal and six anesthetized ferrets (see Table 1). For the acute recordings, anesthesia was induced with medetomidine hydrochloride (Domitor; 0.022 mg/kg/h) and ketamine (Ketaset; 5 mg/kg/h; Fort Dodge Animal Health), and maintained with an intravenous infusion (5 ml/h) of this mixture in physiological saline containing 5% glucose. The ferret also received a single subcutaneous dose of 0.06 mg/kg/h atropine sulphate (C-Vet Veterinary Products) and subcutaneous doses of 0.5 mg/kg dexamethasone (Dexadreson; Intervet UK Ltd.) every 12 h to reduce bronchial secretions and cerebral oedema, respectively. The animal was intubated, placed on a ventilator (7025 respirator, Ugo Basile) and supplemented with oxygen. Body temperature, end-tidal CO2, and the electrocardiogram were monitored throughout the experiment.
The animal was placed in a stereotaxic frame and the temporal muscles on both sides were retracted. A metal bar was attached to the right side of the skull, holding the head without further need of a stereotaxic frame. The left temporal muscle was largely removed and the auditory cortex exposed by a craniotomy. The dura was removed and the cortex covered with silicon oil. The animal was then transferred to a small table in an anechoic chamber (IAC Ltd.).
Recordings were made with silicon probe electrodes (Neuronexus Technologies). In 3 animals, we used electrodes with 8 active sites on 4 parallel probes, with a vertical spacing of 150 μm. In a few recordings in one of these animals, and for all recordings in a further 3 animals, we used electrodes with 16 active sites spaced at 100 μm intervals on each of two probes. The electrodes were positioned so that they entered the cortex approximately orthogonal to the surface. A photographic record was made of each electrode penetration to document their location relative to anatomical landmarks (surface blood vessels and sulcal patterns).
Extracellular recordings were also carried out in one ferret while it was awake and passively listening to stimuli. A cranial mount was surgically implanted a month prior to the first recording session. During this procedure, surgical anesthesia was induced with an i.m. injection of medetomidine and ketamine. The animal was intubated and anesthesia was maintained with 1-2% isoflurane in oxygen-enriched air. It was placed in a stereotaxic frame and the temporal muscles were retracted and partially removed. The auditory cortex was exposed by a craniotomy and a cranial mount of bone cement with a re-sealable metal well was attached to the skull above the craniotomy. The cranial mount also contained a metal fitting that allowed the head to be fixed to a solid recording frame. During the month following implant surgery, the animal was trained with positive food reinforcement to accept head restraint. Recordings were carried out with the head restrained using up to 5 quartz/platinum-tungsten electrodes (Thomas Recording) lowered through the dura using a Mini Matrix System microdrive (Thomas Recording).
Five cortical fields were investigated: two tonotopic primary fields, the primary auditory cortex (A1) and anterior auditory field (AAF); two tonotopic secondary fields on the posterior ectosylvian gyrus, the posterior suprasylvian and posterior pseudosylvian fields (PSF and PPF); and one non-tonotopic secondary area on the anterior ectosylvian gyrus, the anterior dorsal field (ADF) (Bizley et al., 2005). The number of units recorded in each cortical field are listed in Table 2.
Artificial vowel sounds were generated in MATLAB (The MathWorks), by using an algorithm adapted from Malcolm Slaney's Auditory Toolbox (http://cobweb.ecn.purdue.edu/~malcolm/interval/1998-010/) to band-pass filter click trains. The click rate determined the value of f0 and therefore the evoked pitch. The band-pass filters determined the stimulus timbre, and, in both behavioral and neurophysiological testing, were kept constant at values corresponding to the vowel /i/ (formant frequencies centered at 430, 2132, 3070 and 4100 Hz). The vowel sounds (150 ms in duration, 5 ms onset/offset ramps) were normalized to have equal root-mean-square amplitudes, and calibrations were performed to ensure that changes in periodicity did not influence the overall sound pressure level. At each recording site a fixed range of f0 values was presented, from either 150 Hz or 200 Hz to ~1900 Hz.
Pure tone stimuli were used to obtain frequency response areas, both to characterize individual units and to determine tonotopic gradients. The latter were used to confirm the cortical field in which the recordings were made.
Stimuli were delivered using Tucker-Davis Technologies System 3 digital signal processors systems, and were presented to anesthetized animals through customized pairs of Panasonic RPHV297 earphones attached to plastic otoscope speculae inserted into the ear canals. The earphones were closed-field calibrated using an 1/8-inch condenser microphone (Brüel and Kjær). In the awake ferrets, stimuli were presented in the free-field in the same anechoic room via an Audax TWO26M0 speaker (Audax Industries) located ~80 cm from the animal's head at 30° contralateral to the recording chamber. The speaker was calibrated using a ¼-inch condenser microphone (Brüel and Kjær) placed near the position of the ferret's head within the recording set-up. The speaker and headphone calibrations were used to create inverse filters to ensure that a flat (±5 dB) output was produced from 100-24000 Hz.
In the electrophysiological studies, vowel sounds were presented in isolation, at a rate of 1 s−1. In the behavioral experiments vowels were presented in pairs, with a fixed “reference” f0 followed by a 50 ms silent interval and then a “target” f0. Presenting single vowels in the recording experiments allowed us to vary which f0 we chose as a reference sound in later analysis. However, for 152 neural units we instead collected responses to stimuli presented as reference-target pairs, matching precisely the stimulus configuration of the psychoacoustic experiments (see below). The responses recorded with vowel pairs were very similar to those recorded with vowels presented in isolation, i.e. response functions calculated for the second vowel in the pair (the “target” vowel) also commonly had either high-pass or low-pass f0 tuning characteristics. Two example responses are shown in Supplemental Figure 1.
The electrophysiological recordings were bandpass filtered (500-5000 Hz), amplified and digitized at 25 kHz. Data acquisition and stimulus generation were performed using BrainWare software (Tucker-Davis Technologies). Spike sorting was performed offline. Single units were isolated from the digitized signal either by manually clustering data according to spike features such as amplitude, width and area, or by using an automated k-means clustering algorithm in which the voltage potential at 7 points across the duration of the spike window served as variables. We also inspected auto-correlation histograms, and only responses with a clear refractory period in their inter-spike-interval histograms were classed as single neurons. In instances where we could not observe a clear refractory period, we classed the responses as multi-neuron clusters (i.e. spikes were assumed to be produced by more than one neuron).
The responses of 744 isolated single units (n=409, 53.5%) and small multi-unit clusters (n = 355, 46.5%) were analyzed in MATLAB. We found f0-sensitive units in all 5 cortical fields that were sampled. All 744 units included in our analysis were significantly driven by the vowel sounds (paired t-test, p < 0.05). Multi-units were included in our population analysis as we did not implement decoding methods which relied upon the precise temporal response structure of single neurons.
Five animals participated in psychophysical experiments to determine f0 discrimination. Behavioral data from four of these animals have been previously reported (Walker et al., 2009). The stimuli consisted of the same artificial vowels used for electrophysiological recordings. The training and testing methods are described fully by Walker et al. (2009). Briefly, ferrets were trained to lick a central “start” spout. This triggered the presentation of a pair of artificial vowel sounds. The first vowel was a reference sound, and was held at a fixed f0 for each weekly testing run. The second vowel was a target and was either higher or lower in f0 than the reference. The animal's task was to respond to a spout located to the right of the central spout when the target vowel was higher in pitch, and to a spout located to the left of the start spout when the target had a lower pitch then the reference. Animals were water-restricted during the testing period and received a water reward when they responded correctly. If the animal made an incorrect choice, then this was signaled by a brief burst of broadband noise and a 14 s timeout. After an incorrect response, the ferret received a correction trial in which an easier target in the same pitch direction was presented. Correction trials served to prevent an animal from developing a bias toward a preferred response direction, and were repeated until an animal made the correct response, but were excluded from subsequent analyses. Plots of the proportion of trials in which the animal responded at the right spout as a function of the log of the target f0 were sigmoidal in shape and approximated a cumulative Gaussian function. Therefore, psychometric curves were estimated by fitting cumulative Gaussian distributions to the data using probit generalized linear models. The maximum slope of the psychometric function was used to quantify performance.
We recorded responses of ferret auditory cortical neurons to artificial vowel sounds that varied in fundamental frequency (f0). The power spectra of three example vowels are shown in Fig. 1A. We recorded from 5 anaesthetized animals and one awake, passively listening animal (see Table 1). The responses of 744 single units and small multi-unit clusters were analyzed from two core and three belt areas of auditory cortex. Recordings were assigned to cortical fields on the basis of their pure-tone tuning characteristics and location on the ectosylvian gyrus (Bizley et al., 2005). All the data from the awake ferret were obtained from A1, whereas recordings were carried out from all five fields under anesthesia (Table 2). Fifty-five per cent of recordings were from single units, but as we observed no clear differences between the periodicity sensitivity of neurometric functions for the single units and multi-neuron clusters (Wilcoxon rank sum test for single neuron vs. multi-unit neurometric slope, p = 0.26, across all units; identical tests separating units into their respective cortical fields yielded p values of >0.1), nor between awake and anesthetized recordings (Wilcoxon rank sum test, p = 0.34), data were combined across these groups.
Sensitivity to f0 was tested using both regression analysis and ANOVA. A regression analysis of stimulus f0 on spike count (calculated over 150 ms beginning at stimulus onset) was performed for each unit, and, if significant (p<0.05), the sign of the regression slope was used to classify each as high-pass or low-pass for f0. Thirty-seven percent of recordings had significant slopes, of which half (49.7%) were classified as low-pass and half (50.3%) as high-pass. To determine whether other neural units might exhibit a non-monotonic sensitivity to f0, we also tested the relation between f0 and spike rates with a one-way ANOVA. Another 12% of neural units were found to have their spike rate modulated by f0 in this fashion, and two examples can be seen in Supplemental figure 2. This class included highly non-linear high-pass and low-pass units, as well as a small number whose responses were bandpass tuned or varied in a more complicated manner with f0. All driven units were included in subsequent analysis, regardless of whether they were classified as being significantly modulated by stimulus f0.
We assessed the sensitivity of our units to the periodicity of vowels and then compared this sensitivity to the performance of ferrets in discriminating the direction of f0 changes in the same sounds. Periodicity-sensitive neurons were located in all 5 sampled cortical fields and spanned the whole range of possible pure tone characteristic frequencies (see Supplemental Figure 3). Representative examples from two such units are shown in the raster plots in Fig. 1B and C. The first fired robustly in response to sounds with an f0 <~900 Hz, but its response declined rapidly and monotonically for f0 values >1000 Hz (Fig. 1B). In contrast, the unit in Fig. 1C increased its firing rate when f0 was >1200 Hz. The spike rates (±1 SEM) for these two units are shown in Fig. 1D and E.
Five ferrets were trained in a task (described above) in which they heard two vowel sounds and were required to report whether the second (target) sound was higher or lower in pitch than the first (reference) sound (Walker et al., 2009). Two observations indicate that the animals solved this task by judging the pitch of the target sound, rather than using other cues such as spectral or harmonic density. Firstly, these animals immediately transferred pitch discrimination learning from a task using artificial vowel sounds to one using pure-tones, which have very different spectral density (Walker et al., 2009). Secondly, two animals were also tested with artificial vowels generated from temporally “jittered” (i.e. randomized) click trains. Increasing temporal jitter disrupts the sound's periodicity, thus reducing pitch salience, while leaving the spectral density of the stimulus unchanged. Temporal jitter disrupted the ferrets' f0 discrimination performance, suggesting that they indeed responded to the (increasingly less salient) pitch rather than the (largely unchanged) spectral density of the stimuli (see Supplemental Figure 4). A typical psychometric function from one animal is shown in Fig. 2A. Data from five animals are pooled in Fig. 2B to illustrate performance across all reference pitches, and the slopes from each animal's individual testing runs are plotted in Fig. 2C. The slopes do not vary significantly as a function of the reference f0 within the tested range (Kruskal-Wallis test, p = 0.077). For the purpose of comparing behavioral and neural discrimination measurements, we defined the “normal” range of behavioral performance as the 2.5th to 97.5th percentiles of the range we observed in our trained ferrets. The median value was 34% per octave (Weber fraction = 0.24), with lower and upper bounds of 18% (Weber fraction = 0.52) and 64% (Weber fraction = 0.12) per octave, respectively.
Having established that auditory cortical neurons are modulated by the f0 of artificial vowels, we sought to examine whether these responses might provide the physiological signal upon which ferrets' pitch discrimination decisions are based. A simple algorithm was used to estimate the f0 discrimination “performance” supported by the responses of neural units (“neurometric” performance), in a manner that could be compared to the psychometric estimates of f0 discrimination measured for ferrets in our behavioral task. During behavioral testing, animals were presented with a range of f0 values spanning 2 octaves (±1 octave from the reference frequency). Neurometric curves were generated in a similar fashion from neural units' responses obtained over an f0 range of ±1 octave, around a particular reference f0. Because the f0 sensitivity of each unit was measured in 1/5th-octave steps over a ~3 octave range, we were able to calculate neurometrics over a 2-octave range for each of at least 6 different f0 reference values. For each reference f0, we calculated the median spike rate of the responses to 20-40 presentations of the corresponding artificial vowel stimulus.
We then asked whether the number of spikes evoked by each individual response to all other vowels in the f0 range was greater than the median response to the reference. Thus, for each “target” f0 we obtained a proportion of trials wherein the spike rate response exceeded the median response to the reference f0, and this relative spike count was used by a classifier to “guess” whether the target was higher or lower than the reference. Responses with spike counts that equaled the median reference response were randomly assigned to the ‘target higher’ or ‘target lower’ class with equal probability. For high-pass units, such as the one shown in Fig. 1C, when the response to the target was stronger than the median reference response, the classifier guessed that the target f0 was higher than the reference. On the other hand, for low-pass units (e.g. Fig. 1B), a greater target firing rate indicated that the target was lower than the reference f0. Using this very simple firing rate comparison to decode each of the observed responses in turn, we obtained the percentage of “higher” responses our classifier made for each target sound, which can be directly compared to the percent correct scores obtained in our behavioral pitch discrimination experiments.
As with the psychometric curves, neurometric functions were obtained by fitting cumulative Gaussian curves to the observed proportions of “higher” responses for each target f0. In this manner, we calculated neurometric functions for each of the 634 neural units at all possible reference periodicities within the tested range. Our analysis did not assume a priori that units were either high-pass or low-pass. Instead, we computed neurometric functions for each unit twice, once assuming that a greater firing rate indicated an increase in f0, and once with the opposite assumption. The neurometric function with a positive slope was then selected for further analysis. Single neuron neurometric functions derived from spike count values were calculated over 30 ms, 75 ms, 150 ms and 300 ms response windows. A response duration of 150 ms (beginning at stimulus onset) was found to provide the best neurometric slopes for most neurons (Supplemental Figure 5), and therefore all neurometric functions reported here calculate spike counts over this time period.
Figure 2D shows the neurometric function obtained from the unit in Fig. 1B and D, for a reference f0 of 919 Hz and a response window of 150 ms post stimulus onset. The neurometric function obtained for this unit at this reference f0 had a slope of 64% per octave, which is within the normal range of ferrets' behavioral pitch discrimination performance. However, such high neurometric slopes were only rarely obtained from individual units. Also, in this instance, a reference f0 of 919 Hz lies close to the steepest, most informative part of the unit's f0 sensitivity curve (see Fig. 1D). Selecting reference periodicities increasingly further away from this value resulted in a pronounced and systematic decline in neurometric performance, as illustrated in Fig. 2E.
Only a small fraction (13 out of 634) of the units in our sample had neurometric functions whose slopes fell within the trained ferrets' normal behavioral sensitivity range (i.e. slope between 18% and 65% per octave), and none of our neural units could match behavioral sensitivity across the full range of reference f0 values tested. Thus, to distinguish upward from downward periodicity changes on the basis of the firing rate of individual cortical neurons, the ferret's brain would either need to select the ‘best neuron’ for the appropriate periodicity reference, or, more likely, combine information across multiple neurons in the population. Neurons throughout the central nervous system typically form divergent and convergent connections with many other neurons, an anatomical arrangement probably more conducive to integrating responses across sizeable ensembles of neurons than the targeted selection of a few. We therefore developed neurometric algorithms that combine information across ensembles of simultaneously-recorded units, to test their ability to inform sufficiently accurate periodicity judgments robustly over a range of reference periodicities.
Simultaneous recordings from multiple neural units (“ensembles”) were obtained by recording with multi-site electrodes (typically 32 channel, configured as 4 × 8 site linear arrays, or 2 ×16 channel linear arrays; see methods for details). Our sample of 744 units could therefore be grouped into 58 ensembles, each of which reflects the activity recorded with one 32 channel array. In some fortuitous cases we obtained acoustically driven activity at all 32 recording sites and it was possible to discriminate more than one spiking unit from many of the recording sites, while at other penetrations only a few of the recording sites yielded neural responses. Consequently the number of simultaneously-recorded units in these ensembles varied from one penetration to the next (max 61, mean ± SD = 15.3 ± 13.2). Thirty-nine of the 58 recorded ensembles were tested across a wide enough range of stimulus periodicities to analyze performance across multiple reference values and results from these 39 are reported here. Whilst the simultaneously recorded units could come from sites separated on the cortical surface by up to 600 μm, we ensured that these were always within a single cortical field
We implemented a novel decoding method in which responses were classified based on the pattern of activity across each ensemble. We first calculated the number of spikes in each units' response on each trial over a specified time window (30 ms, 75 ms, 150 ms or 300 ms). The array of spike counts across the ensemble on a single trial could then be described as a point in a high-dimensional space, where each dimension corresponded to the spiking response of each individual unit. Thought of in this way, the ensemble responses to repeated presentations of the same stimulus form a “cloud” in this high-dimensional “spike rate space”, and the average response to a particular sound can be thought of as the center of that cloud. If neural responses to repeated presentations of the same stimulus are highly variable, then they scatter widely throughout the space (the cloud is diffuse), but if responses are more reproducible, then the region occupied by the responses will be more compact. The regions of spike rate space occupied by the responses to stimuli of different f0 might be quite distinct, making classification and stimulus decoding relatively easy, or they could be partly or wholly overlapping, making classification more difficult.
We decoded each ensemble's response in a two-stage process. Firstly, we asked whether the ensemble activity could distinguish between the reference and target sounds. For each presentation of the target, we asked whether the response on that particular trial was closer to (i.e. had a smaller Euclidean distance from) the average response to the reference, or to the average response to target sounds of the same f0. If the response to the target was closer to that of the reference, the two sounds were deemed to be indistinguishable, and such trials were randomly classified as either “higher” or “lower” with equal probability. More commonly, however, the response on an individual trial was closer to other responses to the same target than to the average reference response, and the decoding algorithm then made a high/low periodicity judgment by determining if this target response was more similar to the average response to the highest target in the f0 range or the lowest f0 response (i.e. vowels one octave above or below the reference). The proportions of trials classified in this manner as higher than the reference f0 were then fitted with a cumulative Gaussian function to obtain the ensemble neurometric function.
Figure 3A illustrates schematically the mean spike rates from one ensemble, comprised of 26 neural units, in response to a single presentation of the reference sound and sounds with f0 values one octave above and below the reference. It is clear that some units are rather uninformative, given that their spike rate changes little over the 3 conditions shown, whereas others modulate their spike rate appreciably as a function of f0, as indicated by the change in color. The individual responses of these same 26 units to 30 presentations of the reference and the highest and lowest targets are shown in Fig. 3B where, for visualization purposes only, the spike rate vectors are plotted in principal components space. These data suggest that the responses should be relatively easily classifiable – the responses to the high (red) and the low (blue) periodicity sounds are well separated, and the responses to the reference f0 (green) lie between the two. Figure 3C shows, in black, the neurometric curve for this ensemble of 26 units, overlaid on the neurometrics obtained for each individual unit in the dataset (shown in red). The slope of the ensemble neurometric exceeds that of all of the individual neurometric functions (ensemble neurometric: 50.5% per octave; individual unit neurometrics: mean 14% per octave, max 37% per octave), and lies within the normal range of psychometric function slopes for this task.
Figure 4 shows the distribution of slopes computed using individual unit neurometrics (A) and population neurometrics (B). The mean individual unit neurometric performance is 4.6% per octave, with only 3.8% of units having slope values that exceeded the lower limit of performance we observed in trained ferrets (slopes >18% per octave). In contrast, the mean ensemble neurometric slope is 16.3% per octave, and approximately one third (35.2%) of all groups of simultaneously-recorded units yielded neurometric curves that fell within the behavioral range.
We also examined how well our ensembles performed relative to individual units across a range of reference f0 values. Figure 3D plots the ensemble neurometric functions across reference f0 for the same set of 26 units shown in Fig. 3A. This sample ensemble gave a particularly stable neurometric performance. In general, the ensemble neurometrics were less affected by changes in the reference f0 than individual unit neurometrics (compare Fig 2E). To assess this, we took a robust encoding of stimulus f0 to mean that the neural sensitivity, as assessed by the slope of the neurometric curve, should reach or exceed the “minimal psychophysical performance criterion” for a wide range of reference f0 values. The minimum behavioral criterion here corresponded to a neurometric slope of 18% per octave, corresponding to the 2.5th percentile of the slopes of the observed psychoacoustic functions. Twenty five of our neural ensembles yielded neurometrics which reached the behavioral criterion for at least some reference f0 values. For these 25 ensembles, we simply counted the number of reference f0 values for which either the ensemble neurometric or the neurometrics of any of the individual units exceeded the 18% / octave criterion. Ensemble neurometrics yielded exceeded the behavioral criterion for a larger number of reference f0 values than single neuron neurometrics in 22 of these 25 ensembles. In the remaining 3 ensembles the count was equal. In no case did single units yield supra-criterion neurometrics over a wider range of f0 reference values than ensembles.
To further demonstrate that the neurometric discrimination indeed used the pattern of activity across units, and was not simply dominated by one or very few highly selective units, we recomputed the ensemble neurometrics repeatedly, on each occasion excluding a different unit. This had a negligible effect on the resulting neurometrics (i.e. the observed neurometric based upon the whole population always fell within the 95th percentile of the bootstrapped values based upon ensembles made by excluding one neuron). Two examples are shown in Supplemental Figure 8. As an additional control we also tested ensembles in which the best unit was excluded from the ensemble, then the top two units, the top three, and so on (Supplemental Figure 6). This revealed a gradual decrease in performance consistent with ensemble neurometric performance being based upon the distribution of activity across a number of units rather than merely reflecting the performance of just the best unit.
Potentially, several aspects of the neural discharge patterns could carry information. So far, we have only considered spike rate. Neural responses to different f0 values tended to show similar discharge patterns, but sometimes differed in their onset latencies (see Fig. 1B,C). We therefore investigated whether the relative timing of spikes across the ensemble carried information about stimulus f0. For each ensemble, we extracted the first time a spike occurred, across all units, after the stimulus onset. We then computed the first spike latency of all other units relative to this first ensemble spike. In this manner, we obtained vectors of relative spike latencies for the ensemble, which were then decoded using the same pattern matching algorithm described for spike counts. Spikes were considered across different response durations (30 ms, 75 ms, 150 ms or 300 ms), beginning at stimulus onset. Trials in which a unit failed to fire at any point during the response window were assigned a latency value that was 1 ms greater than the maximum response duration under consideration. We also considered a reduced spike count code, which simply asked whether or not a spike occurred during the response window. This was essentially a binary “spike or no-spike” code. A final, fourth code represented ensemble responses by the order in which the units fired. All of the spike latencies occurring within a trial were ranked, and these ranks were used as the input to the decoder.
The choice of code made very little difference to the slope of the neurometric functions. Figure 5 shows the 4 alternative ensemble neurometrics along with their single-unit neurometrics from nine example ensembles. These nine represent three of the best (Fig. 5A-C), three average (Fig. 5D-F) and three of the worst ensembles (Fig. 5G-I) in our data set. Supplemental figure 7 shows, schematically (as in Fig. 3A) the behavior of three different ensembles (comprising 35, 3 and 26 units respectively) when decoded with each of the four neural codes.
The “best code” for each ensemble was defined as the response code that resulted in the steepest neurometric function. Across all penetrations, response durations and reference periodicities, the “best code” was distributed relatively equally between the four candidate codes (spike count: 298, relative latency: 289, binary count: 291, spike-order: 246), and there were, overall, no significant differences in the sensitivity (i.e. average neurometric slope) between these alternative codes (Kruskal-Wallis test, p = 0.077). However, if only the longer response durations were considered, then there were small but statistically significant differences in the performance of each of the decoding strategies (Kruskal-Wallis, p < 0.0001). For example, with a 150 ms wide response window, 32% of units performed best with the spike count code, 28% with relative latency, 24% with reduced spike count, and 16% with the spike-order code. When the slope values are compared at this duration (Supplemental Fig. 5B), those units which performed best with the relative latency code had significantly greater slopes than those performing best with the spike count code (post-hoc Tukey-Kramer, p < 0.05).
In order to examine the extent to which these coding strategies were redundant, we also considered a code which used both spike count and relative latency information. Joint count/latency ensemble decoding was performed by first normalizing the relative latency and spike count values separately to their maximum, and then using these values as inputs to the pattern classifier. Each neuron in the ensemble is therefore represented by a 2N-element vector, containing N normalized spike count and N normalized latency values (spike count and latency values were separately normalized to the maximum count or latency observed across all trials and units in the ensemble). This joint code produced a small but significant improvement in performance when compared to either the spike count or relative latency code alone, indicating that both codes carry independent information (Kruskal-Wallis test for slope values at 150 ms p<0.01, Tukey-Kramer post hoc comparisons p<0.05 for both joint code-count and joint-latency comparisons). Figure 6A illustrates this by showing the average (and the 25th and 75th percentile) performance for ensemble spike count, relative latency and the joint count/latency codes. For all three (combined, count and relative latency) codes, performance initially increases equally rapidly as the response window is increased from 10 ms, but then reaches a maximum near 75 ms. The joint code provides ~10% better periodicity discrimination than either the spike count or relative latency codes by themselves. A two-way ANOVA showed significant effects of both response duration and code type (response window duration: F = 104, p < 0.0001; code choice: F = 18.0, p < 0.0001).Further analysis of the effect of response window length on all four codes is included in Supplemental Fig. 5C,D. The proportion of ensemble-reference combinations which exceeded the behavioral threshold was calculated for each of the four codes and the joint count-latency code (Fig.6B).
Correlations in neural responses have previously been shown to limit neurometric performance in some cases (Zohary et al., 1994; Walker et al., 2008), and to improve it in others (Wang et al., 2007). In order to examine the effects of “noise correlations” (i.e. correlated background activity) on performance, we recomputed the ensemble neurometrics after randomizing (i.e. “shuffling”) the order of stimulus presentations independently for each unit in an ensemble, so that any correlations in neural responses that might be due to common fluctuations in background activity were removed. This procedure was repeated 100 times, and the 5th and 95th percentiles of the shuffled neurometric slopes were used to estimate significance limits. Figures 5A and B illustrate the results of performing this shuffling procedure on two ensembles in which the neurometric curves derived from shuffled data were steeper than from the synchronized data.
In total, 20.9% of comparisons (based on 39 penetrations and all reference f0 values tested) showed a significant limiting effect of correlated noise (i.e. the neurometric computed from the simultaneously recorded ensemble had a slope value that was smaller than the 5th percentile of the shuffled data). These shuffling effects were found in 21 of our 39 ensembles, and occurred across several reference periodicities. In contrast, only 5.8% of all shuffled versus synchronized neurometric comparisons yielded a significant decrease in their slopes when the responses were shuffled, and these changes were smaller on average. In comparisons where shuffling significantly impaired neurometric performance, the average difference between shuffled and synchronized neurometric slopes was 4.8% per octave, but ensembles that provided better neurometrics when shuffled showed an 11.9% per octave slope increase. An identical analysis performed on ensembles decoded with a relative latency code yielded quite different results; whilst 12% of ensemble-reference combinations showed a significant limiting effect of noise, a far greater proportion of ensembles (57%) showed a significant decrease in performance after shuffling compared to the synchronous case. Ensemble-reference combinations which had better spike count neurometric performance when shuffled always also showed better performance when their relative latencies were shuffled. Most of the cases which showed a limiting effect of noise correlations for relative latency decoders also showed this effect with spike count. The largest difference between the effects of shuffling on spike count and relative latency decoding, was in the number of populations which decreased in slope when responses were decorrelated, and the magnitude of this effect, which was greater (−10.8% compared to 4.8%) for spike count. The effect of noise correlations on both codes for smaller sub-ensembles is considered in supplemental figure 8.
We considered how neurometric performance was affected by ensemble size. To do this, we re-calculated the neurometric slopes from increasingly large subsets of simultaneously-recorded units. The number of possible subsets of size n from an ensemble of size p is given by:
which can become very large for even modest ensemble and subset sizes. In cases where there were >500 possible combinations, analyses were limited to 500 randomly-selected subsets. We analyzed spike count and relative latency responses at each ensemble's “best” reference f0 and response duration. Our sample size was not sufficiently large to make quantitative conclusions about optimal ensemble sizes. However, in keeping with the finding that correlations between neurons mostly limited discrimination, for many ensembles, performance tended to increase with ensemble size and then asymptote (Fig. 8A,B for spike count and latency, respectively). Figure 8C,D shows the same data but focuses on the smaller subpopulation (n < 20), plotted relative to the best unit in the population, and in Fig 8E,F, plotted as improvement in slope, relative to the average unit performance. It is clear from these figures that even small subpopulations can substantially improve upon the average unit performance, and that the improvement with increasing population size is rapid.
Perceptual features of sound tend to vary little with sound level, and during our behavioural testing of periodicity discrimination, sound levels roved randomly over a 15 dB range. Any neural code capable of supporting this discrimination behaviour should therefore be capable of operating over a similar range of sound levels. In a subset of recordings (7 ensembles, 81 units) we presented vowel stimuli at 3 sound levels (65, 75 and 85 dB SPL), allowing us to examine how robust the neural representation of f0 is when sound levels change. For each of these 7 ensembles, we constructed neurometric functions at each ensemble's best reference periodicity and sound level (i.e. the combination of level and f0 that produced the steepest slope value). Six ensembles had slope values above the lower bound of the behavioural range (i.e. >18% per octave) and were therefore analysed further. For each of these ensembles, we then constructed neurometric functions from all of the responses, pooling across all three sound levels at each f0 value. We compared the slope of this neurometric function to that obtained with the best f0/level combination.
The two ensembles with the steepest neurometric functions (47.2% per octave and 49.0% per octave respectively) performed similarly when sound levels varied across this 20 dB sound level range (mixed sound level neurometrics were 46.9% per octave and 50.3% per octave respectively i.e. performance was 99.4% and 102.7% of the best single level neurometric). Two ensembles showed only very modest decreases in performance when three sound levels were introduced (from 44.3% per octave to 37.1% per octave and 42.7 per octave to 38.3% per octave;, i.e. performance with randomised sound intensity was 83.8% and 89.7% of their best intensity slope value). Of the remaining 2 ensembles, one dropped from 26% per octave to 16% per octave, and the other from 32% per octave to 5% per octave. Very similar results were observed with the relative spike latency code: the mixed-level neurometric slope values were 86% ± 10% (mean ± SD) of the best periodicity-level combinations, across the 6 ensembles.
In contrast, single unit neurometrics were much more heavily affected by changes in sound level. Of the 72 units which comprised these 6 ensembles, 10 had slopes at their best sound level-periodicity combination which exceeded 20% per octave (mean 30% per octave ± 5%). For these units, the mixed-level neurometric slope value was on average only one third (33.5% ± 33.2% mean ± SD) of that at the best periodicity/level combination. There were only two neurons whose slope value for the mixed sound level neurometrics was >75% of the best level neurometric and both of these had a very modest slopes to begin with (23% and 27% per octave – i.e. only just above our lower threshold). Figure 9 illustrates the neurometrics from one ensemble, and each of its units, at each of the three sound levels (9A-C), and calculated across the three sound levels (9D). Overall the ensemble performance is substantially more robust to changes in sound level than single unit performance, whether ensembles are decoded using spike counts or relative latencies.
Figure 10A plots the slopes of the neurometric functions for all 39 ensembles, at each f0 tested, using the relative latency code and a response window of 75 ms. Also shown for comparison are the mean and range of psychometric performance from the five trained ferrets. Across the full range of reference f0 values, neural ensemble neurometrics are often within the range of the behavioral performance of the ferrets. At most reference frequencies, some ensemble neurometric functions were as steep as the best psychometric functions.
Ensembles from different cortical fields are plotted with different symbols in Fig. 10A, showing that all 5 fields contain ensembles that represent sufficient information about f0 to account for the behaviorally-assessed perceptual decisions. Figure 10B illustrates the distribution of neurometric slopes (across all reference f0 values) for penetrations in each of the cortical fields. Both Figs. 10A and B suggest a trend for field PSF to be the most informative area for periodicity-direction discrimination, but this trend was not statistically significant (p=0.058 Kruskal-Wallis test). However, a sample of 39 ensembles may not be large enough to reveal possible subtle inter-area differences. This issue is addressed in more detail in supplemental figure 10.
Because recordings were made with multi-site silicon electrodes, we often sampled across multiple cortical depths at the same time. Previously, we demonstrated that units recorded in superficial cortical layers had a higher pitch sensitivity than those recorded in deeper layers (Bizley et al., 2009). In order to examine whether the ability of neurons to encode the direction of a pitch change varied across cortical depth, recording sites were divided into “superficial” and “deep” locations according to whether they were within 800 μm of the cortical surface. This depth marks the approximate location of the ventral border of layer IV of the ferret auditory cortex (Dahmen et al., 2008). We compared the single unit neurometric slopes of superficial and deep units. There was no statistically significant difference between the neurometric slopes of deep and superficial layers (Wilcoxon rank sum test, p=0.1244). We then divided all ensembles into two sub-ensembles comprising only superficial/deep units and computed ensemble neurometric functions (with spike counts calculated over a 75 ms window). Again, the superficial and deep ensemble neurometrics were not significantly different (p=0.76, Wilcoxon rank sum test), and there was no difference between the slopes derived from either sub-group and the full ensemble slope (p=0.19, Kruskal-Wallis test). We repeated this procedure while decoding responses with the relative timing measure and this time found that the slopes differed significantly across layers (Kruskal-Wallis test for slope values from the full ensemble, superficial, deep, p=0.038). Post-hoc tests showed that this difference was based on the comparison between the full ensemble slope and the deep units' neurometric slopes, in which the latter were significantly shallower.
We tested a range of methods for decoding individual unit and ensemble responses in addition to those reported in the Results. Unit responses were also decoded with pattern recognition algorithms (using methods similar to those reported in Walker et al., 2008) but these did not differ quantitatively from simple spike count measures. Ensemble responses were also decoded using Linear Discriminant Analysis, a simple perceptron learning model, a dot-product classification (which classified responses along vectors that pointed from the responses to the reference to the target directions), and, finally, using the Euclidean pattern-matching algorithm described but using either transformed (cube-rooted) spike counts or spike counts summarized using Principal Components Analysis. These methods are conceptually similar to the algorithm we used to obtain the results described above, and they generally yielded very similar, although typically slightly worse, results.
We considered one last decoding strategy motivated by our finding that we were unable to distinguish between the possibilities that ferrets compared the target and reference sounds on each trial, or alternatively, built up an internal representation of either the reference, or the highest and lowest targets, and then compared individual target sounds to this internal representation (discussed in Walker et al., 2009). We therefore assessed neurometric performance for ensembles while excluding the first stage of the decoding strategy (i.e. we did not first ask whether the response to the target f0 was differentiable from the response to the reference sound). Omitting this step produced a significant decrease in performance (paired t-test, p < 0.01). However the magnitude of these differences was very small, with only 5% of ensembles neurometrics having slope values for the single step algorithm that were more than 8% per octave poorer than the two-step algorithm. We reported findings based upon the two-step algorithm because it was slightly more effective than omitting this first stage, but our results do not substantially differ when we use the one-step method, or any of the others alternative neurometric algorithms mentioned above. For individual neurons and ensembles, there were often occasions where one of the alternative methods was at least as successful as those we ultimately used. Thus, overall, the simple two-step neurometric approach described produced better results (i.e. steeper neurometric slopes) than the alternatives tested.
We recorded responses to artificial vowel sounds of varying periodicity in five auditory cortical fields, and compared them to the performance of ferrets trained to discriminate the pitch of these sounds. Temporal acuity decreases above the cochlear nucleus with a progressive increase in a rate-based representation of temporal information, including stimulus periodicity, at higher levels of the auditory pathway (Wang et al., 2008). The importance of auditory cortex in pitch extraction is illustrated by human imaging studies showing cortical activation by pitch-evoking sounds (Patterson et al., 2002, Penagos et al., 2004). Moreover, cats with bilateral auditory cortex lesions are impaired in a missing-fundamental pitch task (Whitfield, 1980).
We used neurometric techniques to compare the discrimination performance afforded by single units and ensembles of simultaneously-recorded units to that of trained ferrets in a periodicity discrimination task. The f0-spike-rate functions of a few individual neurons provided sufficient information to account for the animals' discrimination judgments, but only over a very limited range of periodicity values. However, the response patterns of small ensembles of simultaneously-recorded neurons typically discriminated stimulus periodicity with greater accuracy and over a wider range of values than those of individual neurons. Furthermore, ensemble codes for the vowel f0 were robust to changes in sound intensity, and could be read out in at least four alternative ways, based on either spike count or relative latencies.
To decode ensemble responses, we implemented a pattern classification algorithm that took, as its input, one of four different spike summary statistics: spike count, relative spike latency, a reduced “binary” response code, or a first-spike-order code. Similar Euclidean-distance-based metrics have previously been employed to decode activity patterns of individual neurons and ensembles recorded either non-simultaneously (e.g. Schnupp et al., 2006; Engineer et al., 2008) or simultaneously (Walker et al., 2008). We also tried other classification algorithms, most of which did not substantially alter the performance of the ensembles. Other studies have emphasized the potential role of spike latency codes (deCharms and Merzenich, 1996; Jenison, 2000; Thorpe et al., 2001; VanRullen and Thorpe, 2002; Johansson and Birznieks, 2004; Nelken et al., 2005; Gollisch and Meister, 2008). However, attempts to examine ensemble coding have either focused on pairs of simultaneously-recorded units, or simulated ensemble activity from single-unit recordings. Here we recorded simultaneously from sizeable ensembles of neurons, allowing us to test different population coding strategies in the presence of real neural correlations.
The neurometric performance of ensembles, whether based on spike counts or latencies, was often within the normal range of ferret psychometric performance in our pitch discrimination task. However, spike count and latency codes were not entirely redundant, because a joint count-and-latency code performed ~10% better than the codes based on either spike count or latency alone. Previous studies of auditory cortex have shown that optimal decoding might be achieved with a combined spike count and mean response latency code (Nelken et al., 2005), by patterns of spikes across an ensemble (Furukawa et al., 2000), or by combining information about the precise temporal pattern of spiking relative to the phase of ongoing oscillations (Kayser et al., 2009). We believe this to be the first time, however, that a relative spike latency code has been compared with psychophysical performance in the same species.
The finding that combining spike count and relative latency information provided the most accurate way of inferring stimulus periodicity from neuron ensembles is consistent with recent evidence for “multiplexing” of neural codes in monkey auditory cortex (Kayser et al., 2009). These authors found the combined information available from pairs of well-separated neurons was complementary and largely independent. Although we did not investigate low-frequency oscillations, it would be interesting to examine whether the relative timing of action potentials and the local field potential might provide further information. One potential problem for decoding relative latencies across neural ensembles is the possibility that spike times might be erroneously measured relative to spontaneous, rather than stimulus-evoked, events. Information about spike timing relative to the power and phase of the local field potential might allow such occasions to be disambiguated.
Relative timing codes could provide a biologically-plausible mechanism for reading out ensemble activity, enabling rapid response times (Van Rullen and Thorpe, 2002; Johansson and Birznieks, 2004; Masquelier et al., 2008), because there is no requirement to integrate or count spikes over a particular time window. However, we found that spike count and latency-based ensemble decoding allowed equally rapid and accurate periodicity judgments, reaching optimal performance within 75 ms of stimulus onset. These population response integration times are also shorter than the optimal response window required for single units, which was found to be 150 ms. Combining information across neurons therefore allowed information to be decoded more rapidly. The finding that the combined count-latency code only exceeds either code alone from 50-75 ms suggests that the information in each evolves over a similar timescale. Relative latency and rank-order coding could both be achieved physiologically via shunting inhibition (Thorpe et al., 2001). However, it is interesting to note that our spike-order code performed more poorly than the relative latency code, suggesting that the actual spike times are more informative than simply the order in which the neurons fired.
In both human listeners (Moore, 2004) and our trained ferrets (Walker et al., 2009), pitch discrimination judgments are unaffected by large trial-to-trial fluctuations in sound intensity. Any neural correlate of this discrimination task should therefore show a similar invariance across sound levels. However, both firing rates and spike latencies tend to change systematically with sound level (Heil, 1997a,b), implying that neural coding of f0 might not be level invariant. Nevertheless, we found that ensemble responses to sounds that vary in both f0 and intensity can be accurately decoded using either spike counts or relative latencies. The presence of approximately equal numbers of neurons that monotonically increase or decrease their firing rates with increasing f0 could help to generate an ensemble periodicity representation that is invariant across sound level. Within a neural population, a change in sound intensity will likely alter the absolute number of spikes, but not the pattern of activation across balanced populations of high-pass and low-pass neurons, allowing information about stimulus f0 to be disambiguated from changes in sound level.
Our finding that correlated noise can impair performance when decoded with a spike count code, even in small ensembles of simultaneously-recorded neurons, supports previous work demonstrating that weak correlation between neurons can limit the coding capacity of a neuronal pool (Zohary et al., 1994). In contrast, the predominant effect of decorrelating responses when decoded with a spike latency code, was for performance to decline. This suggests that trial-to-trial correlations enhance this relative latency code. It is possible that selecting the optimal combination or differential weighting of neurons within the pool might improve performance even further, but such analyses were not attempted here.
Based on the location of neurons that were tuned to a particular periodicity even when the f0 was omitted from the sound's spectrum, Bendor and Wang (2005) described the presence of a pitch center in marmoset auditory cortex. We might therefore expect the optimal ensemble size for representing stimulus periodicity to vary across different fields in ferret auditory cortex. Nevertheless, we found that the general sensitivity to f0 observed in all five fields investigated is sufficient to support periodicity discrimination behavior. Stimulus periodicity is one of the key acoustic determinants of the perceptual quality of pitch. Many spectrally different sounds can elicit the same pitch percept and a neural substrate for pitch perception should show invariance not just to sound level but also across a range of spectrally different stimuli. Showing that neurons modulate their firing rates in response to changes in periodicity, and that these changes correlate with psychophysical performance, is not, by itself, sufficient to demonstrate pitch selectivity. Nevertheless, our data suggest that sensitivity to one of the key determinates of pitch is distributed throughout auditory cortex.
Simultaneous recordings of neural responses and behavioral measurements would show more directly that the neural coding strategies explored here are actually used by the brain as it solves perceptual tasks. Such studies have been carried out successfully in other sensory modalities. For example, Luna et al. (2005) found that vibrotactile stimuli could be discriminated on the basis of five different neural codes, but showed, using trial-to-trial correlations between neuronal responses and behavior, that only a spike count code could account for the animals' discrimination ability. Although there is evidence that behavioral context does not alter the tuning of auditory cortical neurons, suggesting that responses observed in passive listening conditions provide a valid measure of the representation of sound properties (Scott et al., 2007), other studies have shown that engaging in a task can suppress auditory responses (Otazu et al., 2009) and alter receptive field structure (Fritz et al., 2003). Examining neural coding and behavioral discrimination simultaneously in the same animal is therefore likely to be essential to our ultimate goal of identifying the neural basis of perceptual judgments.
This work was supported by the by the Biotechnology and Biological Sciences Research Council (grant BB/D009758/1 to J.W.H. Schnupp, A.J. King and J.K. Bizley), the Engineering and Physical Sciences Research Council (grant EP/C010841/1 to J.W.H. Schnupp), a Rothermere Fellowship and Hector Pilling Scholarship to K.M.M. Walker, and by a Wellcome Trust Principal Research Fellowship to A. J. King. We are grateful to Johannes Dahmen, Fernando Nodal and Andreas Schultz for assistance with data collection.