|Home | About | Journals | Submit | Contact Us | Français|
Because we can perceive the pitch, timbre and spatial location of a sound source independently, it seems natural to suppose that cortical processing of sounds might separate out spatial from non-spatial attributes. Indeed, recent studies support the existence of anatomically segregated ‘what’ and ‘where’ cortical processing streams. However, few attempts have been made to measure the responses of individual neurons in different cortical fields to sounds that vary simultaneously across spatial and non-spatial dimensions. We recorded responses to artificial vowels presented in virtual acoustic space to investigate the representations of pitch, timbre and sound source azimuth in both core and belt areas of ferret auditory cortex. A variance decomposition technique was used to quantify the way in which altering each parameter changed neural responses. Most units were sensitive to two or more of these stimulus attributes. Whilst indicating that neural encoding of pitch, location and timbre cues is distributed across auditory cortex, significant differences in average neuronal sensitivity were observed across cortical areas and depths, which could form the basis for the segregation of spatial and non-spatial cues at higher cortical levels. Some units exhibited significant non-linear interactions between particular combinations of pitch, timbre and azimuth. These interactions were most pronounced for pitch and timbre and were less commonly observed between spatial and non-spatial attributes. Such non-linearities were most prevalent in primary auditory cortex, although they tended to be small compared with stimulus main effects.
One of the most important functions of the auditory system is to identify and discriminate vocal calls, such as speech sounds. This task requires the listener to process several complex perceptual properties of a single auditory object, and so is likely to engage a number of functionally-distinct cortical areas in parallel. To identify a spoken vowel, for example, the auditory system must determine the positions of formant peaks in the spectral envelope of the vowel sound (Peterson and Barney, 1952). Vowel discrimination is therefore a timbre discrimination task. Meanwhile, in addition to its timbre, the pitch of a spoken vowel can convey information about the speaker's identity (Gelfer and Mikos, 2005) and emotional state (Fuller and Lloyd, 1992; Reissland et al., 2003), and so the periodicity of the vowel must also be analyzed. Finally, localization of the speaker requires processing binaural disparity cues and monaural spectral cues that are independent of the timbre and pitch of the vowel. Because many other species generate vocalizations in an entirely analogous fashion, processing the pitch, timbre and location of vowel-like sounds is an important task for the mammalian auditory system in general.
Building on earlier studies in the visual system (Mishkin and Ungerleider, 1982; Goodale and Milner, 1992), it is widely thought that a separation of function exists within higher-order auditory processing streams, such that more posterior, or dorsal, cortical areas mediate sound localization, whereas more anterior, or ventral, areas are responsible for object identification (Rauschecker et al., 1997; Romanski et al., 1999; Kaas and Hackett, 2000; Alain et al., 2001; Maeder et al., 2001; Tian et al., 2001; Warren and Griffiths, 2003; Barrett and Hall, 2006; Lomber and Malhotra, 2008). Consequently, we might expect the pitch and the timbre of a complex auditory stimulus to be represented in a separate region from its spatial location. On the other hand, it is not uncommon for listeners to find themselves in cluttered acoustic environments, where the pitch, timbre and spatial location of several sound sources may have to be tracked simultaneously. Separating the neural processing of different perceptual attributes could make this task harder, by creating a sort of ‘binding problem’.
Previous work has focused almost exclusively on differences across cortical areas in the representation of just one parameter, such as sound-source location (Recanzone, 2000; Stecker et al., 2005; Harrington et al., 2008) or pitch (Bendor and Wang, 2005). The extent to which these attributes are encoded independently has not previously been investigated. Here we used ‘artificial vowel’ sounds to investigate how pitch (as determined by the pulse rate), timbre (as determined by formant filter frequencies) and location are encoded within and across five identified areas of the auditory cortex of the ferret. Our aim was to determine the degree to which these perceptual attributes are represented in a mutually-independent fashion in both primary and secondary cortical fields, and to look for evidence for feature specialization across these fields.
All animal procedures were approved by the local ethical review committee and performed under licence from the UK Home Office in accordance with the Animal (Scientific Procedures) Act 1986. Five adult, female, pigmented ferrets (Mustela putorius) were used in this study. All animals received regular otoscopic examinations prior to the experiment, to ensure that both ears were clean and disease free.
Anesthesia was induced by a single dose of a mixture of medetomidine (Domitor; 0.022mg/kg/h; Pfizer) and ketamine (Ketaset; 5mg/kg/h; Fort Dodge Animal Health). The left radial vein was cannulated and a continuous infusion (5 ml/h) of a mixture of medetomidine and ketamine in physiological saline containing 5% glucose was provided throughout the experiment. The ferrets also received a single, subcutaneous, dose of 0.06 mg/kg/h atropine sulphate (C-Vet Veterinary Products, Leyland, UK) and, every 12 hours, subcutaneous doses of 0.5 mg/kg dexamethasone (Dexadreson; Intervet UK Ltd.) to reduce bronchial secretions and cerebral edema, respectively. The ferret was intubated, placed on a ventilator (7025 respirator, Ugo Basile) and supplemented with oxygen. Body temperature, end-tidal CO2, and the electrocardiogram (ECG) were monitored throughout the experiment. Experiments typically lasted between 36 and 60 hours.
The animal was placed in a stereotaxic frame and the temporal muscles on both sides were retracted to expose the dorsal and lateral parts of the skull. A metal bar was cemented and screwed into the right side of the skull, holding the head without further need of a stereotaxic frame. On the left side, the temporal muscle was largely removed, and the suprasylvian and pseudosylvian sulci were exposed by a craniotomy, exposing auditory cortex.(Fig. 1) (Kelly et al., 1986) . The dura was removed and the cortex covered with silicon oil. The animal was then transferred to a small table in an anechoic chamber (IAC Ltd.).
Sounds were generated using TDT system 3 hardware (Tucker-Davis Technologies) and MATLAB (MathWorks Inc.), and presented through customized Panasonic RPHV297 headphone drivers. Closed-field calibrations were performed using an 1/8th inch condenser microphone (Brüel and Kjær), placed at the end of a model ferret ear canal, to create an inverse filter that ensured the driver produced a flat (<±5 dB) output.
Pure tone stimuli were used to obtain frequency response areas (FRAs), both to characterize individual units and to determine tonotopic gradients, so as to confirm the cortical field in which any given recording was made. The tones used ranged, in 1/3-octave steps, from 200 Hz to 24 kHz, and were 100 ms in duration (5 ms cosine ramped). Intensities ranged from 10 to 80 dB SPL in 10 dB increments. Each frequency-level combination was presented pseudorandomly at least 3 times, at a rate of one per second. Artificial vowel stimuli were created in MATLAB, using an algorithm adapted from Malcolm Slaney's Auditory Toolbox (http://cobweb.ecn.purdue.edu/~malcolm/interval/1998-010/). Click trains with a duration of 150 ms and a repetition rate corresponding to the desired fundamental frequency were passed through a cascade of four bandpass filters to impart spectral peaks at the desired formant frequencies. The vowel sounds were normalized to have equal root-mean-square amplitudes, and calibrations were performed using an 1/8th inch condenser microphone (Brüel and Kjær) to ensure that changes in pitch or timbre did not influence the overall sound pressure level. Virtual acoustic space (VAS) techniques were then used to add sound-source direction cues to the artificial vowel sounds. A series of measurements including head size, sex, body weight and pinna size, were taken from each ferret in order to select the best match from our extensive library of ferret head-related transfer function recordings. We have shown previously that ferret spectral localization cue values scale with the size of the head and external ears (Schnupp et al., 2003). Sound-source direction cues were generated by convolving the artificial vowel sounds with minimum phase filters that imparted the appropriate interaural level differences and spectral cues corresponding to a particular direction in the horizontal plane, and which at the same time equalized out any differences in the headphone transfer functions that had been revealed during headphone calibration. Small delays were then introduced in the sound waveforms to generate appropriate interaural time differences.
We presented sounds from four virtual sound-source directions (−45°, −15°, 15° and 45° azimuth, at 0° elevation) and used four sound pitches with F0 equal to 200, 336, 565 and 951 Hz. Four timbres were chosen: /a/ with formant frequencies F1-F4 at 936, 1551, 2815 and 4290 Hz; /ε/ with formant frequencies at 730, 2058, 2979 and 4294 Hz; /u/ with formant frequencies at 460, 1105, 2735 and 4115 Hz; and /i/ with formant frequencies at 437, 2761, 3372 and 4352 Hz. The permutation of these 4 pitches by 4 timbres by 4 source directions gave us a stimulus set of 64 sounds.
Recordings were made with silicon probe electrodes (Neuronexus Technologies). In two animals, we used electrodes with an 8 × 4 configuration (8 active sites on 4 parallel probes, with a vertical spacing of 150 μm). In a small number of recordings in one of these animals, and in another animal, we used electrodes with a 16 × 2 configuration (16 active sites spaced at 100 μm intervals on each of two probes). In the final two animals, electrodes with 4 × 4 and 16 × 1 configurations were used (100-150 μm spacing of active sites on each probe). The electrodes were positioned so that they entered the cortex approximately orthogonal to the surface of the ectosylvian gyrus. A photographic record was made of each electrode penetration to allow later reconstruction of the location of each recording site relative to anatomical landmarks (surface blood vessels, sulcal patterns), to allow us to construct functional maps of the auditory cortex.
The neuronal recordings were bandpass filtered (500 Hz - 5 kHz), amplified (up to 20,000 times), and digitized at 25 kHz. Data acquisition and stimulus generation were performed using BrainWare (Tucker-Davis Technologies).
Spike sorting was performed offline. Single units were isolated from the digitized signal either by manually clustering data according to spike features such as amplitude, width and area, or by using an automated k-means clustering algorithm, in which the voltage potential at 7 points across the duration of the spike window served as variables. We also inspected auto-correlation histograms, and only cases in which the inter-spike-interval histograms revealed a clear refractory period were classed as single units.
Analysis of responses to vowel stimuli was performed blind to animal number or the position of the electrode penetration relative to anatomical landmarks. Before examining how the responses to vowel stimuli varied as a result of their cortical location, each penetration was first assigned to a cortical field on the basis of the responses of units to simple stimuli. This was done by measuring pure-tone FRAs at all recording sites, and comparing these to previously-documented physiological criteria for each of the fields that have been characterized in ferret auditory cortex (Bizley et al., 2005). Penetrations were assigned to a given cortical field according to the characteristic frequency (CF) and tuning properties derived from the FRA and the latency and duration of the response, together with photographs recording the location of the electrode penetrations on the cortical surface and the overall frequency organization obtained for each animal.
Having established the recording locations within each individual animal and noted that there were consistent trends in the responses to vowel sounds between cortical fields across animals, we established a “composite” auditory cortex map. Field boundaries were determined on the basis of the responses to pure tones and noise bursts, but blind to the responses to vowel sounds. This approach has previously been used to investigate the representation of multisensory responses in ferret auditory cortex (Bizley and King, 2008). To create the composite map, the penetration locations and cortical field boundaries for each individual animal were projected onto a single animal frequency map derived using optical imaging of intrinsic signals. This map was a representative example taken from Nelken et al. (2004). This procedure was performed separately for each animal. Morphing each animal's cortical map onto a single example in this way allowed the data from each animal to be superimposed in a bias free fashion.
We used a set of 64 artificial vowel sounds, comprising all possible combinations of four spatial locations, four pitches and four timbres. The parameter values were chosen to be quite widely spaced along each of the three dimensions. VAS stimuli were presented at −45°, −15°, 15° and 45° azimuth at 0° elevation, with negative azimuths denoting locations to the animal's right, contralateral to the recording sites. Fundamental frequencies of 200, 336, 565 and 951 Hz were used, and the four timbres corresponded to the vowels: /a/, /ε/, /u/ and /i/. These parameter ranges were chosen to make the stimuli easily discriminable along each perceptual dimension, both for human listeners, and, as far as we know from available psychoacoustical data, also for our animal model species, the ferret. The azimuth spacing of 30° corresponds approximately to two to three ferret behavioural just noticeable difference (JND) limens (Parsons et al., 1999). The perceptual distance of the pitch steps used here (0.75 octaves) is similarly about twice as wide as the ferrets' JNDs (Walker et al., submitted). The ferrets' ability to discriminate the spectral envelopes associated with the four different vowels has not yet been formally investigated, but preliminary experiments have demonstrated that ferrets rapidly learn to discriminate the identity of these vowel sounds and do so across at least a two octave range of pitches (Bizley JK, Walker KM, King AJ, Schnupp JWH, unpublished observations).
Each artificial vowel stimulus was 150 ms long. Figure 1A shows the frequency spectra for the 16 possible combinations of pitch and timbre. This illustrates that both timbre and pitch changes affect the spectral envelope of the sounds, and presenting these sounds from different virtual directions can introduce further changes in the spectral envelope. Yet while changes in location, pitch and timbre all affect the sound spectrum, the perceptual consequences of these changes are quite distinct. If the perceptual distinction between pitch, timbre and location is reflected at the level of neuronal discharges in auditory cortex, then this stimulus set ought to reveal this separation according to perceptual categories.
Extracellular recordings were performed using multi-site silicon electrodes in anesthetized ferrets. We sampled over 900 acoustically-sensitive recording sites. At 615 of these, we were able to obtain stable recordings of neural responses to 30-40 presentations of each of the 64 artificial vowel stimuli, which were presented in a randomly interleaved order. 324 recordings were from single units and 292 were small clusters of units. Because we were unable to find any systematic difference in the response properties of single units and small unit clusters, the term “unit” will be used to refer to both groups.
We observed a rich variety of response types across units, and responses were frequently clearly modulated by more than one, and often all three, stimulus dimensions. Figures 1B and C show the responses from two different units. In each case, the three panels show the same data plotted three times, with the 64 stimuli ordered into groups of 16 with a common azimuth, pitch or timbre (first, second and third panel, respectively). These examples illustrate a common finding: tuning to stimulus pitch, timbre or azimuth alone did not adequately describe the responses of these units and their responses could not be captured satisfactorily as a single spike count value. Rather, neurons often showed a degree of sensitivity to each parameter (e.g. Fig. 1B), and/or to particular combinations of parameters (Fig. 1C) in a time-dependent fashion. Further examples are illustrated in supplemental Figure 1.
In order to examine these stimulus effects, we constructed post stimulus time histogram (PSTH) matrices, in which data were sorted according to two of the three stimulus parameters and pooled across the third. Figures 2A and C show such PSTH matrices for the unit illustrated in Fig. 1B. The 16 panels to the top left of Figure 2A show the data arranged according to all 16 timbre × pitch combinations. The first four columns show the PSTH for the responses to timbres corresponding to /i/, /u/, /ε/ and /a/, while the top four rows show the data for pitches at 200, 336, 565 and 951 Hz, respectively. The rightmost column in Figure 2A shows the mean response for each pitch, averaged across all timbres and azimuths, while the bottom row shows the mean response for each timbre, and the bottom right PSTH illustrates the grand average response across all stimuli.
Displaying the data in this manner makes it easier to appreciate the effect of varying either stimulus pitch or timbre. To describe these effects, we shall adopt the terminology used in ANOVA-type linear statistical models, with mean spike rate during some small time interval as our ‘response variable’, while pitch, timbre, location and post-stimulus time serve as ‘explanatory variables’. We treat these as categorical variates, as we cannot assume the relationship between stimulus parameter value and spike rate to be linear or even monotonic. Within that conceptual framework, comparing the top four panels of the rightmost column with the bottom right panel therefore reveals the ‘main effect’ of varying pitch on the discharge pattern on this unit. To make it easier to visualize the ‘main effects’ of each stimulus parameter, we plot the individual PSTHs in the rightmost column and bottom row on top of a color scale that shows how each particular PSTH differs from the grand average at each time bin. Red means the neural firing rate is, at that time point, larger than average, blue indicates that it is below average, and the saturation of the color encodes the size of the difference.
The grand average PSTH in the bottom right panel in Figure 2A shows that the unit responded to artificial vowel sounds with an initial increase in firing rate, which peaked at a rate of ~50 Hz at ~50 ms post stimulus onset, followed by a smaller second peak at ~180 ms. The main effect of presenting a relatively low pitch (200 Hz, top panel of the last column) was to decrease the size of the first peak in the PSTH and to increase that of the second. Conversely, the main effect of high pitches (951 Hz, fourth panel in the last column) was to increase the size of the first response peak and to decrease the second. Similarly, the main effect of varying timbre can be appreciated by comparing the panels in the bottom row of the PSTH matrix in Figure 2A. A timbre corresponding to the vowel /u/ strongly enhanced the initial response peak (bottom row, second panel), whereas the timbre for /i/ suppressed it (bottom row, first panel), but timbre changes did not affect the later part of the response. This unit was therefore sensitive to both pitch and timbre, and the effects of changing pitch or timbre were manifest at different latencies after stimulus onset.
In the conceptual framework of an ANOVA-style analysis, the simplest assumption for responses to a particular pitch/timbre combination would be that the main effects of pitch and timbre might be additive. To look for non-linear interactions between the stimulus dimensions, we compare the PSTHs in the main body of the matrix against the values that would be predicted from the linear sum of the ‘main effects’, which are shown by the color scales in the rightmost column and bottom row of Figure 2A-D. The color scales in the main body (first four rows and columns) of the PSTH matrix show these ‘two-way interactions’. Therefore any deviation from white shows that the response observed was non-linear, with red colors indicating a supra-additive response, and blue colors indicating a sub-additive one. For example, in the unit shown in Figure 2A, the combination of /i/ and a 200 Hz F0 elicited a supra-linear response, while the combination of /i/ and 951 Hz F0 resulted in a response that was smaller than the linear prediction. However, examination of the absolute values for the interactions and main effects shows that the size of the two-stimulus interactions were small relative to the “main effects” of any one stimulus parameter; the interaction coefficients did not exceed ±33 spikes/s compared to ±86 spikes/s for the largest main effect.
Figure 2C shows the azimuth-by-pitch main effects and interactions for the same unit, while Figures 2B and D show the pitch-by-timbre and pitch-by-azimuth main effects and interactions, respectively, for a second sample unit (the same as that illustrated in Figure 1C). Although this second sample unit exhibited rather different temporal discharge patterns, like the first, it was clearly influenced by more than one stimulus dimension. As we shall see further below, the data shown in Figure 2 were fairly typical of many of the units recorded throughout all cortical areas characterized. Thus, most units were sensitive to more than one stimulus dimension, their firing patterns could change at various times post-stimulus onset, and non-additive interactions between stimuli were not uncommon.
To quantify the strength and significance of these main effects and interactions, we performed a 4-way ANOVA on the spike counts, averaged across the 30-40 repeat presentations for each of the 16 stimuli, in each 20-ms bin for the first 300 ms after stimulus onset. In this manner, each response was represented as a vector of 15 sequential spike counts. Our choice of 20 ms bin widths was based on previous studies of ferret auditory cortex, which indicate that this is likely to be a suitable temporal resolution for decoding neural responses (Schnupp et al., 2006; Walker et al., 2008). In this ANOVA, the 3 stimulus parameters (azimuth, pitch and timbre) plus the time bin served as factors. To quantify the relative strength with which one of the three stimulus dimensions influenced the firing of a particular unit, we calculated the proportion of variance explained by each of azimuth, pitch and timbre, Varstim, as:
where “stim” refers to the stimulus parameter of interest (pitch, timbre or azimuth), SSstim · bin is the Sum of Squares for the interaction of the stimulus parameter and time bin, SSerror is the Sum of Squares of the error term, dfstim . bin refers to the degrees of freedom for the stimulus × time bin interaction, SStotal is the total Sum of Squares, and SSbin is the Sum of Squares for the time bin factor. A significant SSbin reflects the fact that the response rate was not flat over the duration of the 300 ms response window. This is in itself unsurprising, but by examining the stimulus-by-time-bin interactions, we were able to test the statistical significance of the influence a given stimulus parameter had, not just on the overall spike rate, but also on the temporal discharge pattern of the response. Stimulus-by-time-bin interactions were common, and revealed how a particular stimulus parameter influenced the shape of the PSTH. Subtracting the SSerror · dfstim · bin from the SSstim · bin term allows us to calculate the proportion of response variance attributable to each of the stimuli, taking into account the additional variance explained simply by adding extra parameters to the model. For the responses shown in Figure 2A and C, the percentage of variance explained by the stimulus main effects was 5% for azimuth, 17% for pitch and 56% for timbre, while for the unit shown in Figure 2B and D, the main effects of azimuth, pitch and timbre accounted for 3%, 19% and 7%, respectively, of the variance in the neural discharge patterns. Thus, while both units were significantly influenced by all 3 stimulus dimensions, one might justifiably describe the first as being ‘predominantly’ sensitive to timbre and the second to pitch. Only 23% of units were significantly (p <0.001) modulated by either azimuth, pitch or timbre alone. By contrast, 36% of neural responses were dependent on two of the three stimulus dimensions and 29% of units were influenced by all three. The responses of the remaining 12% of units were not significantly modulated by any of the stimuli.
As mentioned above, we combined data from single units and from small multi-unit clusters for most analyses. In order to verify that any joint sensitivity was not a result of recording from more than one neuron, we determined the proportion of units sensitive to combinations of two parameters for single units alone. Sensitivity to both pitch and timbre was observed in 56% of all recording sites and in 52% (169/324) of the single units. Thirty per cent of all recordings and 33% of all single units were sensitive to both pitch and azimuth, while 34% of all recordings and 36% of single units were sensitive to timbre and azimuth. Thus, we were equally likely to observe combination sensitivity in both multi-units and well separated single units.
To examine the importance of the temporal discharge pattern in the neural response, we also performed an ANOVA that was restricted to the overall spike counts calculated over the first 75 ms after stimulus onset. This ANOVA was performed again using pitch, azimuth and timbre as independent variables, but this time excluded post-stimulus time. It resulted in far fewer units exhibiting significant response modulation as a function of these stimulus attributes. For example, using a single spike count measure, only 11% of all units exhibited significant (p <0.05) sensitivity to sound azimuth, although 36% of units had shown a significant (p <0.001) time-bin*azimuth interaction. The same comparison for pitch and timbre yielded values of 18% as compared to 66% for pitch, and 29% compared to 73% for timbre. Over this single time window of 75 ms, the discharge rates of 60% of units no longer exhibited any significant stimulus main effects, and only 13% of units were modulated by more than one parameter. Analyzed in this manner, the unit shown in Figure 2A, for example, was found to be sensitive only to timbre, while that depicted in Figure 2B was no longer sensitive to any of the three stimulus dimensions.
When we performed the analysis using a longer response window (300 ms, data not shown) there were even fewer neurons whose responses were significantly modulated by these stimuli, and neither unit shown in Fig. 1 was found to be sensitive to pitch, azimuth or timbre. Figure 2A shows that, for one of these units, the early and the late part of the response varied in opposite ways with stimulus pitch. The resulting ‘cancellation’ of stimulus effects is the likely to be the main reason why using an inappropriately wide analysis window failed to give a significant result despite clear stimulus dependence. These results clearly demonstrate that highly significant but transient stimulus effects can easily be missed in an analysis that is temporally too coarse-grained. We therefore adopted the response variance explained statistics described in Eqn. 1 as our preferred measure of neural stimulus sensitivity for the further analyses described below.
To examine the distribution of stimulus sensitivity across auditory cortex, recordings were made in 5 of the 7 previously identified acoustically responsive areas in the ferret ectosylvian gyrus: the primary and anterior auditory fields (A1 and AAF), the tonotopically-organized posterior pseudosylvian and posterior suprasylvian fields (PPF and PSF), which are located on the posterior bank of the ectosylvian gyrus, and the non-tonotopic anterior dorsal field (ADF) on the anterior bank (Fig. 3A) (Kowalski et al., 1995; Nelken et al., 2004; Bizley et al., 2005). These 5 areas make up the auditory ‘core’ (A1 and AAF), and ‘belt’ (PPF, PSF, ADF) areas in this species. We have previously reported that ~60% of neurons in the anterior ventral field (AVF) are visually sensitive (Bizley et al., 2007). AVF and the ventroposterior (VP) area are likely to be ‘para-belt’ areas and were not included in the present study. In 4 out of 5 animals, recordings were made in all 5 cortical fields and, in the remaining animal, responses were recorded in 3 fields.
The locations of each field were determined for individual animals as described in the Materials and Methods. Because we used multi-site electrode arrays and commonly recorded several units at different depths, the range of CFs obtained in these cases was visualized by plotting one Voronoi tile for each unit recorded, arranged in a circular fashion around the penetration site, with the most deeply recorded unit shown rightmost and proceeding clockwise from deep to superficial. The composite frequency map obtained in this manner is shown in Figure 3B. The low frequency regions that mark the boundaries of fields AAF and ADF and of A1, PPF and PSF are readily apparent.
To investigate the anatomical distribution of sensitivity to pitch, timbre and location, we generated composite feature sensitivity maps using methods described further below. Figures 3C-E map the proportion of variance explained (Eqn 1) by the ‘main effects’ of azimuth, pitch and timbre, respectively, for every penetration made, onto the surface of the auditory cortex. Each ‘tile’ in the Voronoi tesselation shows the average value obtained for all units recorded at that site. The color scale indicates the proportion of variance explained with darker, red colors indicating low, and brighter, yellow colors indicating high values. These plots suggest that there are areas in which clusters of units have a higher sensitivity to each stimulus parameter. Highly azimuth sensitive units were particularly common in A1, as well as in an area encircling the tip of the pseudosylvian sulcus. Highly pitch sensitive units were most commonly found in the middle of auditory cortex, around the point at which the low frequency edges of the tonotopic core and belt areas converge, as shown in Figure 3B. Timbre sensitivity was highest in the primary fields and along the low frequency ridge that separates the two posterior fields, PPF and PSF. To illustrate the range of percentage variance explained in any one electrode penetration, Figures 3F-H plot the values for every unit recorded, with each tile representing a single recorded unit, and multiple units from a single penetration arranged in a circular fashion, as in Figure 3B, around the site of the penetration. Overall, Figure 3 suggests that there are ‘clusters’ of recording sites that are relatively more sensitive to stimulus azimuth, pitch or timbre, but these are not obviously restricted to a particular subset of the 5 tonotopic cortical fields investigated here, and, in each field, we observed considerable unit-to-unit variability in the sensitivity to pitch, timbre and location.
The distribution of parameter sensitivity for each cortical area was visualized in box-plot format (Figures 3I-K). Despite the very wide unit to unit variation, some degree of specialization is nevertheless apparent, as, for all three stimulus dimensions, there were significant differences in the proportion of variance explained by cortical field (Kruskal-Wallis test, χ2 = 27, 68, and 77 respectively, p <0.0001). Tukey-Kramer post-hoc comparisons (p <0.05) revealed that azimuth sensitivity was, on average, significantly higher in A1 and PPF than in AAF, PSF and ADF, pitch sensitivity tended to be more pronounced in A1 and the posterior fields PSF and PPF than in AAF and ADF, whereas AAF showed the highest average level of timbre sensitivity. These trends were seen consistently in each of the animals, and not just in the pooled data, and a jack-knife test was used to establish that no one animal contributed disproportionately to the small but significant differences reported here. The same statistical tests (i.e. Kruskal-Wallis and Tukey Kramer post-hoc tests) were implemented 5 times, excluding each of the five animals in turn. In all cases, the same significant trends across cortical areas were preserved.
Our analysis method quantified the linear contribution of each of the three stimulus dimensions to the units' responses, as well as the non-linear effects of presenting particular stimulus combinations. This non-linear effect is quantified in our ANOVA by the three-way interaction of the time bin factor with two different stimulus parameters, and can be thought of as a measure of combination sensitivity. The interaction coefficients measure non-additive (“multiplicative”) interactions between categorical variates, and are analogous to a logical AND operation (by how much does the response differ from the purely linear, additive expectation for specific parameter combinations e.g. when timbre = /i/ AND azimuth = 45°?). Over two-thirds of units tested were sensitive to more than one stimulus dimension and 41% of all units showed significant (p <0.001) non-linear interactions for at least one combination of pitch × timbre, azimuth × pitch, or azimuth × timbre. We also investigated whether it would be possible to describe the nature of these non-linear interactions either as ‘predominantly expansive’ or facilitatory, or as ‘predominantly compressive’ or saturating. Expansive non-linearity would make the response of a neuron more selective for a particular stimulus parameter combination, while saturating non-linearities would have the opposite effect. Expansive non-linearities are supra-additive, and would result in positive interaction coefficients which grow systematically as main effect coefficients grow larger. Compressive, sub-additive non-linearities, would, by contrast, result in large, positive main effect coefficients being associated with negative interaction coefficients. We therefore compared the sums of main effect coefficients (the ‘predicted additive response’) with their corresponding interaction coefficients, but we found no significant systematic trends or relationships. The interaction effects thus appear to be too variable to allow them to be characterized generally as overall predominantly expansive or compressive.
The distribution across the cortical surface of sensitivity to combinations of stimulus parameters is shown in Figure 4A-C. Non-linear interactions were most commonly observed in A1 and AAF, where 54% and 56% of units exhibited significant interactions between two or more stimuli, respectively. The most common interactions were between pitch and timbre, although azimuth × timbre sensitivity and azimuth × pitch sensitivity were also observed (Fig. 4D). The proportion of units exhibiting significant interactions fell to 36%, 30% and 15% in fields PPF, PSF and ADF, respectively. While the number of units showing interactions varied between cortical fields, the overall magnitude of the interaction term did not vary between cortical areas for either the azimuth × pitch or azimuth × timbre conditions. This is shown in Figure 4E by plotting the proportion of variance explained by the sum of the spatial and non-spatial interaction terms (azimuth × pitch and azimuth × timbre) for each of the five cortical areas (Kruskal-Wallis test, χ2 = 5.5, p = 0.24). In contrast, the proportion of response variance explained by the pitch × timbre interaction term did show a significant variation across cortical fields (Fig. 4F, Kruskal-Wallis test, χ2 = 48.2, p <0.001), with combination sensitivity for non-spatial parameters accounting for more of the variance in the primary areas A1 and AAF than in the posterior fields PSF and PPF, which in turn, had higher values than ADF. This distribution is very similar to that observed for the timbre main effects.
Both the number and magnitude of the pitch × timbre interactions were greater than for either the timbre × azimuth or pitch × azimuth interaction terms, supporting a separation of spatial and non-spatial attributes throughout auditory cortex. However, this separation is far from complete: of the 254 units in which there was a significant interaction, 225 were sensitive to pitch-timbre combinations and 60 of these (i.e. 26%) were also sensitive to pitch-azimuth or timbre-azimuth combinations. In summary, while interactions in the “what” domain were more common, a substantial minority of units were sensitive to combinations of both spatial and non-spatial stimulus features.
Our multi-site recording electrodes allowed us to make simultaneous recordings at 8-16 different depths throughout the cortex. We grouped recordings coarsely into ‘superficial’ or ‘deep’ simply based on whether or not the recording site was within or >800 μm below the cortical surface. These divisions coarsely divide units into supra and infra-granular cortical layers. The distribution of proportion of variance explained values are plotted in Figure 5. Azimuth sensitivity was found to be greatest in the deeper cortical layers (2-sampled t-test, p =0.01), whereas pitch and timbre sensitivity were greater in the superficial layers (p = 0.004 and p <0.001, respectively). Pitch-timbre interactions were also more common in the superficial layers (p = 0.002) while pitch-azimuth (p = 0.33) and timbre-azimuth (p = 0.35) interactions were found to be equally distributed in depth. Data were pooled across all cortical areas for these analyses, but similar trends were observed in each of the five cortical fields individually.
In Figure 3E we showed that clusters of units with high timbre sensitivity were commonly found in the low CF border region between A1, PPF and PSF. This raises the question of whether sensitivity to timbre, or indeed to other stimulus parameters, varies systematically with unit CF. However, scatter plots of unit CF against the proportion of variance explained (supplemental Fig. 2A-C) showed no systematic relationship between unit CF and azimuth, pitch or timbre sensitivity. Furthermore, pitch, timbre or azimuth sensitivity was just as common among units that were unresponsive or untuned to pure tones as in units with clearly defined CFs (supplemental Fig. 2D-F), and there were no CFs at which it was particularly easy or difficult to obtain vowel responses (supplemental Fig. 1G). We also found no significant correlations between CF and best pitch (defined as the pitch that elicited the most spikes per presentation) for pitch-sensitive units with a CF below 1 kHz.
It has been proposed that a division of labor exists across auditory cortical areas whereby the ability of humans and animals to recognize or localize auditory objects can be attributed to anatomically-separate processing streams. This concept is inspired by earlier studies that postulated distinct hierarchies for processing different visual features, such as color or motion, in extrastriate visual cortex (Ungerleider and Haxby, 1994). Parallel processing in the auditory system is supported by behavioral-deactivation (Lomber and Malhotra, 2008), functional imaging (Alain et al., 2001; Maeder et al., 2001; Warren and Griffiths, 2003; Barrett and Hall, 2006) and electrophysiological studies (Recanzone, 2000; Tian et al., 2001; Cohen et al., 2004), as well as by anatomical evidence showing differences in the connectivity of these regions (Hackett et al., 1999; Romanski et al., 2000; Bizley et al., 2007). Most of the physiological studies compared the sensitivity of different auditory cortical areas to only one stimulus parameter, such as spatial location or pitch. Here we adopted a different approach, based around a stimulus set in which several stimulus dimensions were systematically varied, to explore the relative sensitivity of neurons in different cortical fields to azimuth, pitch and timbre.
Because we examined neural responses to three stimulus attributes, the range of values selected in each case was necessarily limited, but nonetheless covered behaviorally pertinent and broadly comparable ranges. Ferrets can accurately localize low-frequency narrowband noise bursts (Kacelnik et al., 2006) and lateralize the same synthetic vowels that were used in the present experiment. While the range of azimuths used covered only the frontal quadrant of auditory space, behavioral and electrophysiological studies have shown particularly high sensitivities within this region to changes in sound-source direction (Mrsic-Flogel et al., 2005; Nodal et al., 2008). The separation of our stimuli in both azimuth (Parsons et al., 1999) and pitch (Walker et al. 2009, in press) was large compared to psychoacoustic difference limens. There have been few studies of timbre discrimination in animals, but chinchillas can discriminate the vowels /i/ and /a/ across a range of speakers and pitches (Burdick and Miller, 1975), and our own data demonstrate that ferrets can accurately discriminate the vowel timbres used in this study (Bizley JK, Walker KM, King AJ, Schnupp JWH, unpublished observation).
The bulk of behavioural and neurophysiological studies of auditory cortex have been carried out in cats. A1 is well conserved across species, and AAF appears to be equivalent in cat and ferret (Kowalski et al., 1995; Imaizumi et al., 2004; Bizley et al., 2005). The posterior fields, PPF and PSF, in the ferret share certain similarities with the posterior auditory field (PAF) in the cat, both being tonotopically organized and containing neurons with different temporal response properties from those in the primary fields (Phillips and Orman, 1984; Stecker et al., 2003; Bizley et al., 2005). Like the cat's secondary auditory field (Schreiner and Cynader, 1984), ferret ADF lacks tonotopic organization and contains neurons with broad frequency response areas (Bizley et al., 2005). Although strict homologies have yet to be established, auditory cortex appears to be organized in a similar fashion in these species.
We found that, as in cats (Harrington et al., 2008), sensitivity to changes in sound azimuth varies across auditory cortex, but is nonetheless a property of all areas examined. This contrasts with recent behavioral data in cats (Lomber et al., 2007), which suggest that certain cortical fields, such as A1 and PAF, are required for normal sound localization, whereas others, such as AAF, are not.
However, complex behaviors, like remembering to approach a sound source for a reward, are bound to require cognitive control from high-order cortical areas, most likely in frontal cortex. The profound differences observed in response to cooling different cortical fields may therefore have less to do with the physiological properties of neurons in those areas than with their projections to higher-order brain regions (Hackett et al., 1998; Romanski et al., 1999).
Our findings are also in broad agreement with many previous studies of cortical pitch sensitivity. Our ‘pitch sensitivity’ test was less stringent than that used by Bendor and Wang (2005), who reported a pitch-selective area in marmoset auditory cortex. However, in agreement with their observations, we found pitch-sensitive neurons to be more common in the superficial cortical layers and to occur frequently (although not exclusively) near the low-frequency border of A1. Nevertheless, overall, we did not find a correlation between unit CF and pitch sensitivity or a tendency for low CF neurons to be more sensitive to pitch. The lack of a single pitch center in ferret auditory cortex is supported by the results of imaging studies in ferrets (Nelken et al., 2008) and humans (Hall and Plack, 2008).
The neural basis for timbre processing has been much less widely studied. Previous studies have demonstrated that the response properties of neurons in A1 are well suited to detect spectral envelope cues (Calhoun and Schreiner, 1998; Versnel and Shamma, 1998). Moreover, the spectral integration properties of A1 neurons have been reported to be topographically organized in A1 in a manner that might support vowel discrimination based on the frequency relationship of the first and second formants (Ohl and Scheich, 1997). Consistent with these studies, we found that sensitivity to vowel timbre was greatest in the primary auditory cortical areas of the ferret. Nevertheless, as with azimuth and pitch sensitivity, the responses of neurons recorded in all five cortical fields were modulated by timbre. Lesions of the dorsal/rostral auditory association cortex, but not A1, in rats impaired performance on a multi-formant vowel discrimination task (Kudoh et al., 2006). Using fMRI, timbre sensitivity has been demonstrated in both posterior Heschl's gyrus and the superior temporal sulcus in humans (Menon et al., 2002; Kumar et al., 2007).
Previous studies have reported systematic differences in neural tuning properties along isofrequency laminae in A1. Changes in the representation of properties such as tone threshold and tuning bandwidth (Schreiner and Mendelson, 1990; Cheung et al., 2001; Read et al., 2001) and binaural response characteristics (Middlebrooks et al., 1980; Rutkowski et al., 2000) have been observed. However, the sampling density of recording sites in individual animals was insufficiently fine to investigate whether this was the case for any of the parameters investigated in the present study.
Our analysis revealed that the neural encoding of pitch, location and timbre cues is interwoven and distributed across auditory cortex. These methods were sensitive to changes in neural firing that occurred over time and were able to capture the effects apparent in the raw data in a way that a simple spike-count measure failed to. Previous studies have also shown that the timing of spikes in auditory cortex carries information useful for discriminating natural sounds (Schnupp et al., 2006; Gourevitch and Eggermont, 2007), as well as about a sound's pitch (Steinschneider et al., 1998) and location (Furukawa and Middlebrooks, 2002; Nelken et al., 2005).
Our stimuli spanned three independent perceptual dimensions, but very few neurons in any of the cortical fields examined were sensitive to changes in azimuth, pitch or timbre only. Nevertheless, small but significant differences in average neuronal sensitivity were observed across cortical areas and depths. These subtle regional differences could provide the basis for the subsequent anatomical segregation of spatial and non-spatial information in higher-order cortical areas. While it is possible that cortical areas other than those sampled here exhibit greater functional specialization, or that a clearer distinction might be apparent with other stimulus features, such as those that are temporally modulated, it is important to remember that spatial and non-spatial aspects of sounds often have to be considered together. For instance, in order to operate effectively in the presence of multiple sound sources, it is necessary to be able to track specific pitch, timbre and sound-source location combinations over time. Spatial sensitivity has been reported within auditory cortical and prefrontal areas thought to be concerned with sound identification (Cohen et al., 2004; Gifford and Cohen, 2005; Lewald et al., 2008). Moreover, a recent study (Recanzone, 2008) documenting sensitivity to monkey calls found that neurons throughout auditory cortex were equally selective in their responses. Interactions between spatial and non-spatial processing streams are known to occur in the visual cortex (Tolias et al., 2005). Such effects are likely to be particularly important in audition, where multiple sounds can be perceived simultaneously at several locations.
We found that cortical neurons often responded non-linearly to feature combinations. This was particularly the case for pitch and timbre in the primary fields, A1 and AAF. This apparent combination sensitivity could simply reflect intermixing of relatively low level sensitivity to multiple sound features. However, it has been argued that A1 might represent auditory objects by grouping together physical stimulus attributes from a common source, with higher-order cortical areas extracting perceptual features, such as the object's location, from this object-based representation (Nelken and Bar Yosef, 2008). Grouping stimulus attributes is essential for tracking a sound source through a potentially cluttered acoustic environment (Bregman, 1990), and the non-linear sensitivity that we observed in A1 may be ideally suited to achieving this. We observed a decrease in non-linear feature interactions away from A1, suggesting an increasingly independent representation of these perceptual dimensions in higher auditory cortex. Although still sensitive to different sound features, this sensitivity was well described by linear interactions, which help to preserve information for subsequent processing.
Ultimately, the question of interest is how this distributed network of neurons contributes to perception. In humans, selective attention has been shown to modulate putative localization and identification pathways independently (Ahveninen et al., 2006). The true extent to which a division of labor exists within auditory cortex may therefore become apparent only when animals use the activity in these areas to listen to different attributes of sound.
This work was supported by the by the Biotechnology and Biological Sciences Research Council (grants BB/D009758/1 to J.W.H. Schnupp, A.J. King and J.K. Bizley) the Engineering and Physical Sciences Research Council (grant EP/C010841/1 to J.W.H. Schnupp), a Rothermere Fellowship and Hector Pilling Scholarship to K.M.M. Walker, and by a Wellcome Trust Principal Research Fellowship to A. J. King. We are grateful to Israel Nelken for valuable discussion and comments on the manuscript.