|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: KI NJP SWC CES. Performed the experiments: KI SWC CES. Analyzed the data: KI NJP TOS CES. Contributed reagents/materials/analysis tools: TOS. Wrote the paper: KI NJP TOS SWC CES.
A central goal in auditory neuroscience is to understand the neural coding of species-specific communication and human speech sounds. Low-rate repetitive sounds are elemental features of communication sounds, and core auditory cortical regions have been implicated in processing these information-bearing elements. Repetitive sounds could be encoded by at least three neural response properties: 1) the event-locked spike-timing precision, 2) the mean firing rate, and 3) the interspike interval (ISI). To determine how well these response aspects capture information about the repetition rate stimulus, we measured local group responses of cortical neurons in cat anterior auditory field (AAF) to click trains and calculated their mutual information based on these different codes. ISIs of the multiunit responses carried substantially higher information about low repetition rates than either spike-timing precision or firing rate. Combining firing rate and ISI codes was synergistic and captured modestly more repetition information. Spatial distribution analyses showed distinct local clustering properties for each encoding scheme for repetition information indicative of a place code. Diversity in local processing emphasis and distribution of different repetition rate codes across AAF may give rise to concurrent feed-forward processing streams that contribute differently to higher-order sound analysis.
An ultimate goal in auditory neuroscience is to understand the neural coding of species-specific communication and human speech sounds, but the complexity of such sounds renders this challenge difficult. A common approach is to reduce intractable experimental questions to tractable ones by studying key coding features using parametric techniques. Periodic amplitude modulations are ubiquitous temporal features of species-specific communication and human speech sounds , . The modulation envelope of vocalization and speech (e.g., phonemes) is dominated by low repetition rates (<40 Hz) – and most cortical neurons limit their timing-locked responses to that modulation range . Speech and vocalization decoding depends strongly on the integrity of the low rate repetition modulation envelope –. Lesion studies in monkeys and humans have suggested that auditory cortex (AC) is necessary to process communication or speech sounds , . It has been proposed that precise spike timing may code slow repetition sounds, while firing rate (FR) may code faster repetition sounds in AC , – but see Ref. . A recent study in marmoset monkeys proposed that FR may code a particular range of slow to medium repetition rates (~10 to 45 Hz) in the anterior field of AC . A growing number of studies suggest that interspike interval (ISI) profiles are a viable neural code for temporal processing –. However, ISI analysis of AC response patterns is not yet well advanced. A particular issue is that spike-timing precision and FR are not completely independent measures. Both bear on the potential efficacy of an interval code. We investigated stimulus-related neural information of spike-timing precision, FR, and ISIs for coding slow repetition rates and their topographic organization by high-resolution multi-unit mapping of a primary auditory field in the ketamine-anesthetized cat. This approach should be able to clarify the roles of timing and place codes in conveying information about low stimulus repetition rates.
Temporal information by spike timing and FR often appears to be spatially distributed in AC , , . Organized spatial distributions (‘maps’) of these properties may provide an opportunity to explore how temporal information is represented by a population of cortical neurons . In the cat, two tonotopic fields comprise the primary core areas at a hierarchically equivalent level, primary AC (AI) and anterior auditory field (AAF) , . They receive largely independent, concurrent inputs from the different thalamic divisions ,  resulting in different distributions of spectral receptive field parameters , . Behavioral experiments with reversible cryoloop lesions suggest that cat AAF contributes to temporal-pattern discrimination  but is not involved in other functional tasks, such as sound localization . This supports the notion that AAF is part of a stimulus identification or ‘what’ pathway .
Time-locking in AAF has been shown in several species to cover a wider frequency range than in other cortical fields , , , although the range is still dominated by modulation rates <~40 Hz. This provides a comparatively wide repetition rate range to compare properties of phase locking, FR, and interval encoding of temporal information. Click trains are used to explore the encoding of repetitive stimuli in AAF. In contrast to sinusoidally amplitude-modulated signals , , , changes in click train repetition rates are not confounded by changes in stimulus rise times . Here, we investigate different neural encoding schemes of slow repetition rate sounds and their spatially arranged expressions of stimulus-related mutual information.
To understand neural coding of slow repetitive sounds in AC, we obtained repetition rate transfer functions (RRTFs) to quantify responses to click trains. A population code is assumed and no distinction is made between local multi-unit and single-unit responses. We employed a high-resolution cortical mapping technique with extracellular recordings ,  and reconstructed spatial organization via Voronoi-Dirichlet tessellation maps.
RRTFs were examined for 276 multi-unit recordings in cat AAF of three hemispheres (two left and one right). AAF is located anterior to AI and usually flanked by suprasylvian and anterior ectosylvian sulci , . There was no clear evidence of a temporal coding difference between left and right hemispheres and they were treated equally in the population analyses.
For RRTFs, two different measures have been used to describe temporal tuning. Spike-timing precision is expressed as vector strength (VS) measuring how well spikes are synchronized to the click stimulus relative to the duration of the repetition period (see Materials and Methods). VS values range from zero (spikes evenly distributed throughout the stimulus period) to one (spikes are perfectly aligned to a particular phase of the stimulus period). The other measure is average FR magnitude. Stimuli with low- and high-repetition rates may be coded differently by VS and FR –. Multi-unit examples of post-stimulus time histograms (PSTHs) for clicks at different repetition rates (1–38 Hz for many recording sites but up to 250 Hz presented 15 times; see Materials and Methods) reveal different response behaviors for VS and FR measures (Figs. 1A, S1A, S1B). Phase locking to the stimuli is expressed at varying degrees, with maximal values predominantly seen at low repetition rates. The corresponding RRTFs for the two measures, VS (magenta lines) and FR (cyan lines), show band-pass behavior (Fig. 1B). For a majority of recording sites, VS peaked at low repetition rates and declined with increasing (mid to high) repetition rates (spikes losing synchronization to the stimulus). Similar behavior was seen for FR (Fig. 1B). However, for other sites, FR often peaked at higher repetition rates than VS (Fig. S1C, S1D). On the average, FR peaked at 29.2±22.2 Hz (hereafter, expressed as mean ± standard deviation in the text), which was significantly higher than VS (12.8±8.1 Hz) (paired t-test; p<0.0001). This study was focused on encoding of low repetition rates (1, 6, 10, 14, 22, and 30 Hz), a range associated with the occurrence frequency of vocalization phrases, phonemes, or syllables (gray background in Figs. 1B, S1C, S1D) and with a high probability of encountering high temporal response fidelity.
Unlike VS and FR, ISI behavior has not been extensively studied in AC. Recent work, however, demonstrated that ISIs can present a sensible neural code for temporal processing , . The ISI distribution for a cortical recording site in response to a range of low repetition rates is illustrated in Figure 1C. Some recording sites (<4%) only showed ISIs at short intervals of 1 to 3 ms, compatible with bursting, but not at the intervals corresponding to the period of the presented repetition rate (Fig. S2). A more common occurrence is recording sites that express ISI peaks corresponding to integer multiples of stimulus intervals (stimulus phase-locked spikes) (Fig. 1C). Unlike VS and FR, ISI is not directly characterized by a single value. The coefficient of variation (CV), an estimate of ISI variability computed by dividing the ISI standard deviation by the mean, is the main descriptor of the ISI distribution. Figure 1D illustrates population histograms of CV of ISIs for six difference repetition rates for all recording sites in one hemisphere. The CV distributions of ISIs were quite similar across repetition rates. However, medium repetition rates differed slightly from 1 and 30 Hz (p<0.05; Fisher's protected least significant difference for a multiple t-statistics indicating higher ISI fidelity for that range). The CV of ISIs spanned from ~1 to >3, which was higher than values derived from visual cortex (<~1) , , although differences in the auditory and visual stimulus paradigms make a direct comparison of the values difficult.
The three response measures carried different amounts of information about the temporal stimulus properties. To quantify the information content, i.e., estimating the ability to discriminate between different repetition rates based on their cortical response, we considered the unit-basis mutual information (MI), as read out by an ideal observer (see Materials and Methods).
The MI for ISI was calculated for two conditions: ISI (1 ms) was based on all intervals ≥1 ms, whereas ISI (10 ms) was based only on intervals ≥10 ms, more closely matching the interval range contained in the presented repetition range. The different repetition rates can be distinguished only for ISI values ≥10 ms (cf. Fig. 1C). Shorter ISIs (<3ms) occurred most often, but they did not allow distinguishing between different information values of the presented repetition rate range. For the example site in Figure 1C, when we included only ISI values ≥10 ms, we obtained an information value of ~2 bits (out of a maximum of ~2.58 bits (=log2(6)) for comparing six stimuli). This value is consistent with the visual inspection of Figure 1C, where four of six repetition rates (10 Hz, 14 Hz, 22 Hz, and 30 Hz) can be reliably distinguished. At an ISI resolution of 1 ms, the information for the example is somewhat reduced (~1.5 bits/stimulus)
MI contained in ISI (hereafter, ISI information) was significantly higher than that contained in either VS or FR (hereafter, VS and FR information, respectively) (p<0.001; paired t tests adjusted by the sequential Bonferroni correction for multiple comparisons; Fig. 2A). Across all hemispheres, ISI (1ms) information averaged to 0.63±0.40 bits/stimulus compared to 0.14±0.12 for VS information and 0.18±0.18 for FR information. ISI (10ms) information was almost twice as high (1.18±0.54), reflecting the reduction of short-interval noise. ISI (10ms) information was highly correlated with ISI (1ms) information (r2=0.66; p<0.001).
To assess whether combining the different encoding schemes can capture an increased amount of repetition information over each individual scheme, we tested combinations of encoding pairs for all recording sites that resulted in a significant amount of information for the joint schemes (Fig. 2B). Analysis of a combination including all three schemes failed due to an insufficient number of appropriate sites.
Joint information estimates equal to the sum of information by the individual schemes would indicate non-redundancy of the contributing information. Combining VS and FR information (Fig. 2B, white bars) was almost equal to the sum of information by both individual schemes (Fig. 2B, black/gray bars), suggesting that each carried non-redundant information. Furthermore, combining FR and ISI (1ms or 10ms) codes resulted in an increase of information beyond the linear sum for the individual schemes. This behavior is indicative of non-redundant contributions of each scheme for encoding repetition information with a cooperative, synergistic component for the FR and ISI combination. By contrast, VS combined with ISI information resulted in lower information than the sum, indicating that both schemes conveyed redundant information (Fig. 2B).
ISI information correlated with the magnitude of all three response measures, VS, FR, and CV of ISIs (Fig. 3). The rest of the analysis will focus on the ISI information at 1 ms resolution since it does not assume a priori knowledge of the stimulus periodicity range. The maximum value evoked by any of the tested repetition rates was used to represent overall spike-timing precision (VS max) and FR magnitude (FR max) for each recording site. ISI precision is represented by the minimum coefficient of variation (CV min) of ISI. The strongest correlation existed between FR max and ISI information: the lower FR, the higher was the ISI information (Fig. 3B). CV min was negatively correlated with ISI information: the less ISI variability, the higher was the ISI information (Fig. 3C). VS max was weakly positively correlated with ISI information (Fig. 3A). All three measures contributed to ISI information indicating that temporal coding is not dominated by a single response aspect. VS information was weakly but significantly correlated with VS max: sites with high VS max values carry more VS information (Fig. 3D). By contrast, FR magnitude is not significantly correlated with FR information (data not shown).
Although the higher ISI information relative to VS and FR information suggested that discrimination between low repetition rates may be ISI dominated in cat AAF, the other forms of stimulus encoding still may be useful, especially if the three codes are either independently distributed or provide non-redundant information as demonstrated above for the joint information. Principal component analysis jointly applied to the three information estimates, the underlying response measures (VS max, FR max and CV min), as well as four additional receptive field parameters (characteristic frequency (CF), Q40, threshold and minimum latency, see Materials and Methods) revealed three orthogonal components of temporal processing (Table 1). As expected from the high ISI information values, the strongest factor, representing 28.6% of the variance across all three hemispheres, captured ISI information as well as the covariants of FR max and CV min. The second factor (17.0% of variance) was aligned with VS information and VS magnitude. The fifth factor (9.0% of variance) was dominated by repetition rate information carried by FR. This analysis indicates that the three schemes of repetition rate information carrier fall along orthogonal axes. However, it does not imply that they are completely independent from each other, as already shown by the joint information analysis. They indicate, however, that some of these aspects capture largely uncorrelated, non-redundant aspects about the repetition rates. The temporal factors were not correlated with the two factors comprising the spectral parameters, CF and Q40 (F3; 12.9%) and threshold and response latency (F4; 10.0%) (Table 1; see Ref. ). The temporal encoding schemes captured by the three orthogonal temporal factors provide alternative, though not completely independent, means of extraction, representation, and transmission of low repetition rate information.
Spatial differentiation in cortical functional organization can provide insights into principles of local and global information processing. As a primary auditory field, AAF expresses a tonotopic gradient (Figs. 4A, S3A, S3B). Repetitive click train stimuli revealed distinct and non-homogenous spatial distribution patterns for the different temporal response measures embedded in the tonotopic map. Voronoi-Dirichlet tessellation maps (see Materials and Methods) of VS, FR, and CV of ISI for six different repetition rates are shown for spatially smoothed values (weighted least-squares linear regression model; Fig. 4B, 4C, 4D, for raw values, see Fig. S4A, S4B, S4C). The majority of sites had only moderate VS values to low repetition stimuli (blue to green polygons in Fig. 4B). Distinct neuron clusters with high VS (yellow to red polygons) emerged in restricted tonotopic regions (Fig. 4B). These clusters showed persistent and precise spike timing over a fairly wide range of repetition rates (6–22 Hz for Fig. 4B), suggesting the existence of spatially restricted cortical networks with high temporal population fidelity interleaved with regions of low temporal fidelity in the local neuronal population.
The spatial pattern of FR differed markedly from VS (Figs. 4C, S4B). Throughout the entire CF range, high FRs (red polygons) were seen for a range of repetition rates. However, sites with the highest FR substantially shifted with changes in repetition rate. This effect was less apparent for VS. To illustrate these activation shifts more clearly, high FR loci in smoothed maps, categorized as >0.75 of normalized peak activity, are shown in red (gray: sites with <0.75 of normalized FR, Fig. 4C). Even small changes in repetition rate activated a different cluster of cortical sites, i.e., comparable with a place code for low repetition rates.
The CV of ISIs also showed spatial clustering (Figs. 4D, S4C) with distinct regions of high (blue polygons) and low (red polygons) ISI precision that shift spatially with repetition rate although less clearly than for FR. The spatial changes of FR with increasing repetition rate appeared to be more widespread than for either VS or CV of ISI. Unlike FR, the two temporal precision measures showed spatially restricted regions of either high or low values that appeared largely invariant with repetition rate changes.
Similarity analysis (spatial cross-correlation, see Materials and Methods) of the value distributions by cross-correlation as a function of repetition rate difference (Fig. 5A, 5B, 5C) showed that similarity reduction is proportional to the repetition rate difference for all three measures. The steepest decline in FR similarity was seen for small linear rate differences (Fig. 5B) with no significant correlations (−0.15<r<0.15) remaining for rate differences above 10–12 Hz. For VS and CV of ISI, the decline is proportional to the logarithm of the rate difference and steepest for 0.5 to 1 octave repetition rate differences (Fig. 5A, 5C) with a loss of correlation (−0.15<r<0.15) for rate differences above 1.5 to 2 octaves. Large repetition rate differences (>20Hz or >3.5 octaves, respectively) could show a weak anti-correlation between the spatial activation patterns. The logarithmic versus linear difference in the timing- and rate-based spatial pattern changes of click train responses suggests distinct divergence in the shapes of the corresponding RRTFs. This indicates differences among the encoding schemes and what is capture about the repetitive sounds.
Assessment of spatial organization in cortical fields requires rigorous statistical testing. We applied two approaches to determine the presence of spatial clustering for ten temporal and three spectral measures determined for the three hemispheres (Table 2; for spectral and response latency measures, see Ref. ). Spatial analysis for the combined additive or multiplicative information analyses was not pursued due to the analysis-based decrease in the number of valid recording sites.
Spatial autocorrelation, validated with a Monte-Carlo analysis of randomized value assignments, was used to determine global trends of spatial organization (see Materials and Methods). Significant global organization was found for all three spectral parameters (CF, Q40 and response threshold; Table 2). Temporal response or information measures showed less reliable tendencies of global organization. Two hemispheres (111L and 073L) showed significant global organization for response latency and the temporal factor F2 (capturing VS information) and two hemispheres (111L and 073R) showed global spatial organization for FR information and F5 (also capturing FR information). No significant global organization was observed for ISI information, VS max, CV min, and F1 (aligned with ISI information and CV min). Temporal response parameters appeared to be less globally organized than spectral parameters.
A non-significant Geary's C (see Materials and Methods) does not necessarily indicate absence of any spatial organization since spatial heterogeneity within a field leads to the possibility that global spatial auto-correlation may miss local organizations. Therefore, we applied a local analysis that tested the value similarity of each polygon with its direct spatial neighbors. The local similarity measure was validated by Monte-Carlo analysis (see Materials and Methods). The spatial distribution of nine temporal parameters for one hemisphere (111L) is shown in Figure 6. Only three maps showed global organization expressed by significant Geary's C (Table 2): FR information, and temporal factors F2 and F5. In contrast, local spatial organization was found for all 9 maps (Table 2), i.e., a statistically significant proportion of polygons had sufficient numbers of neighboring sites with similar values (Fig. 6, polygons with significant parameter clustering or neighborhood similarity, p<0.05, are marked by white and black dots). Each significant polygon could be considered as the center of a local, functionally similar cluster of neurons. The average spatial extent of such clusters, given the sampling density in these maps, was approximately 200–400 µm. Out of 27 maps constructed for temporal click rate parameters (3 hemispheres, 9 parameters), 20 maps exhibited significant local functional clustering. F1, the factor associated with ISI information, showed local clustering for all three hemispheres (Figs. 6C, ,7),7), although global organization did not reach significance for any of them (Table 2). Both, local and global organization was evident for F2 in two hemispheres (111L and 073L) and for F5 in one hemisphere (111L). The average proportion of polygons with statistically significant neighborhood similarity for repetition rate response was generally <20% (Table 2). By contrast, spectral parameters showed a much higher proportion of polygons with functionally similar neighbors (Table 2). For CF, ~90% of polygons had similar neighbors, and an average of ~33% polygons showed highly similar neighbors for Q40 and response threshold.
A close relationship existed between the measures of local and global organization. Additionally, a hierarchy of the extent of spatial organization emerged across all tested parameters. The average global and spatial indicators of spectral and temporal spatial organization were highly correlated (r2=0.92; y=1.72x−1.65; Fig. 8). The highest degree of spatial organization in AAF was for CF, followed, in descending order, by spectral integration (Q40), and response sensitivity (threshold). Response latency and F2 (VS information) showed the most spatial organization among temporal parameters -both locally and globally- although clearly less than for the spectral parameters. F1 (ISI information) and F5 (FR information) revealed the least spatial organization with F1 showing only local clustering and F5 only global spatial trends. The main conclusion from the spatial analysis is that every investigated parameter can show some form of spatial organization, albeit the degree of spatial order can vary from highly organized pattern, with only shallow gradients extending over several millimeters of cortical space (e.g., CF), to lower order with scattered functional clusters of a few hundred µm diameter (e.g., temporal F1 and F5, see Figs. 6, ,77).
Natural signals, in particular those used for communication, are characterized by low repetition rate or low frequency modulation. In this study, we demonstrated that the cortical neurons use multiple strategies to robustly process low repetition rates –. Low modulation rates dominate temporally encoded auditory cortical activity . Neural coding of low repetition rates develops during AC maturation ,  and in adulthood can be improved by behavioral training –, experience, and hormonal manipulation . Therefore, it is important to understand how low repetition rate sounds are encoded. Past studies focused on either VS or FR as largely alternative means for cortical encoding of slow repetitive sounds , –. Here we considered the contribution of ISIs to the encoding of low repetition rate sounds in AC.
The best click repetition rates for FR (~29 Hz) in this study is in the same range as in a previous study using amplitude modulated signals in the barbiturate-anesthetized cat AAF (27 Hz, see Ref. ). For VS values, the current study (~13 Hz) shows lower values than in the previous study (31 Hz, see Ref. ). These values in the current study are substantially higher than in a previous study using clicks in ketamine-anesthetized cat AAF . The cause for the differences seen between the two studies may underlie difference in the sampling methods because similar anesthetic regimens were used.
Distributions of RRTF filter types can provide insight into temporal coding strategies. Several studies have described RRTF or modulation transfer function filter types , , , ,  with the majority of recording sites in cat AAF (70–90%) revealing band-pass filtering property for VS and FR (see Ref. ). Despite differences in model systems (cat, old and new world monkeys), auditory cortical fields (AI, the anterior field, and others), recording conditions (anesthetized and awake), cortical layers (granular and supragranular layers), stimuli (click trains and amplitude modulated sounds), and classification criteria, the filter type distributions in the majority of those studies are in general agreement with the current study (70–80% were of band-pass type, data not shown).
Different RRTF filter shapes combined with different best repetition rate estimates may be suitable to code a wide range of repetition rates by utilizing temporal and/or rate coding strategies at the single neuron level , , , . However, spike-timing precision and FR are not the only spike-train parameters that can reflect the nature of the stimulus. For visual and somatosensory cortical neurons, it has been shown that they are well equipped to decode stimulus-related information on the basis of relative spike timing and ISI duration , . Our information analysis of auditory cortical neurons also demonstrates an advantage of interval timing over VS and FR in encoding and decoding of low stimulus repetition rates.
All three response measures (spike-time precision (VS), average FR, and ISI) provide information about the presented repetition rates. The amount of ISI information significantly exceeds that of either VS or FR alone. The highest ISI information values encountered here (Fig. 3) approach the theoretical value of ~2.58 bits/stimulus (for consideration of discrimination between 6 repetition rates). However, the average ISI information remains clearly below the maximal value partially due to the use of multi-unit responses with overlapping responses from several neurons and noise contributions (see also Ref. ) as indicated by an information analysis restricted to intervals ≥10 ms. It is clear, however, that, for the average site, a substantial amount of information about the stimulus is being conveyed by other means.
While VS and FR provide fairly little information on their own, both parameters contribute to the CV of ISI and, consequently, to the ISI information content. VS max is positively correlated with ISI information. A high absolute spike-timing precision is advantageous for precisely encoding stimulus-based relative interval durations. FR magnitude is negatively and exponentially correlated to the ISI information. This is not unexpected since higher FRs will result in a higher probability of shorter ISIs that are independent of the stimulus-driven interval statistics, especially in cases of multi-unit recordings as employed here. Such an inverse relationship between FR magnitude and the amount of MI is also found in single neurons of the cat mid-brain  and in visual neural transformation from retina to thalamus in the macaque monkey . In the former study, neurons with small FR magnitude also showed high information content per spike and high feature selectivity. Such feature selectivity by small number of spikes (sparse coding) is found for odor coding in the insect mushroom body  and mammalian olfactory cortex , for song syllable sequence in the song bird premotor area , and for constructing an acoustic image by multiple delays in the echolocating bat AI . Therefore, a close relationship between FR magnitude and ISI information may be a basis of neural coding of communication calls at the primary cortical level. There are several potential benefits to maintaining a temporal code of repetition information at the level of AAF. Temporal information may be more easily transmitted to the following stations that can read out the information via converging projections and precise coincident inputs. Energy consumption may be lower for a low-rate interval code than for an average rate code. Stimulus-locked temporal codes may provide useful information about a task or stimulus that may not be necessary for single discrimination or detection tasks and could be accounted for by rate measures alone as demonstrated in the detection of vibratory stimuli .
The finding that the information related to the three temporal codes project onto orthogonal factors points to some non-redundancy in the different periodicity representation schemes. This is also expressed in the dissimilar spatial distribution pattern and their distinctions in repetition rate dependence. Availability of different encoding schemes may have advantages for signal processing under different conditions and adverse circumstances such as low signal-to-noise ratios, reverberation, variations in sound intensity, or the presence of multiple sound sources that may affect the three codes in different ways. The possibility that different codes, employed concurrently, can provide complementary information has already been demonstrated for natural sounds . A study of the neural ensemble code for stimulus periodicity in the range of the fundamental frequency of vocalizations also demonstrated that a joint code of rate and timing parameters provide more information than either code alone . The observation that the combined FR and ISI information reflects an increased amount of repetition information confirms the representation of non-redundant information by rate and temporal codes also for low repetition rates in AAF.
It should be noted that the three coding aspects discussed here for repetition rate discrimination do not provide a complete picture of low modulation frequency analysis. Recent studies have pointed out additional means to detect and discriminate the waveform shape of slow modulations, relying on more complete analyses of the evoked spike patterns, and their relationship to rhythmic activity , , , .
Stimulus information is distributed across a wide range of cortical neuron types, laminae, and areas. Knowledge of the spatial layout of information processing is important because it can provide crucial insights into the local functional tasks and algorithms , . Several aspects of spatial organization and variability emerged.
The two temporal response measures and FR show different kinds of spatial variations with repetition rate changes. Nearly stimulus-independent sub-regions were observed for VS and CV of ISI. These ‘modules’ with locally confined variations are in contrast to spatial FR patterns that shift over a wider area in a stimulus-dependent manner, more compatible with a rate/place code. This difference in spatial behavior for timing and rate codes is also expressed by a scaling difference. The largest changes in FR map similarity are observed for small, linear repetition rate differences in contrast to small changes on a logarithmic scale for temporal maps. These differences in type and stimulus dependence of the spatial distributions indicate a degree of independence of time and rate codes for periodicity analysis. The factor analysis supports the notion that the three stimulus repetition codes considered here operate somewhat, although not completely, separately. Differences in the spatial behaviors of spike-timing precision, rate, and interval codes as a function of repetition rate can be interpreted as evidence for multiple, concurrent processing streams (or streamlets) embedded within a cortical area.
Stimulus-tolerant spatial features (Figs. 4B, 4D, S4A, S4C) likely reflect specialized and confined anatomical networks , ,  that can support a stable connectional framework for task-specific processing. Separate neuron clusters for either precise or only moderately synchronized spike timing can be a consequence of convergent thalamocortical projection to AAF ,  and local cortical circuits properties  that may be expressions of structurally and functionally distinguishable components of larger, more generally definable processing and projection schemes such as the ‘what’ and ‘where’ streams. Reading out information from stimulus-dependent maps would require broad-range connections, while reading out information from a locally stimulus-independent map could be done through local connections alone.
A novel, quantitative spatial analysis of cortical maps revealed that local clustering of similar functional properties is a general feature of all parameters considered here. Local clustering exceeding the expectations from random parameter distributions were encountered for spectral and temporal parameters in ~80% of the maps. Failure to observe significant clustering in the remaining 20% may be a consequence of the sampling density and the ratio of circumference-to-area of the mapped region with reduced statistical power in cases of high ratios (e.g., hemisphere 073R; Fig. 7).
The proportion of sites that are surrounded by sites with similar properties can vary over a wide range from <10%, for some temporal parameters, to >90% for frequency preference (i.e., CF). Maps with low clustering proportions often have only few, isolated sites with similar surroundings. With increase in the overall clustering proportion confluence of individual clusters to larger modules is observed and, finally, large-scale aggregates, such as the tonotopic organization, are seen for maps with high clustering proportion. This range or hierarchy of spatial order in cortical maps is confirmed by the analysis of global spatial organization through spatial autocorrelation and can now be quantified (e.g., Geary's C) and compared across different areas and modalities. It should be noted that the current analysis methods do not require or rely on the notion of local functional gradients that in previous studies have been the dominant feature in assessing functional topography (e.g., see Ref. , ).
Anatomical studies of AC have revealed that all extrinsic areal connections, whether tonotopic, non-tonotopic, multisensory, or limbic, show a high degree of connectional topography , . Local topographies in convergent inputs create distinct conditions for functional processing and it is, thus, not surprising to see topographic principles expressed by essentially all considered functional aspects in AAF. Similar spatial order is conceivably present in areas outside the core areas although it is currently not clear where they fall along the continuum of a spatial order hierarchy and what the functional parameters are that may be organized in such a way.
Recording conditions used in this study influenced all three measures of VS, FR, and ISIs. Our data were predominantly based on multi-unit recordings since one of the goals was to elucidate the spatial distribution pattern of the different response measures. There are separate loci with either low or high VS. Neuronal clusters with low VS may arise from single neurons within a recorded cluster with precise spike timing at different phases of stimulus period (thus, resulting in only moderate spike timing in multi-units) and/or individually less precise spike timing. Neuronal clusters with high VS are loci of very precise and highly synchronized spike timing that reflect a tight network organization.
FR might also be influenced by the number of neurons in the recorded clusters. Because FR information is not related to FR magnitude, multi-unit recordings do not appear to strongly influence the obtained FR information. On the other hand, the amount of ISI information is negatively correlated with FR magnitude. Therefore, it cannot be determined whether low ISI information is associated with the number of neurons in a recorded cluster and/or the number of spikes.
Finally, high ISI variability may depend on the number of neurons within a cluster, synchronized spike timing among the neurons, and interval variability within single neurons. Low fidelity in any of these aspects may dominate sites carrying low ISI information. Comparisons of single- and multi-unit recordings made for periodic click trains revealed no systematic differences . Furthermore, temporal response properties are mostly independent of CF, thus local disparity in frequency tuning is not likely to strongly effect the temporal response properties . However, multi-unit responses may not simply represent a cluster of single-unit properties. Therefore, single-unit recordings will be necessary in both acute and awake preparations to provide a fuller understanding. The current study provides a more general framework for such future investigations.
Recent studies using functional magnetic resonance imaging or positron emission tomography in humans and macaques suggested that the superior-temporal plane is specific to human speech or macaque species-specific calls over non-specific calls or other sounds –. These fields are located anterior to the primary core fields, and may be a part of an anterior auditory ‘what’ pathway .
The anterior field of AC is found in many different animal models (for review, see Ref. ). Several studies of neural processing of repetition rates or amplitude modulated sounds have indicated that AAF may show higher temporal fidelity than other cortical fields , , . Furthermore, behavioral experiments have suggested that AAF may be a suitable area to study the neural processing of temporal sound aspects and, more generally, may be part of system focused on object-based or ‘what’ properties of the auditory environment . Recently, Bendor and Wang  proposed that the rostral field (R) of marmoset AC dominantly uses a rate code for a particular repetition-rate range (10–45 Hz). While anatomical locations (position relative to AI) of cat AAF and marmoset R are similar, it is not known whether these two fields share similar physiological and anatomical properties.
Overall, the findings suggest local processing specialization within an early cortical station of the ‘what’ pathway, suggesting the presence of subdivisions within more global processing streams. The observation that an interval code allows more discrimination ability of periodicity information than codes either based solely on temporal precision or mean FR may provide a convenient window to assess mechanisms and local tasks implemented in an anterior auditory pathway that emphasizes temporal aspects of sound processing. The observation that local spatial organization, in form of functional mini-modules, may be ubiquitous can guide future attempts to reconcile functional and structural organizational principles  within and across different processing streams.
Experiments were conducted on three hemispheres (two left and one right hemispheres) of two adult female cats. All protocols were approved by the University of California San Francisco Committee on Animal Research in accordance with federal guidelines for care and use of animals in research. Animals were sedated by intramuscular injections of a mixture of ketamine (22 mg/kg) and acepromazine (0.11 mg/kg). After venous cannulation, sodium pentobarbital (15–30 mg/kg) was administered and supplemented as needed throughout the surgical procedure. Following tracheotomy, a craniotomy was performed to expose the ectosylvian gyrus. The dura mater was partially removed, and the cortical surface was covered with viscous silicone oil. Before commencing the electrophysiological recordings, sodium pentobarbital anesthesia was replaced with a continuous intravenous infusion of a mixture of ketamine (2–10 mg/kg/h) and diazepam (0.05–0.2 mg/kg/h) in lactated Ringers (1–3 ml/kg/h). To prevent edema and mucus secretion, dexamethasone (1.2 mg/kg, S.C.) and atropine sulfate (0.04 mg/kg, S.C.) were administered every 12 hours. Since recordings lasted for three to four days, an antibiotic (cephalosporin, 11 mg/kg, I.V.) was administrated to prevent wound infection. Body temperature was monitored and maintained by a water heating pad at 37±1°C. Electrocardiogram and respiration rate were monitored continuously during surgery and recording procedures.
Experiments were conducted in a double-walled, anechoic chamber (Industrial Acoustics, Bronx, NY). Stimuli were delivered by a STAX-54 headphone through a sealed tube into the acoustic meatus contralateral to the studied hemisphere. The system frequency transfer function was flat (±6 dB) up to 14 kHz and rolled off 10 dB/octave at higher frequencies.
Two different stimuli, pure tone bursts and click trains, were presented for measuring frequency response areas (FRAs) and RRTFs, respectively. Pure tone stimuli of 50ms duration (including 3-ms linear rise and fall time) were generated at intervals of 400–750 ms by a microprocessor (TMS32010, 16 bits resolution and 120 kHz digital-to-analog sampling rate). FRAs were mapped by presenting 675 pseudo-randomized tone bursts at 45 different frequencies (3–5 octave range) and 15 sound levels (70 dB range in 5 dB steps). For RRTFs, click trains (monopolar, rectangular pulses of 200 µs duration; 500 ms train duration) were systematically presented 15 times for repetition rates from 1 to 38 Hz (1, 6, 10, 14, 18, 22, 26, 30, 34, and 38 Hz) at sound levels of 82–102 dB SPL (peak equivalent). The relatively high levels were applied to enhance synchrony among the multi-unit responses. For sites with non-monotonic rate-level functions values at the lower end of the range were used. For some recording sites, higher repetition rates were presented (up to 250 Hz).
Parylene- or epoxylite-coated tungsten microelectrodes (Micro Probes, Potomac, MD or Frederic Haer & Co., Bowdoinham, ME) with 0.5–4 MΩ impedance at 1 kHz were used for multi-unit recordings. Single or double microelectrodes were advanced perpendicular to the cortical surface with a hydraulic microdrive (David Kopf Instruments, Tujunga, CA). A video picture of the cortical surface was captured and digitized with a CCD digital camera (Cohu, San Diego, CA). Each recording site was marked on the digitized picture using Canvas software (Deneva, Miami, FL). The marked sites were used to reconstruct tessellation maps of the recording area (see below). Neuronal activity was obtained in thalamocortical recipient layers . Action potentials were amplified and band-pass filtered (0.3–10 kHz; World Precision Instruments, Sarasota, FL, and Axon Instruments, Union City, CA), fed to an oscilloscope, and isolated from background noise with a time/amplitude window discriminator (BAK Electronic, Mount Airy, MD). For FRAs and RRTFs, spikes occurring in the first 50 ms or 550ms, respectively, after stimulus onset were recorded at 10 or 100 µs resolution for the analyses.
Data were analyzed using the MATLAB (Mathwork, Natick, MA) platform. StatView (SAS Institute, Cary, NC) was used for statistical analysis.
Spectral receptive field parameters such as CF, minimum threshold, quality factors, and response latency were measured . Threshold was defined as minimum excitatory SPL, and estimated at 5 dB resolution. CF was defined as the frequency at which a single neuron or neuron cluster produced sound-evoked spikes at threshold sound level. Spectral bandwidths were calculated as CF divided by excitatory bandwidth at 40 dB (Q40) above threshold; the higher the Q-value, the more sharply tuned are the neurons. Latency was determined as the minimum value in the averaged latency-level function at CF and the two adjacent test-frequencies (CF 1/15 to 1/9 octaves). Results for spectral receptive field parameter distributions in AAF were presented elsewhere .
For RRTFs, spike occurrence to the first click was discarded except for the 1 Hz stimulus since it does not contribute to repetition information. Spikes were counted from the second click onset to 550 ms after the first click onset (for 1 Hz stimulus, spikes occurring between the first click onset and 550 ms were used). VS and FR were used to measure temporal following activity , . VS measures how well spikes are synchronized to the clicks relative to the duration of the repetition period:
where n is the total number of spikes, t is time of spike occurrence, and T is the inter-click interval . Significance of synchronization was examined by a Rayleigh test (p<0.001). Bin width was 1 ms. Repetition rate tuning curves were constructed without smoothing across different repetition rates. Best repetition rate was defined as that a repetition rate that evoked the largest response strength for VS or FR. RRTF tuning curves for VS and FR were classified into three filter types. A band-pass filter was assigned when the response peak was flanked by troughs in which the responses drop <75% of the peak . If one of the response troughs did not reach the criterion, then RRTFs were considered to be either a low- or high-pass filter. Although most recordings were made from multi-units, past studies have shown that single- and multi-unit recordings share similar RRTFs or modulation transfer functions , , .
ISIs between two consecutive spikes were measured in the time window of 550 ms with a bin width of 1 ms for each trial and accumulated across all 15 trials. Spike train irregularity in ISIs was estimate based on the CV that was defined as the standard deviation of ISIs divided by the mean of ISIs.
To reconstruct the spatial distribution of receptive field or temporal parameters across the cortical surface, tessellation maps were calculated by Voronoi-Dirichlet tessellation . The polygon surrounding each electrode penetration in the tessellation map characterizes the area assigned to the functional parameter at the recording site. Borders between neighboring polygons were determined from the midpoints of a straight line between adjacent recording points. The value of each receptive field or temporal parameter in the cortical surface map is illustrated by color code.
The MI of the repetition rate carried in the FR was computed based on 15 presentations of the same set of repetition rates. MI analyses were limited to six different repetition rates (1, 6, 10, 14, 22, and 30 Hz for which we obtained data sets for a majority of recording sites). MI between repetition rate f and firing rate fr is given by . In our case, all repetition rates f were presented the same number of times, so that
where Nf=6 was the number of different repetition rates. To account for the fact that MI is positively biased , , the values were linearly extrapolated to infinite dataset limit (i.e., number of repetitions; not to the limit of infinite word length). Extrapolation was done by removing different sets of one, two, three, or four presentations at a time. The final value and its standard deviation was obtained as a result of a linear fit in 1/Nf, each repeated 15 times for different combinations of dropped presentations.
MI between repetition rate and VS was evaluated similarly. VS values were calculated for each stimulus presentation to form distributions of VS values associated with each stimulus periodicity. The MI conveyed by the VS code quantifies how well these distributions (and thus stimulus repetition rates) can be distinguished from each other. Non-significant VS measures were assigned a MI of zero bits/stimulus (Rayleigh test, p>0.001). In the case of information carried by ISIs, the distribution of ISIs P(isi|f) was computed for each stimulus repetition rate f and averaged across repeated stimulus presentations. These information values were then also extrapolated to the infinite dataset size, according to procedures described above.
Additive information values (Code(x) + Code(y) in Fig. 2B) represent the sum of information values computed for each pattern of neural responses separately, with separate extrapolation to infinite dataset size. Joint information values (Code(x) × Code(y) in Fig. 2B) were computed based on joint probabilities of two measures of neural responses, such as VS and FR (Fig. 2B, white bars); extrapolation to infinite dataset size in this case was based on recomputation of these joint probabilities from fractions of the data, and then using a linear extrapolation with respect to the inverse of the dataset size to find the value for infinite dataset size.
The existence of spatial organization for experimental variables was established using two complementary approaches. Spatial autocorrelation, a measure of redundancy, was used to estimate global spatial organization by calculating Geary's C coefficient . C values are based on value differences between pairs of observations and can vary between 0, indicating perfect positive spatial correlation (high spatial uniformity, maximal neighbor similarity), and 2, indicating negative spatial correlation (maximal dispersion, high value contrast between neighbors). Random spatial distribution (the null-hypothesis) results in a C value of 1. In a Monte-Carlo analysis, the statistical significance of the experimental C value was derived from the C-value distribution of 10,000 randomly redistributed map versions.
Local spatial organization was assessed through the value similarity between each polygon and its nearest neighbors. Statistically significant similarity between a polygon and its direct neighbors was determined by comparison with 10,000 redistributions of the neighboring polygon values. The number of significant polygons in a given experimental map was compared to the number of significant polygons in 1,000 randomized maps. The number of significant polygons estimates the proportion of local parameter clusters. Neither of the two tests takes into account where in the map local or global similarities are situated. However, the larger the number of local clusters, the higher is the probability of a confluence of them, increasing global organization and, thus, spatial autocorrelation.
Two examples of RRTFs for VS and FR. (A, B) Poststimulus time histograms for two different sites. Response strength was normalized to the maximum responses at 1 Hz. (C, D) RRTF tuning curves for the same two sites. VS and FR are illustrated by magenta and cyan lines, respectively, and data points are fit by a polynomial cubic spline for illustration. Filled circles are significant VS values by a Rayleigh test (p<0.001). Site identification and CF are shown in (C, D). Gray areas are the repetition rate range for the focus of our study. (A, C) Moderate VS site. FR showed band-pass property with high best repetition rate. VS information: 0.19 bits/stimulus. FR information: 0.33 bits/stimulus. ISI (1 ms) information: 0.34 bits/stimulus. ISI (10 ms) information: 0.74 bits/stimulus. (B, D) Low VS site. FR increased with increasing repetition rates (high-pass property). VS information: 0.08 bits/stimulus. FR information: 0 bits/stimulus. ISI (1 ms) information: 0.58 bits/stimulus. ISI (10 ms) information: 1.17 bits/stimulus.
(0.43 MB TIF)
An example of ISI histograms for six different repetition rates from one site. There is no additional peak corresponding to the period of the stimulus repetition rates. This site is a less common example. VS information: 0.01 bits/stimulus. FR information: 0.02 bits/stimulus. ISI (1 ms) information: 0.15 bits/stimulus. ISI (10 ms) information: 0.45 bits/stimulus.
(0.08 MB TIF)
Smoothed tonotopic gradient and approximate position of AAF. (A) Hemisphere 073L. (B) Hemisphere 073R. Scale bars: 1 mm. See Figure 4's legend for further explanation.
(0.19 MB TIF)
Spatial distributions of population response (without smoothing) to different repetition rates (hemisphere 111L). Spatial representation of raw VS (A), normalized FR (B), and raw CV of ISI (C) as a function of different repetition rates. Scale bars: 1 mm. See Figure 4's legend for further explanation.
(0.86 MB TIF)
We thank Bénédicte Philibert for help with data collection, Craig Atencio for support with the spatial statistics, and Brian Malone for comments on an earlier manuscript version.
Competing Interests: The authors have declared that no competing interests exist.
Funding: This work was supported by National Institutes of Health (NIH) grants DC-02260 and MH-077970 to C.E.S., Veterans Affairs Merit Review to S.W.C., as well as Hearing Research Inc., and the Coleman Memorial Fund. N.J.P. was supported by NIH grant EY019288 and the Pew Charitable Trust. T.O.P. was supported by the Alfred P. Sloan Fellowship, the Searle Funds, NIH grants MH068904 and EY019493, National Science Foundation (NSF) grant IIS-0712852, the McKnight Scholarship, the Keck and Ray Thomas Edwards Foundations, and the Center for Theoretical Biological Physics (NSF PHY-0822283). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.