The results are based on recordings from 182 auditory nerve fibers in nine cats. Responses to SAM and transposed tones were collected from 39 high-CF fibers in four cats (CF range: 2.9–13.5 kHz, median 5.7 kHz). These data include 119 level series for matched pairs (same fc and fm) of SAM and transposed tones. Responses to low-frequency (f ≤1,000 Hz) pure-tones were recorded from 70 low-CF fibers from these four and five additional cats (CF range: 170–1,600 Hz, median 740 Hz). Overall, the data include 5.1% low-SR, 37.2% medium-SR, and 57.7% high-SR responses. Responses to pure tones >1,000 Hz were also recorded from an additional 73 higher-CF fibers (CF range: 1–3.3 kHz). These higher-frequency data are used only in and .
In the following sections, we first compare phase locking to pure tones, SAM tones, and transposed tones presented at the fiber’s CF using the synchronization index (SI) measure. Then, we compare phase locking for on- and off-CF presentations of SAM and transposed tones. Last, we examine alternate, autocorrelation-based metrics to quantify phase locking to each of the three stimuli.
Phase locking to pure, SAM, and transposed tones at CF
A set of period histograms measured as a function of stimulus level in response to a pure tone, a transposed tone, and a SAM tone () are shown in . Responses to the pure tone are recorded from a low-CF fiber (385 Hz), whereas responses to the SAM and transposed tones are from a high-CF fiber (7,000 Hz). The modulation frequency of the transposed and SAM tones (250 Hz) roughly matches the frequency of the pure tone (354 Hz).
At low stimulus levels, responses to both SAM and transposed tones are restricted to approximately half of the stimulus cycle, indicating a high degree of phase locking to the envelope. As the level increases, spike discharges gradually spread to the other half-cycle, indicating a decrease in phase locking, consistent with previous reports for SAM tones (
Cooper et al. 1993;
Javel 1980;
Joris and Yin 1992;
Smith and Brachman 1980). However, for each level, responses to the transposed tone are restricted to a smaller part of the stimulus cycle than responses to the SAM tone. Although the firing rates in response to both SAM and transposed tones saturate at 50 dB SPL (), phase locking reaches a maximum at lower levels, consistent with previous observations.
For the low-frequency pure tone, the fiber’s response occupies virtually the same portion of the stimulus cycle from threshold (20 dB SPL) ≤60 dB above threshold (80 dB SPL), well beyond the point where the rate begins to saturate (). Thus consistent with previous observations (
Johnson 1980), this medium-SR fiber phase locks to the pure tone over a wider range of levels than the rate-based dynamic range.
These observations are reflected in the synchrony level functions derived from the period histograms (). For the transposed and SAM tones, the SI is highest near threshold and falls dramatically with increasing stimulus level. For each stimulus level, SI is larger for the transposed tone than for the SAM tone. Synchrony to the pure tone remains fairly stable with level. Near threshold, the synchrony values are similar for the pure tone and the transposed tone.
shows synchrony-level functions for pure, transposed, and SAM tones for the entire data set. Responses to SAM and transposed tones with all modulation frequencies are included, as well as pure-tone responses for all frequencies <1,000 Hz. Sound level is expressed in decibels with respect to rate threshold to facilitate comparison between fibers. For the most part, responses are well separated based on the stimulus type, even though data from different frequencies and modulation frequencies are pooled: low-frequency fibers phase lock with the greatest precision to pure tones, then high-frequency fibers to transposed tones, followed by high-frequency fibers for SAM tones. Near threshold, fibers tend to show similarly high phase locking to transposed tones and pure tones, although synchrony falls more rapidly with level for transposed tones than for pure tones. Synchrony to SAM tones is poorer near threshold and degrades faster with level than for either transposed or pure tones.
With a few exceptions, the sound pressure level of the maximum synchrony occurred between rate threshold and rate saturation for all stimuli, with an average (±SD) of 15.9 ± 10.3, 5.2 ± 8.7, and 2.5 ± 7.1 dB above threshold for pure, transposed, and SAM tones, respectively. Our average for SAM tones is lower than the value of 10 ± 5.0 dB with respect to threshold given by
Joris and Yin (1992), although the rather coarse (10-dB) spacing between adjacent stimulus levels in our experiments makes a precise assessment difficult. Further, the difference could be partially accounted for by clarifying how sound pressure level of the SAM tone is defined: root-mean-square of the waveform versus the peak of the waveform envelope, as well as possibility of different definitions of rate threshold. Consistent with previous observations for SAM tones (
Joris and Yin 1992), the SI to SAM and transposed tones decreased monotonically above the level of maximum synchrony and sometimes leveled out at higher levels.
To quantify the fall of synchrony with level for each stimulus, a regression line was fit to the decreasing portion of each SI-level function above the maximum. Only responses containing at least three statistically significant (
P < 0.001) SI values based on the Rayleigh statistic were used in this analysis. The upward sloping portions of the SI level functions were not fit because they typically contained fewer than three data points. The mean downward slopes (±SD) were −0.0036 ± 0.0043/dB for pure tones, −0.0064 ± 0.0028/dB for transposed tones, and −0.0103 ± 0.0026/dB for SAM tones. Our slope estimates for SAM tones are similar to those (−0.012 ± 0.003/dB) given by
Joris and Yin (1992). The thick lines in show the mean slopes for the three stimulus types. The slope relating phase locking to level is almost two times shallower for pure tones than for transposed tones. In addition, phase locking to transposed tones is always greater than phase locking to SAM tones for both the average line and the majority of the data. ANOVA revealed a highly significant effect of stimulus type on the slopes [
F(2,202) = 71.5,
P < 0.001]. Post hoc multiple comparisons with Bonferroni adjustment indicates that slopes for each stimulus are significantly different from slopes for the other two stimuli (
P < 0.05).
Although SI-level curves for the three stimuli are fairly well separated (), synchrony to SAM and transposed tones is expected to vary with modulation frequency. As modulation frequency increases, the sidebands are increasingly attenuated by the cochlear filter centered at the CF, thereby decreasing the modulation depth of the mechanical input to the hair cells. In addition, there appears to be a temporal limitation in the ability of AN fibers to phase lock to high modulation frequencies (
Greenwood and Joris 1996;
Joris and Yin 1992). We therefore examined the dependency of phase locking on modulation frequency for each stimulus type (). Following
Johnson (1980), we use the maximum SI (SI
max) over all stimulus levels as a measure for comparison. For low modulation frequencies (
fm ≤250 Hz) of the transposed tone, SI
max is comparable to that for low-frequency pure tones at 250 Hz. However, synchrony to transposed tones begins to degrade for modulation frequencies between 250 and 500 Hz, whereas phase locking to pure tones does not start to degrade until 1,000 Hz. SI
max is always lower for SAM tones than for transposed tones, although the rate of decrease at higher modulation frequencies is similar for both stimuli.
Our finding that SI
max is greater for pure tones than for transposed tones >250 Hz might be confounded by the fact the tone-burst stimuli used by
Tsai and Delgutte (1997) (on which most of our pure-tone data are based) had shorter durations and higher repetition rates than those of our SAM and transposed tones. However, this is unlikely because a two-way ANOVA comparing SI
max for short tone bursts (100 ms; Tsai and Delgutte,1997) and long-duration tones (15–30 s;
Johnson 1980) revealed neither a significant main effect of duration [
F(1,442) = 0.19,
P = 0.66] nor a significant interaction between duration and frequency [
F(6,442) = 1.38;
P = 0.22].
Both the maximum synchrony and the slope of the synchrony-level function above the maximum may depend on modulation frequency. SI-level functions for transposed and SAM tones are shown in separately for each modulation frequency. Data at fm = 1,000 Hz are not shown because few fibers were studied at this modulation frequency. suggests that the shapes of SI-level functions depend on both the stimulus type and the modulation frequency and that the two variables interact. For low modulation frequencies (≤125 Hz), synchrony falls at a faster rate with level for SAM tones than for transposed tones, as was the case for the pooled data in . However, as modulation frequency increases to 250 Hz and, especially, 500 Hz, the synchrony-level functions for the two stimuli become more parallel. Although the slopes at 500 Hz are similar, the two set of curves remain well separated. These results are confirmed in , which shows the mean slopes of regression lines fit to the SI-level functions as a function of modulation frequency for all three stimuli. Excluding the limited data at 1,000 Hz, the mean slope for SAM tones tends to increase with fm, whereas the mean slope for transposed tones decreases with fm, so that the two curves converge at 500 Hz. A two-way ANOVA of slopes with stimulus type (SAM and transposed tones) and fm as factors indicates that their interaction is highly significant [F(4,128) = 10.87, P < 0.001]. The mean slopes for transposed tones are always more negative than those for their pure-tone counterparts, except again for fm = 1,000 Hz, where few data points are available.
For both pure and SAM tones, it has been reported that low-SR fibers tend to phase lock with greater precision than medium-SR fibers, which, in turn, phase lock with more precision than high-SR fibers (pure tones:
Johnson 1980; SAM tones:
Joris and Yin 1992;
Joris et al. 1994). Spontaneous activity adds a DC offset to the period histogram by contributing spikes that are not locked to the stimulus and is thus expected to decrease synchrony. To examine whether phase locking to transposed tones also depends on SR, shows SI
max as a function of
fm separated according to two SR groups (
Liberman 1978): high SR (>18/s) and low/medium SR (<18/s). We did not have data from enough low-SR fibers to analyze them separately. For all three stimuli, there is a clear trend for responses from low/medium-SR fibers to have greater SI
max than responses from high-SR fibers. This observation is confirmed by two-way ANOVAs, with
fm and SR group as factors, which reveal highly significant main effects of SR group on SI
max for all three stimuli [pure tones:
F(1,117) = 32.4,
P < 0.001; transposed tones:
F(1,58) = 37.9,
P < 0.001; SAM tones:
F(1,58) = 31.7,
P < 0.001].
Whereas the instantaneous firing rate in response to a low-frequency pure tone falls below the SR during one half of each stimulus cycle (
Johnson 1980), this is not expected to be the case for high-frequency modulated sounds such as SAM and transposed tones, neglecting possible effects of adaptation (see DISCUSSION). Responses to transposed tones and pure tones would therefore be expected to become less similar with higher spontaneous activity. To further analyze the interaction between SR and stimulus type, we ran a two-way ANOVA on SI
max with SR group and stimulus type (pure or transposed tone) as factors. To minimize the confounding effect of frequency, the analysis included pure-tone data only for frequencies <800 Hz and transposed-tone data for
fm ≤250 Hz. As expected, the analysis showed significant main effects of SR group and stimulus type (both
P < 0.001). Post hoc multiple comparisons using Bonferroni corrections show that SI
max to pure tones is significantly greater than SI
max to transposed tones for the high-SR group (
P < 0.001), but not for the low/medium-SR group (
P > 0.05). Thus temporal discharge patterns to transposed tones are similar to pure-tone responses only for low/medium-SR fibers, and only at low stimulus levels and low modulation frequencies where the phase locking to transposed tones is maximum.
Off-CF responses to SAM and transposed tones
Thus far, we have described phase locking of ANF responses to SAM and transposed tones when the carrier frequency was at the fiber’s CF. However, these stimuli will excite fibers over a range of CFs, particularly at higher stimulus levels where the cochlear excitation patterns become broad. If most of the fibers with CFs near the carrier frequency are saturated, so that phase locking to the envelope becomes weak, temporal information about the modulation frequency may be conveyed primarily by fibers with CFs far from the frequency band where the stimulus has the most energy, i.e., far from the carrier frequency. Therefore it is important to characterize phase locking when fc is presented off the CF.
In general, responses to SAM and transposed tones with carrier frequencies both above and below CF broadly resemble on-CF responses. However, some noticeable differences do exist. shows response patterns of a fiber as a function of level to transposed tones with carrier frequencies at the CF (6,920 Hz), below the CF (5,603 Hz), and above the CF (7,490 Hz). For all three carrier frequencies, the period histograms show the same trend in that the fraction of the stimulus cycle occupied by spike discharges increases with stimulus level, implying a degradation in phase locking. Average discharge rates for this low-SR fiber do not saturate completely, even at high levels (). As expected, phase locking begins to deteriorate at lower levels for the on-CF stimulus than for the off-CF stimuli because the threshold is lower when fc is at the CF. The shapes of the period histograms for stimuli below CF resemble those at the CF. However, the period histograms for stimuli above CF clearly show two modes within each stimulus cycle at lower stimulus levels, unlike the on-CF and below-CF histograms. Such bimodal period histograms may result from the sharp transients occurring twice in each stimulus cycle at times when the envelope of the transposed tone shows a slope discontinuity (), which would be expected to induce ringing in the cochlear filters. Despite these complex histogram shapes, shows that the synchronization index decreases monotonically with the level beyond a maximum for all three carrier frequencies, and the slopes are similar. However, the maximum synchrony is somewhat greater on the CF than either above or below the CF.
Period histograms showing two or more peaks per stimulus cycle were automatically identified from the presence of multiple zero crossings in the numerical derivative of the autocorrelation of the period histogram, which is considerably smoother than the raw period histogram. Multimodal period histograms were seen in response to transposed tones in six of 22 fibers (27%) for carrier frequencies below the CF and in five of 27 fibers (19%) for carrier frequencies above the CF, but were never observed at the CF. Multiple peaks in responses to SAM tones were less prevalent and occurred in only one of 22 fibers (5%) for carrier frequencies below the CF and in four of 27 fibers (15%) for carriers above the CF. These observations are consistent with oscillations seen at the onset and, sometimes, offsets in response to tone bursts with abrupt onsets (
Kiang et al. 1979;
Rhode and Smith 1985). Such oscillations, which are observed only when the stimulus is presented in the steep gradients of the tuning curve above and below the CF, have been attributed to the ringing of the cochlear filters. Although ringing of the cochlear filters is characterized by oscillations at the characteristic frequency in low-CF fibers, only the envelope of the ringing can be seen in the present study because the CFs of the neurons studied with SAM and transposed tones were always above the limit of phase locking. According to this interpretation, multimodal discharge patterns are more prevalent for transposed tones that for SAM tones because the transposed tones have more abrupt transients.
shows synchrony-level functions for SAM and transposed tones with carrier frequencies at, above, and below the CF for the entire data set. To facilitate comparisons, stimulus level is expressed relative to the level of maximum synchrony at each carrier frequency. For all three ranges of carrier frequencies, the SI-level curves generally decrease monotonically for levels above the maximum and the slopes are similar. Precise trends, however, are hard to identify in because of the large variability between fibers. To quantitatively compare the level dependency of SI on and off CF, for each fiber, we computed the difference between each off-CF SI-level curve and the corresponding on-CF curve. The resulting change in SI (ΔSI) level functions () are generally flat or just slightly increasing for both SAM and transposed tones. Flat ΔSI-level functions indicate that on- and off-CF SI-level functions have the same slopes, whereas sloping ΔSI-level functions indicate that SI decreases with level at different rates on and off the CF.
These effects were quantitatively analyzed by fitting a regression line to the entire set of ΔSI-level functions separately for each stimulus type (SAM and transposed) and each carrier frequency range (above and below the CF). The slopes and intercepts of the four best-fitting lines are given in . Because the ΔSI-level functions are plotted with respect to the level maximum, the intercept gives an estimate of the difference in SImax off and on CF. Statistical analyses (t-test) show that the intercepts for SAM tones above CF and transposed tones below CF were significantly negative (P < 0.05), indicating a lower SImax for these two conditions compared with when the carrier is at the CF. On the other hand, the intercepts did not significantly differ from zero for SAM tones below CF and for transposed tones above CF, indicating similar SImax on and off CF in these conditions. The slopes of the best-fitting lines to the ΔSI-level functions did not, in general, differ from zero (), indicating that the rates of decrease of synchrony with level are similar on and off CF. The one exception is for SAM tones above the CF, where the slope was significantly positive, indicating that synchrony decreases more rapidly with level for SAM tones above CF than for SAM tones at CF. Despite these statistically significant differences, the overall effects are small and the general pattern of results suggests that the dependency of synchrony on level does not greatly differ on and off CF for either stimulus.
| Table 1Slopes and intercepts of ΔSI-level functions for SAM and transposed tones |
Phase-locking analysis with autocorrelation
Although the synchronization index is an appropriate measure of phase locking to a periodic stimulus when the period histogram is unimodal (as was largely the case in our data), it is a poor model of computations performed by the CNS because it necessitates a priori knowledge of the stimulus period. Recently,
Louage et al. (2004) proposed alternative, autocorrelation-based measures of phase locking that capture the intrinsic periodicities in the spike responses without requiring prior knowledge of the stimulus. These measures make particular sense in the present context because interaural cross-correlation is a key component of most models for ITD processing. This section compares phase locking, as measured by SI to autocorrelation-based measures, and examines the extent to which these phase-locking metrics are correlated and can be used interchangeably.
shows the shuffled autocorrelogram (SACs; see METHODS) as a function of stimulus level for the same data that were analyzed with period histograms in . For all three stimuli, the SACs show multiple modes separated by the stimulus period, indicating phase locking to the stimulus. In general, each cycle of the SAC is more symmetric around its maximum than the corresponding period histogram in . For SAM and transposed tones, the increasing width of each SAC mode as level increases parallels the spread of the period histograms to an increasingly large fraction of the stimulus cycle. SACs for the pure tone do not broaden as appreciably with level as SACs for SAM and transposed tones. To quantify phase locking, we first use the height of the largest mode in the normalized SAC, called “peak height” in short (
Louage et al. 2004). The peak height varies from 1 for randomly distributed spikes (no phase locking) to arbitrarily large values if phase locking is very precise. Peak-height versus level functions () show similarly strong phase locking to the pure tone and the transposed tone at low levels, but a faster decrease in peak height above the maximum for the transposed tone than for the pure tone. In addition, for each level, the peak height for the SAM tone is always lower than that for the transposed tone. These trends parallel those seen with the synchronization index measure in . Phase locking was also quantified by the “half-width,” the width of the SAC at 50% of the peak height (
Louage et al. 2004). The half-width is normalized by the stimulus period to facilitate comparison of responses to stimuli with different frequencies. The normalized half-width varies in the opposite direction of the peak height, from 0 for ideally precise phase locking to 1 for randomly distributed spikes. Half-widths are similar for the pure tone and the transposed tone at low levels in this example (). At higher levels, half-widths for the SAM tone and the transposed tone increase considerably, reaching 1 for the SAM tone, but stabilize near 0.3 for the pure tone. These trends parallel the observations made with the peak height and the synchronization index. Thus for these examples, the three measures of phase locking are highly correlated.
shows a scatterplot of synchronization index against SAC peak height for the entire data set. Responses to SAM and transposed tones at all modulation frequencies are included, as well as pure-tone responses for frequencies <1 kHz. Despite including data from different frequencies, there is a clear, monotonic relationship between SAC peak height and synchronization index for each stimulus type. The data were fit using a hyperbolic function
where SI is the synchronization index, PH is the SAC peak height, and a is the only free parameter. This equation has the simplest form that gives both PH = 1 when SI = 0, and PH = ∞ when SI = 1. Three different versions of the model were tested, in which either the same curve (i.e., the same value of the parameter a) was fit to all the data, or a curve (a value of a) was fit separately for each stimulus type, or the SAM and transposed tone data were grouped together, while keeping the pure-tone responses separate. The model that minimized the variance of the residuals while using the smallest number of parameters was the last one, in which the transposed and SAM data share the same value of a, but not the pure tone data. This model performed significantly better than the single-curve model [F(1,1088) = 976.0, P < 0.001], while not performing significantly worse than the separate-curves model [F(1,1087) = 3.079, P < 0.080]. This best model (thick lines in ) accounted for >98% of the variance in the data, confirming the tight relationship between peak height and synchronization index. The best-fitting parameter a was 0.316 for pure tones and 0.460 for SAM and transposed tones. Because the relationship between peak height and synchronization index is the same for SAM tones and transposed tones, all conclusions based on the SI remain valid if we use the peak height instead for these two stimuli.
To verify that similar trends hold whether the SAC peak height or the SI is used as the phase-locking metric, shows the level-maximum peak height (PH
max) as a function of frequency or
fm, which can be compared with SI
max in . We do not show slope data comparable to for the peak height because the level dependency of peak height could not be fit by a straight line. As observed with the synchronization index metric (), PH
max for transposed tones at modulation frequencies <250 Hz is similar to pure-tone PH
max at 250 Hz and is higher than PH
max for SAM tones. However, the two metrics differ in that PH
max for pure tones continues to vary considerably with frequency <1,000 Hz, whereas the SI is stable in that frequency range (see also
Louage et al. 2004). The PH
max curve for pure tones also shows an unexpected dip at 1,000 Hz. In addition, the PH
max curves for SAM and transposed tones are less parallel to each other than the corresponding SI
max curves. In general, the PH
max measure tends to amplify the differences between stimulus conditions relative to SI when phase locking is strong. These differences are likely to reflect the compressive nature of the SI, which cannot exceed unity, whereas peak height can in principle reach arbitrarily large values. Despite these differences, the general trends of the effect of frequency on maximum phase locking across stimulus types hold for both metrics.
Although the SAC peak height measures the extent to which spikes occur in a certain temporal relationship with respect to each other across stimulus presentations, the half-width measures the temporal precision with which spikes occur at these preferred intervals. For periodic stimuli, the two metrics are expected to be negatively correlated because precise phase locking implies both a large peak height and a narrow half-width. However, the two measures might not be perfectly correlated if the relation between the two depended on the stimulus wave shape. shows a scatterplot of peak height against normalized half-width for the same data as in . The figure shows a clear, monotonically decreasing relationship between the two metrics. The data were fit by a hyperbolic function
where
HW is the normalized peak half-width and
b and
c are free parameters.
2 As in , different versions of the model were tested by grouping the responses to the three stimuli in various ways. We chose the model in which the responses to all three stimuli are grouped together (i.e., the values of
b and
c are constrained to be the same for all three stimuli). The best-fitting parameter values for this model were
b = 1.369 and
c = 0.857. Although the model with three separate curves fit the data significantly better than the single-curve model [
F(4,1085) = 490.5,
P < 0.001], we prefer the simpler single-curve model because the fraction of the variance accounted was only slightly greater for the three-curve model than for the single-curve model (93.62% vs. 93.30%) and the parameter estimates were essentially identical for all three curves in the more complex model. Thus the three phase-locking measures (synchronization index, peak height, and peak width) seem to be essentially equivalent for SAM and transposed tones, and the relationship between peak height and half-width is the same regardless of the stimulus. On the other hand, the relationship between peak height and synchronization index differs somewhat for pure tones compared with the other two stimuli. However, the differences in this relationship are fairly small and do not seem to substantially alter conclusions based on the traditional SI measure ( and ).