|Home | About | Journals | Submit | Contact Us | Français|
The auditory system encodes time with sub-millisecond accuracy. To shed new light on the basic mechanism underlying this precise temporal neuronal coding, we analyzed the neurophonic potential, a characteristic multiunit response, in the barn owl’s nucleus laminaris. We report here that the relative time measure of phase delay is robust against changes in sound level, with a precision sharper than 20 µs. Absolute measures of delay, such as group delay or signal-front delay, had much greater temporal jitter, for example due to their strong dependence on sound level. Our findings support the hypothesis that phase delay underlies the sub-millisecond precision of the representation of interaural time difference needed for sound localization.
The barn owl is well known for its superb sound-localization capabilities (Bala et al. 2003; Carr and Konishi 1990; Gerstner et al. 1996; Keller et al. 1998; Kempter et al. 2001; Koppl 1997; Moiseff and Konishi 1981; Pena et al. 1996; Sullivan and Konishi 1984, 1986; Viete et al. 1997). The owl uses interaural time difference (ITD) to encode sound azimuth with a behavioral accuracy of <10 µs (Bala et al. 2003) and a neuronal sensitivity of 25–100 µs (Bala et al. 2003; Moiseff and Konishi 1981). The highest overall monaural temporal sensitivity has been measured in the third-order nucleus laminaris (NL) where binaural convergence creates tuning to ITD (Carr and Konishi 1990; Reyes et al. 1996; Schwarz 1992; Sullivan and Konishi 1986). The NL exhibits a characteristic frequency-following multiunit response termed the neurophonic (Schwarz 1992; Snyder and Schreiner 1984; Sullivan and Konishi 1986), which we used to study timing. The neurophonic well represents both monaural and binaural temporal sensitivity (Sullivan and Konishi 1986).
Earlier measurements of the conduction time from the ear to NL have found values between 2 and 3 ms (Carr and Konishi 1990). Sullivan and Konishi (1984) already mentioned the importance of phase but did not quantify temporal precision. Koppl (1997) quantified temporal precision in the second-order nucleus magnocellularis that provides input to the nucleus laminaris but found that the response delay depended on sound level. The rate of change amounted to ~5 µs/dB around the characteristic frequency. Taking into account that the range of interaural level differences experienced by barn owls is ~20 dB (Keller et al. 1998; Viete et al. 1997), how is it possible that the ITD can be represented in NL of owls with a neuronal precision much sharper than 100 µs?
The auditory system encodes delay both as conduction time, an absolute time measure (Fitzgerald et al. 2001; Goldstein et al. 1971; Ruggero 1980) and, by phase locking, a relative time cue (Anderson et al. 1971; Carr and Konishi 1990; Koppl 1997; Reyes et al. 1996; Sullivan and Konishi 1984, 1986). Absolute and relative time codes are also known in physics, where a distinction is made between group velocity and phase velocity, thus resulting in group and phase delays (Anderson et al. 1971; Fitzgerald et al. 2001; Goldstein et al. 1971; Koppl 1997; Ruggero 1980). While group delay describes the latency of the envelope of a band-pass-filtered signal, phase delay refers to the times of occurrence of its peaks and troughs. The high-frequency limit of group delay, the signal-front delay, has also been used as a measure of delay (Fitzgerald et al. 2001; Ruggero 1980). We analyzed data obtained from the NL of the barn owl to determine which measure of delay was suited for precise and level-independent representation of ITD.
Nine barn owls (Tyto alba pratincola) were used in this study. The procedures conformed to National Institutes of Health guidelines for animal research and were approved by the animal care and use committee of the University of Maryland. In contrast to earlier studies, the analogue waveform of the neurophonic potential in or close to NL was recorded at a sampling period of 20.8 µs with commercial, Epoxylite-coated tungsten electrodes (Frederick Haer, Brunswick, ME) with impedances of 2–8 MΩ. Neurophonic recordings had the advantage of being stable for ≥1 h and allowed measurements of local multiunit activity. Specific recording sites were defined by combining stereotaxic techniques, physiological characterization, and histologically verified lesions.
Acoustic stimuli (clicks and noises) were digitally generated by custom-written software (“Xdphys” written in Dr. M. Konishi’s lab at the California Institute of Technology, Pasadena, CA) driving a signal-processing system (Tucker Davies Technology, Gainesville, FL). Clicks had a rectangular form of varying intensity [0 dB (corresponding to 65 dB SPL) to 40-dB attenuation] and a duration of two samples (equivalent to 41.6 µs). Only condensation clicks were used. The standard click had 0 dB attenuation.
Neurophonic responses to clicks were recorded in the 3.5- to 7-kHz region of the tonotopically organized NL. The spontaneous activity (10 ms before click presentation) as well as the driven activity (10 ms after click presentation) were stored. Clicks were repeated 128 times (Fig. 1A). The driven activity contained an oscillatory response (Fig. 1, A and B). Its envelope increased smoothly within ~1 ms and fell off almost symmetrically. The oscillation under the envelope typically exhibited a complex waveform containing several spectral components. Fourier analysis showed that one or two components were <2 kHz (Fig. 1C). Another component was close to the best frequency as obtained from iso-level frequency response curves. Because we wanted to study processes related to frequency tuning, only the high-frequency component was analyzed. Therefore the neurophonic potential was high-pass filtered to reveal the oscillation of the high-frequency component alone (Fig. 1D). Auditory filtering is well described by gammatone functions and their derivatives (Irino and Patterson 2001; Tan and Carney 2003). Thus the high-pass filtered click-evoked response was fitted with a Gammatone function of order 3 (Fig. 1D).
In the click-evoked response, typically several peaks and troughs could be distinguished. We considered only local extrema occurring after stimulus presentation and having an amplitude greater than the mean plus 2 SDs of the background noise for at least three consecutive data points. The latency of the first extremum detected in this way was the signal-front delay (Fig. 1D). The group delay was assigned to the maximum of the envelope of the Gammatone function (Fig. 1D). The extremum closest to the group delay, determined in the first of the 128 trials that was within 1 SD of the average group delay, was chosen for the phase delay (Fig. 1D). To estimate the variability or “jitter” of each type of delay, we used the respective extrema in each of the 128 traces (Fig. 1A). The SD of these delays determined over the 128 repetitions was taken as a measure for the variability and was plotted as a data point in a histogram (Fig. 1E). In our sample of 176 data sets, the signal-front delay had the largest jitter (median: 500 µs), whereas the jitter in the phase delay was smallest (median: 10.4 µs). The jitter in the group delay (median: 64 µs) was between these two extremes. Thus the temporal precision necessary for ITD coding (Bala et al. 2003; Moiseff and Konishi 1981) could be achieved with the phase delay and the group delay but not with the signal-front delay. Therefore we did not consider the signal-front delay further.
In a second experiment, we decreased click amplitudes ≤40 dB. As is typically observed in audition, the group delay increased as the stimulus level decreased (Fig. 2). In the example shown in Fig. 2A, however, peak number 4 at 0 dB attenuation coincided with peak number 3 at 20-dB attenuation and with the barely visible peak number 1 at 40-dB attenuation (vertical line in Fig. 2A). Note that the definition of phase delay allows for a jumping between subsequent peaks (Fitzgerald et al. 2001). In 72 data sets obtained from 43 recording sites, phase delay remained essentially constant with a narrow distribution around one sample point (mean: −3 µs, Fig. 2B). In contrast, group delay increased ≤0.6 ms when click level was reduced by 20 dB (Fig. 2B), in agreement with the result in Carr and Konishi (1990).
The low variability and the level invariance of phase delay implied that this delay measure would be the most reliable code for representing the behaviorally relevant interaural time difference (ITD). The “best ITD” at a given recording site may be computed from two phase delays obtained through monaural stimulation (Sullivan and Konishi 1986): we subtracted the phase delay for stimulation of the left ear from the phase delay for stimulation of the right ear. This subtraction lead to a “best ITD” that was independent of interaural level difference (ILD) (compare Fig. 3, A for 0 dB ILD with B for 10 dB ILD). On the other hand, ITDs between two group delays depended on the ILD. Even though an ITD computed from the group delay may be zero for 0 dB ILD (Fig. 3A), it changed significantly for 10 dB ILD (Fig. 3B).
Because the delay differences obtained from monaural responses in Fig. 3, A and B, only indirectly represented the binaural situation, we also tested the level dependence of ITD tuning with binaural stimuli in a few (n = 6) cases. In Fig. 3C, we plotted the response at a given recording site as a function of the ITD of binaural noise in contrast to our previous results obtained through neurophonic potentials in response to monaurally presented clicks. The ITD tuning curves for ILDs of 0 and 10 dB were virtually identical in agreement with previous findings that ITD tuning for binaural stimulation is stable under varying conditions of stimulus level (Pena et al. 1996; Viete et al. 1997). Thus the level tolerance of ITD tuning as shown in Fig. 3C independently demonstrates the importance of phase delay.
Thus phase delay, and not group delay or signal-front delay, appears to underlie ITD tuning. Single-unit recordings from auditory nerve and cochlear nucleus support this conclusion (Koppl 1997; Sullivan and Konishi 1984). Likewise, auditory nerve fibers of squirrel monkeys show very low phase-delay jitter when stimulated with sinusoids near the characteristic frequency (Anderson et al. 1971).
The level independence of phase delay is consistent with a variety of filters proposed for peripheral auditory processing (Irino and Patterson 2001; Tan and Carney 2003). The remarkable stability of phase delay is consistent with the model of Gerstner et al. (1996) and Kempter et al. (2001), who predicted that during development synapses from NM to NL and axonal arbors from NM to NL are selected in such a way that phase delays are similar. These authors also argued that only such a selection allows for using phase delay to code temporal information in the NL and to represent ITD. Note that this conclusion is in line with the existence of a neurophonic potential in adult animals when it is assumed that the neurophonic potential is typically the summed response of an ensemble of magnocellular axons. A coherent summation of responses of different axons is only feasible when we have a coincident arrival of volleys of phase-locked spikes at the borders of NL and a coherent transmission of spikes through the nucleus. In other words, theory predicts that phase delays in different magnocellular axons must be similar.
Timing is important in many neuronal systems. It plays a role in models of learning (spike-timing-dependent plasticity) in feature binding as well as in precise reactions to dynamic stimuli—such as approaching targets. To compare the precision of the different systems, a temporal quality factor is helpful. The coefficient of variation (CV), defined as the quotient of SD and mean, may be a good criterion. In our example, the CV for the phase delay is ~0.01. In systems such as the visual cortex, temporal jitter is much larger (Bair et al. 2002; Bisley et al. 2004) (CV ~ 0.1), whereas values similar to the CV observed in the owl’s NL are found in the electrosensory system (Carr et al. 1986) and the auditory system of bats (Covey and Casseday 1991) and may be computed from synfire chains (Abeles et al. 1993).
M. Knepper and E. Smith helped with measuring and analyzing click responses. R. Schätte made helpful comments on the manuscript.
This research was sponsored by the German Research Foundation (DFG, Wa-606/12, Ke-788/1–3) and by National Institutes of Health Grants DC-000636 to C. E. Carr and by P30 04664.