|Home | About | Journals | Submit | Contact Us | Français|
The dynamics of sound localization were studied using a free-field direct localization task (pointing to sound sources) and an observer-weighting analysis that assessed the relative influence of each click in a click-train stimulus. In agreement with previous studies of the precedence effect and binaural adaptation, weighting functions showed increased influence of the onset click when the interclick interval (ICI) was short (<5 ms). For longer ICIs, all clicks in a train contributed roughly the same amount to listeners’ localization responses. Finally, when a short gap was introduced in the middle of a train, the influence of the click immediately following the gap increased, in agreement with the “restarting” results obtained by Hafter and Buell
Sound localization in the natural world is based on a variety of cues including interaural-time (ITD) and -level (ILD) differences, as well as spectral cues produced by the direction-dependent filtering of sound by the head, shoulders, and pinnae. The interaural differences provide cues to azimuth (direction in the horizontal plane), whereas spectral cues, characterized by the directional transfer function (DTF), provide monaural information that is especially useful in vertical localization. Each of these cues is susceptible to distortion by environmental factors, such as the presence of echoes, reverberation, and competing sources. However, even in some highly reverberant spaces, listeners are relatively unaffected by echoes in their ability to localize sound sources. In some cases, this ability may be partly attributable to the availability of redundant spatial cues, but an additional factor is the perceptual dominance of spatial cues contained in a stimulus onset—which are unaffected by the presence of echoes—over those contained in later portions (Zurek, 1980). This dominance is exhibited in a relatively large and well-studied class of phenomena shown using a variety of different approaches, and known variously as the “precedence effect” (Wallach et al., 1949), “Haas effect” (after Haas, 1972), “law of the first wavefront” (Blauert, 1983), “binaural adaptation” (Hafter, 1997), “echo suppression” (Clifton, 1987), and occasionally “onset dominance” (Freyman et al., 1997). A recent review can be found in Litovsky et al. (1999). Throughout the remainder of this paper, we use the descriptive term, “onset dominance,” to refer to the general phenomenon of increased influence of onsets in spatial hearing.1 The goal of the current study is to investigate onset dominance by developing a temporal weighting function for the localization of click-train stimuli presented in the free field.
Three types of stimuli have been used to estimate the temporal extent of onset dominance in past studies. In the first, paired stimuli (for example, paired clicks or noise bursts) are presented with a delay between the first (lead) and second (lag) stimulus. Lead and lag are presented with different spatial or intracranial positions, and listeners are asked to make spatial judgments regarding the lagging stimulus or the combined (fused) image of both lead and lag. Commonly employed in studies of the precedence effect (Litovsky et al., 1999), this method reveals a temporary reduction in spatial sensitivity from approximately 1 to 10 ms following the leading stimulus (Zurek, 1980). A second approach, employed by Hafter and colleagues (Hafter and Buell, 1990; Hafter et al., 1988b; Hafter and Dye, 1983) compares spatial discrimination performance for stimuli of different durations, where the different portions of each stimulus (clicks in a train) present redundant spatial information to the listener. While lateralization performance generally improves with stimulus duration—as expected if listeners respond based on pooled information from all clicks—improvement for high-rate stimuli [interclick interval (ICI) shorter than approximately 12 ms] is suboptimal, as if later clicks are less effective than earlier clicks. In modeling this effect, termed “binaural adaptation,” Hafter and Dye (1983) showed that the number of informative events (i.e., the effective number of clicks) available to a listener, N, is a compressive power function of the number of acoustic clicks, n
If one assumes that the relative effectiveness (or “weight”) of individual clicks declines monotonically following the stimulus onset, then the weight (wj) on each click j can be estimated by calculating the finite difference of Eq. (1)
(Hafter and Buell, 1990; Hafter et al., 1983). The exponent k of Eq. (1) is ICI dependent, with k ≈ 1 (optimal use of all clicks) at ICIs longer than approximately 12 ms; k (and consequently, N) grows smaller and smaller as ICI approaches 2 ms. Below this value, k ≈ 0 and performance for trains of up to 32 clicks is hardly better than for single clicks (Hafter et al., 1988b). Thus, weighting functions estimated by Eq. (2) show a monotonic decline in the effectiveness of clicks over the course of a stimulus, with the slope of decline related to ICI.
Both of the above methods estimate temporal weighting functions by comparing the performance levels achieved with stimuli of different overall durations. In the precedence method, effects of stimulus rate and duration are confounded. The subtractive method described by Hafter et al. (1988b) has the advantage of estimating the influence, or “perceptual weight,” of each click in an extended stimulus of a given rate. However, there are indications that the assumptions underlying Eq. (2) may not hold for all stimulus arrangements (see, e.g., Saberi, 1996), possibly because localization judgments reflect the retroactive evaluation of spatial information carried by all parts of the stimulus.
A more direct approach is to estimate weights for each portion of an extended stimulus independently; this can be accomplished using observer-weighting analyses (Ahumada and Lovell, 1971; Berg, 1989; Saberi, 1996; Shinn-Cunningham et al., 1993; Stellmack et al., 1999). These techniques were developed to help ascertain the relative influence of multiple stimulus components on a subject’s perception or psychophysical performance. In short, observer-weighting analyses (see, e.g., Berg, 1989) relate random variation of a number of independent stimulus components to variation in subject responses (e.g., detection, scaling, etc.). Stimulus components that, when varied, induce systematic changes in the response are assigned high weights by the analysis, while those which do not affect the response receive low weights.
In the context of onset dominance in sound localization, observer weighting was used by Shinn-Cunningham et al. (1993, 1995) to estimate the relative influence of leading and lagging noise bursts, presented over headphones, on the perceived lateral position of their combined image. The analysis revealed high weights for the lead and correspondingly low weights on the lag—indicating onset dominance—over lead–lag delays of 1 to 10 ms, as expected based on earlier studies of the precedence effect. Similarly, Stellmack et al. (1999) used observer-weighting analysis in a task where listeners were asked to discriminate the lateral position of lead or lag clicks (i.e., not the combined image), again presented over headphones. They found high lead weights at short delays (1–4 ms), regardless of whether subjects were asked to judge the position of the lead or lag click.
Two previous studies using observer-weighting techniques to estimate temporal weighting functions for the lateralization of extended stimuli were completed by Saberi (1996) and Dizon et al. (1998). Saberi (1996) estimated weighting functions for trains of 2–16 clicks, similar to stimuli employed by Hafter and Dye (1983). Saberi’s study employed a lateralization task in which listeners were asked to identify each stimulus as having clicks drawn from one of two normal distributions of ITD. The two distributions were centered at left-leading and right-leading ITD values corresponding to each listener’s discrimination threshold (defined as 75% correct performance), with standard deviations of 100 µs. Each click in a train possessed an ITD drawn at random from that trial’s distribution, and relative weights were computed for individual clicks. For ICIs of 1.8–12 ms, the first click received consistently higher weight than did later clicks; this effect was somewhat diminished at long ICIs and for long trains. Except at an ICI of 12 ms, weights did not decrease monotonically over the duration of the stimulus, as would be expected based on the model of Hafter and Buell (1990). Rather, weights were reduced to a constant level immediately following the onset (click 1).
Dizon et al. (1998) estimated similar weighting functions for broadband noises with varying ITD. Each stimulus was divided into 4–6 temporal “slices,” each with an ITD chosen at random from a discrete distribution of five values spanning the range from −400 µs (left-leading) to +400 µs (right-leading). As in the studies of Shinn-Cunningham et al. (1993, 1995), subjects estimated the perceived lateral position of the fused image, and linear regression was used to calculate weights for the different slices. Slice duration was varied from 2 to 10 ms as an experimental parameter. The results revealed high weight for the first slice, regardless of slice length, and approximately equal (and low) weights for the remaining slices. This pattern of weights matches closely the functions obtained by Saberi (1996), despite differences in procedure (clicks vs continuous noise, identification vs adjustment).
As described earlier, real-world sound localization is based on a combination of acoustic cues, including ITD, ILD, and the DTF. The observer-weighting studies described above employed only headphone listening, manipulating only ITD as a spatial cue. Manipulation of a single parameter in this manner helps to simplify experimental designs and the identification of potential mechanisms. However, we are ultimately interested in extrapolating these findings to real-world listening, where all three cue types are present. In this respect, pure-ITD stimuli are not satisfactory, because they are not mere simplifications of natural stimuli; rather, they present ITD- and ILD cues which are in conflict with each other and with spectral cues derived from the DTF.
Previous work has shown that binaural adaptation affects the processing of ILD cues similarly to that of ITD cues (Hafter et al., 1983), even in situations where the two cue types are presented together (Hafter et al., 1990). Similar results were found by Hafter et al. (1988a) to hold for free-field stimuli. However, there are some indications of important differences between pure-ITD and free-field listening with regard to precedence-like effects. Blauert et al. (1989), for instance, found no evidence of an active “restarting” phenomenon—as observed by Hafter and Buell (1990) under headphone-listening conditions—in the free-field precedence effect. In addition, the findings of Rakerd and Hartmann (1985) suggest that the presence of echoes alters the way in which listeners make combined use of ITD and ILD cues for localization.
Here, we present sounds in the free field via loudspeakers. In that manner, the ITDs, ILDs, and spectral cues are related in a natural fashion and are consistent with the individual subjects’ everyday listening experiences. To obtain localization judgments, we employed a direct localization (pointing) task. This task was similar to the adjustment procedures used by Shinn-Cunningham et al. (1993, 1995) and Dizon et al. (1998), but utilized a visual pointer in the free field rather than an acoustic one presented over headphones.
Subjects included the first author (CS) and four paid subjects (HW, LL, LS, and TL) naive to the purpose of the experiments. All subjects had normal audiograms from 125– 8000 Hz. Not all subjects participated in all experiments or conditions. Later sections indicate the particular subjects involved in each experiment.
Following previous work (Hafter and Dye, 1983; Saberi, 1996), stimuli throughout this study consisted of trains of high-frequency narrow-band clicks (Gaussian-windowed tone bursts), sampled at 50 kHz. Carrier frequency was fixed at 4 kHz and the Gaussian envelope, centered on a peak of the carrier waveform, had a total duration of 2 ms (measured at the points where the Gaussian window function falls below the limits of 16-bit truncation). Duration measured at ±1σ was 0.6 ms. The measured bandwidth (at −3 dB) of the click was approximately 900 Hz. Trains of 2 or 16 clicks were synthesized with interclick intervals (ICI), defined as time between click peaks, of 1, 3, 5, 8, or 14 ms. At 1-ms ICI, the 2-ms Gaussian envelopes overlap for half their total duration, but cross over at a point 59.9 dB below the peak of either envelope, resulting in negligible overlap in energy. Different stimuli were presented at equal absolute levels, so that SPL at the listener’s position varied systematically as a function of ICI, from 32 dB at 14 ms ICI to 39 dB at 1 ms ICI. All stimuli were clearly audible for all subjects.
As depicted in Fig. 1, listeners were seated in an anechoic chamber (Eckels Corp., 8.3×5.4×4.0 m), facing an array of 12 loudspeakers (Audax model MHD12P25 FSM-SQ) placed at ear level along the left, right, and front walls. Loudspeakers were spaced 5.5° apart in listener-centered azimuth, with the center-most loudspeakers placed 2.75° to the left and right of the listener’s midline. Stimuli were delayed and attenuated such that all loudspeakers produced sounds at the listener’s position that were equalized in level and aligned in time. The delays and attenuations simulated a circular array of loudspeakers located 6.1 m from the listener’s location. During the experiment, the loudspeakers were obscured visually by an acoustically transparent white curtain hung 1.07 m in front of the listener. A laser pointer was mounted near the listener’s right hand—approximately 56 cm to the right, 30 cm below, and 15 cm in front of the listener’s intracranial midpoint—on a pair of high-precision potentiometers allowing free rotation in both azimuth and elevation. It projected a bright spot of red light upon the curtain, which listeners used to make localization responses (see below). The laser position was recorded digitally by sampling the potentiometer settings with a pair of 8-bit analog-to-digital converters. Note that, since the curtain was hung in a straight line across the room, the accuracy of laser pointer readings was not equal across the entire field; away from the midline, a given angular rotation of the pointer produced a larger displacement of the point in head-centered coordinates. However, because all responses were transformed to head-centered coordinates for analysis, this distortion did not act to systematically bias responses; rather, the reduced pointer accuracy results in somewhat increased variance in responses away from the midline. This should probably not be a concern, especially considering that auditory spatial acuity is reduced in those regions as well (Mills, 1958). The room was lit by two 25-watt soft-white bulbs mounted near the ceiling to either side of the listener. The space beyond the curtain, including the loudspeakers, was darkened.
The spectral characteristics of individual loudspeakers were equalized through digital inverse filtering. Each day, impulse responses from each loudspeaker were recorded digitally and used to construct time-domain inverse filters that produced effectively “flat” spectral responses in phase and level (±1 dB in the range 2–6 kHz).
A stimulus location, θL, was chosen at random on each trial. This location defined the center of a group of three or five loudspeakers that presented the individual click stimuli. The stimulus itself was a train of 2 or 16 clicks, depending on the experiment. Each click in a train was presented from a single loudspeaker, selected at random from within the group. This random variation of location was necessary for the computation of observer weights for each click (see Sec. III).
Three conditions defining the placement of loudspeakers in a group were employed. The conditions, summarized in Table I, are denoted N3, W3, and W5. In each case, the letter indicates the range of azimuths around θL, either ±5° (“narrow,” for N3) or ±11° (“wide,” for W3 and W5). The number (3 or 5) indicates the number of loudspeakers in the group. For example, in condition W3, clicks were presented from one of three loudspeakers on a given trial; these were located at θL, θL−11°, or θL+11° azimuth. Because we used a fixed array of loudspeakers, some values of θL at the far ends of the array were not achievable in each condition; ranges of achievable stimulus azimuths are also shown in Table I.
Figure 2 shows a timeline of two hypothetical trials that might appear in condition W3. In the first trial, location 4 (θL= −13.75°) has been selected as the stimulus location for presentation of a 16-click train. Individual clicks are presented from speakers 2, 4, and 6 (−24.75°, −13.75°, and −2.75°). The stimulus on trial 2 is presented from location 7 (θL= +2.75°), and individual clicks are delivered to speakers 5, 7, and 9 (−8.25°, +2.75°, and +13.75°).
The listener’s task on each trial was to point to the location of the stimulus with the laser pointer. The subject guided the pointer using the right hand and held a small response box in the left hand. At the beginning of the trial, subjects were instructed to face forward and maneuver the laser pointer so that its spot was located in the “home” position, above 28° elevation and between −16° (to the right) and +17° (to the left) azimuth. In the home position, the spot was at or above the top of the curtain, and just beyond the subject’s gaze. A stimulus was presented from one of eight or ten potential locations, following which subjects were instructed to direct their gaze, without moving their heads, to foveate the perceived location. They were not told to expect stimuli with multiple apparent locations but, in the event that more than one acoustic image was perceived, they were to respond to the leftmost image. This instruction ensured that when subjects perceived multiple images (as expected, e.g., at long ICIs), the weights were not biased by listeners’ strategies to favor early or late clicks. Rather, since all clicks appeared in the leftmost position with equal probability, the appearance of multiple images should have produced equal weights on all clicks. Next, they were to maneuver the laser pointer to project its spot directly at the point of their gaze and to record the location by pressing a button on the response box. The positions of both potentiometers were recorded and transformed to coordinates of the listener’s gaze for analysis. Following the response, subjects returned the laser spot to the home position and, following a 1-s delay, the next trial began. Each experimental run consisted of 100 uninterrupted trials. Subjects were allowed to take breaks between runs, and completed between 4 and 16 runs in a given condition.
The listener’s task was to indicate a single location belonging to a group of stimuli with potentially disparate locations. Regardless of whether listeners actually perceived a single location for this type of stimulus, they had to respond in a way that combined the locations of the individual clicks. The analysis used here assumes the following linear combination:
where R is the (predicted) response location in azimuth, θi is the azimuth of the ith click (of n total clicks), wi is the perceptual weight applied to the ith click, and C is a constant that reflects overall bias in response locations. Because each click provides potentially usable information for the task, a reasonable listening strategy would place equal weight on each click (i.e., all wi are equal and responses indicate the mean location of all clicks). For a discrimination task (e.g., if subjects were asked to discriminate θL on two presentations) this strategy is not merely reasonable, but “optimal,” in information-theoretic terms (assuming equality and independence of the performance-limiting noise associated with each click) (Saberi, 1996). In contrast, when not all wi are equal, a suboptimal strategy, favoring some clicks over others, is indicated. Since the localization task used here has no “correct” answer, no strategy can be considered optimal in quite this way; however, equal weighting serves as a useful null hypothesis to which obtained weighting patterns can be compared.
Using response azimuth (θR) as the dependent variable and the azimuths of the n individual clicks (θi,i = 1,…,n) as independent predictor variables for multiple linear regression, a least-squares fit to Eq. (3), minimizing (θR – R)2, was computed for each combination of subject, stimulus condition, and ICI. Regression coefficients, also known as “beta weights,” obtained from these analyses provided estimates for wi. Normalized weights
were then computed to provide a relative weighting function for each combination of subject, condition, and ICI. While the raw weights provide a meaningful interpretation (degrees shift in response per degree shift in click location), normalized weights stress the relative influence of each click and also allow weights to be averaged across subjects (since Σwi=1 for each subject, by definition). Normalized weights are plotted throughout this paper, along with 95% confidence intervals (see the Appendix).
Experiment 1 used pairs of clicks to study the localization-dominance aspect of the precedence effect (Litovsky et al., 1999) with a direct-localization task. While the spatial separation of stimuli was somewhat smaller (5.5°–22° azimuth) than in many studies of localization dominance, the stimuli were designed to be similar to those used in previous work and spanned a comparable range of ICI (1–14 ms). The results of this experiment are compared to those obtained using observer-weighting paradigms for lateralization (Dizon et al., 1998; Saberi, 1996; Shinn-Cunningham et al., 1993; Stellmack et al., 1999).
Four subjects participated in this experiment. Two (CS, LL) listened to stimuli presented in condition N3 and three (CS, HW, TL) listened to stimuli presented in condition W3.
The ICI for trains of 2 clicks varied from 1 to 14 ms. Each subject completed four runs of 100 trials at each ICI, except subject TL, who listened only to ICIs of 3 and 8 ms.
Figure 3 presents the mean weighting functions (across subjects) obtained for click pairs in both conditions. For click pairs, 2 = 1 –1, so the normalized click weight 1 is plotted on its own, as a function of ICI. At the shorter ICIs (1–5 ms), weight assigned to click 1 was significantly greater than that assigned to click 2 (i.e., 1 > 0.5). There were no significant differences between the weights obtained in conditions N3 (filled circles) and W3 (open circles), except at the shortest ICI (1 ms), where the difference can be explained by differences in the degree of onset dominance exhibited by different subjects at this value of ICI (see Fig. 4). Smaller symbols plot comparable weights obtained in four other observer-weighting studies (Saberi, 1996; Shinn-Cunningham et al., 1993, 1995; Stellmack et al., 1999).
For 2-click trains, pr = w1 / w2. This ratio indicates the relative influence of the onset and later-arriving portions of the stimulus on the localization responses. Larger values indicate a stronger influence of the onset, and a value of 1 indicates equal weight between the onset click and the remainder of the stimulus. Precedence ratios are plotted against ICI in Fig. 4. A logarithmic scale is used to avoid overemphasizing large values of pr and to more clearly visualize the trend of pr values declining with increasing ICI. Separate lines indicate trials with different spatial separations between loudspeakers. Since the positions of clicks 1 and 2 were selected randomly from three possibilities (relative to θL) on each trial, there were three possible separations on any given trial: 0°, 5.5°, or 11° in condition N3 and 0°, 11°, or 22° in condition W3. By comparing performance on W3 trials separated by 11° with those trials separated by 22° (for instance), we determined whether precedence differed between trials with wide or narrow separations completed by individual subjects. As can be seen from the figure, there were no systematic differences between the separations. Especially at short ICI, however, large differences between the precedence ratios were calculated for different subjects, with HW (condition W3) showing very large pr values compared to subject LL (condition N3). It seems likely that the difference between conditions N3 and W3 apparent in Fig. 3 reflects this inter-subject difference rather than an effect of spatial separation. In comparison, subject CS—who listened in both conditions—showed similar pr values in both conditions.
The primary finding of this experiment is that localization was dominated by the first click at short ICIs but that weights for the two clicks were approximately equal at longer ICIs (8–14 ms). Recall that subjects were instructed to point to the leftmost auditory image when more than one image was apparent. Consistent pointing to the leftmost click would cause subjects to point to the lead on 50% of trials; equal weights reflect the fact that both clicks are equally likely to be in the leftmost position. If instructed differently, subjects would likely have been able to accurately localize both clicks at long ICIs. These results are in fair quantitative agreement with those obtained in other studies using headphone stimulation. As shown in Fig. 3, weights obtained by Shinn-Cunningham et al. (1993, 1995), Saberi (1996), and Stellmack et al. (1999) were similar to those measured in this experiment, except at longer delays, where the results of Saberi (1996) and Shinn-Cunningham et al. (1993) indicate stronger precedence effects than those of the current study or Stellmack et al. (1999). Shinn-Cunningham et al. (1993) also conducted a meta-analysis of lead weights estimated from previous studies of the precedence effect using discrimination measures (Gaskell, 1983; Saberi and Perrott, 1990; Zurek, 1980). These tended to decline more gradually (i.e., 1 remained above 0.7 for ICIs around 8–10 ms) than weights measured in matching tasks. The results of this experiment are in agreement with that observation, suggesting that precedence at intermediate delays (~5–10 ms) may be weaker for localization dominance than for lag discrimination.
Spatial separation was found to have no significant effect on the weighting functions. Though consistent with the results of Yang and Grantham (1997), this result contrasts with that of Shinn-Cunningham et al. (1993), in which some subjects showed separation-dependent weighting functions at long lead–lag delays and high stimulus levels. Considering the magnitude of interaural separations used in that study (ΔITD up to ±1 ms) relative to the equivalent angular separations used here—a maximum angular separation of 22° corresponds roughly to ΔITD of 150–200 µs (Kuhn, 1987)—it may be that very large separations act to reduce precedence in a way that narrower separations do not.
A final feature of the weighting functions plotted in Fig. 3 bears discussion: there is a consistent tendency at the longer ICIs (8–14 ms) for a larger weight on click 2 than on click 1. The difference is small, but it is statistically significant in condition W3. This result may reflect a general bias of subjects to point toward the later stimulus, perhaps through the perception (occasionally reported by subjects) of apparent motion. It may also relate to the increase of weights observed toward the end of the stimulus in experiments 2–4, discussed below. Similar effects were seen by Stellmack et al. (1999), who measured elevated echo weights at ICIs from 16–32 ms even when subjects were asked to judge the location of the lead click. Also, ITD-threshold data collected by Tollin and Henning (1998) show some indication that a diotic lag click interferes with discrimination of the lead’s ITD as ICI increases from 0.8 to 12.8 ms, suggesting increased influence of the lag at these delays.
While studies of the precedence effect have generally used pairs of stimuli with a variable delay (“lead” and “lag”), studies of onset dominance and binaural adaptation have used single stimuli with extended durations. Experiment 2 examined the form of localization weighting for trains of 16 bandlimited clicks, similar to stimuli used by Saberi (1996) and Hafter and Dye (1983) to measure binaural adaption.
All five subjects (CS, LL, LS, TL, and HW) participated in this experiment. Not all subjects were tested at all ICIs; specifically, TL was not tested at 8 ms ICI and LL was not tested at 5 ms ICI.
Stimuli were trains of 16 clicks, as described in Sec. II B. Each subject completed 4 runs of 100 trials at each of four ICIs (3, 5, 8, and 14 ms). Loudspeaker condition W3 defined the spatial layout of stimuli. All other aspects of the experimental procedure, stimulus presentation, and analysis were as described in the section on general methods. Aside from differences in stimuli used and listeners participating, this experiment was identical to experiment 1.
Normalized weighting functions, averaged across subjects, appear in Fig. 5. Weights for click 1 were significantly larger than weights for clicks 2–16 at short ICI (3 ms). At longer ICIs, weights were approximately equal for all clicks. Additionally, at ICIs of 3 and 5 ms, there is a consistent tendency for weights to increase from click 2 to click 16. At 5-ms ICI, this tendency manifested in a weight for click 16 that was significantly larger than the optimal value of 1/16=0.0625 (and also larger than most of the preceding clicks’ weights).
Figure 6 displays the precedence ratios computed from weighting functions for each subject. It can be seen from the figure that ratios were largest at 3-ms ICI, quantifying the onset dominance apparent in Fig. 5. Interestingly, however, the mean precedence ratio remained slightly above 1/15 (the expected equal-weighting value) for all ICIs, in all conditions. This indicates that although the weighting functions for 8- and 14-ms ICI in Fig. 5 did not deviate significantly from equal weighting, there was a tendency for onset dominance even for these stimuli, with click 1 receiving somewhat more weight than later clicks.
The weighting functions plotted in Fig. 5 demonstrate a large onset emphasis for the shortest ICI (3 ms), with relatively even weights for the longer ICIs (8 and 14 ms). Even at short ICIs, however, weights for clicks beyond the first remained positive, indicating that all clicks had some influence on the responses. These results are consistent with previous findings in precedence and binaural adaptation (Hafter and Dye, 1983; Litovsky et al., 1999; Shinn-Cunningham et al., 1993). However, they disagree with the model of binaural adaptation proposed by Hafter and Buell (1990), which predicts a monotonic decrease in the relative effectiveness of each click. Predictions of that model, calculated by fitting Eq. (2) to 1 at each ICI, are plotted as dotted lines in Fig. 5. The obtained weighting functions reveal an immediate reduction in weights following click 1. Across conditions, the weight for click 2 tended to be among the smallest, and was significantly overestimated by Eq. (2) at 3-ms ICI. This result is consistent with the findings of Saberi (1996) and Dizon et al. (1998), who also showed a rather abrupt reduction of postonset weights at short ICIs.
Across conditions, the largest weights other than click 1 appeared near the end of the stimulus (for ICIs of 5–8 ms, they were slightly larger than those of click 1). This “recovery” of weights is one of the more intriguing aspects of the obtained weighting functions. No such recovery was seen by Saberi (1996), and of course no such effect would be observed in precedence-type studies where only two clicks are presented—although results showing an increased influence of click 2 (e.g., experiment 1; Stellmack et al., 1999; Tollin and Henning, 1998) may be related. A number of experimental issues remain to be explored with respect to this finding; however, weighting functions for stimuli varying in duration suggest an increased influence of cues near the end of a stimulus, rather than a recovery from transient suppression following the onset (Stecker, 2000). This effect is reminiscent of “recency” effects observed in tests of verbal memory (Glanzer and Cunitz, 1966), and may be related to the integration of spatial information in sensory memory and/or response-planning mechanisms. These issues will be addressed in future work.
Increasing weights are, of course, not compatible with the relative-effectiveness model [Eq. (2)] presented by Hafter and Buell (1990), which assumes only monotonically nonin creasing weighting functions. However, the results are not necessarily at odds with the binaural-adaptation results (Hafter and Buell, 1990; Hafter et al., 1988b; Hafter and Dye, 1983) summarized by Eq. (1), showing suboptimal improvement with duration at short ICIs.
A further consideration when comparing the results of this experiment with those found in the binaural adaptation literature is the difference between discrimination measures and the localization task employed here. There are some indications that the extent of laterality produced by modulated high-frequency stimuli is not necessarily predicted by the discriminability of their interaural cues (Bernstein and Trahiotis, 1994; Buell et al., 1994). Some precedence studies have also shown that lagging stimuli can occasionally be discriminated—possibly based on nonspatial cues—in stimuli that produce single fused images (Litovsky et al., 1999; Saberi and Perrott, 1990). Finally, Tollin and Henning (1998) found discrepancies between listeners’ ability to independently lateralize both clicks in a pair and to discriminate the ITD of one click. Localization and discrimination measures must involve different brain mechanisms at some point (e.g., response selection), possibly with time courses different from those affecting purely sensory mechanisms (viz., auditory mechanisms underlying onset dominance).
One surprising result of the research on binaural adaptation is the so-called “restart” phenomenon (Hafter and Buell, 1990), whereby a brief acoustic change (or “trigger”) in the middle of an extended stimulus produces a release from adaptation. For example, Hafter and Buell (1990) inserted a short gap between clicks 4 and 5 in an 8-click stimulus. Without the gap, the performance improvement from four to eight clicks was suboptimal at short ICI, but with it, performance was equivalent to two optimally combined 4-click stimuli, each adapted independently of the other. A similar improvement was obtained when the gap was replaced by a different acoustic “trigger,” such as a shortening of the ICI or the appearance of a brief tone burst (Hafter, 1997). They termed this release from binaural inhibition “restarting,” based on the idea that the effect of the trigger is to return sensitivity to normal, preadapted levels, hence restarting the adaptation mechanism.
Although the results of Hafter and Buell (1990) show clearly the effects of restarting with various triggers, Saberi’s (1996) study of observer weighting in click-train lateralization revealed no effects of inserting gaps (4-ms gaps in trains of 1.8-ms ICI) at various positions within the trains. Saberi suggested that the difference between his results and those of Hafter and Buell (1990) may have been the result of the randomly varying ITD used in his study (compared to the static ITD used in previous studies). In this experiment, we interrupted 16-click trains with short gaps in order to explore the effects of restarting on sound localization.
Five subjects (CS, HW, LL, LS, and TL) participated in this experiment. Subjects completed 4 runs of 100 trials at each ICI (3 or 5 ms).
Stimuli were trains of 16 clicks, generated as in experiment 2, with one major difference: the ICI between clicks 8 and 9 was lengthened by 2 ms (for trains with 3-ms ICI) or 3 ms (for 5-ms trains). As in experiment 2, stimuli were presented using loudspeaker condition W3. Other aspects of the experimental procedure and analytical technique were unchanged from experiment 2.
Figure 7 plots the weighting functions for conditions incorporating a gap between clicks 8 and 9. The two upper panels show the mean normalized weighting functions, with 95% confidence intervals on the weight estimates. For a 3-ms ICI, a clear and significant elevation of the weight on click 9 (9) was found, indicating that the gap was an effective restarting trigger. For 5-ms ICI, there was no significant increase of the weight on click 9. Lower panels plot the normalized weights for individual subjects; each subject is represented by a different symbol. Note the variation between the weighting functions for the different subjects, especially with regard to 9. At 3-ms ICI, subjects CS, HW, and TL showed rather large weights on click 9 (and also quite small weights on click 8). Subjects LL and LS, on the other hand, showed weights on click 9 that were approximately the equal-weight value of 1/16; these same subjects also had the lowest weights on click 1, and the weakest onset dominance (precedence) at short ICIs, as seen in Fig. 6. A similar, though less apparent, trend can be seen in the weights for trains with 5-ms ICI.
At the shortest ICI (3 ms), introducing a gap of 2 ms between clicks 8 and 9 was sufficient to increase the weight on click 9, as expected from previous research on binaural adaptation (Hafter and Buell, 1990). However, the results also show clear intersubject variability with regard to restarting. Interestingly, the subjects that failed to show restarting showed the least onset dominance as well. In contrast to the results at 3-ms ICI, 3-ms gaps within trains of 5-ms ICI did not produce restarting. Some subjects in this condition (CS and TL) did produce larger weights on click 9 than on click 8, and those subjects also had the largest weights on click 1, however, the effects were much smaller than for 3-ms ICI, and not statistically significant.
Saberi (1996) found no effect of introducing gaps in trains of clicks presented over headphones. He reasoned that restarting may have been prevented by the random variation in ITD of his stimuli, compared to the static ITD employed by Hafter and Buell (1990). Based on the current results, we suggest that variation of interaural cues does not explain the difference, since spatial cues were also varied between clicks in this experiment, and restarting was apparent, at least for trains with 3-ms ICI. On the other hand, because the current paradigm employs free-field listening, stimuli carry both ITD and ILD cues, as well as cross-frequency cues related to the DTF. If variation of spatial cues causes the localization mechanism to alter the strategy used in combining time-and level cues, then the differences among the results of the three studies could be related to spatial variation, as Saberi suggested. However, more work comparing the influence of ITD and ILD in this paradigm is necessary before drawing strong conclusions along these lines.
The appearance of intersubject variation in the degree of restart (as seen in Fig. 7) suggests an alternate explanation of the difference between Saberi’s results (Saberi, 1996) and those of the current study: namely, that the two studies sampled listeners with stronger (in our case) or weaker (in Saberi’s) restarting. This explanation, however, appears unlikely for two reasons: First, the results for subjects LL and LS suggest that the degree of restarting may be related to the degree of onset dominance for a given listener; Saberi’s results, in contrast, show large onset weights, but no effect of gaps. Second, Hafter and Buell (1990) observed little variability across subjects concerning the improvement in performance afforded by restarting.
In experiments 1–3, every click contained in a stimulus train was subjected to random variation in its location. As pointed out by Saberi (1996), an important difference between observer-weighting approaches and earlier studies of binaural adaptation is that the latter have employed static interaural cues in assessing performance. Having already established that onset dominance, its rate dependency, and the restarting effect of Hafter and Buell (1990) are observed using the current paradigm, it is still quite possible that the obtained weighting functions differ in important ways from the time course of onset dominance for static stimuli. For example, if change in location were a sufficient trigger for restarting, then the random variations in click location could have acted to produce restarting at random times within each stimulus used in the study. Such occasional restarting would act to increase the weights of clicks following the restart (as was seen in Fig. 7), thus flattening all the functions obtained in this study. As a check for this kind of effect, experiment 4 examined localization weights for stimuli which did not possess random variation in the location of each click.
Ideally, we would derive a method for obtaining weighting functions directly from localization performance using static stimuli. The method of subtraction employed by Hafter and Buell (1990) utilizes static stimuli; however, the assumption underlying the estimation of temporal weighting functions—specifically that weights are monotonically nonincreasing—appear invalid for the regression paradigm, based on the results of experiments 2 and 3. The observer-weighting analysis employed in the current study avoids this assumption, but cannot be used for static stimuli, since it requires independent variation in each click’s location. In experiment 4, we modified the observer-weighting technique by varying the location of only one click in the stimulus. All other clicks were emitted from a common location. In this paradigm, the relative influence of the varied click (or “probe”) and the main “body” of the stimulus were assessed using multiple linear regression, as before. Because only two weights were computed (probe and body), and the body weight encompassed 15 clicks (for 16-click trains), the interpretation of the weighting functions is not as straightforward as in the other experiments. However, plotting probe weight as a function of probe position produces a function that is comparable to the weighting functions derived in experiments 1–3.
Two subjects (CS and HW) participated in this experiment. Each completed 8 runs of 100 trials at each tested ICI: 3, 5, and 14 ms in the probe condition and 4 runs of 100 trials at each of the same ICIs in a control condition (see below).
Stimuli were trains of 16 clicks, generated as in experiment 2. One of the 16 clicks was chosen as the “probe” on each trial. There were four potential probe positions corresponding to clicks 1, 2, 9, and 16. A new position was chosen at random from this set on each trial. The remaining clicks (e.g., clicks 1–8 and 10–16 when click 9 was the probe), termed the “body,” shared a common location (the stimulus location—θL—chosen on that trial). The location of the probe was selected in the same manner that individual click locations were selected in experiments 1–3. Loudspeaker condition W5 was used in this experiment; here, the probe was presented from −11, −5.5, 0, +5.5, or +11 degrees relative to the body on any given trial. We chose to use this increased spatial resolution to provide a more stable estimate of the single probe weight. A control condition employed condition W5 and the same subjects as the probe condition but was otherwise identical to experiment 2. Other aspects of the experimental procedure were identical to those used in experiment 2. However, the analytical technique was modified for the probe method employed by the new design.
Because all clicks comprising the body of the stimulus shared a common location, they did not vary independently and hence were not assigned weights individually. Rather, the model used here included two predictors: body and probe
Since the body contained 15 clicks, while the probe contained only 1, the body was expected to exert a stronger influence on localization (i.e., wbody should be somewhat larger than wprobe), regardless of the probe position. As before, normalized weights were computed
The normalized probe weight expresses the relative influence of the probe on localization responses, just as before, and would be expected to vary as a function of probe position, with large values for a probe at click 1, small values for a probe at click 2, and so on.
Figure 8 shows the mean normalized probe weights as a function of probe position. Open symbols indicate weighting functions obtained in the control condition (similar to the method used in experiment 2) for comparison. Weights obtained in the control condition did not differ significantly from those obtained in experiment 2. There were additionally no significant differences between the results for the probe and control conditions. For both 3- and 5-ms ICI there was a trend for slightly more extreme weights (higher at clicks 1 and 16, lower at click 9) in the probe condition consistent with the notion that spatial variation may have acted to flatten weights in the control condition (and in experiments 1–3); however, none of these effects was statistically significant.
The lack of significant systematic differences between the weights calculated using the probe method and those calculated with random variation on all clicks lends support to the form of observer weights reported in this study. The results of this experiment show a nonsignificant tendency for less-flattened weights using the probe method compared to the control. This would seem to suggest a small homogenizing effect of click variation on obtained weights in experiments 1–3. However, conclusions should be tempered by consideration of the subjects participating in experiment 4, who were more experienced than other subjects, and tended to show larger effects of onset dominance at short ICI in the earlier experiments. Although the results of experiment 4 are expressed in relation to a control condition employing only those subjects, it is possible that other subjects would have been less sensitive to the manipulation. Nevertheless, the correspondence between weighting functions obtained in the two methods suggests that the results of the previous experiments were not largely affected by continuous variation in the locations of clicks; if anything, they may have acted to reduce differences between the weights for different clicks.
Support for this work was provided by NIH/NIDCD Research Grant 00087. We thank Miriam Valenzuela, Frédéric Theunissen, and Bruce Berg for their assistance in developing the observer-weighting method employed in this study; Ephram Cohen and Erick Gallun for invaluable assistance with programming and figure preparation; Rich Ivry, Marty Banks, David Wessel, Gerald Kidd, and Chris Mason for helpful comments during the early stages of this research; and finally John Middlebrooks, Brian Mickey, Ewan Macpherson, Julie Arenberg Bierer, along with two anonymous reviewers for providing feedback on earlier versions of this paper.
The standard error of an individual regression coefficient wi is given as
where sθR·θi is the standard error of estimate (square root of error variance) associated with the predictor variable in question and sθsi is the standard deviation of that predictor. Here, N is the number of samples used to estimate wi (Howell, 1997). The statistical significance of differences between coefficients can be examined using t-tests. Alternately, one can compute confidence intervals (CI) on the weight estimates
where t(α/2,df) is the critical value of Student’s t for a significance level of α/2 (e.g., 0.025 for 95% confidence intervals) and df degrees of freedom (N–n–1). Ninety-five-percent confidence intervals were calculated based on the raw weights (prior to normalization) for each subject. For plotting with normalized weights, confidence limits were normalized in the same manner as the weights themselves
Plots of mean weights across subjects use error bars displaying mean confidence limits, rather than limits corresponding to the pooling of data across subjects. As such, they do not provide an estimate of intersubject variability; conversely, they overestimate the actual 95% confidence interval for the mean weight, and thus reflect a somewhat conservative estimate of significance.
aPortions of this work have appeared in the first author’s doctoral dissertation and in a poster presentation given at the 24th Annual Midwinter Research Meeting of the Association for Research in Otolaryngology in February 2001.
PACS numbers: 43.66.Qp, 43.66.Mk [LRB]
1Some readers may object to our lumping together of potentially disparate phenomena, and we tend to agree with the general view that a number of interacting neural mechanisms at different levels may be involved in onset dominance phenomena. Nevertheless, it is our view that at least some distinctions in the literature result primarily from differences in the experimental tasks and stimuli used in various studies. The stimuli employed here were designed to be similar to those used in previous studies of both binaural adaptation and the precedence effect. Labeling the target of this investigation as one or the other at this point seems mistaken, especially considering the remarkably convergent temporal extents of the two phenomena.