|Home | About | Journals | Submit | Contact Us | Français|
Previous studies have demonstrated large errors (over 30°) in visually perceived exocentric directions (the direction between two objects that are both displaced from the observer’s location; e.g., Philbeck et al., in press). Here, we investigated whether a similar pattern occurs in auditory space. Blindfolded participants either attempted to aim a pointer at auditory targets (an exocentric task) or gave a verbal estimate of the egocentric target azimuth. Targets were located at 20° to 160° azimuth in the right hemispace. For comparison, we also collected pointing and verbal judgments for visual targets. We found that exocentric pointing responses exhibited sizeable undershooting errors, for both auditory and visual targets, that tended to become more strongly negative as azimuth increased (up to −19° for visual targets at 160°). Verbal estimates of the auditory and visual target azimuths, however, showed a dramatically different pattern, with relatively small overestimations of azimuths in the rear hemispace. At least some of the differences between verbal and pointing responses appear to be due to the frames of reference underlying the responses; when participants used the pointer to reproduce the egocentric target azimuth rather than the exocentric target direction relative to the pointer, the pattern of pointing errors more closely resembled that seen in verbal reports. These results show that there are similar distortions in perceiving exocentric directions in visual and auditory space.
In light of the seemingly effortless accuracy with which we are typically able to interact with objects in the world, the fact that there are sizeable misperceptions of spatial relationships, even in the nearby environment, is remarkable (e.g., Doumen, Kappers, & Koenderink, 2005; Philbeck, 2000). A particularly striking example comes from distortions in perceiving exocentric directions—that is, perceiving the direction from one object to another, both of which are displaced from one’s current viewing location. One way to study this involves asking observers to aim a pointer mounted just in front of the body at a nearby target (Haber, Haber, Penningroth, Novak, & Radgowski, 1993; Haun, Allen, & Wedell, 2005; Montello, Richardson, Hegarty, & Provenza, 1999). We have demonstrated large errors in this task, even when both the pointer and the target are within arm’s reach (Philbeck, Sargent, Arthur & Dopkins, in press). When participants point without vision, mean errors increase with target eccentricity and peak at −32° for targets at 150° azimuth (i.e., the direction in a horizontal plane relative to the pointing direction of the head). Large errors remain even when observers can see both the pointer and target while responding (averaging −21° for targets at 150°).1 These errors are apparently not due to misperception of the target location, because verbal estimates of the egocentric target azimuth (centered on a vertical axis through the head) are highly accurate and precise. Most importantly, the errors are also not due to mechanical constraints associated with manipulating the pointer, because participants can use the pointer more accurately in an egocentric task analogous to that underlying the verbal estimates—that is, using the pointer to reproduce the egocentric target azimuth, rather than the exocentric task of aiming the pointer at the target. This suggests that the errors are not fundamentally associated with involvement of the motor system. We interpret these results as indicating strong misperception of exocentric directions in visually perceived space.
These results are striking for several reasons. First, the task of aiming a pointer at a nearby target has sometimes been assumed to measure perceived egocentric target azimuth (Montello et al., 1999). Our results, however, suggest that this seemingly egocentric task in fact probes exocentric relationships between the pointer and target. Second, our results show that large distortions in perceived pointer-target relations appear to co-exist with accurate perception of the egocentric locations of the targets themselves. Similar effects have been found at somewhat larger distances (e.g., Doumen et al., 2005; Doumen, Kappers, & Koenderink, 2006; Foley, 1965; Kelly, Loomis, & Beall, 2004; Koenderink & van Doorn, 1998; Koenderink, van Doorn, Kappers, & Lappin, 2002). Koenderink and others account for these errors in indicating exocentric directions by concluding that the intrinsic geometry of visually-perceived space is non-Euclidean (Cuijpers, Kappers, & Koenderink, 2000a, 2000b, 2001, 2002; Foley, 1972; Koenderink et al., 2002; Koenderink, van Doorn, & Lappin, 2000, 2003; but see also Gogel, 1990). We will return to this issue in the General Discussion. Regardless of how the errors come about, however, their impact on directional judgments in other research domains is largely unknown. Our goal in the current paper is to investigate these issues in the auditory domain. If the distorted representations that presumably underlie exocentric pointing errors are manifested only during on-line visual perception (or perhaps decay rapidly after vision is occluded), one would not expect similar errors to be manifested during pointing to target locations if they are specified non-visually. More specifically, the large apparent discrepancy in visual space between accurately-perceived egocentric locations (indicated by verbal reports of azimuth in our previous work) and inaccurately-perceived exocentric relations (indicated by inaccurate exocentric pointing) should be virtually eliminated when the target locations are supplied by a non-visual sensory modality. Although there may still be systematic errors in exocentric pointing, these errors should be readily explained by errors in the initially perceived egocentric target locations. This means that the pattern of errors in exocentric pointing tasks may be quite different, depending on whether targets are specified by vision or by some non-visual modality, even when the initial egocentric localization is taken into account.
Some evidence from the haptic domain, however, suggests that misperception of exocentric directions is not limited to the visual modality (Blumenfeld, 1937; Kappers, 1999; Kappers & Koenderink, 1999; Kappers, Postma & Viergever, in press). Kappers and Koenderink (1999) demonstrated large systematic errors (up to 40°) in haptic judgments of perceived parallelity, collinearity, and exocentric direction. These large distortions in haptic perception were manifested over a large scale of haptic space and under bimanual conditions (Kappers, 1999). The extent to which this phenomenon might generalize to other non-visual domains remains unknown. In particular, concrete evidence for this idea is lacking in the auditory domain. Ohkura, Yanagida, Maeda and Tachi (2000) reported distortions in two auditory perceptual tasks (creating apparently parallel alleys and equidistant alleys) that were similar to distortions that occur in visual space (e.g., Battro, Netto, & Rozestraten, 1976; Indow & Watanabe, 1984). Whether or not these distortions generalize to include exocentric pointing in the auditory domain remains unclear.
A variety of studies have investigated directional localization of auditory targets; many of these involved responses that required moving the entire body or a hand-held instrument from a body-centered (presumably egocentric-referenced) starting location all the way to the perceived sound source location (e.g., Brungart, Durlach & Rabinowitz, 1999; Langendijk & Bronkhorst, 2002; Loomis, Klatzky, Philbeck & Golledge, 1998). Other studies have used verbal reports of the sound source location (Wightman & Kistler, 1989) or turning the head to face the sound source (Bronkhorst, 1995; Carlile, Leong & Hyams, 1997; Makous & Middlebrooks, 1990; Perrett & Noble, 1995). Gilkey, Good, Ericson, Brinkman and Stewart (1995) asked participants to use a stylus to indicate the perceived source location on a spherical model located just in front of them. Although the model was displaced from the participant, it was used to indicate the egocentric source location, i.e., relative to the participant’s head. None of these studies assessed directional relationships between two locations, both of which were displaced from the participant.
One study that may have probed the perception of exocentric relationships in auditory space was conducted by Oldfield and Parker (1984). In this study, blindfolded listeners used a handheld “gun” to point to a white-noise sound source in an anechoic environment. The mean pointing responses increasingly undershot the physical target location in the rear hemispace, becoming most negative at just over −20° for target azimuths in the range of 120 to 160 deg. This is comparable to the magnitude of systematic errors observed in the indications of visual target azimuth by Philbeck et al. (in press). However, it is not obvious to what extent errors in this task reflect errors in egocentric target localization versus errors in the perceived exocentric alignment between the “gun” and the target. Haber et al. (1993) compared 9 different methods of indicating the direction of auditory targets (1000 Hz pure tones presented outdoors) in a group of congenitally blind participants. One method involved aiming a pointer at the perceived source location during the stimulus presentation. Montello et al. (1999) replicated some features of this method in a group of blindfolded sighted participants. Both studies showed some response errors that could indicate misperception of exocentric alignment in auditory space, but again, it is unclear to what extent the errors may reflect mislocalization of the egocentric sound source locations.
In the following two studies, we aimed to determine whether the exocentric pointing errors that occur for visual targets also occur when target locations are specified by audition. To do this, in Experiment 1 we assessed the ability of participants to aim a pointer without vision toward sound sources (a series of noise bursts) in a horizontal plane. We measured the perceived egocentric localization using verbal reports. For comparison, we collected similar pointing and verbal responses for visual targets. If there are similar misperceptions of exocentric directions in both visual and auditory space, there should be large errors when participants aim the pointer at a target (particularly in the rear hemispace, as seen in our previous results; Philbeck et al., in press), but these errors should not be accompanied by large errors in egocentric localization tasks. As we will describe, the Experiment 1 data supported this prediction. In Experiment 2, we tested the idea that the frame of reference underlying the pointing responses was responsible for the large errors. Instead of aiming the pointer at the perceived location of the target (an exocentric task), participants used the pointer to reproduce the egocentric azimuth of auditory targets. Pointing errors were reduced in this condition.
Auditory localization experiments are often conducted in relatively restricted cue (e.g., anechoic) environments so that the effectiveness of individual cues may be investigated (Blauert, 1983; Middlebrooks & Green, 1991). We did not follow this approach in Experiment 1, for two reasons. First, our goal here was not to isolate individual cues to determine their relative effectiveness, but rather to compare two response modalities (verbal report and manual pointing) with the cue configuration held constant. Second, in our previous work in the visual domain (Philbeck et al., in press), we found large exocentric pointing errors under viewing conditions that also produced highly accurate verbal indications of egocentric azimuth. The accurate egocentric judgments formed the basis of our interpretation that the pointing errors were not simply the result of inaccurate localization of the egocentric target azimuth. To more fully compare the similarity between our previous work in the visual domain to the auditory domain, it was critical to provide multiple sources of auditory spatial information, so that egocentric localization of the auditory targets would be maximized. This should decrease the extent to which inaccurate egocentric localization contributes to errors in the exocentric pointing task.
To these ends, Experiment 1 was conducted in a natural indoor environment, with broad-band auditory stimuli (a series of noise bursts). Broad-band stimuli contain the low and high frequencies that are the stimuli for interaural time difference and interaural level difference cues, respectively (Middlebrooks, Makous & Green, 1989; Zwislocki & Feldman, 1956). Participants were also encouraged to turn their head to face the source location while the noise bursts were presented. Turning the head allows participants to position the stimulus in a plane perpendicular to the interaural axis, a position where static localization tends to be optimized (Makous & Middlebrooks, 1990; Middlebrooks & Green, 1991; Pollack & Rose, 1967). Thus, prior to the onset of head motion, azimuthal cues were probably dominated by interaural time and level differences and pinna cues; during and after the head motion, information related to neck muscle proprioception, efference copy, and vestibular cues was also available.
Fourteen people (7 men and 7 women, average age = 18.9 years, range = 17 to 20 years) participated in the study for course credit. All but one were right-handed. All individuals tested gave their written consent prior to participating and reported having normal or corrected vision in both eyes and normal hearing in both ears. This study was approved by the George Washington University ethics committee (Washington, DC, USA) and was performed in accordance with the ethical standards of the 1964 Declaration of Helsinki.
Each participant was exposed to both auditory and visual targets. These stimulus modalities were presented in separate sessions, with the average separation between sessions being 6.7 days (range 2 – 9 days). This separation was intended to minimize carry-over effects between sessions. Stimulus modality block order (audition first or vision first) was counterbalanced across participants. Two behavioral measures of target azimuth were compared: verbal report of the egocentric target azimuth and exocentric pointing, in which participants attempted to align the pointer with the target location. These response modalities were blocked within each stimulus modality session; half the participants pointed first and half gave verbal estimates first. For each participant, the same response-type block order was used in both stimulus modality sessions. Auditory and visual targets were presented at one of the following possible azimuths to the right of straight ahead: 20°, 40°, 60°, 80°, 100°, 120°, 140°, or 160°. Each azimuth was sampled six times for each combination of response type and stimulus modality. Trials were randomized within each response type block.
The experiment was conducted in a well-lit, carpeted laboratory, 6 m × 6 m, with smooth plastered brick walls and fiberglass ceiling panels at a height of 2.45 m. The background noise level, as measured by a CEM DT-805 sound level meter (Shenzhen Everbest Machinery Industry Company, Ltd.: Shenzhen, China), was 43 dBA. The background frequency spectrum was relatively flat, with somewhat more power in frequencies below 300 Hz (sound measurement is described in more detail below). The RT60 reverberation time for frequencies of 125, 250, 500, 1000, 2000 and 4000 Hz, calculated using Smaart AcousticTools 4.14 software (Eastern Acoustic Works: Whitinsville, MA), was 0.54, 0.57, 0.47, 0.23, 0.21, and 0.19 s, respectively. Reflections of the sound stimulus off the table surface surrounding the participant (described below) likely contributed to the total amount of sound reaching the listeners' ears indirectly.
Participants sat in a chair surrounded by a round table (0.80 m high and 1.52 m in diameter, with a circular hole 0.80 m in diameter in the center; see Figure 1). Although the chair could be rotated to allow participants to sit down, the chair orientation remained fixed during all stimulus presentations and responses. The table was situated approximately in the center of the room, and was surrounded by a heavy black Polyester curtain. The curtain hung from ceiling to floor in a circular configuration, 3 m in diameter. It was opaque to light. Although we did not test the acoustical properties of the curtain, it was not likely to be a significant source of sound absorption or reflection at the frequencies our noise-generating device was capable of producing. During testing, the participant faced one corner of the room (although all the walls were obscured by the curtain). Markings under the edge of the table were used by the experimenter to position the targets but were invisible to participants. A pointing device was mounted on the arms of the chair directly in front of the participants, at waist level, in a horizontal plane. The rotation axis of the pointer was offset from the chair axis by approximately 23 cm. The pointer itself consisted of a thin rod, 16 cm long, centered on top of a polar scale graduated to the nearest degree. The scale was covered while participants viewed the visual target. Both auditory and visual targets were presented on the surface of the table surrounding the participant, at a distance of approximately 1 m from the center of the table.
The stimuli were broadband noise bursts, square wave gated with a 200 ms on / 200 ms off duty cycle. Eight pink noise bursts were separately generated using Sound Forge 4.5 (Sonic Foundry, Inc.: Madison, WI); these were combined into a single file of 8 bursts. This eight-burst train was then set to loop continuously and recorded onto cassette tape. During the experiment, these recorded bursts were played by a RadioShack CTR-112 cassette player with a 4.2 cm diameter loudspeaker, using the method described below in the Procedure section. Figure 2 shows the frequency spectrum of a representative noise burst produced by the cassette player when placed in the 20° stimulus position. Measured at the listening position, the mean sound pressure level of the bursts was 60 dBA. Because the cassette player rested on the table while emitting the noise bursts, the sound source was located somewhat below the participants’ interaural axis. The angle of declination depended upon the participants’ individual ear height, but generally ranged between 25–35 deg.
Visual targets were generated by white light-emitting diodes (LEDs) mounted underneath the table surface. The table surface was made of translucent Plexiglass, tinted dark brown and sanded to eliminate specular reflections from the overhead illumination. The stimulus locations were only visible when the LEDs were illuminated. When illuminated, the LEDs had the appearance of a bright circular spot on the table surface, with an angular diameter of approximately 0.5 deg.
Participants were individually tested in two sessions lasting no more than 60 minutes apiece. Following our previous methodology (Philbeck et al., in press), participants were first informed of the offset of the pointer’s axis from the center of the chair. They were instructed to point so that an imaginary line would extend from the pointer and connect directly to the target. That is, they were not to use the pointer to reproduce the direction of the target relative to an axis centered on the chair or their head. This was demonstrated by the experimenter by placing a small object at 90° to the right of the participant (i.e., relative to an axis passing vertically through the center of the chair), and showing the participant that an accurate pointing response would require setting the pointer greater than 90°, relative to the pointer axis. All participants pointed using their right hand to avoid variability that might accrue from hand-specific biases. In previous work, we found that pointing with the non-dominant hand does not exert a strong influence on pointing errors in this paradigm (Philbeck et al., in press). After every pointing response, the experimenter recorded the direction of the pointer to the nearest degree and then turned the pointer back to 180° on the pointer scale. In verbal report trials, participants were instructed to respond in degrees using a 0–90° “quadrant” format, coupled with an indication of the target’s quadrant location (e.g., “back right”). Thus, for example, for a target placed at 100° to the right of the participant, a correct response would be “10°, back right.” To ensure that participants fully understood this 0–90° quadrant format, the experimenter randomly placed an object in one of the two right-side quadrants and asked what the correct response would be. Although we were prepared to offer feedback if participants did not report the correct quadrant in this preliminary trial, this was never required. Participants were encouraged to use increments of 10° as often as possible. In our previous work using visual targets in well-lit conditions, observers produced highly accurate and precise judgments of egocentric target azimuth using this response (Philbeck et al., in press).
In auditory trials, participants remained blindfolded throughout the entire block of trials. On each trial, the experimenter placed the cassette player on the table surface with the loudspeaker pointing upwards and aligned with the appropriate azimuth marking underneath the table. Participants were instructed to move their head and torso so as to point their nose at the target as they listened to the stimulus. The noise burst train was played continuously until the experimenter judged that the participant’s nose was indeed pointed at the source location; head pointing accuracy was not measured, but this method ensured that participants genuinely attempted to point their nose on each trial and were reasonably accurate in doing so. The total duration of the burst train on each trial was not recorded, but generally durations ranged between 3 and 8 s. After the burst train was discontinued, the participant generated a response using the specified response type, the response was recorded, and the target was positioned for the next trial. Generally, participants listened to the stimulus first then turned back to their initial position prior to making a response; this avoided an awkward position in the pointing trials. No error feedback was given.
At the beginning of each visual trial, participants raised their blindfold for several seconds to view the target. They were encouraged to move their head and torso so as to get a good view of targets located in the rear quadrant. As participants viewed the target, a piece of paper covered the numbers on the pointing dial. After several seconds, the experimenter announced the response type to be used on that trial. Participants then donned the blindfold and immediately made the specified response. The paper covering the numbers on the pointer dial was removed just after the blindfold was in place and before pointing responses were initiated. The participant’s verbal or pointing response was recorded, and the target was positioned for the next trial. Although donning the blindfold in visual trials imposed a slightly longer delay (approximately 1 s) between the stimulus presentation and the participant’s response than in auditory trials, this difference is likely to be negligible given the relatively small memory load. No error feedback was given.
One conspicuous outlier was removed before analysis--a pointing judgment of 350° for an auditory target at 20° egocentric azimuth. Circular statistics (Batschelet, 1981) yielded highly similar values to the linear means and standard deviations, so we will report only the linear analyses.
The rotation axis of the pointer was offset from a vertical axis passing through the center of the chair by approximately 23 cm. Because the chair-centered axis passed through participants’ heads, or nearly so, we will define the egocentric target azimuth as being centered on this chair-centered axis, with zero degrees being straight ahead relative to the torso and azimuth increasing in a clockwise direction. The offset between the chair and pointer axes meant that the physical direction of the target relative to the pointer axis was somewhat different from the direction relative to the chair axis. This directional difference, called “parallax”, complicates the issue of how to calculate and compare errors between verbal reports (which presumably are centered on an axis near the chair center) and pointing responses (centered on a point displaced 23 cm from the chair center). In Experiment 1, we performed separate Analyses of Variance (ANOVA) on the error data using pointing errors calculated relative to the chair-centered reference frame and pointing errors calculated relative to the pointer-centered reference frame. These analyses differed in only minor ways, so we will only report the inferential statistics using the untransformed (pointer-centered) pointing errors. We will, however, report the mean constant errors calculated in both chair-centered and pointer-centered frames of reference as a basis of comparison.
We analyzed constant (signed) errors, which measure the overall tendency to over- or undershoot the physically accurate value. We calculated constant error as the response value minus the physical target azimuth. We also analyzed absolute (unsigned) errors, which measure the overall magnitude of errors, ignoring their direction. This analysis differed in only minor ways from the constant error analysis, so we will not report the absolute errors. Lastly, we also analyzed within-subject variable errors, calculated as the standard deviation across the 6 measurements per condition for each participant. This provides a measure of response consistency, with high error indicating low consistency and vice-versa.
Verbal estimates were converted to degrees, ranging from 0 to 360° (e.g., responses of “10° back right” became 100°). Before analysis, the constant and absolute error data were averaged over the six repetitions per condition within subjects. We did not analyze for gender differences or response-type block order effects (i.e., verbal-first versus pointing-first).
Figure 3 shows the mean constant errors in Part 1 of this experiment. The pattern of results for auditory versus visual targets was quite similar. Pointing responses for both stimulus modalities showed strong underestimations, particularly in the rear quadrant, with errors as large as −19° and −16° for visual and auditory targets, respectively. By contrast, the magnitudes of verbal errors were considerably smaller, reaching maxima of 3.9° and 8.3° for visual and auditory targets, respectively.
We performed a multivariate analysis of variance (MANOVA) on the constant (signed) errors, with stimulus modality, response type and target azimuth as within-subject factors. This analysis showed no main effect of Stimulus Modality (F[1, 13] = 2.59, p = 0.132). However, there was a strong main effect of Response Type (F[1, 12] = 58.30, MSe = 18,052.6, p <0.0001), with constant errors being more strongly negative in pointing than in verbal reports (see Figure 3). There was also a main effect of Azimuth (F[7, 91] = 5.70, MSe=307.8, p <0.0001). These effects were qualified by a Response Type × Azimuth interaction (F[7, 91] = 22.67, MSe = 1,083.0, p < 0.0001): pointing errors grew more negative as azimuth increased, becoming most negative for targets at 140° egocentric azimuth with a mean of −17.9° (collapsing across auditory and visual target conditions). Verbal errors, meanwhile, were generally much smaller, becoming somewhat more positive as azimuth increased. The maximum mean verbal error was +6.7° for targets at 160° egocentric azimuth (again, collapsing across auditory and visual target conditions). These significant differences in response type across target azimuth did not depend on stimulus modality (F[7, 91] = 0.52, p = .816). There was no Response Type × Stimulus Modality interaction (F[1, 13] = 0.04, p = .838), and no Stimulus Modality × Azimuth interaction (F[7, 91] = 1.32, p = .251).
To illustrate the extent to which individual participants tended to produce differing results depending on response type, we calculated difference scores for each participant (mean verbal response minus mean pointing response) for each target azimuth, separately for auditory targets and visual targets. Figure 4 shows the mean difference scores (i.e., averaging the individual difference scores across participants) for each combination of stimulus modality and target azimuth. This shows that that, on average, individual participants indeed tended to give larger verbal responses than pointing responses. This is so because pointing responses tended to undershoot the physical target azimuth more than verbal estimates (see Figure 3).
Table 1 shows the mean constant errors in Experiment 1 expressed in both chair-centered and pointer-centered frames of reference. Although this pattern would not necessarily hold true across all error magnitudes and target azimuths, in our data, the constant errors calculated relative to the chair center tended to be somewhat more strongly negative on average than errors calculated relative to the pointer, particularly for targets in the rear hemispace. It is clear that the same pattern of pointing errors exists under both chair- and pointer-centered definitions of error, albeit with slightly different overall magnitudes. This similarity suggests that the difference in constant errors between verbal and pointing responses is not simply an artifact of the frame of reference used to calculate the errors.
Responses in auditory conditions exhibited significantly more within-subject variability than responses in visual conditions, and pointing responses were significantly less variable than verbal responses when auditory targets were involved. A MANOVA on the variable errors, calculated within subjects across 6 measurements per condition, showed a main effect of Stimulus Modality (F[1, 13] = 11.57, MSe = 579.1, p = .005), with variable errors averaging 9.7° and 7.4° for auditory and visual targets, respectively. There was also a main effect of Response Type (F[1, 13] = 10.88, MSe = 298.3, p = .006), with verbal and pointing variable errors averaging 9.4° and 7.8°, respectively. In our previous work (Philbeck et al., in press), we found that verbal and pointing responses to visual targets did not differ in terms of within-subject variability. Interestingly, in the current study, there was a marginally significant Response Type × Stimulus Modality interaction that explains this apparent contradiction (F[1, 13] = 4.49, MSe = 110.0, p = .054): for visual targets, replicating the pattern of our previous work, variable errors for verbal and pointing responses were quite similar (7.76° and 7.20°, respectively). For auditory targets, verbal responses were considerably more variable than pointing responses (11.0° vs. 8.4°, respectively).
There was also a main effect of Azimuth as well (F[7, 91] = 4.32, MSe = 94.1, p = .004), with variable errors tending to increase with azimuth. This effect was qualified by Response Type (F[7, 91] = 4.0, MSe = 74.16, p = .0007), such that the variable errors for verbal responses tended to vary more with target azimuth than did those of pointing responses (see Table 2). There was no interaction of Stimulus Modality × Azimuth (F[7, 91] = 1.80, p = 0.97, and no Stimulus Modality × Response Type × Azimuth interaction (F[7, 91] = 0.97, p = .457).
We will comment on the similarity between our auditory localization data and previous research in the General Discussion. Our primary interest in Experiment 1 was to determine whether the large exocentric pointing errors found in visual space also appear in auditory space. The results clearly demonstrate that the same pattern of errors was indeed found for both visual and auditory exocentric pointing tasks.
Although we carefully instructed participants to align the pointer directly with the target, might they have instead set the pointer to match the target azimuth defined relative to themselves? If so, one would expect the pattern of errors shown in the dashed curve in the top panel of Figure 3. This curve was created by comparing egocentrically-defined azimuths for various positions around the table with the corresponding exocentric directions of those positions relative to the pointer. As the figure shows, the observed pattern of errors peaks at a different azimuth and is larger in magnitude than the errors predicted by errors in following the instructions. This argues against the idea that participants pointed using an egocentric frame of reference instead of following the instructions.
Considering just the visual target condition, our results qualitatively replicated the pattern of verbal vs. pointing differences we found in our previous work (Philbeck et al., in press). The most negative mean pointing error for visual targets in Experiment 1 was − 19.5°, calculated relative to the pointer center; this corresponds to a value of −23° calculated relative to the chair center. This is somewhat less negative than the −35° chair-centered errors we found for targets at 140° in our previous study (Philbeck et al., in press). The maximum mean verbal error in the current experiment was +3.9° for visual targets at 160° egocentric azimuth, compared with +3.0° for targets at 160° egocentric azimuth in our previous study. These differences may be related to changes in the laboratory environment between the current study versus our previous work. Previously, we used a square tabletop (instead of circular, as in the current study) and the laboratory walls were visible (instead of occluded by a circular curtain, as they were here). Eliminating rectilinear features of the environment deprives participants of salient contextual cues that might otherwise be used as a reference when generating responses (Cuijpers et al., 2001). If these changes in apparatus and laboratory are responsible for the differing pointing error magnitudes between studies, however, this suggests that the presence of rectilinear features in the environment somehow may amplify exocentric pointing errors. More research is required to resolve these issues.
Any observed differences in the pattern of responses between two response types (e.g., verbal reports and manual pointing) could arise from at least two sources: (1) One source is at the level of the perceptual representations that presumably underlie the two response types. For example, our interpretation of the greater errors in pointing than in verbal reports in our previous work (Philbeck et al., in press) was that pointing reflected distortions in the perceived exocentric directions of the targets relative to the pointer, while verbal estimates reflected the more accurately-perceived egocentric target azimuths. (2) An alternative possibility is that differences between pointing and verbal responses could be due to biases introduced at the response execution stage (Carlile et al., 1997; Gilkey et al., 1995). In this view, participants perceive the target location equally accurately regardless of which response type is used, but the two responses are calibrated differently. Perception may only be indexed by way of behavior, and all behavioral responses are subject to calibration in some form. Thus, response biases are impossible to rule out completely. Nevertheless, in our previous work (Philbeck et al., in press), we found that large pointing errors persisted even when participants pointed with full vision of both the pointer and the visual target. This allowed participants the opportunity to correct for any response biases arising in the process of pointer manipulation. We have also noticed, more informally, that when someone else aims the pointer accurately at a visual target, observers with unrestricted vision often believe that the pointer is so strongly misaligned that they express disbelief at the accuracy of the setting, even though they never touch the pointer. This argues strongly that these effects, at least in the visual domain, are indeed perceptual.
To provide insight into whether the pointing biases in Experiment 1 were due to distortions in the perceived exocentric target direction or to response biases elicited by the task of manipulating the pointer, in Experiment 2, we included a condition in which targets were specified purely by spatial language (e.g., “rotate the pointer to 60°”; see Gilkey et al., 1995 for a similar procedure). The logic of this approach is that errors in the spatial representation that presumably controls the response should be minimized by using a non-perceptual source. Accurate pointing to targets specified by spatial language would lend force to the argument that the exocentric pointing errors in Experiment 1 were indeed due to perceptual errors, rather than to response biases introduced at the output stage.
We investigated one other issue in Experiment 2. In our previous work (Philbeck et al., in press), we hypothesized that pointing biases such as those in Experiment 1 are related to the frame of reference underlying the task (i.e., the exocentric task of aligning two points, both of which are displaced from the body), rather than involvement of the motor system in setting the pointer. To test this in our previous work, we asked participants to point using a different set of instructions: although they used the same pointing apparatus (thus keeping the motor requirements the same), participants now adjusted the pointer to reproduce the target direction defined relative to their body. Thus, an accurate response for a target 90° to the participant’s right would be 90°. We found that pointing errors were greatly reduced under these instructions (Philbeck et al., in press), supporting the idea that manipulation of the pointer, per se, was not the primary cause of the pointing biases in the exocentric task of aligning the pointer with the target.
In Experiment 2 here, we again hypothesized that the biases observed in pointing to auditory targets in Experiment 1 were related to the exocentric nature of the task. Following our previous logic (Philbeck et al., in press), we altered the frame of reference underlying the pointing response by instructing participants to adjust the pointer to reproduce the target direction defined relative to their body. If biases introduced by the motor system are the primary source of the pointing errors in Experiment 1, pointing responses should closely mimic those of Experiment 1, because the motor requirements are virtually identical. If some aspect of the frame of reference underlying the response played a role in creating the errors, responses should more closely resemble the verbal reports of Experiment 1, which were presumably egocentric-referenced.
Fourteen people (8 women and 6 men, average age = 19.0 years, range = 18 to 20 years) participated in the study for course credit. None of them had participated in Experiment 1. All but one were right-handed. All participants gave their written consent and reported having normal or corrected vision in both eyes and normal hearing in both ears. This study was approved by the George Washington University ethics committee (Washington, DC, USA) and was performed in accordance with the ethical standards of the 1964 Declaration of Helsinki.
In Experiment 2, participants always used the pointing device to indicate the target azimuth. Experiment 2 involved the same 8 egocentric target azimuths as in Experiment 1. Here, however, target locations were specified to participants either by presenting noise burses by cassette player at one of the possible azimuths, or using spatial language—i.e., the experimenter verbally called out an egocentric azimuth, and participants attempted to set the pointer without vision to the specified azimuth. The auditory and verbal stimulus modalities were blocked and counterbalanced, with equal numbers of participants being exposed to auditory targets first versus verbal targets first. Each target azimuth was measured 6 times for each combination of stimulus modality and target azimuth, for a total of 96 trials. Trials were fully randomized within each stimulus modality block.
The experiment was conducted in the same laboratory as Experiment 1 and involved the same circular curtain, the same circular table surrounding the participant, and the same pointing device. The same auditory stimulus was used as in Experiment 1.
Participants were individually tested in one session lasting no more than 30 minutes. They were instructed to adjust the pointer so that it reproduced the direction of the target relative to their head. This was demonstrated by placing a visual target 90° to the right of the participant (i.e., relative to an axis passing vertically through the center of the chair), and showing the participant that an accurate pointing response would require setting the pointer to 90°. Additionally, a cartoon head was affixed to the polar scale directly beneath the pointer, centered on the pointer axis. This was done to further illustrate that participants should point to indicate the target direction relative to an axis centered on their head. As in Experiment 1, all participants pointed with the right hand. After every pointing response, the experimenter recorded the direction of the pointer to the nearest degree and then turned the pointer to 180° on the pointer scale.
In both auditory and verbal target conditions, participants remained blindfolded throughout the entire block of trials. For verbal target trials, the experimenter announced the target direction and participants immediately adjusted the pointer. In auditory target trials, as in Experiment 1, participants were instructed to turn their head and torso to point their nose at the target while the stimulus was presented. The experimenter terminated the sound stimulus when the participant’s head was aligned with the target. Immediately after the stimulus was terminated, participants initiated their pointing response. After every pointing response, the participant’s response was recorded, the pointer was returned to 180° on the pointer scale, and the target was positioned for the next trial. No error feedback was given.
Constant and variable errors were calculated in a similar way as in Experiment 1, with the exception that here, pointing errors were calculated relative to the egocentric (chair-centered) target azimuth. Absolute errors were calculated by taking the absolute value of the constant error for a given condition. Responses that indicated a target on the left side of the egocentric body midline were removed, unless they were very near (+/− 5°) 0° or 180°. This resulted in the removal of 4 data points, all in the auditory target condition. Three were from a single condition in one participant (auditory target at 160°), with 3 of the 6 responses being 203°, 190°, and 193°. These responses perhaps reflected left-right confusion, although it is not clear whether the confusion was in perceiving the target or in using the pointer. One other left-sided response was removed: a pointing response of 198° for a stimulus azimuth of 20°. A fifth data point was removed because it was a particularly conspicuous outlier (pointing response of 170° for an auditory target at 40 deg; other responses for that condition were 43, 46, 61, 95, and 99°). These removals amounted to 0.4% of the entire data set.
The mean constant errors across azimuths are shown for each stimulus modality in Figure 5. Although there was some overestimation of egocentric azimuth for auditory targets, there was little evidence of the undershooting in the rear quadrant shown in the exocentric pointing of Experiment 1. We performed a MANOVA on the constant errors, with Stimulus Modality and Target Azimuth included as within-subjects factors. This analysis showed no main effect of Stimulus Modality (F[1, 13] = 1.40, p = .258). Collapsing across stimulus modality, the mean constant error was −0.5°. There was a main effect of Azimuth (F[7, 91] = 4.70, MSe = 372.5, p = .0002), as well as a Stimulus Modality × Azimuth interaction (F[7, 91] = 2.45, MSe = 198.2, p = .024). To investigate this interaction, we conducted planned comparisons between responses in the auditory and verbal target conditions at each azimuth. These analyses showed that only the pointing responses for targets at 60° and 80° differed significantly between stimulus modality conditions (p’s = .029 and .0001, respectively).
A MANOVA on the absolute errors showed only a main effect of Stimulus Modality (F[1, 13] = 9.35, MSe = 1,238.6, p = .009), with mean absolute errors being 10.0° and 14.7° for targets specified verbally versus by audition, respectively. There was no main effect of Azimuth (F[7, 91] = 1.19, p = .318) and no Stimulus Modality × Azimuth interaction (F[7, 91] = 0.97, p = 0.455).
Table 3 gives the mean variable errors, calculated within-subjects across 6 repetitions per condition and then averaged over participants; errors are listed as a function of stimulus modality (auditory or verbal) and azimuth. A MANOVA on the variable errors showed a main effect of Stimulus Modality (F[1, 13] = 14.81, MSe = 1,142.3, p = .002). There was also a main effect of Azimuth (F[7, 91] = 2.61, MSe = 43.1, p = .017), which is evident in Table 3. There was no Stimulus Modality × Azimuth interaction (F[7, 91] = 0.40, p = .90).
Pointing responses for targets defined by spatial language and audition were quite similar in terms of constant errors across most of the tested target azimuths, with only two exceptions (60° and 80°). However, pointing to targets defined by spatial language was more accurate overall in terms of absolute errors, and more precise in terms of variable error, than pointing to auditory targets. Participants in Gilkey et al. (1995) pointed to verbally-defined targets on a spherical model with accuracy and precision, a finding the authors interpreted as evidence that the errors they observed in pointing to auditory targets using the same apparatus were indeed perceptual in nature and not due to response bias. This approach rests on the assumption that spatial representations created on the basis of spatial language are themselves unbiased. Past research has demonstrated some degree of functional equivalence in spatial representations derived from perceptual sources and spatial language (e.g., Avraamides, Loomis, Klatzky & Golledge, 2004; Loomis, Lippa, Klatzky & Golledge, 2002; see also Zwaan & Radvansky, 1998, for a review), but the extent to which this assumption holds true for representations of azimuth has not been fully explored. Pointing in the spatial-language trials of Experiment 2 was not completely error-free, and this raises the possibility that either some pointing-related response biases remain, or that representations of azimuth based on spatial language are somewhat biased. This being the case, using the spatial language pointing data to infer what a purely perception-based (bias-free) response may have been is inappropriate. Nevertheless, the relatively accurate and precise pointing to targets defined by spatial language in Experiment 2 argues that the exocentric pointing errors in Experiment 1 are not due primarily to pointing-specific response biases.
Figure 5 shows that the egocentric pointing responses to auditory targets in Experiment 2 are quite different from the exocentric pointing responses in Experiment 1 (for comparison, see auditory data in Figure 3, top panel). Most importantly, under identical listening conditions, the large, negative constant errors in Experiment 1’s exocentric pointing all but disappeared in Experiment 2, in which pointing was used to indicate the egocentric target azimuth. In fact, in many ways Experiment 2’s egocentric pointing data more closely resemble the verbal reports in Experiment 1 than they do Experiment 1’s exocentric pointing. Notably, although the Experiment 1 verbal reports exhibited somewhat larger (positive) constant errors for targets in the rear hemispace than did pointing responses in Experiment 2, neither type of egocentric response showed the large, negative constant errors in the exocentric pointing judgments of Experiment 1. The pattern of egocentric pointing errors for auditory targets in the current experiment also very closely resembles that of the egocentric pointing responses in our previous work involving visual targets (Philbeck et al., in press).
In principle, one could also collect verbal indications of exocentric directions to address these issues. If the frame of reference, rather than the motor requirements of the task, underlies the large response errors for targets in the rear hemispace, one might expect exocentric verbal responses to show a similar pattern as exocentric pointing responses. While this remains an interesting avenue for future research, fully exploring this prediction would be beyond the scope of the current paper. One complicating factor is that verbal indications of exocentric directions are themselves subject to systematic and random error from various sources, and additional experimentation would be required to determine the extent to which the exocentric verbal responses are indeed responsive to perceived exocentric directions. Experiments investigating verbal judgments of exocentric directions are currently underway in our laboratory.
Experiment 1 clearly shows that the pattern of exocentric pointing errors is quite similar for auditory and visual targets, and that these errors in both modalities co-exist with a markedly different pattern in verbal reports of egocentric target azimuth (particularly for targets in the rear hemispace). In this discussion, we will first address the issue of how to interpret the apparent discrepancy between exocentric and egocentric responses. We will then compare and contrast our auditory localization data with previous work, and discuss the implications of the similarity in visual and auditory exocentric pointing.
One possible explanation for the difference in exocentric pointing and verbal reports of egocentric target azimuths is that, although both are responsive to the perceived egocentric target azimuth, one response is more susceptible to biases at the output stage (Gogel, 1990). Presumably, pointing would be the most biased response of the two, because it shows the largest absolute errors. In this view, pointing responses should not be taken at face value as direct (relatively untransformed) indications of perceived azimuth. Mechanical constraints associated with the physical properties of the hand, for example, could introduce biases into pointing responses. Also, pointing responses require a complex set of perceptuomotor transformations (Soechting & Flanders, 1992), and biases could arise in these transformations as well. Systematic biases (and random error) associated with manipulating the pointer are undoubtedly present in our pointing data to some extent, but response biases do not seem to fully explain the differences between verbal and pointing responses. Experiment 2 showed that participants could use the pointer relatively accurately when reproducing the egocentric target azimuth, as well as when using it to indicate egocentric azimuths specified by spatial language.
A second possible explanation for the differences in exocentric pointing and verbal reports is that the two response types are both relatively untransformed reflections of perception, but the perceptions underlying each response type differ in their accuracy with respect to the physical layout. In this view, egocentric target azimuths were perceived with some accuracy, but the perception of exocentric alignment was subject to large perceptual errors. Taken together, our data more strongly support this possibility. In Experiment 1, exocentric pointing showed large errors, while verbal reports (presumably egocentric-referenced) were more accurate; in Experiment 2, an egocentric-referenced version of the pointing task yielded a pattern of errors quite similar to that of the verbal reports of Experiment 1. Our interpretation is that response type, per se, is not the source of the differences between exocentric pointing and verbal reports of azimuth. Instead, the difference lies in the relatively accurate perception of egocentric azimuth (given sufficient stimulus support) versus the more distorted perception of exocentric alignment (in the same cue environment). The two response types yield different patterns to the extent that they are differentially responsive to these perceptions. Future experimentation involving verbal indications of exocentric directions promises to further clarify the validity of this interpretation. In particular, exocentric judgments may yet be influenced by non-perceptual response biases that are independent of the motor requirements of the response mode. Examining verbal judgments of exocentric directions may help identify such biases.
Interestingly, strong biases related to exocentric reference frames are also apparent in haptic space. For example, haptic judgments of parallelity, in which rods are manually adjusted to be perceived parallel, are biased toward a reference frame centered on the hand (in our parlance, an exocentric reference frame; Kappers, 1999; Kappers et al., in press). Although similar hand-centered biases may play some role in our task, we have argued that the errors we observed here and elsewhere are not crucially dependent on using the hand when responding (Philbeck et al., in press). In this context, it is important to mention evidence that the functional goal of behavioral responses influences not only the reference frame used for controlling the responses, but also the brain mechanisms underlying them (for reviews, see Creem & Proffitt, 2001; Milner & Goodale, 1995). If errors in perceiving exocentric directions indeed do not crucially depend on involvement of the motor system, this suggests that the errors do not stem from distortions in cortical spatial representations specialized for controlling action-based responses.
Our work shows that large errors in indicating the exocentric alignment of auditory and visual targets apparently coexist with relatively accurate indications of egocentric target azimuths, given sufficient stimulus support. How may this apparently paradoxical result be explained? One possibility is that the egocentric target azimuths are in fact perceived relatively accurately, while simultaneously the exocentric directions of the target with respect to the pointer are perceived inaccurately. This could occur if, for example, there is a functional dissociation in the neural representations of egocentric locations and exocentric directions. Another way this could occur is if the phenomenal geometries underlying visual and auditory space are not Euclidean. Several researchers have concluded that visual space, at least, is indeed non-Euclidean (e.g., Cuijpers et al., 2000, 2001; Foley, 1972; Koenderink et al., 2000, 2002, 2003). Foley (1965), for instance, suggested that the visual angle between points in a horizontal plane is over-perceived. This would result in a non-Euclidean phenomenal geometry, and could explain the undershooting we observed when participants attempted to align the pointer with remembered target locations: if the angular separation between pointer and target is over-perceived, the pointing angle required to aim at the target would not need to be as big, and thus participants should produce pointer angles that are too small.
Although a variety of studies have investigated auditory localization (e.g., Brungart et al., 1999; Carlile et al., 1997; Gilkey et al., 1995; Langendijk & Bronkhorst, 2002; Thurlow & Runge, 1967 and Wightman & Kistler, 1989, to name only a few), perhaps the most similar to ours in terms of the available auditory information is that of Makous and Middlebrooks (1990). In this study, as in ours, broad-band stimuli were used, participants were allowed to move their head during stimulus presentation in one condition, and responses were unconstrained (i.e., not limited to a fixed set of possible responses; see Perrett & Noble, 1995). Unlike our study, head pointing was used to measure perceived location (following an extensive training period involving 10–20 sessions with error feedback), and the study was conducted in an anechoic environment. The head pointing technique is likely to be egocentric-referenced, so these data are most comparable to the verbal reports in our Experiment 1 and the egocentric pointing responses in our Experiment 2. Extensive training with feedback no doubt increased the response precision in this study relative to ours, and the constellation of auditory stimulus cues was somewhat different, owing to the anechoic environment and other differences in the specific sound stimuli and method of presentation. Thus, direct comparison between Makous and Middlebrooks (1990) and our study is probably inappropriate. Nevertheless, our results are consistent with theirs along several lines. They found that directional biases for frontal stimulus locations were quite small, whereas biases for rearward targets were somewhat larger, averaging 4.6 deg; these biases were such that rearward targets were localized to be farther toward the rear than the physical target location. Our own auditory data show the same pattern, although the absolute magnitude of the verbal errors was slightly larger (averaging 6.6 deg for rearward targets). Makous and Middlebrooks (1990) also found that response variability was greater for rearward locations than frontal locations; this held true on our data as well (see Table 2). Thus, our results replicated the essential features of foregoing research involving broad-band stimuli and closed-loop, egocentric-referenced responses.
Do the errors in exocentric pointing for auditory and visual targets arise independently in each modality, or do they imply some kind of cross-modal linkage? Clearly, at relatively early cortical stages, information processing is functionally and anatomically segregated by stimulus modality. Even at somewhat later stages of processing (e.g., in posterior parietal cortex), multiple spatial maps exist that are segregated according to different body-centered frames of reference (Andersen, Snyder, Bradley, & Xing, 1997). If the errors are due to processing at these stages, they could be independent to some degree.
Another possibility is that the exocentric pointing errors appear for both auditory and visual targets because they arise at a still later stage that receives input from both modalities. Several varieties of this possibility may be distinguished. (1) Information from non-visual sensory modalities could be converted into a visual format in the process of encoding, so that stored three-dimensional spatial information bears the properties of visual space even if vision is not the original source modality (Auerbach & Sperling, 1974; Platt & Warren, 1972; Shelton & McNamara, 2001, 2004; Simpson, 1972; Warren, 1970). (2) Information from multiple modalities could be integrated into a unified spatial representation, in which individual locations remain labeled according to their source modality (Yamamoto & Shelton, 2005). (3) Lastly, information from multiple modalities could be integrated into a unified spatial representation, but once this integrative process is complete, information about the source modality is discarded (Auvray, Gallace, Tan, & Spence, 2007). Thus, subsequent performance based on this amodal representation or “spatial image” becomes independent of input modality (Loomis, Klatzky, Avraamides, Lippa, & Golledge, 2007; Loomis et al., 2002). These three possibilities are not necessarily mutually exclusive, and they may well exist in conjunction with earlier, modality-specific representations. Distinguishing among these possibilities presents significant challenges for future research, but the remarkable similarity in the perception of visual vs. auditory exocentric directions provides a potential source of leverage for studying these issues.
This work was supported in part by NIH Grant RO1 NS052137 to JP.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
PsycINFO classification codes: 2323, 2326
1We calculated errors somewhat differently in the current article than in our previous work. The difference involves where the frame of reference is centered when calculating errors. In the current paper, errors were calculated relative to a frame of reference centered on the pointer axis, while in our previous work, the frame of reference was centered on the axis passing vertically through the participant’s head. As we will see, mean signed pointing errors tended to be somewhat more strongly negative when calculated relative to the participant’s head when compared with errors calculated relative to the pointer.