|Home | About | Journals | Submit | Contact Us | Français|
Previous studies of heading perception suggest that human observers employ spatiotemporal pooling to accommodate noise in optic flow stimuli. Here, we investigated how spatial and temporal integration mechanisms are used for judgments of heading through a psychophysical experiment involving three different types of noise. Furthermore, we developed two ideal observer models to study the components of the spatial information used by observers when performing the heading task. In the psychophysical experiment, we applied three types of direction noise to optic flow stimuli to differentiate the involvement of spatial and temporal integration mechanisms. The results indicate that temporal integration mechanisms play a role in heading perception, though their contribution is weaker than that of the spatial integration mechanisms. To elucidate how observers process spatial information to extract heading from a noisy optic flow field, we compared psychophysical performance in response to random-walk direction noise with that of two ideal observer models (IOMs). One model relied on 2D screen-projected flow information (2D-IOM), while the other used environmental, i.e. 3D, flow information (3D-IOM). The results suggest that human observers compensate for the loss of information during the 2D retinal projection of the visual scene for modest amounts of noise. This suggests the likelihood of a 3D reconstruction during heading perception, which breaks down under extreme levels of noise.
When an observer travels on a straight path, changes in the perceived visual environment are projected onto the retina and form a two-dimensional (2D) radial pattern, referred to as ‘optic flow’ (Gibson, 1950), with a focus of expansion (FOE) in the direction of locomotion, or heading (Gibson, 1950; Gibson, 1979; Gibson et al., 1955). Since Gibson’s seminal study, psychophysical, physiological and theoretical studies have demonstrated that the pattern of optic flow plays an important role in computing heading (see Andersen and Saidpour, 2002; Britten and Van Wezel, 2002; Britten, 2008; Crowell and Banks, 1993; Grigo and Lappe, 1999; Koenderink and van Doorn, 1987; Li et al., 2009; Longuet-Higgins and Prazdny, 1980; Royden, 1997; Royden et al., 2006; Vaina, 1998; Warren et al., 1988; Warren et al., 1991). Psychophysical studies and theoretical models have shown that heading perception is robust under various conditions, including retinal eccentricity (Crowell and Banks, 1993), eye movements with small rotation rates (Lappe et al., 1999; Royden et al., 1992; Royden et al., 1994; Warren et al., 1991) and high levels of noise (Royden, 1997; van den Berg, 1992; Warren et al., 1991).
Previous work has suggested that spatiotemporal integration contributes to the detection of heading. In a psychophysical study, Warren and colleagues (1991) showed that human observers could accurately perceive heading in the presence of uniformly distributed 2D direction noise. Although heading discrimination thresholds increased with direction noise, their study clearly illustrated that the visual system can tolerate a great deal of noise in the velocity field as long as the global structure of the optic flow pattern is preserved. Royden and Vaina (2004) examined a stroke patient who was severely and permanently impaired on local 2D direction discrimination, while his performance on a straight-trajectory heading task was normal. From these results, the authors conjectured that when the observer is moving in a straight line, accurate heading perception does not require the precise estimation of motion directions. These two studies suggest that the spatial integration of local motion signals by the human visual system helps compensate for noise during heading perception. Warren and colleagues (1991) also suggested that the spatial integration of information in two successive velocity fields should be sufficient for the perception of translational heading, implying that extensive temporal integration is not necessary. However, if available, would human observers benefit from continued integration over time, as previously shown for the perception of 2D direction of motion (Watamaniuk et al., 1989)?
In this study we first characterized psychophysically the extent to which the human visual system utilizes spatial and temporal integration mechanisms in the perception of straight-trajectory heading. Second, we developed two ideal observer models to further investigate the properties of the spatial integration mechanism: We asked whether, in computing heading, the human visual system relies on 2D optic flow or on a three-dimensional (3D) reconstruction of the motion in the scene.
In the psychophysical experiment stimuli simulated forward motion through a cloud of dots whose 3D trajectories were randomly perturbed between successive frames. We applied three types of external noise to the heading stimulus in order to measure the relative contributions of different integration mechanisms. Although both spatial and temporal integration may be used in all experiments to reduce internal noise, consistent performance changes among external noise types indicated differences in the use of integration mechanisms to compensate for the external stimulus noise. In the first experimental condition, we used ‘random-walk’ perturbations in dots’ 3D paths (Warren et al., 1991; Watamaniuk et al., 1989; Williams and Sekuler, 1984) to provide a baseline measure of performance. Since this noise was uncorrelated both in space and time, both spatial and temporal integration mechanisms would be able to successfully reduce stimulus noise. The second experimental condition used a 3D fixed-random-trajectory noise, which was conceptually similar to the ‘fixed-trajectory’ stimuli employed by Watamaniuk and colleagues (1989) and Williams and Sekuler (1984) within 2D direction discrimination tasks. Here the 3D velocity vector of each dot was perturbed and subsequently held constant across frames. These perturbations were fully correlated in time locally, such that temporal integration could not be used to reduce external, stimulus noise. In the third experimental condition, we perturbed the global heading location between successive frames, allowing temporal -but not spatial- integration to reduce external noise and aid performance. Through these experimental manipulations of the direction information available in the task we contrast the effects of spatial and temporal integration mechanisms with respect to a baseline in which both mechanisms were used. Subjects had relatively worse performance for the case where solely temporal integration was used to reduce noise, suggesting that although a temporal integration mechanism was involved, it played a lesser role than the spatial integration mechanisms in processing optic flow.
To determine how the human visual system utilizes spatial information during heading perception, we developed two Ideal Observer Models (IOMs). In the first IOM, we used the projected 2D dot motions (optic flow), while the second IOM had available the full 3D motion information of each dot. Through comparison of human psychophysical performance (from the random-walk experimental condition) to these two models, we showed that observers became more efficient at estimating heading as noise increased (up to 40–60° of direction noise), and that the efficiency changes partly matched the amount of information lost during the projection from 3D to 2D. This suggests that observers’ heading estimates under noisy conditions were based not on the (2D) optic flow per se but rather on motion estimates obtained from reconstructing the 3D visual environment through which the observer translates.
Part of the psychophysical work was presented at the Vision Science Society 2005 Annual Meeting (Sikoglu and Vaina, 2005) and part of the work on ideal observer models was presented at the Vision Science Society 2006 Annual Meeting (Sikoglu et al., 2006).
Stimuli consisted of random dot kinematograms (RDK) generated on an Apple Macintosh G4 computer and displayed on a 17” Apple CRT monitor. RDK motion sequences were displayed at 75Hz in a calibrated gray-scale mode at a screen resolution of 832 × 624 pixels. Each RDK was displayed in an imaginary square aperture subtending 44.5° × 44.5° at a viewing distance of 30 cm. The dots were distributed in a virtual trapezoidal volume whose bases were located 400 cm and 1500 cm from the observer. Dots were 2 × 2 pixels (4 × 4 arcmin) and were placed with a density of 2 dots/deg2. The motion of the dots within this volume simulated observer’s self-motion along a straight line trajectory at a speed of 100 cm/sec. Dots moving outside the trapezoidal volume were randomly assigned to new locations such that the density of dots inside the 3D volume was held constant. In each trial, the direction of self-motion, defined by the FOE, was randomized along an imaginary horizontal line extending throughout the center of the display, so that the FOE could be located at a range of positions within ± 22.25° of the screen center. The RDK stimulus was presented for 480 msec (12 frames), with each frame updated every 3 screen refreshes resulting in effective frame duration of 40 msec. At the end of the motion a new static random dot pattern, with the same spatial statistics, was displayed together with a vertical target line (8.96° long) that intersected the horizontal midline of the display. In all experimental conditions, the psychophysical variable of interest was the distance between the target and FOE, which provided a measure of heading accuracy, and is referred to as target offset. Target offset levels ranged between 0.0159° and 14.83°. In each stimulus, the dots and target were white (79.55 cd/m2) and displayed against a gray background (10.22 cd/m2).
The 3D trajectories of each dot were randomly perturbed between successive frames in three different experimental conditions: 1- Direction noise from constrained random-walk (Fig. 1a); 2- Direction noise from constrained fixed-random-trajectory (Fig. 1b) and 3- Random heading direction noise (Fig. 1c).
In experimental condition 1 (random-walk), the solid angle (θ) of each dot’s 3D displacement vector was randomly selected from a normal probability distribution, with a specified standard deviation (σnoise), centered around the dot’s unperturbed motion (Fig. 2a). The 3D displacement vector’s magnitude was constant between each pair of frames. A random direction perturbation was applied to each dot in each frame independent of its perturbation in the previous frame (random-walk noise) (Fig. 1a), such that the direction noise was spatially uncorrelated across dots within each frame and temporally uncorrelated across frames. Since this noise was spatially and temporally uncorrelated, motion vectors could be averaged over space and time, thus reducing the effect of external stimulus noise. This 3D noise resulted in perturbations of the local 2D displacement vectors as illustrated in Fig. 2b and Fig. 2c that were well characterized by an offset exponential. In experimental condition 2 (fixed-random-trajectory), the perturbed trajectory of each dot was held constant after the first stimulus frame regardless of its change in position within the display (fixed-random-trajectory noise) (Fig. 1b), creating noise that was spatially uncorrelated across dots within each frame, but temporally correlated across frames since the same deviation was applied to each dot throughout its lifetime. In experimental condition 3 (random-heading), direction noise was applied by shifting the direction of heading, characterized by the focus of expansion (FOE), in each stimulus frame (random-heading noise) (Fig. 1c). The amount of FOE-shift was randomly selected from a normal distribution, with a specified standard deviation (σFOE-shift), centered around the actual heading. All dots moved coherently according to the heading direction of the current frame; thus creating a frame-wise coherent global motion percept. Random-heading direction noise was spatially correlated within each frame (i.e. each dot had the same 3D perturbation) but temporally uncorrelated across frames. The spatial correlation between direction vector perturbations limited the use of spatial integration for the accurate estimation of dot trajectories and heading location. Throughout the duration of the stimulus, shifting the FOE location had the effect of introducing local motion noise with respect to the unperturbed heading angle, thus providing a common measure of direction noise across all three experimental conditions. In the subsequent analysis of the random-heading results, we simulated shifted FOE locations within the heading stimulus to determine the relationship between the global perturbation and the local direction noise.
Prior to the start of an experimental session, observers adapted for 5 minutes to the background luminance of the monitor display in a quiet, darkened room. Each trial was preceded by an auditory cue, immediately followed by the RDK stimulus (480 msec) and then by the presentation of a static random dot pattern containing the target line. During the psychophysical task, observers were required to fixate a small central cross (0.75° × 0.75°). Stimuli were presented binocularly in a two alternative forced choice (2AFC) paradigm with no feedback and the observers’ task was to determine whether their heading was to the left or to the right of the target line. Responses were entered by pressing a predetermined button on the computer keyboard.
Observers’ target offset thresholds (79% correct) were estimated as the average over the last six reversals of the 3-down/1-up constant step size portion of an adaptive staircase procedure (Vaina et al., 2003). In all experimental conditions observers’ performance was reported as the mean threshold averaged across at least three staircases. Two additional staircase thresholds were collected for each test condition containing a measured threshold greater than two standard deviations from the mean.
For experimental conditions 1 and 2, target offset thresholds were obtained for σnoise of 6.32°, 15.09°, 31.58°, 37.54°, 43.51°, 56.84°, 69.47°, 82.11°, 94.74°, 107.37° and 120°. For experimental condition 3, target offset thresholds were obtained for σFOE-shift of 3°, 6°, 9°, 12° and 15°.
Five observers (4 females, 1 male, mean age = 23.2 years, SD = ±2.17) participated in three experimental tasks. All had normal or corrected-to-normal vision. One observer, ES, was an experienced psychophysical observer and an author. The other four observers were naïve and unaware of the purpose of the experiments. All participants gave written informed consent before the start of the experimental sessions in accordance with Boston University’s Institutional Review Board Committee on research involving human subjects.
To determine observers’ ability to perform the heading discrimination task, we first conducted a screening experiment using the test described in sections 2.1 and 2.2 and replicated from the work of Royden and Vaina (2004). Direction perturbation was not used during this screening period. Observers practiced the task for one hour. At the end of the hour each observer’s target offset threshold was approximately 2°, consistent with the performance of normal observers previously reported for this task (Vaina and Soloviev, 2004; Warren et al., 1988).
Figure 3 shows discrimination thresholds for heading direction (mean target offset threshold ± SE) for each observer plotted against standard deviation for the random-walk noise distribution (σnoise). Thresholds were averaged across the five observers (± SE) are indicated by the shaded region. In order to illustrate the effect of an increase in random-walk direction noise on the accuracy of heading judgments, a repeated measures of ANOVA was performed on the observers’ data. There was a significant effect of noise on heading perception across observers (F(1,49)=217.18, p<0.0001). Due to the set-up of the experiment, the maximum measurable target offset was 12°, therefore in the statistical analysis we considered only those noise ranges whose thresholds were less than 12° for all observers. For the five observers tested, this corresponded to maximum σnoise of 107.37° and at this level the mean target offset threshold across observers was approximately 8°.
In a similar heading stimulus, Warren et al. (1991) applies random-walk noise with uniform distributions characterized by direction bandwidths of 45°, 90°, and 135° to 2D flow fields. In order to compare our results with theirs, we determined the best-fit distribution of projected 2D flow vectors perturbed with random-walk noise. A least squares fit using uniform distributions with different bandwidth values was used to determine the bandwidth value with highest correlation and lowest Kullback-Leibler (KL) distance values. For σnoise of 37.54°, the bandwidth for the distribution of 2D perturbations was approximately 140°. At this level the average threshold was roughly 4° for the random-walk condition. This is similar to the performance reported by Warren et al. (1991), i.e. average threshold of roughly 3.5° for bandwidth of 135°.
Since random-walk direction noise was chosen independently for each dot and for each frame, the noise vectors present in each stimulus were uncorrelated in space and time. Accurate estimates of heading direction could utilize spatiotemporal integration of local dot trajectories, providing a performance baseline. Using fixed-random-trajectory direction noise, we investigated the case where the direction noise applied in 3D was spatially uncorrelated but temporally correlated. In this case, the noise applied to each dot was constant for the duration of each trial. Since the noise was temporally correlated, the temporal integration mechanisms should not be able to reduce stimulus noise. Thus, the purpose of using fixed-random-trajectory direction noise was to determine the effect of impairing the temporal integration mechanisms’ ability to reduce the external noise, thus allowing isolation of spatial integration mechanisms.
Figure 4 shows discrimination thresholds for heading direction (mean target offset thresholds ± SE) for each observer plotted against standard deviation for the fixed-random-trajectory noise distribution (σnoise). Heading discrimination across observers significantly worsened with increasing fixed-random-trajectory noise as indicated by a repeated measures of ANOVA (F(1,34)=92.51, p<0.0001). As in the random-walk condition, we considered only those values of σnoise, which resulted in measurable thresholds for all observers. In the case of fixed-random-trajectory noise, this corresponded to a σnoise of 69.47° (compared to 107.37° in random-walk noise) and a mean target offset threshold of 8.5°, for the observers tested. We attribute this increased sensitivity to direction perturbations to the inability of temporal mechanisms to reduce stimulus noise.
To compare the thresholds from random-walk and fixed-trajectory direction noise, a generalized linear model (GLM) was fit to the thresholds averaged across subjects,
where η was a binary classifier indicating the type of direction noise, (i.e. for random-walk (experimental condition 1) η = 1 and for fixed-random-trajectory (experimental condition 2) η = 0), A was the offset for the threshold versus σnoise fit, B was the slope term of the threshold versus σnoise fit and C was the interaction term between the type and amount of direction noise. The interaction term denoted whether the slopes were the same or different for two types of direction noise conditions. If the interaction term was not significant, then the slopes for different types of direction noise were the same. Note that the slope of the fit for the random-walk condition alone corresponded to the sum of B and C, while B gave the slope for the fixed-random-trajectory condition.
Comparison of the results from random-walk and fixed-random-trajectory direction noise conditions revealed that heading perception under the effect of fixed-random-trajectory noise was significantly worse than heading perception under the effect of random-walk noise (GLM analysis’ (df=92) interaction term: t=6.8526, p<0.0001). In a 2D direction discrimination task, Watamaniuk et al. (1989) performed a similar comparison and found that there was no significant difference between thresholds obtained when the direction noise resulted from random-walk versus fixed-random-trajectory. The difference between our results and those of Watamaniuk et al. (1989) may indicate a difference in the nature of the integration mechanisms employed in heading perception and 2D direction discrimination. The results obtained with fixed-random-trajectory noise suggest that observers temporally integrate across frames and can make use of the acquired temporal information in a heading discrimination task, but this information is not used in 2D direction discrimination.
In order to address quantitatively the properties of the temporal integration employed by the observers, we computed the cumulative direction vector for each dot from the first to last frame in the random-walk condition and then calculated the average noise (deviation of the 12-frame motion vector from the no-noise vector) over a 12-frame window for random-walk stimuli, which resulted in lower effective noise levels (σnoise-effective). For example, a σnoise of 56.84° with random-walk noise corresponded to a σnoise-effective of approximately 17° for a 12-frame window (480 msec) since over time the dot regressed towards its unperturbed motion vector. This allowed us to understand what performance on the random-walk task should be if observers simply averaged direction vectors over the entire stimulus.
The effective noise calculation results in a decrease in the cumulative noise with time by a factor of , where (n-1) is the number of frame-pairs. We scaled the noise values for the random-walk experimental condition to simulate temporal integration in an n-frame time window. For each time window, we performed a GLM fit on the average target offset thresholds from the scaled random-walk and fixed-random-trajectory direction noise data. In the case of fixed-random-trajectory noise, the magnitudes of the noise vectors did not change with the duration of the integration, since the perturbed directions for each dot remained identical through the duration of the stimulus. The interaction terms of the GLM fits were not significant (p>0.05) for durations of 4, 5 and 6 frames, suggesting that the differences between thresholds in the two conditions were not statistically different (GLM analysis’ (df=16) interaction term for 4 frames: t=−0.9468, p=0.3578; 5 frames: t=0.6322, p=0.5362; 6 frames: t=1.7960, p=0.0914). The Pearson correlation (R2) values obtained from the pooled data fits provide a measurement of how closely the threshold values from both noise conditions clustered around the fitted line. Therefore large values of R2 indicate that thresholds showed a high degree of similarity between the random-walk and fixed-random-trajectory conditions’ datasets. The best fit was obtained for a stimulus duration of 5 frames (R2=0.91), meaning that thresholds for random-walk were indistinguishable from thresholds for fixed-random-trajectory conditions when scaled noise values were based on less than half the actual stimulus duration (200 msec).
Figure 5 illustrates thresholds for both conditions as a function of σnoise-effective in the case of the 5-frame window (200 msec), for which the performance under the random-walk and fixed-random-trajectory noise conditions were most overlapping. The implication of this result is that observers were not performing the task by averaging over a full 12-frame window of the stimulus. Instead, they appear to use temporal integration, which can be explained as a dot-by-dot reduction in noise (i.e. a local process) averaged over a 5-frame window. This time window may reflect a limitation in the duration available to the temporal integration mechanisms, an initial latency before temporal integration began, or a combination of both. For the straight-trajectory heading, it has been reported that observers can compute translational heading by employing spatial integration over two frames (Warren et al., 1991). Here, we show (in experimental conditions 1 and 2) that, in addition to the spatial integration, temporal integration has a beneficial role in the perception of straight-trajectory heading. Our result is similar to Watamaniuk et al.’s (1989) findings that temporal integration leads to an improvement on direction discrimination in 2D RDK stimuli, especially in the presence of motion noise.
In experimental conditions 1 and 2, the noise in the heading stimuli resulted from local perturbations of direction. In experimental condition 3 (random-heading), we investigated the effect of a global direction perturbation on the accuracy of heading perception. The purpose of the random-heading condition was to determine the specific contributions of temporal integration mechanisms to the perception of heading direction, by reducing the involvement of spatial integration mechanisms.
Figure 6 shows discrimination thresholds for heading direction (mean target offset threshold ± SE) for each observer plotted against standard deviation for the random-heading noise distribution (σFOE-shift). Across observers, the heading discrimination accuracy dropped significantly with increasing noise as shown by a repeated measures of ANOVA (F(1,19)=470.82, p<0.0001). As in the other two locally applied direction noise types, we considered only the levels of direction noise that resulted in thresholds less than 12°, the maximum measurable target offset. In the case of random-heading direction noise (experimental condition 3) this corresponded to σFOE-shift of 12° with a target offset threshold of 12° across all observers.
The shifts in heading angle due to the global noise introduced spatially structured perturbations in the local dot movements. To compare the local effects of the global noise condition with the random-walk and fixed-random-trajectory noise conditions, we derived a common metric by studying the 2D projected local noise levels for all direction noise conditions. We performed simulations for specific 3D noise distributions, i.e. σnoise for random-walk and fixed-random-trajectory direction noise and σFOE-shift for random-heading direction noise. Because the flow vector projections varied with eccentricity, we simulated heading for eccentricities of 0° (fixation) and 22.5° (edge of our aperture). In all noise conditions, σprojected-noise decreased with FOE eccentricity, so the 2D noise distributions corresponding to a central FOE and the most eccentric FOE constituted the upper and lower bounds for the resultant noise distributions (σnoise). We projected the 3D perturbed translation vectors onto the 2D plane and calculated the difference between the perturbed and unperturbed polar angles. Then we fit Gaussian curves to the projected noise distributions to quantify the resulting spread (σprojected-noise), for each FOE location and for each level of 3D direction noise. For all curve fits, the minimum R2 value was 0.87 and the maximum KL distance was 7.61. Since we used translation vectors between frames, there was no difference in σprojected-noise between random-walk and fixed-random-trajectory noise conditions. Figures 7a and 7b illustrate the effective range of the projected 2D direction noise (σprojected-noise) corresponding to local perturbations (σnoise) and global FOE perturbations (σFOE-shift), respectively.
Globally applied random-heading noise had a greater effect on heading perception than either random-walk or fixed-random-trajectory noise (as illustrated in Fig. 7c) even for the least effective situation (when FOE was at the periphery – Fig. 7a and Fig. 7b). For example, a random-heading noise (σFOE-shift) of 9° corresponded to local motion perturbations (σprojected-noise) of 14.50° and 6.51° when the heading angle was at the center or at the edge of the screen respectively, which leads to a σprojected-noise of about 10.50°. A similar level of σprojected-noise (11.39°) was obtained for σnoise of 6.32°. At this level, the average threshold across observers was approximately 3.04° for random-walk noise and 4.05° for fixed-random-trajectory noise, while the corresponding threshold for random-heading noise (i.e., for a σFOE-shift of 9°) was 8.23° (Fig. 7c). As discussed previously, in random-heading direction noise, spatial integration mechanisms alone cannot be used to reduce noise. Thus, when comparing on the basis of projected 2D noise distributions, the increased thresholds for the random-heading noise condition suggest that, for perception of heading, the human visual system is more sensitive to noise affecting spatial integration mechanisms than to temporal integration mechanisms.
Using the equivalent local noise values for all the experimental conditions, we showed a progressive drop in heading accuracy from random-walk noise (spatiotemporal integration) to fixed-random-trajectory noise (spatial integration) to random-heading direction noise (temporal integration). The difference between random-walk and fixed-random-trajectory direction noises was fully accounted for by a temporal windowing of the random-walk noise, limiting the temporal integration of motion vectors to a 5-frame (200 msec) window (Fig. 5). Furthermore, the dramatic drop in performance on random-heading noise compared to random-walk and fixed-random-trajectory noise conditions indicates that when spatial integration was not available to improve task performance, subjects were significantly impaired on the task, demonstrating the importance of spatial over temporal integration mechanisms.
The results reported here, together with previous findings (Andersen and Saidpour, 2002; Royden and Vaina, 2004; van den Berg, 1992; Warren et al., 1991), demonstrate the importance of spatiotemporal integration mechanisms for the accurate judgment of straight-trajectory heading. The involvement of spatial integration mechanisms in heading discrimination is supported by computational models (Beck et al., 2007; Longuet-Higgins and Prazdny, 1980; Perrone and Stone, 1998; Royden, 1997; Royden, 2002). There is compelling evidence for the fact that spatiotemporal integration mechanisms benefit from 3D structural information (Beusmans, 1998; Li et al., 2009; van den Berg, 1992; van den Berg and Brenner, 1994a; van den Berg and Brenner, 1994b). Several studies have suggested that heading perception is more robust to noise when depth information is provided (van den Berg, 1992; van den Berg and Brenner, 1994a; van den Berg and Brenner, 1994b). However, it is yet to be determined whether the human visual system compensates for information lost during the projection from 3D environmental coordinates to 2D retinal coordinates. Although 3D spatial reconstruction has been investigated in computer vision systems (Avidan and Shashua, 2000), it has not been specifically studied within the context of the human visual system. Here, we examined the extent to which observers are able to reconstruct 3D information to improve the accuracy of heading judgments.
To compare human performance to the best possible performance under the random-walk direction noise condition, we developed two ideal observer models (IOM). Our aim was to investigate the contribution of 3D reconstruction to heading perception. We contrasted the psychophysical results from experimental condition 1 to a 2D-IOM (see section 3.1), which had full knowledge of the projected dot locations in 2D (screen coordinates) and to a 3D-IOM (see section 3.1), which had full knowledge of the locations of the dots in the simulated 3D environment (environmental coordinates).
IOMs provide a way to measure the visual system’s efficiency by comparing the visual system to an ideal statistical decision making process. They have been employed previously to make predictions regarding the underlying computational mechanisms using the measured efficiencies (Crowell and Banks, 1993; Crowell and Banks, 1996; Watamaniuk, 1993). For example, Crowell and Banks (1996) developed an ideal observer model to compare human and ideal observer performances in a heading discrimination task with different flow patterns presented at different retina locations. They found that stimulus information varied with FOE location (the most informative regions were directly above and below the FOE), but that the efficiency with which this information was extracted was “reasonably constant for different flow patterns and quite constant for different retinal eccentricities” (Crowell and Banks, 1996). Here we used IOMs to compare the information content available in 2D and 3D coordinate systems to measure the efficiency of the human visual system in determining heading relative to each coordinate frame.
As in the psychophysical task, the IOM was required to make a decision of whether the FOE for a given trial was to the left or to the right of the target. The decision was made using a Bayesian statistical approach (Equation 1).
The posterior probability that the heading direction was to the left or right of the target given the observed stimulus (X), P(Θleft/right | X), was formed from the product of the conditional probability of each dot at every frame arising from a left or right heading angle, P(xi | Θleft/right), multiplied by the prior probability of left or right heading angle, P(Θleft/right), divided by the marginal probability of observing the particular stimulus, P(X). For both IOMs, we calculated the probability of left or right heading angles separately and used their ratios to make the decision of the IOM. Since the probability of left and right offsets were each 0.5 and the probability of the dot space was the same for both the left and right heading cases, the Bayesian statistics were reduced to the following equation:
If the ratio was greater than 1.0, the IOM decided the heading angle was to the left of the target and if the ratio was less than 1.0, the IOM decided the heading angle was to the right of the target (Equation 2). Both IOMs were assumed to know the location of the target and the target offset distance so they could compare only two possible FOE locations, (i.e. target ± target offset).
In order to calculate P(xi| Θleft) and P(xi| Θright), we characterized the distributions of differences between the perturbed and the unperturbed polar angles.
For the 2D-IOM, the dots, whose locations were perturbed in 3D, were projected onto the screen, so that each dot could be defined solely by its x and y components. We used the perturbed translation vectors from the projected dots to calculate the polar angles defining each dot’s position at every frame (Equation 3).
where ϕperturbed was the perturbed polar angle, was the translation vector in the x direction of the projection and was the translation vector in the y direction of the projection.
To account for the loss of information within the visual system we added simulated internal noise to each of the perturbed direction vectors. The goal of factoring this internal noise into our model was to make the IOM more physiologically plausible by taking into consideration noise introduced at earlier visual processing stages. The amount of internal noise added to the direction vectors was drawn from a normal distribution (spread of 5°). This spread value was chosen based on the error of human observers when estimating the direction of a single dot as used in (Calabro and Vaina, 2006; Crowell and Banks, 1996).
The unperturbed polar angles were derived from the translation vectors with no noise applied, thus determining where the dot should have moved as a function of its location in the previous frame for each possible heading angle, i.e. right-heading = target + target offset and left-heading = target − target offset.
We characterized the probability distributions of direction error, i.e. differences between the perturbed and unperturbed polar angles, as the sum of two exponentials plus an offset (Eq. 4). For large amounts of noise, the second exponential was needed to capture noise around 180°. The offset term was required to characterize the plateau of the distributions, which for large noise ranges (σnoise>35°) do not approach zero.
where Δϕ was the variable of interest, the difference between the perturbed and unperturbed polar angles, and P(xi| Θleft/right) was the conditional probability given a heading to the left or right. The free parameters (A1, A2, τ and O) used to define the conditional probability were estimated as a function of the dots’ angular positions with respect to the possible FOE locations, level of direction noise, and eccentricity of the dots with respect to the possible heading locations using least squares fits to sample noise distributions (Equation 4). From the conditional probability density function, we determined the probabilities for each dot given an FOE at each of two possible heading locations, i.e. P(xi| Θleft) and P(xi| Θright).
In order to simulate the data collection, the resulting probabilities were used to calculate the likelihood ratio of the heading occurring to the left versus the right of the target. We repeated the procedure at discrete target offset levels (ranging from 0.001 to 10 depending on the direction noise range being applied), 50 times each, and determined the proportion correct at each level. We then fit a psychometric (Weibull) function to the proportion correct versus target offset values to determine the 79%-correct target offset threshold for each level of σnoise used in the psychophysical experiment.
For the IOM’s decision we again calculated P(xi| Θleft) and P(xi| Θright). In order to find these probabilities, we characterized the distributions of differences between the perturbed heading location for every dot at each frame and the two possible unperturbed heading locations.
Frame-to-frame perturbed dot trajectories for each dot were used to determine the perturbed heading angles (i.e. location of FOE) (Equation 5).
where θxz perturbed was the perturbed heading angle in xz plane, was the perturbed translation vector in the x direction and was the perturbed translation vector in the z direction. Precise 3D locations of the dots were not accessible to human observers in the psychophysical experimental conditions, but here we were seeking to find the absolute limit on performance in a noisy environment.
Similar to the 2D-IOM, here too we were interested in accounting for the internal losses within the visual system. To do this we applied Gaussian noise (spread of 5°) to the perturbed 2D translation vectors. From these 2D motion vectors with simulated internal noise the 3D vectors were reconstructed using the known depths of the dots. Since the depth of each dot was specified, there was a unique 2D to 3D mapping for all motion vectors. Both 2D and 3D models therefore had the same amount of simulated internal noise due to the fact that this internal noise was applied to the 2D motion vectors for both models.
In the 3D-space the distribution of direction differences between the perturbed and unperturbed translation vectors for each dot in successive frames was defined as a Gaussian distribution. The projection of the local Gaussian distributions onto the xz plane was compared to the distribution of unperturbed heading locations. We characterized this difference between the perturbed heading location and possible unperturbed heading locations as the sum of two exponentials (Eq. 6).
where Δθ was the variable of interest, i.e. the difference between unperturbed and perturbed heading locations, and P(xi| Θleft/right) was the conditional probability of the motion vector xi arising from the left or right heading angle. The two factors that determined the parameters (A1, A2, τ and κ) in the above equation were the target offset and the direction noise levels (Equation 6). Each target offset and direction noise level defined a different probability function, which fully characterized the possible difference distributions between perturbed and unperturbed heading locations for every dot at each frame. We determined the probabilities for each dot given a FOE at two possible heading locations P(xi| Θleft) and P(xi| Θright)), which was in turn used to calculate the likelihood of a dot coming from the left or the right of the target for a given frame. The product of likelihood ratios led to the IOM heading judgments for each trial. After calculation of the likelihood ratios, the procedures outlined for the 2D-IOM were used to determine the thresholds for all levels of direction noise (σnoise) used in the psychophysical experiment.
Figure 8 illustrates the performances of the 2D- and 3D-IOMs by plotting the target offset thresholds as a function of direction noise levels. Each data point is the average of six target offset thresholds (± SE).
Linear fits (df=9) to the performance of both models revealed that the ideal observers’ thresholds increased significantly with direction noise (2D-IOM: slope=0.0067±0.0002 arcdeg per degree of noise, t=40.12, p<0.0001; 3D-IOM: slope=0.0021±0.0001 arcdeg per degree of noise, t=19.24, p<0.0001). The difference in slopes between the ideal observers was due to the loss of information that occurred during the projection from 3D environmental coordinates to 2D screen coordinates.
Figure 9 illustrates human performance efficiency relative to the IOMs as a function of direction noise. We calculated the efficiency by dividing the IOMs’ target offset threshold by the observers’ target offset threshold for matched experimental conditions. The threshold values were inversely proportional to the sensitivities, meaning that an increase in efficiency reflects a relative improvement of the human observers compared to the model. Efficiency relative to the 2D-IOM and 3D-IOM first increased with direction noise before reaching a plateau. We used a least squares fit of a piecewise linear function with three free parameters: the slope and intercept of the rising curve (where efficiencies were increasing), and the pivot point (the noise level at which efficiency reached a plateau). The linear fits (2D-IOM: df=3; 3D-IOM: df=4) to the rising portions of these curves showed a significant increase in efficiencies with an increase in direction noise (2D-IOM: slope=0.0883±0.0219, t=4.04, p<0.05; 3D-IOM: slope=0.0290±0.0058, t=4.96, p<0.01). As shown by the difference in slope values, the increase in efficiency was larger for the 2D-IOM. The pivot points were estimated as 41° and 60° of noise for the 2D and 3D models, respectively. Thus, as the direction noise increased, human observers began to use the available information more efficiently, up to 40°–60° of direction range, after which point the efficiency remained constant. The plateau of efficiencies for direction noise levels greater than 40°–60°, may indicate that both the human and IOM performances are limited by external noise. For smaller direction noise levels human internal noise plays a more important role, which results in lower efficiencies. The difference in slopes between the 2D-IOM and 3D-IOM illustrates a difference in how well each model explains human performance. A flat efficiency curve would signify that the model fully accounts for observers’ abilities to compensate for noise. Efficiency slopes for the rising parts of the curves decreased from the 2D-IOM to 3D-IOM, suggesting that the 3D-IOM captured more of the compensation strategies used in the task and that observers may be partially recovering the 3D information lost during the projection onto 2D retinal coordinates. In addition, for large perturbations, efficiencies with respect to both models were unaffected by an increase in noise. This suggests that 3D reconstruction may play a role in the spatial information processing for relatively low amounts of noise (σnoise ≈ 40°–60°). Note that a 3D reconstruction mechanism is a contributing factor to the spatial integration mechanism, which was critical for perceiving heading accurately in the psychophysical experiments.
In this study, we investigated how the human visual system integrates information temporally and spatially within noisy optic flow fields for perception of heading. While previous psychophysical work reported the involvement of spatiotemporal mechanisms in optic flow perception (Andersen and Saidpour, 2002; Beck et al., 2007; Royden, 2002; Warren et al., 1991), it has not addressed specifically how the spatial and temporal information present within stimuli are utilized by the integration mechanisms during heading perception.
Warren et al. (1991) reported that a spatial integration mechanism is sufficient for the precise discrimination of translational heading. This study showed that temporal integration mechanisms also contribute to the accuracy of heading judgments. Here, we used three different types of direction noise (random-walk, fixed-random-trajectory and random-heading) to investigate the involvement and contribution of spatial and temporal integration mechanisms to heading perception. In all cases, the accuracy of heading discrimination decreased with the amount of direction noise applied to the stimulus. Furthermore, the impact of equivalent levels of local direction noise varied significantly with the type of noise. Observers’ heading discrimination was most robust to the frame-wise perturbations in local motion associated with the random-walk direction noise where both spatial and temporal integration mechanisms were involved. Even though heading perception was only slightly degraded when the visual system could not benefit from temporal integration mechanisms (as in the case of fixed-random-trajectory direction noise), an effective noise analysis suggested that observers make use of temporal integration mechanisms which operate on a dot-by-dot basis, and over a limited time window (about 200 msec). A relatively short temporal integration window is sufficient to maintain an accurate percept of heading given that in natural scenes moving observers are often faced to make accurate heading judgments under dynamically changing conditions such as shift of FOE locations due to eye rotations (Banks et al., 1996; Royden et al., 1994). Observers were most sensitive to the frame-wise perturbations in the global structure of coherent motion associated with random-heading direction noise where only temporal integration mechanisms were beneficial. Taken together these results suggest an additive effect of temporal integration to heading judgments at the stimulus level. This is consistent with previous low level motion studies showing the involvement of both spatial and temporal integration mechanisms (Bair and Movshon, 2004; Fredericksen et al., 1994; Vaina et al., 2003; Williams and Sekuler, 1984). However, in this task, a comparison of the performance across noise conditions illustrates that temporal integration has a significantly weaker influence during heading perception than spatial integration (as illustrated in Fig. 7b).
In order to understand the specifics of spatial information processing used in heading perception, we compared human psychophysical performance with 2D- and 3D-Ideal Observer Models for straight line heading discrimination. Previous studies have suggested the depth information improves subjects’ heading perception (van den Berg, 1992; van den Berg and Brenner, 1994a; van den Berg and Brenner, 1994b). Consistent with this hypothesis, here we provided computational evidence that, for heading discrimination, the human visual system is not limited by the 2D information available in the stimulus, but it recovers the 3D scene information, possibly benefiting from a 3D reconstruction of the optic flow fields. Royden and Vaina (2004), previously suggested that 2D information may be sufficient to perceive heading in environments, i.e. when the amount of information loss was not significant. The differences in performance between the IOMs discussed here indicate that while 2D motion is indeed sufficient to estimate heading, in noisy environments a partial reconstruction of the 3D motion is likely to improve performance and robustness to noise. Moreover, efficiency values showed that observers were able to make use of a possible 3D reconstruction under relatively low levels of noise (σnoise ≤ 40°–60°). We suggest that for non-robust optic flow fields, human heading perception mechanisms may take advantage of more involved computations involving 3D reconstruction. Since the efficiencies for the 3D-IOM (Fig. 9) were not constant, the 3D reconstruction mechanism may not fully capture the integration mechanisms employed in this task. The fact that efficiency values with respect to both 2D- and 3D-IOMs remained constant at very high levels of direction noise, i.e. when the scene is highly fragmented, implies that there are other mechanisms that may also contribute to spatial information processing. Grouping (Braddick, 1993; Smith and Curran, 2000; Treue et al., 2000), for example, may be an important aspect of spatial integration that is not accounted for by the proposed ideal observer models.
We thank Martin Kopcik for programming the basic heading stimuli and Robert I. Pitts for technical advice on the stimulus generation algorithms. We also thank the anonymous reviewers for their comments and suggestions. This work was supported in part by the National Institutes of Health grant R01 NS064100-01A1 to LMV.