Recently, there has been great interest among vision researchers in developing computational models that predict the distribution of saccadic endpoints in naturalistic scenes. In many of these studies, subjects are instructed to view scenes without any particular task in mind so that stimulus-driven (bottom-up) processes guide visual attention. However, whenever there is a search task, goal-driven (top-down) processes tend to dominate guidance, as indicated by attention being systematically biased toward image features that resemble those of the search target. In the present study, we devise a top-down model of visual attention during search in complex scenes based on similarity between the target and regions of the search scene. Similarity is defined for several feature dimensions such as orientation or spatial frequency using a histogram-matching technique. The amount of attentional guidance across visual feature dimensions is predicted by a previously introduced informativeness measure. We use eye-movement data gathered from participants’ search of a set of naturalistic scenes to evaluate the model. The model is found to predict the distribution of saccadic endpoints in search displays nearly as accurately as do other observers’ eye-movement data in the same displays.
visual search; visual attention; top-down attentional control; real-world scenes; informativeness; eye tracking; eye movements; saccadic selectivity; scene perception
Is it possible to infer a person's goal by decoding their fixations on objects? Two groups of participants categorically searched for either a teddy bear or butterfly among random category distractors, each rated as high, medium, or low in similarity to the target classes. Target-similar objects were preferentially fixated in both search tasks, demonstrating information about target category in looking behavior. Different participants then viewed the searchers' scanpaths, superimposed over the target-absent displays, and attempted to decode the target category (bear/butterfly). Bear searchers were classified perfectly; butterfly searchers were classified at 77%. Bear and butterfly Support Vector Machine (SVM) classifiers were also used to decode the same preferentially fixated objects and found to yield highly comparable classification rates. We conclude that information about a person's search goal exists in fixation behavior, and that this information can be behaviorally decoded to reveal a search target—essentially reading a person's mind by analyzing their fixations.
decoding; fixation duration; categorical search; computer vision; classification
Deficits or atypicalities in attention have been reported in individuals with autism spectrum disorder (ASD), yet no consensus on the nature of these deficits has emerged. We conducted three experiments that paired a peripheral precue with a covert discrimination task, using protocols for which the effects of covert exogenous spatial attention on early vision have been well established in typically developing populations. Experiment 1 assessed changes in contrast sensitivity, using orientation discrimination of a contrast-defined grating; Experiment 2 evaluated the reduction of crowding in the visual periphery, using discrimination of a letter-like figure with flanking stimuli at variable distances; and Experiment 3 assessed improvements in visual search, using discrimination of the same letter-like figure with a variable number of distractor elements. In all three experiments, we found that exogenous attention modulated visual discriminability in a group of high-functioning adults with ASD and that it did so in the same way and to the same extent as in a matched control group. We found no evidence to support the hypothesis that deficits in exogenous spatial attention underlie the emergence of core ASD symptomatology.
covert attention; exogenous attention; crowding; visual search; contrast sensitivity; adults; autism; ASD
It is difficult to recognize an object that falls in the peripheral visual field; it is even more difficult when there are other objects surrounding it. This effect, known as crowding, could be due to interactions between the low-level parts or features of the surrounding objects. Here, we investigated whether crowding can also occur selectively between higher level object representations. Many studies have demonstrated that upright faces, unlike most other objects, are coded holistically. Therefore, in addition to featural crowding within a face (M. Martelli, N. J. Majaj, & D. G. Pelli, 2005), we might expect an additional crowding effect between upright faces due to interference between the higher level holistic representations of these faces. In a series of experiments, we tested this by presenting an upright target face in a crowd of additional upright or inverted faces. We found that recognition was more strongly impaired when the target face was surrounded by upright compared to inverted flanker (distractor) faces; this pattern of results was absent when inverted faces and non-face objects were used as targets. This selective crowding of upright faces by other upright faces only occurred when the target–flanker separation was less than half the eccentricity of the target face, consistent with traditional crowding effects (H. Bouma, 1970; D. G. Pelli, M. Palomares, & N. J. Majaj, 2004). Likewise, the selective interference between upright faces did not occur at the fovea and was not a function of the target–flanker similarity, suggesting that crowding-specific processes were responsible. The results demonstrate that crowding can occur selectively between high-level representations of faces and may therefore occur at multiple stages in the visual system.
vision; perception; awareness; face recognition; ensemble; spatial; lateral; masking; object
One of the most important aspects of visual attention is its flexibility; our attentional “window” can be tuned to different spatial scales, allowing us to perceive large-scale global patterns and local features effortlessly. We investigated whether the perception of global and local motion competes for a common attentional resource. Subjects viewed arrays of individual moving Gabors that group to produce a global motion percept when subjects attended globally. When subjects attended locally, on the other hand, they could identify the direction of individual uncrowded Gabors. Subjects were required to devote their attention toward either scale of motion or divide it between global and local scales. We measured direction discrimination as a function of the validity of a precue, which was varied in opposite directions for global and local motion such that when the precue was valid for global motion, it was invalid for local motion and vice versa. There was a trade-off between global and local motion thresholds, such that increasing the validity of precues at one spatial scale simultaneously reduced thresholds at that spatial scale but increased thresholds at the other spatial scale. In a second experiment, we found a similar pattern of results for static-oriented Gabors: Attending to local orientation information impaired the subjects’ ability to perceive globally defined orientation and vice versa. Thresholds were higher for orientation compared to motion, however, suggesting that motion discrimination in the first experiment was not driven by orientation information alone but by motion-specific processing. The results of these experiments demonstrate that a shared attentional resource flexibly moves between different spatial scales and allows for the perception of both local and global image features, whether these features are defined by motion or orientation.
attention; spatial scale; global; local; motion perception; integration; segmentation
Motion can bias the perceived location of a stationary stimulus, but whether this occurs at a high level of representation or at early, retinotopic stages of visual processing remains an open question. As coding of orientation emerges early in visual processing, we tested whether motion could influence the spatial location at which orientation adaptation is seen. Specifically, we examined whether the tilt aftereffect (TAE) depends on the perceived or the retinal location of the adapting stimulus, or both. We used the flash-drag effect (FDE) to produce a shift in the perceived position of the adaptor away from its retinal location. Subjects viewed a patterned disk that oscillated clockwise and counterclockwise while adapting to a small disk containing a tilted linear grating that was flashed briefly at the moment of the rotation reversals. The FDE biased the perceived location of the grating in the direction of the disk's motion immediately following the flash, allowing dissociation between the retinal and perceived location of the adaptor. Brief test gratings were subsequently presented at one of three locations—the retinal location of the adaptor, its perceived location, or an equidistant control location (antiperceived location). Measurements of the TAE at each location demonstrated that the TAE was strongest at the retinal location, and was larger at the perceived compared to the antiperceived location. This indicates a skew in the spatial tuning of the TAE consistent with the FDE. Together, our findings suggest that motion can bias the location of low-level adaptation.
motion processing; flash-drag effect; tilt-aftereffect; orientation adaptation
Perceived time is inherently malleable. For example, adaptation to relatively long or short sensory events leads to a repulsive aftereffect such that subsequent events appear to be contracted or expanded (duration adaptation). Perceived visual duration can also be distorted via concurrent presentation of discrepant auditory durations (multisensory integration). The neural loci of both distortions remain unknown. In the current study we use a psychophysical approach to establish their relative positioning within the sensory processing hierarchy. We show that audiovisual integration induces marked distortions of perceived visual duration. We proceed to use these distorted durations as visual adapting stimuli yet find subsequent visual duration aftereffects to be consistent with physical rather than perceived visual duration. Conversely, the concurrent presentation of adapted auditory durations with nonadapted visual durations results in multisensory integration patterns consistent with perceived, rather than physical, auditory duration. These results demonstrate that recent sensory history modifies human duration perception prior to the combination of temporal information across sensory modalities and provides support for adaptation mechanisms mediated by duration selective neurons situated in early areas of the visual and auditory nervous system (Aubie, Sayegh, & Faure, 2012; Duysens, Schaafsma, & Orban, 1996; Leary, Edwards, & Rose, 2008).
temporal perception; multisensory integration; duration adaptation; interval tuning; cue combination
Many socially important search tasks are characterized by low target prevalence, meaning that targets are rarely encountered. For example, transportation security officers (TSOs) at airport checkpoints encounter very few actual threats in carry-on bags. In laboratory-based visual search experiments, low prevalence reduces the probability of detecting targets (Wolfe, Horowitz, & Kenner, 2005). In the lab, this “prevalence effect” is caused by changes in decision and response criteria (Wolfe & Van Wert, 2010) and can be mitigated by presenting a burst of high-prevalence search with feedback (Wolfe et al., 2007). The goal of this study was to see if these effects could be replicated in the field with TSOs. A total of 125 newly trained TSOs participated in one of two experiments as part of their final evaluation following training. They searched for threats in simulated bags across five blocks. The first three blocks were low prevalence (target prevalence ≤ .05) with no feedback; the fourth block was high prevalence (.50) with full feedback; and the final block was, again, low prevalence. We found that newly trained TSOs were better at detecting targets at high compared to low prevalence, replicating the prevalence effect. Furthermore, performance was better (and response criterion was more “liberal”) in the low-prevalence block that took place after the high-prevalence block than in the initial three low-prevalence blocks, suggesting that a burst of high-prevalence trials may help alleviate the prevalence effect in the field.
visual search; prevalence effects; airport security; visual attention; error rates; criterion shift
We determined the degree to which change in visual acuity (VA) correlates with change in optical quality using image-quality (IQ) metrics for both normal and keratoconic wavefront errors (WFEs). VA was recorded for five normal subjects reading simulated, logMAR acuity charts generated from the scaled WFEs of 15 normal and seven keratoconic eyes. We examined the correlations over a large range of acuity loss (up to 11 lines) and a smaller, more clinically relevant range (up to four lines). Nine IQ metrics were well correlated for both ranges. Over the smaller range of primary interest, eight were also accurate and precise in estimating the variations in logMAR acuity in both normal and keratoconic WFEs. The accuracy for these eight best metrics in estimating the mean change in logMAR acuity ranged between ±0.0065 to ±0.017 logMAR (all less than one letter), and the precision ranged between ±0.10 to ±0.14 logMAR (all less than seven letters).
image quality; image-quality metrics; visual acuity; normal and keratoconic wavefront errors
Feedforward visual object perception recruits a cortical network that is assumed to be hierarchical, progressing from basic visual features to complete object representations. However, the nature of the intermediate features related to this transformation remains poorly understood. Here, we explore how well different computer vision recognition models account for neural object encoding across the human cortical visual pathway as measured using fMRI. These neural data, collected during the viewing of 60 images of real-world objects, were analyzed with a searchlight procedure as in Kriegeskorte, Goebel, and Bandettini (2006): Within each searchlight sphere, the obtained patterns of neural activity for all 60 objects were compared to model responses for each computer recognition algorithm using representational dissimilarity analysis (Kriegeskorte et al., 2008). Although each of the computer vision methods significantly accounted for some of the neural data, among the different models, the scale invariant feature transform (Lowe, 2004), encoding local visual properties gathered from “interest points,” was best able to accurately and consistently account for stimulus representations within the ventral pathway. More generally, when present, significance was observed in regions of the ventral-temporal cortex associated with intermediate-level object perception. Differences in model effectiveness and the neural location of significant matches may be attributable to the fact that each model implements a different featural basis for representing objects (e.g., more holistic or more parts-based). Overall, we conclude that well-known computer vision recognition systems may serve as viable proxies for theories of intermediate visual object representation.
neuroimaging; object recognition; computational modeling; intermediate feature representation
The adaptation of an observer’s saccadic eye movements to artificial post-saccadic visual error can lead to perceptual mislocalization of individual, transient visual stimuli. In this study, we demonstrate that simultaneous saccadic adaptation to a consistent error pattern across a large number of saccade vectors is accompanied by corresponding spatial distortions in the perception of persistent objects. To induce this adaptation, we artificially introduced several post-saccadic error patterns, which led to a systematic distortion in participants’ oculomotor space and a corresponding distortion in their perception of the relative dimensions of a cross-figure. The results indicate a tight coupling between the oculomotor and visual–perceptual spaces that is not limited to misperception of individual visual locations but also affects metrics in the visual–perceptual space. This coupling suggests that our visual perception is continuously recalibrated by the post-saccadic error signal.
visual perception; eye movements; saccadic adaptation; post-saccadic visual error; gaze-contingent stimuli; sensorimotor alignment
The way we perceive an object depends both on feedforward, bottom-up processing of its physical stimulus properties and on top-down factors such as attention, context, expectation, and task relevance. Here we compared neural activity elicited by varying perceptions of the same physical image—a bistable moving image in which perception spontaneously alternates between dissociated fragments and a single, unified object. A time-frequency analysis of EEG changes associated with the perceptual switch from object to fragment and vice versa revealed a greater decrease in alpha (8–12 Hz) accompanying the switch to object percept than to fragment percept. Recordings of event-related potentials elicited by irrelevant probes superimposed on the moving image revealed an enhanced positivity between 184 and 212 ms when the probes were contained within the boundaries of the perceived unitary object. The topography of the positivity (P2) in this latency range elicited by probes during object perception was distinct from the topography elicited by probes during fragment perception, suggesting that the neural processing of probes differed as a function of perceptual state. Two source localization algorithms estimated the neural generator of this object-related difference to lie in the lateral occipital cortex, a region long associated with object perception. These data suggest that perceived objects attract attention, incorporate visual elements occurring within their boundaries into unified object representations, and enhance the visual processing of elements occurring within their boundaries. Importantly, the perceived object in this case emerged as a function of the fluctuating perceptual state of the viewer.
object perception; object attention; bistable perception
Traditional trichromatic theories of color vision conclude that color perception is not possible under scotopic illumination in which only one type of photoreceptor, rods, is active. The current study demonstrates the existence of scotopic color perception and indicates that perceived hue is influenced by spatial context and top-down processes of color perception. Experiment 1 required observers to report the perceived hue in various natural scene images under purely rod-mediated vision. The results showed that when the test patch had low variation in the luminance distribution and was a decrement in luminance compared to the surrounding area, reddish or orangish percepts were more likely to be reported compared to all other percepts. In contrast, when the test patch had a high variation and was an increment in luminance, the probability of perceiving blue, green, or yellow hues increased. In addition, when observers had a strong, but singular, daylight hue association for the test patch, color percepts were reported more often and hues appeared more saturated compared to patches with no daylight hue association. This suggests that experience in daylight conditions modulates the bottom-up processing for rod-mediated color perception. In Experiment 2, observers reported changes in hue percepts for a test ring surrounded by inducing rings that varied in spatial context. In sum, the results challenge the classic view that rod vision is achromatic and suggest that scotopic hue perception is mediated by cortical mechanisms.
hue percepts; scotopic color vision; natural image statistics
We have recently suggested that neural flow parsing mechanisms act to subtract global optic flow consistent with observer movement to aid in detecting and assessing scene-relative object movement. Here, we examine whether flow parsing can occur independently from heading estimation. To address this question we used stimuli comprising two superimposed optic flow fields comprising limited lifetime dots (one planar and one radial). This stimulus gives rise to the so-called optic flow illusion (OFI) in which perceived heading is biased in the direction of the planar flow field. Observers were asked to report the perceived direction of motion of a probe object placed in the OFI stimulus. If flow parsing depends upon a prior estimate of heading then the perceived trajectory should reflect global subtraction of a field consistent with the heading experienced under the OFI. In Experiment 1 we tested this prediction directly, finding instead that the perceived trajectory was biased markedly in the direction opposite to that predicted under the OFI. In Experiment 2 we demonstrate that the results of Experiment 1 are consistent with a positively weighted vector sum of the effects seen when viewing the probe together with individual radial and planar flow fields. These results suggest that flow parsing is not necessarily dependent on prior estimation of heading direction. We discuss the implications of this finding for our understanding of the mechanisms of flow parsing.
optic flow processing; heading; flow parsing; object movement; ego-motion
Although global motion processing is thought to emerge early in infancy, there is debate regarding the age at which it matures to an adult-like level. In the current study, we address the possibility that the apparent age-related improvement in global motion processing might be secondary to age-related increases in the sensitivity of mechanisms (i.e., local motion detectors) that provide input to global motion mechanisms. To address this, we measured global motion processing by obtaining motion coherence thresholds using stimuli that were equally detectable in terms of contrast across all individuals and ages (3-, 4-, 5-, 6-, and 7-month-olds and adults). For infants, we employed a directional eye movement (DEM) technique. For adults, we employed both DEM and a self-report method. First, contrast sensitivity was obtained for a local task, using a stochastic motion display in which all the dots moved coherently. Contrast sensitivity increased significantly between 3 and 7 months, and between infancy and adulthood. Each subject was then tested on the global motion task with the contrast of the dots set to 2.5 × each individual's contrast threshold. Coherence thresholds were obtained by varying the percentage of coherently moving “signal” versus “noise” dots in the stochastic motion display. Results revealed remarkably stable global motion sensitivity between 3 and 7 months of age, as well as between infancy and adulthood. These results suggest that the mechanisms underlying global motion processing develop to an adult-like state very quickly.
local motion; global motion; infants; adults; contrast; coherence; stochastic motion
How does a baseball outfielder know where to run to catch a fly ball? The “outfielder problem” remains unresolved, and its solution would provide a window into the visual control of action. It may seem obvious that human action is based on an internal model of the physical world, such that the fielder predicts the landing point based on a mental model of the ball’s trajectory (TP). But two alternative theories, Optical Acceleration Cancellation (OAC) and Linear Optical Trajectory (LOT), propose that fielders are led to the right place at the right time by coupling their movements to visual information in a continuous “online” manner. All three theories predict successful catches and similar running paths. We provide a critical test by using virtual reality to perturb the vertical motion of the ball in mid-flight. The results confirm the predictions of OAC, but are at odds with LOT and TP.
outfielder problem; visual control of locomotion; perception-action; modeling; baseball
In three experiments, we examined the influence of visual working memory (VWM) on the metrics of saccade landing position in a global effect paradigm. Participants executed a saccade to the more eccentric object in an object pair appearing on the horizontal midline, to the left or right of central fixation. While completing the saccade task, participants maintained a color in VWM for an unrelated memory task. Either the color of the saccade target matched the memory color (target match), the color of the distractor matched the memory color (distractor match), or the colors of neither object matched the memory color (no match). In the no-match condition, saccades tended to land at the midpoint between the two objects: the global, or averaging, effect. However, when one of the two objects matched VWM, the distribution of landing position shifted toward the matching object, both for target match and for distractor match. VWM modulation of landing position was observed even for the fastest quartile of saccades, with a mean latency as low as 112 ms. Effects of VWM on such rapidly generated saccades, with latencies in the express-saccade range, indicate that VWM interacts with the initial sweep of visual sensory processing, modulating perceptual input to oculomotor systems and thereby biasing oculomotor selection. As a result, differences in memory match produce effects on landing position similar to the effects generated by differences in physical salience.
saccadic eye movements; visual working memory; visual short-term memory
To understand how different spatial frequencies contribute to the overall perceived contrast of complex, broadband photographic images, we adapted the classification image paradigm. Using natural images as stimuli, we randomly varied relative contrast amplitude at different spatial frequencies and had human subjects determine which images had higher contrast. Then, we determined how the random variations corresponded with the human judgments. We found that the overall contrast of an image is disproportionately determined by how much contrast is between 1 and 6 c/°, around the peak of the contrast sensitivity function (CSF). We then employed the basic components of contrast psychophysics modeling to show that the CSF alone is not enough to account for our results and that an increase in gain control strength toward low spatial frequencies is necessary. One important consequence of this is that contrast constancy, the apparent independence of suprathreshold perceived contrast and spatial frequency, will not hold during viewing of natural images. We also found that images with darker low-luminance regions tended to be judged as having higher overall contrast, which we interpret as the consequence of darker local backgrounds resulting in higher band-limited contrast response in the visual system.
contrast gain control; perceived contrast; reverse correlation; contrast contancy; natural scenes
Adapting to a 20 Hz oscillating grating reduces the apparent duration of a 10 Hz drifting grating displayed subsequently in the same location as the adaptor. The effect is orientation-independent as it remains once the adaptor is rotated 90° relative to the tests (Johnston, Arnold, & Nishida, 2006). However, it was shown that, for random dots moving at 3°/s, duration compression follows adaptation only when the adaptor and test drift in the same direction, and it disappears when they drift in opposite directions (Curran & Benton, 2012). Here, we explored the relationship between the relative motion direction of adaptor and test and the strength of duration compression for a wider range of speeds and for narrow-band stimuli (temporal frequencies between 3 and 18 Hz). We first measured perceived temporal frequency for the same stimuli after adaptation, and we used these estimates to match the apparent rate of the adapted and unadapted tests in the duration task. We found that, whereas at 3 Hz the effect of adaptation in the opposite direction on duration is marginal, at higher frequencies there is substantial duration compression in the opposite direction. These results indicate that there may be two contributions to apparent duration compression: a cortical contribution sensitive to orientation and motion direction at a wide range of temporal frequencies and a direction-independent subcortical contribution, which is revealed at higher frequencies. However, while direction specificity implies cortical involvement, subcortical orientation dependency and the influence of feedback to subcortical areas should not be ignored.
perceived duration; temporal frequency adaptation; motion direction specificity; psychophysics
Objects in the environment differ in their low-level perceptual properties (e.g., how easily a fruit can be recognized) as well as in their subjective value (how tasty it is). We studied the influence of visual salience on value-based decisions using a two alternative forced choice task, in which human subjects rapidly chose items from a visual display. All targets were equally easy to detect. Nevertheless, both value and salience strongly affected choices made and reaction times. We analyzed the neuronal mechanisms underlying these behavioral effects using stochastic accumulator models, allowing us to characterize not only the averages of reaction times but their full distributions. Independent models without interaction between the possible choices failed to reproduce the observed choice behavior, while models with mutual inhibition between alternative choices produced much better results. Mutual inhibition thus is an important feature of the decision mechanism. Value influenced the amount of accumulation in all models. In contrast, increased salience could either lead to an earlier start (onset model) or to a higher rate (speed model) of accumulation. Both models explained the data from the choice trials equally well. However, salience also affected reaction times in no-choice trials in which only one item was present, as well as error trials. Only the onset model could explain the observed reaction time distributions of error trials and no-choice trials. In contrast, the speed model could not, irrespective of whether the rate increase resulted from more frequent accumulated quanta or from larger quanta. Visual salience thus likely provides an advantage in the onset, not in the processing speed, of value-based decision making.
decision making; accumulator; reaction time
Vision researchers rely on visual display technology for the presentation of stimuli to human and nonhuman observers. Verifying that the desired and displayed visual patterns match along dimensions such as luminance, spectrum, and spatial and temporal frequency is an essential part of developing controlled experiments. With cathode-ray tubes (CRTs) becoming virtually unavailable on the commercial market, it is useful to determine the characteristics of newly available displays based on organic light emitting diode (OLED) panels to determine how well they may serve to produce visual stimuli. This report describes a series of measurements summarizing the properties of images displayed on two commercially available OLED displays: the Sony Trimaster EL BVM-F250 and PVM-2541. The results show that the OLED displays have large contrast ratios, wide color gamuts, and precise, well-behaved temporal responses. Correct adjustment of the settings on both models produced luminance nonlinearities that were well predicted by a power function (“gamma correction”). Both displays have adjustable pixel independence and can be set to have little to no spatial pixel interactions. OLED displays appear to be a suitable, or even preferable, option for many vision research applications.
display characterization; spatio-temporal precision; spectrum; OLED; stimulus presentation
Depth estimates from disparity are most precise when the visual input stimulates corresponding retinal points or points close to them. Corresponding points have uncrossed disparities in the upper visual field and crossed disparities in the lower visual field. Due to these disparities, the vertical part of the horopter—the positions in space that stimulate corresponding points—is pitched top-back. Many have suggested that this pitch is advantageous for discriminating depth in the natural environment, particularly relative to the ground. We asked whether the vertical horopter is adaptive (suited for perception of the ground) and adaptable (changeable by experience). Experiment 1 measured the disparities between corresponding points in 28 observers. We confirmed that the horopter is pitched. However, it is also typically convex making it ill-suited for depth perception relative to the ground. Experiment 2 tracked locations of corresponding points while observers wore lenses for 7 days that distorted binocular disparities. We observed no change in the horopter, suggesting that it is not adaptable. We also showed that the horopter is not adaptive for long viewing distances because at such distances uncrossed disparities between corresponding points cannot be stimulated. The vertical horopter seems to be adaptive for perceiving convex, slanted surfaces at short distances.
stereopsis; binocular vision; horopter; corresponding points; natural environment; depth perception; cyclovergence
Despite the growing popularity of virtual reality environments, few laboratories are equipped to investigate eye movements within these environments. This primer is intended to reduce the time and effort required to incorporate eye-tracking equipment into a virtual reality environment. We discuss issues related to the initial startup and provide algorithms necessary for basic analysis. Algorithms are provided for the calculation of gaze angle within a virtual world using a monocular eye-tracker in a three-dimensional environment. In addition, we provide algorithms for the calculation of the angular distance between the gaze and a relevant virtual object and for the identification of fixations, saccades, and pursuit eye movements. Finally, we provide tools that temporally synchronize gaze data and the visual stimulus and enable real-time assembly of a video-based record of the experiment using the Quicktime MOV format, available at http://sourceforge.net/p/utdvrlibraries/. This record contains the visual stimulus, the gaze cursor, and associated numerical data and can be used for data exportation, visual inspection, and validation of calculated gaze movements.
virtual reality; eye movements; gaze; methods
Search is commonly described as a repeating cycle of guidance to target-like objects, followed by the recognition of these objects as targets or distractors. Are these indeed separate processes using different visual features? We addressed this question by comparing observer behavior to that of support vector machine (SVM) models trained on guidance and recognition tasks. Observers searched for a categorically defined teddy bear target in four-object arrays. Target-absent trials consisted of random category distractors rated in their visual similarity to teddy bears. Guidance, quantified as first-fixated objects during search, was strongest for targets, followed by target-similar, medium-similarity, and target-dissimilar distractors. False positive errors to first-fixated distractors also decreased with increasing dissimilarity to the target category. To model guidance, nine teddy bear detectors, using features ranging in biological plausibility, were trained on unblurred bears then tested on blurred versions of the same objects appearing in each search display. Guidance estimates were based on target probabilities obtained from these detectors. To model recognition, nine bear/nonbear classifiers, trained and tested on unblurred objects, were used to classify the object that would be fixated first (based on the detector estimates) as a teddy bear or a distractor. Patterns of categorical guidance and recognition accuracy were modeled almost perfectly by an HMAX model in combination with a color histogram feature. We conclude that guidance and recognition in the context of search are not separate processes mediated by different features, and that what the literature knows as guidance is really recognition performed on blurred objects viewed in the visual periphery.
categorical guidance; object detection; visual similarity; classifiers; eye movements
Three studies, involving a total of 145 observers examined quantitative theories of the overestimation of perceived optical slant. The first two studies investigated the depth/width anisotropies on positive and negative slant in both pitch and yaw at 2 and 8 m using calibrated immersive virtual environments. Observers made judgments of the relative lengths of extents that were frontal with those that were in depth. The physical aspect ratio that was perceived as 1:1 was determined for each slant. The observed anisotropies can be modeled by assuming overestimation in perceived slant. Three one-parameter slant perception models (angular expansion, affine depth compression caused by mis-scaling of binocular disparity, and intrinsic bias) were compared. The angular expansion and the affine depth compression models provided significantly better fits to the aspect ratio data than the intrinsic bias model did. The affine model required depth compression at the 2 m distance; however, that was much more than the depth compression measured directly in the third study using the same apparatus. The present results suggest that depth compression based on mis-scaling of binocular disparity may contribute to slant overestimation, especially as viewing distance increases, but also suggest that a functional rather than mechanistic account may be more appropriate for explaining biases in perceived slant in near space.
slant; non-Euclidean; space perception; orientation; surface layout