The visual system rapidly represents the mean size of sets of objects. Here, we investigated whether mean size is explicitly encoded by the visual system, along a single dimension like texture, numerosity, and other visual dimensions susceptible to adaptation. Observers adapted to two sets of dots with different mean sizes, presented simultaneously in opposite visual fields. After adaptation, two test patches replaced the adapting dot sets, and participants judged which test appeared to have the larger average dot diameter. They generally perceived the test that replaced the smaller mean size adapting set as being larger than the test that replaced the larger adapting set. This differential aftereffect held for single test dots (Experiment 2) and high-pass filtered displays (Experiment 3), and changed systematically as a function of the variance of the adapting dot sets (Experiment 4), providing additional support that mean size is adaptable, and therefore explicitly encoded dimension of visual scenes.
Mean size; Adaptation aftereffect; Summary representations
It is difficult to recognize an object that falls in the peripheral visual field; it is even more difficult when there are other objects surrounding it. This effect, known as crowding, could be due to interactions between the low-level parts or features of the surrounding objects. Here, we investigated whether crowding can also occur selectively between higher level object representations. Many studies have demonstrated that upright faces, unlike most other objects, are coded holistically. Therefore, in addition to featural crowding within a face (M. Martelli, N. J. Majaj, & D. G. Pelli, 2005), we might expect an additional crowding effect between upright faces due to interference between the higher level holistic representations of these faces. In a series of experiments, we tested this by presenting an upright target face in a crowd of additional upright or inverted faces. We found that recognition was more strongly impaired when the target face was surrounded by upright compared to inverted flanker (distractor) faces; this pattern of results was absent when inverted faces and non-face objects were used as targets. This selective crowding of upright faces by other upright faces only occurred when the target–flanker separation was less than half the eccentricity of the target face, consistent with traditional crowding effects (H. Bouma, 1970; D. G. Pelli, M. Palomares, & N. J. Majaj, 2004). Likewise, the selective interference between upright faces did not occur at the fovea and was not a function of the target–flanker similarity, suggesting that crowding-specific processes were responsible. The results demonstrate that crowding can occur selectively between high-level representations of faces and may therefore occur at multiple stages in the visual system.
vision; perception; awareness; face recognition; ensemble; spatial; lateral; masking; object
The ability of the visual system to localize objects is one of its most important functions and yet remains one of the least understood, especially when either the object or the surrounding scene is in motion. The specific process that assigns positions under these circumstances is unknown, but two major classes of mechanism have emerged: spatial mechanisms that directly influence the coded locations of objects, and temporal mechanisms that influence the speed of perception. Disentangling these mechanisms is one of the first steps towards understanding how the visual system assigns locations to objects when there are motion signals present in the scene.
Echolocating organisms represent their external environment using reflected auditory information from emitted vocalizations. This ability, long known in various non-human species, has also been documented in some blind humans as an aid to navigation, as well as object detection and coarse localization. Surprisingly, our understanding of the basic acuity attainable by practitioners—the most fundamental underpinning of echoic spatial perception—remains crude. We found that experts were able to discriminate horizontal offsets of stimuli as small as ~1.2° auditory angle in the frontomedial plane, a resolution approaching the maximum measured precision of human spatial hearing and comparable to that found in bats performing similar tasks. Furthermore, we found a strong correlation between echolocation acuity and age of blindness onset. This first measure of functional spatial resolution in a population of expert echolocators demonstrates precision comparable to that found in the visual periphery of sighted individuals.
Echolocation; Perception; Crossmodal perception; Blindness
One of the most important aspects of visual attention is its flexibility; our attentional “window” can be tuned to different spatial scales, allowing us to perceive large-scale global patterns and local features effortlessly. We investigated whether the perception of global and local motion competes for a common attentional resource. Subjects viewed arrays of individual moving Gabors that group to produce a global motion percept when subjects attended globally. When subjects attended locally, on the other hand, they could identify the direction of individual uncrowded Gabors. Subjects were required to devote their attention toward either scale of motion or divide it between global and local scales. We measured direction discrimination as a function of the validity of a precue, which was varied in opposite directions for global and local motion such that when the precue was valid for global motion, it was invalid for local motion and vice versa. There was a trade-off between global and local motion thresholds, such that increasing the validity of precues at one spatial scale simultaneously reduced thresholds at that spatial scale but increased thresholds at the other spatial scale. In a second experiment, we found a similar pattern of results for static-oriented Gabors: Attending to local orientation information impaired the subjects’ ability to perceive globally defined orientation and vice versa. Thresholds were higher for orientation compared to motion, however, suggesting that motion discrimination in the first experiment was not driven by orientation information alone but by motion-specific processing. The results of these experiments demonstrate that a shared attentional resource flexibly moves between different spatial scales and allows for the perception of both local and global image features, whether these features are defined by motion or orientation.
attention; spatial scale; global; local; motion perception; integration; segmentation
Despite several Findings of perceptual asynchronies between object features, it remains unclear whether independent neuronal populations necessarily code these perceptually unbound properties. To examine this, we investigated the binding between an object’s spatial frequency and its rotational motion using contingent motion aftereffects (MAE). Subjects adapted to an oscillating grating whose direction of rotation was paired with a high or low spatial frequency pattern. In separate adaptation conditions, we varied the moment when the spatial frequency change occurred relative to the direction reversal. After adapting to one stimulus, subjects made judgments of either the perceived MAE (rotational movement) or the position shift (instantaneous phase rotation) that accompanied the MAE. To null the spatial frequency-contingent MAE, motion reversals had to physically lag changes in spatial frequency during adaptation. To null the position shift that accompanied the MAE, however, no temporal lag between the attributes was required. This demonstrates that perceived motion and position can be perceptually misbound. Indeed, in certain conditions, subjects perceived the test pattern to drift in one direction while its position appeared shifted in the opposite direction. The dissociation between perceived motion and position of the same test pattern, following identical adaptation, demonstrates that distinguishable neural populations code for these object properties.
Vision; Perception; Motion; Color-motion asynchrony; Binding; Spatial frequency; Texture; Contingent aftereffect; Motion aftereffect; MAE; Localization; Differential latency
Although second-order motion may be detected by early and automatic mechanisms, some models suggest that perceiving second-order motion requires higher-order processes, such as feature or attentive tracking. These types of attentionally mediated mechanisms could explain the motion aftereffect (MAE) perceived in dynamic displays after adapting to second-order motion. Here we tested whether there is a second-order MAE in the absence of attention or awareness. If awareness of motion, mediated by high-level or top-down mechanisms, is necessary for the second-order MAE, then there should be no measurable MAE if the ability to detect directionality is impaired during adaptation. To eliminate the subject’s ability to detect directionality of the adapting stimulus, a second-order drifting Gabor was embedded in a dense array of additional crowding Gabors. We found that a significant MAE was perceived even after adaptation to second-order motion in crowded displays that prevented awareness. The results demonstrate that second-order motion can be passively coded in the absence of awareness and without top-down attentional control.
Vision; Attention; Perception; Motion; Motion perception; Second-order motion; Crowding; Awareness; Consciousness; First-order motion; MAE; Contrast-defined
Motion can bias the perceived location of a stationary stimulus, but whether this occurs at a high level of representation or at early, retinotopic stages of visual processing remains an open question. As coding of orientation emerges early in visual processing, we tested whether motion could influence the spatial location at which orientation adaptation is seen. Specifically, we examined whether the tilt aftereffect (TAE) depends on the perceived or the retinal location of the adapting stimulus, or both. We used the flash-drag effect (FDE) to produce a shift in the perceived position of the adaptor away from its retinal location. Subjects viewed a patterned disk that oscillated clockwise and counterclockwise while adapting to a small disk containing a tilted linear grating that was flashed briefly at the moment of the rotation reversals. The FDE biased the perceived location of the grating in the direction of the disk's motion immediately following the flash, allowing dissociation between the retinal and perceived location of the adaptor. Brief test gratings were subsequently presented at one of three locations—the retinal location of the adaptor, its perceived location, or an equidistant control location (antiperceived location). Measurements of the TAE at each location demonstrated that the TAE was strongest at the retinal location, and was larger at the perceived compared to the antiperceived location. This indicates a skew in the spatial tuning of the TAE consistent with the FDE. Together, our findings suggest that motion can bias the location of low-level adaptation.
motion processing; flash-drag effect; tilt-aftereffect; orientation adaptation
Although the visual cortex is organized retinotopically, it is not clear whether the cortical representation of position necessarily reflects perceived position. Using functional magnetic resonance imaging (fMRI), we show that the retinotopic representation of a stationary object in the cortex was systematically shifted when visual motion was present in the scene. Whereas the object could appear shifted in the direction of the visual motion, the representation of the object in the visual cortex was always shifted in the opposite direction. The results show that the representation of position in the primary visual cortex, as revealed by fMRI, can be dissociated from perceived location.
Visual information is crucial for goal-directed reaching. A number of studies have recently shown that motion in particular is an important source of information for the visuomotor system. For example, when reaching a stationary object, movement of the background can influence the trajectory of the hand, even when the background motion is irrelevant to the object and task. This manual following response may be a compensatory response to changes in body position, but the underlying mechanism remains unclear. Here we tested whether visual motion area MT+ is necessary to generate the manual following response. We found that stimulation of MT+ with transcranial magnetic stimulation significantly reduced a strong manual following response. MT+ is therefore necessary for generating the manual following response, indicating that it plays a crucial role in guiding goal-directed reaching movements by taking into account background motion in scenes.
action; localization; manual following response; perception; pointing; TMS; visuomotor
Recent evidence suggests those with autism may be generally impaired in visual motion perception. To examine this, we investigated both coherent and biological motion processing in adolescents with autism employing both psychophysical and fMRI methods. Those with autism performed as well as matched controls during coherent motion perception but had significantly higher thresholds for biological motion perception. The autism group showed reduced posterior Superior Temporal Sulcus (pSTS), parietal and frontal activity during a biological motion task while showing similar levels of activity in MT+/V5 during both coherent and biological motion trials. Activity in MT+/V5 was predictive of individual coherent motion thresholds in both groups. Activity in dorsolateral prefrontal cortex (DLPFC) and pSTS was predictive of biological motion thresholds in control participants but not in those with autism. Notably, however, activity in DLPFC was negatively related to autism symptom severity. These results suggest that impairments in higher-order social or attentional networks may underlie visual motion deficits observed in autism.
Perceiving biological motion is important for understanding the intentions and future actions of others. Perceiving an approaching person's behavior may be particularly important, because such behavior often precedes social interaction. To this end, the visual system may devote extra resources for perceiving an oncoming person's heading. If this were true, humans should show increased sensitivity for perceiving approaching headings, and as a result, a repulsive perceptual effect around the categorical boundary of leftward/rightward motion. We tested these predictions and found evidence for both. First, observers were especially sensitive to the heading of an approaching person; variability in estimates of a person's heading decreased near the category boundary of leftward/rightward motion. Second, we found a repulsion effect around the category boundary; a person walking approximately toward the observer was perceived as being repelled away from straight ahead. This repulsive effect was greatly exaggerated for perception of a very briefly presented person or perception of a chaotic crowd, suggesting that repulsion may protect against categorical errors when sensory noise is high. The repulsion effect with a crowd required integration of local motion and human form, suggesting an origin in high-level stages of visual processing. Similar repulsive effects may underlie categorical perception with other social features. Overall, our results show that a person's direction of walking is categorically perceived, with improved sensitivity at the category boundary and a concomitant repulsion effect.
categorical perception; reference repulsion; biological motion; ensemble coding; motion repulsion
People are sensitive to the summary statistics of the visual world (e.g., average orientation/speed/facial expression). We readily derive this information from complex scenes, often without explicit awareness. Given the fundamental and ubiquitous nature of summary statistical representation, we tested whether this kind of information is subject to the attentional constraints imposed by change blindness. We show that information regarding the summary statistics of a scene is available despite limited conscious access. In a novel experiment, we found that while observers can suffer from change blindness (i.e., not localize where change occurred between two views of the same scene), observers could nevertheless accurately report changes in the summary statistics (or “gist”) about the very same scene. In the experiment, observers saw two successively presented sets of 16 faces that varied in expression. Four of the faces in the first set changed from one emotional extreme (e.g., happy) to another (e.g., sad) in the second set. Observers performed poorly when asked to locate any of the faces that changed (change blindness). However, when asked about the ensemble (which set was happier, on average), observer performance remained high. Observers were sensitive to the average expression even when they failed to localize any specific object change. That is, even when observers could not locate the very faces driving the change in average expression between the two sets, they nonetheless derived a precise ensemble representation. Thus, the visual system may be optimized to process summary statistics in an efficient manner, allowing it to operate despite minimal conscious access to the information presented.
Visual awareness; Change blindness; Face perception; Object recognition
The flash-drag (FDE) effect refers to the phenomenon in which the position of a stationary flashed object in one location appears shifted in the direction of nearby motion. Over the past decade, it has been debated how bottom-up and top-down processes contribute to this illusion. In this study, we demonstrate that randomly phase-shifting gratings can produce the FDE. In the random motion sequence we used, the FDE inducer (a sinusoidal grating) jumped to a random phase every 125 ms and stood still until the next jump. Because this random sequence could not be tracked attentively, it was impossible for the observer to discern the jump direction at the time of the flash. By sorting the data based on the flash’s onset time relative to each jump time in the random motion sequence, we found that a large FDE with a broad temporal tuning occurred around 50 to 150 ms before the jump and that this effect was not correlated with any other jumps in the past or future. These results suggest that as few as two frames of unpredictable apparent motion can preattentively cause the FDE with a broad temporal tuning.
motion; position perception; flash drag; binding
The pulvinar nucleus of the thalamus is suspected to play an important role in visual attention, based on its widespread connectivity with the visual cortex and the fronto-parietal attention network. However, at present, there remain many hypotheses on the pulvinar’s specific function, with sparse or conflicting evidence for each. Here we characterize how the human pulvinar encodes attended and ignored objects when they appear simultaneously and compete for attentional resources. Using multivoxel pattern analyses on data from two fMRI experiments, we show that attention gates both position and orientation information in the pulvinar: attended objects are encoded with high precision, while there is no measurable encoding of ignored objects. These data support a role of the pulvinar in distractor filtering – suppressing information from competing stimuli in order to isolate behaviorally relevant objects.
vision; perception; selective attention; spatial attention; distractor filtering; thalamus; fMRI; visual cortex
When a test is flashed on top of two superimposed, opposing motions, the perceived location of the test is shifted in opposite directions depending on which of the two motions is attended. Because the stimulus remains unchanged as attention switches from one motion to the other, the effect cannot be due to stimulus-driven, low-level motion. A control condition ruled out any contribution from possible attention-induced cyclotorsion of the eyes. This provides the strongest evidence to date for a role of attention in the perception of location, and establishes that what we attend to influences where we perceive objects to be.
attention; perceptual organization; motion-2D
Fragile X syndrome is the most common cause of inherited intellectual impairment and the most common single-gene cause of autism. Individuals with fragile X syndrome present with a neurobehavioural phenotype that includes selective deficits in spatiotemporal visual perception associated with neural processing in frontal–parietal networks of the brain. The goal of the current study was to examine whether reduced resolution of spatial and/or temporal visual attention may underlie perceptual deficits related to fragile X syndrome. Eye tracking was used to psychophysically measure the limits of spatial and temporal attention in infants with fragile X syndrome and age-matched neurotypically developing infants. Results from these experiments revealed that infants with fragile X syndrome experience drastically reduced resolution of temporal attention in a genetic dose-sensitive manner, but have a spatial resolution of attention that is not impaired. Coarse temporal attention could have significant knock-on effects for the development of perceptual, cognitive and motor abilities in individuals with the disorder.
crowding; flicker; magnocellular; Mooney; contrast sensitivity
Diagnostic features of emotional expressions are differentially distributed across the face. The current study examined whether these diagnostic features are preferentially attended to even when they are irrelevant for the task at hand or when faces appear at different locations in the visual field. To this aim, fearful, happy and neutral faces were presented to healthy individuals in two experiments while measuring eye movements. In Experiment 1, participants had to accomplish an emotion classification, a gender discrimination or a passive viewing task. To differentiate fast, potentially reflexive, eye movements from a more elaborate scanning of faces, stimuli were either presented for 150 or 2000 ms. In Experiment 2, similar faces were presented at different spatial positions to rule out the possibility that eye movements only reflect a general bias for certain visual field locations. In both experiments, participants fixated the eye region much longer than any other region in the face. Furthermore, the eye region was attended to more pronouncedly when fearful or neutral faces were shown whereas more attention was directed toward the mouth of happy facial expressions. Since these results were similar across the other experimental manipulations, they indicate that diagnostic features of emotional expressions are preferentially processed irrespective of task demands and spatial locations. Saliency analyses revealed that a computational model of bottom-up visual attention could not explain these results. Furthermore, as these gaze preferences were evident very early after stimulus onset and occurred even when saccades did not allow for extracting further information from these stimuli, they may reflect a preattentive mechanism that automatically detects relevant facial features in the visual field and facilitates the orientation of attention towards them. This mechanism might crucially depend on amygdala functioning and it is potentially impaired in a number of clinical conditions such as autism or social anxiety disorders.
Crowding, the inability to recognize objects in clutter, sets a fundamental limit on conscious visual perception and object recognition throughout most of the visual field. Despite how widespread and essential it is to object recognition, reading, and visually guided action, a solid operational definition of what crowding is has only recently become clear. The goal of this review is to provide a broad-based synthesis of the most recent findings in this area, to define what crowding is and is not, and to set the stage for future work that will extend crowding well beyond low-level vision. Here we define five diagnostic criteria for what counts as crowding, and further describe factors that both escape and break crowding. All of these lead to the conclusion that crowding occurs at multiple stages in the visual hierarchy.
Neural transmission latency would introduce a spatial lag when an object moves across the visual field, if the latency was not compensated. A visual predictive mechanism has been proposed, which overcomes such spatial lag by extrapolating the position of the moving object forward. However, a forward position shift is often absent if the object abruptly stops moving (motion-termination). A recent “correction-for-extrapolation” hypothesis suggests that the absence of forward shifts is caused by sensory signals representing ‘failed’ predictions. Thus far, this hypothesis has been tested only for extra-foveal retinal locations. We tested this hypothesis using two foveal scotomas: scotoma to dim light and scotoma to blue light. We found that the perceived position of a dim dot is extrapolated into the fovea during motion-termination. Next, we compared the perceived position shifts of a blue versus a green moving dot. As predicted the extrapolation at motion-termination was only found with the blue moving dot. The results provide new evidence for the correction-for-extrapolation hypothesis for the region with highest spatial acuity, the fovea.
When a video of someone speaking is paused, the stationary image of the speaker typically appears less flattering than the video, which contained motion. We call this the frozen face effect (FFE). Here we report six experiments intended to quantify this effect and determine its cause. In Experiment 1, video clips of people speaking in naturalistic settings as well as all of the static frames that composed each video were presented, and subjects rated how flattering each stimulus was. The videos were rated to be significantly more flattering than the static images, confirming the FFE. In Experiment 2, videos and static images were inverted, and the videos were again rated as more flattering than the static images. In Experiment 3, a discrimination task measured recognition of the static images that composed each video. Recognition did not correlate with flattery ratings, suggesting that the FFE is not due to better memory for particularly distinct images. In Experiment 4, flattery ratings for groups of static images were compared with those for videos and static images. Ratings for the video stimuli were higher than those for either the group or individual static stimuli, suggesting that the amount of information available is not what produces the FFE. In Experiment 5, videos were presented under four conditions: forward motion, inverted forward motion, reversed motion, and scrambled frame sequence. Flattery ratings for the scrambled videos were significantly lower than those for the other three conditions. In Experiment 6, as in Experiment 2, inverted videos and static images were compared with upright ones, and the response measure was changed to perceived attractiveness. Videos were rated as more attractive than the static images for both upright and inverted stimuli. Overall, the results suggest that the FFE requires continuous, natural motion of faces, is not sensitive to inversion, and is not due to a memory effect.
face perception; static images; dynamic images; attractiveness; fluency
Conscious visual perception of the constantly changing environment is one of the brain’s most critical functions. In virtually every moment of every daily activity, the visual system is confronted with the task of accurately representing and interpreting scenes that change rapidly over time. Adults can judge the identity and order of changing images presented at a rate of up to 10 Hz (~50 ms per image); this limit reflects a finite temporal resolution of attention. In the research reported here, although 6- to 15-month-old infants could detect the presence of rapid flicker without difficulty, their ability to segment individual alternating states within the flicker was severely limited: Fifteen-month-old infants had a temporal resolution of attention approximately one order of magnitude lower than that of adults (~1 Hz). Coarse temporal resolution constrains how infants perceive and utilize dynamic visual information and may play a role in the visual processing deficits found in individuals with neurodevelopmental disorders.
temporal individuation; Gestalt flicker fusion; contrast sensitivity
Human object recognition degrades sharply as the target object moves from central vision into peripheral vision. In particular, one's ability to recognize a peripheral target is severely impaired by the presence of flanking objects, a phenomenon known as visual crowding. Recent studies on how visual awareness of flanker existence influences crowding had shown mixed results. More importantly, it is not known whether conscious awareness of the existence of both the target and flankers are necessary for crowding to occur.
Here we show that crowding persists even when people are completely unaware of the flankers, which are rendered invisible through the continuous flash suppression technique. Contrast threshold for identifying the orientation of a grating pattern was elevated in the flanked condition, even when the subjects reported that they were unaware of the perceptually suppressed flankers. Moreover, we find that orientation-specific adaptation is attenuated by flankers even when both the target and flankers are invisible.
These findings complement the suggested correlation between crowding and visual awareness. What's more, our results demonstrate that conscious awareness and attention are not prerequisite for crowding.