Watch any crowded intersection, and you will see how adept people are at reading the subtle movements of one another. While adults can readily discriminate small differences in the direction of a moving person, it is unclear if this sensitivity is in place early in development. Here, we present evidence that 4-year-old children are sensitive to small differences in a person's direction of walking (~7°) far beyond what has been previously shown. This sensitivity only occurred for perception of an upright walker, consistent with the recruitment of high-level visual areas. Even at 4 years of age, children's sensitivity approached that of adults’. This suggests that the sophisticated mechanisms adults use to perceive a person's direction of movement are in place and developing early in childhood. Although the neural mechanisms for perceiving biological motion develop slowly, they are refined enough by age 4 to support subtle perceptual judgments of heading. These judgments may be useful for predicting a person's future location or even their intentions and goals.
biological motion; perceptual development; direction; extrapolation; predictive judgment
Retinal motion can modulate visual sensitivity. For instance, low contrast drifting waveforms (targets) can be easier to detect when abutting the leading edges of movement in adjacent high contrast waveforms (inducers), rather than the trailing edges. This target-inducer interaction is contingent on the adjacent waveforms being consistent with one another – in-phase as opposed to out-of-phase. It has been suggested that this happens because there is a perceptually explicit predictive signal at leading edges of motion that summates with low contrast physical input – a ‘predictive summation’. Another possible explanation is a phase sensitive ‘spatial summation’, a summation of physical inputs spread across the retina (not predictive signals). This should be non-selective in terms of position – it should be evident at leading, adjacent, and at trailing edges of motion. To tease these possibilities apart, we examined target sensitivity at leading, adjacent, and trailing edges of motion. We also examined target sensitivity adjacent to flicker, and for a stimulus that is less susceptible to spatial summation, as it sums to grey across a small retinal expanse. We found evidence for spatial summation in all but the last condition. Finally, we examined sensitivity to an absence of signal at leading and trailing edges of motion, finding greater sensitivity at leading edges. These results are inconsistent with the existence of a perceptually explicit predictive signal in advance of drifting waveforms. Instead, we suggest that phase-contingent target-inducer modulations of sensitivity are explicable in terms of a directionally modulated spatial summation.
Motion; Spatial coding; Spatial summation; Predictive coding
Visual input often arrives in a noisy and discontinuous stream, owing to head and eye movements, occlusion, lighting changes, and many other factors. Yet the physical world is generally stable—objects and physical characteristics rarely change spontaneously. How then does the human visual system capitalize on continuity in the physical environment over time? Here we show that visual perception is serially dependent, using both prior and present input to inform perception at the present moment. Using an orientation judgment task, we found that even when visual input changes randomly over time, perceived orientation is strongly and systematically biased toward recently seen stimuli. Further, the strength of this bias is modulated by attention and tuned to the spatial and temporal proximity of successive stimuli. These results reveal a serial dependence in perception characterized by a spatiotemporally tuned, orientation-selective operator—which we call a continuity field—that may promote visual stability over time.
visual history; aftereffect; tilt aftereffect; visual perception; orientation
Individuals can rapidly and precisely judge the average of a set of similar items, including both low-level (Ariely, 2001) and high-level objects (Haberman & Whitney, 2007). However, to date, it is unclear whether ensemble perception is based on viewpoint-invariant object representations. Here, we tested this question by presenting participants with crowds of sequentially presented faces. The number of faces in each crowd and the viewpoint of each face varied from trial to trial. This design required participants to integrate information from multiple viewpoints into one ensemble percept. Participants reported the mean identity of crowds (e.g., family resemblance) using an adjustable, forward-oriented test face. Our results showed that participants accurately perceived the mean crowd identity even when required to incorporate information across multiple face orientations. Control experiments showed that the precision of ensemble coding was not solely dependent on the length of time participants viewed the crowd. Moreover, control analyses demonstrated that observers did not simply sample a subset of faces in the crowd but rather integrated many faces into their estimates of average crowd identity. These results demonstrate that ensemble perception can operate at the highest levels of object recognition after 3-D viewpoint-invariant faces are represented.
ensemble coding; face perception; statistical summary
How is visual space represented in cortical area MT+? At a relatively coarse scale, the organization of MT+ is debated: Retinotopic, spatiotopic, or mixed representations have been proposed. However, none of these entirely explains perceptual localization of objects at a fine spatial scale—a scale relevant for tasks like navigating or manipulating objects. For example, perceived positions of objects are strongly modulated by visual motion: stationary flashes appear shifted in the direction of nearby motion. Does spatial coding in MT+ reflect these shifts in perceived position? We performed an fMRI experiment employing this “flash-drag effect”, and found that flashes presented near motion produced patterns of activity similar to physically shifted flashes in the absence of motion. This reveals a motion-dependent change in the neural representation of object position in human MT+, a process that could help compensate for perceptual and motor delays in localizing objects in dynamic scenes.
One of the most important functions of vision is to direct actions to objects1. However, every time that vision is used to guide an action, retinal motion signals are produced by the movement of the eye and head as the person looks at the object or by the motion of other objects in the scene. To reach for the object accurately, the visuomotor system must separate information about the position of the stationary target from background retinal motion signals—a long-standing problem that is poorly understood2–7. Here we show that the visuomotor system does not distinguish between these two information sources: when observers made fast reaching movements to a briefly presented stationary target, their hand shifted in a direction consistent with the motion of a distant and unrelated stimulus, a result contrary to most other findings8,9. This can be seen early in the hand’s trajectory (~120 ms) and occurs continuously from programming of the movement through to its execution. The visuomotor system might make use of the motion signals arising from eye and head movements to update the positions of targets rapidly and redirect the hand to compensate for body movements.
One of the most fundamental functions of the visual system is to code the positions of objects. Most studies, especially those using fMRI, widely assume that the location of the peak retinotopic activity generated in the visual cortex by an object is the position assigned to that object—this is a simplified version of the local sign hypothesis. Here, we employed a novel technique to compare the pattern of responses to moving and stationary objects and found that the local sign hypothesis is false. By spatially correlating populations of voxel responses to different moving and stationary stimuli in different positions, we recovered the modulation transfer function for moving patterns. The results show that the pattern of responses to a moving object is best correlated with the response to a static object that is located behind the moving one. The pattern of responses across the visual cortex was able to distinguish object positions separated by about 0.25 deg visual angle, equivalent to approximately 0.25 mm cortical distance. We also found that the position assigned to a pattern is not simply dictated by the peak activity—the shape of the luminance envelope and the resulting shape of the population response, including the shape and skew in the response at the edges of the pattern, influences where the visual cortex assigns the object’s position. Therefore, visually coded position is not conveyed by the peak but by the overall profile of activity. © 2006 Elsevier Ltd. All rights reserved.
Vision; Perception; Motion; Motion perception; fMRI; Retinotopy; Topography; Local sign; Localization; Labeled line; Visual cortex; Striate cortex
Movement of the body, head, or eyes with respect to the world creates one of the most common yet complex situations in which the visuomotor system must localize objects. In this situation, vestibular, proprioceptive, and extra-retinal information contribute to accurate visuomotor control. The utility of retinal motion information, on the other hand, is questionable, since a single pattern of retinal motion can be produced by any number of head or eye movements. Here we investigated whether retinal motion during a smooth pursuit eye movement contributes to visuomotor control. When subjects pursued a moving object with their eyes and reached to the remembered location of a separate stationary target, the presence of a moving background significantly altered the endpoints of their reaching movements. A background that moved with the pursuit, creating a retinally stationary image (no retinal slip), caused the endpoints of the reaching movements to deviate in the direction of pursuit, overshooting the target. A physically stationary background pattern, however, producing retinal image motion opposite to the direction of pursuit, caused reaching movements to become more accurate. The results indicate that background retinal motion is used by the visuomotor system in the control of visually guided action.
Visual perception; Motion; Position; Visuomotor; Reaching; Pointing; Action
How does the visual system assign the perceived position of a moving object? This question is surprisingly complex, since sluggish responses of photoreceptors and transmission delays along the visual pathway mean that visual cortex does not have immediate information about a moving object's position. In the flash-lag effect (FLE), a moving object is perceived ahead of an aligned flash. Psychophysical work on this illusion has inspired models for visual localization of moving objects. However, little is known about the underlying neural mechanisms. Here, we investigated the role of neural activity in areas MT+ and V1/V2 in localizing moving objects. Using short trains of repetitive Transcranial Magnetic Stimulation (TMS) or single pulses at different time points, we measured the influence of TMS on the perceived location of a moving object. We found that TMS delivered to MT+ significantly reduced the FLE; single pulse timings revealed a broad temporal tuning with maximum effect for TMS pulses, 200 ms after the flash. Stimulation of V1/V2 did not significantly influence perceived position. Our results demonstrate that area MT+ contributes to the perceptual localization of moving objects and is involved in the integration of position information over a long time window.
flash-lag effect; moving objects; MT+; perceived position; TMS
In daily life, we make several saccades per second to objects we cannot normally recognize in the periphery due to visual crowding. While we are aware of the presence of these objects, we cannot identify them and may, at best, only know that an object is present at a particular location. The process of planning a saccade involves a presaccadic attentional component known to be critical for saccadic accuracy, but whether this or other presaccadic processes facilitate object identification as opposed to object detection—especially with high level natural objects like faces—is less clear. In the following experiments, we show that presaccadic information about a crowded face reduces the deleterious effect of crowding, facilitating discrimination of two emotional faces, even when the target face is never foveated. While accurate identification of crowded objects is possible in the absence of a saccade, accurate identification of a crowded object is considerably facilitated by presaccadic attention. Our results provide converging evidence for a selective increase in available information about high level objects, such as faces, at a presaccadic stage.
crowding; saccades; presaccadic information; presaccadic attention; visually guided action
Because the environment is cluttered, objects rarely appear in isolation. The visual system must therefore attentionally select behaviorally relevant objects from among many irrelevant ones. A limit on our ability to select individual objects is revealed by the phenomenon of visual crowding: an object seen in the periphery, easily recognized in isolation, can become impossible to identify when surrounded by other, similar objects. The neural basis of crowding is hotly debated: while prevailing theories hold that crowded information is irrecoverable – destroyed due to over-integration in early stage visual processing – recent evidence demonstrates otherwise. Crowding can occur between high-level, configural object representations, and crowded objects can contribute with high precision to judgments about the “gist” of a group of objects, even when they are individually unrecognizable. While existing models can account for the basic diagnostic criteria of crowding (e.g., specific critical spacing, spatial anisotropies, and temporal tuning), no present model explains how crowding can operate simultaneously at multiple levels in the visual processing hierarchy, including at the level of whole objects. Here, we present a new model of visual crowding—the hierarchical sparse selection (HSS) model, which accounts for object-level crowding, as well as a number of puzzling findings in the recent literature. Counter to existing theories, we posit that crowding occurs not due to degraded visual representations in the brain, but due to impoverished sampling of visual representations for the sake of perception. The HSS model unifies findings from a disparate array of visual crowding studies and makes testable predictions about how information in crowded scenes can be accessed.
attention; visual attention; coarse coding; ensemble coding; summary statistics; perception; neural network
The visual system rapidly represents the mean size of sets of objects. Here, we investigated whether mean size is explicitly encoded by the visual system, along a single dimension like texture, numerosity, and other visual dimensions susceptible to adaptation. Observers adapted to two sets of dots with different mean sizes, presented simultaneously in opposite visual fields. After adaptation, two test patches replaced the adapting dot sets, and participants judged which test appeared to have the larger average dot diameter. They generally perceived the test that replaced the smaller mean size adapting set as being larger than the test that replaced the larger adapting set. This differential aftereffect held for single test dots (Experiment 2) and high-pass filtered displays (Experiment 3), and changed systematically as a function of the variance of the adapting dot sets (Experiment 4), providing additional support that mean size is adaptable, and therefore explicitly encoded dimension of visual scenes.
Mean size; Adaptation aftereffect; Summary representations
It is difficult to recognize an object that falls in the peripheral visual field; it is even more difficult when there are other objects surrounding it. This effect, known as crowding, could be due to interactions between the low-level parts or features of the surrounding objects. Here, we investigated whether crowding can also occur selectively between higher level object representations. Many studies have demonstrated that upright faces, unlike most other objects, are coded holistically. Therefore, in addition to featural crowding within a face (M. Martelli, N. J. Majaj, & D. G. Pelli, 2005), we might expect an additional crowding effect between upright faces due to interference between the higher level holistic representations of these faces. In a series of experiments, we tested this by presenting an upright target face in a crowd of additional upright or inverted faces. We found that recognition was more strongly impaired when the target face was surrounded by upright compared to inverted flanker (distractor) faces; this pattern of results was absent when inverted faces and non-face objects were used as targets. This selective crowding of upright faces by other upright faces only occurred when the target–flanker separation was less than half the eccentricity of the target face, consistent with traditional crowding effects (H. Bouma, 1970; D. G. Pelli, M. Palomares, & N. J. Majaj, 2004). Likewise, the selective interference between upright faces did not occur at the fovea and was not a function of the target–flanker similarity, suggesting that crowding-specific processes were responsible. The results demonstrate that crowding can occur selectively between high-level representations of faces and may therefore occur at multiple stages in the visual system.
vision; perception; awareness; face recognition; ensemble; spatial; lateral; masking; object
The ability of the visual system to localize objects is one of its most important functions and yet remains one of the least understood, especially when either the object or the surrounding scene is in motion. The specific process that assigns positions under these circumstances is unknown, but two major classes of mechanism have emerged: spatial mechanisms that directly influence the coded locations of objects, and temporal mechanisms that influence the speed of perception. Disentangling these mechanisms is one of the first steps towards understanding how the visual system assigns locations to objects when there are motion signals present in the scene.
Echolocating organisms represent their external environment using reflected auditory information from emitted vocalizations. This ability, long known in various non-human species, has also been documented in some blind humans as an aid to navigation, as well as object detection and coarse localization. Surprisingly, our understanding of the basic acuity attainable by practitioners—the most fundamental underpinning of echoic spatial perception—remains crude. We found that experts were able to discriminate horizontal offsets of stimuli as small as ~1.2° auditory angle in the frontomedial plane, a resolution approaching the maximum measured precision of human spatial hearing and comparable to that found in bats performing similar tasks. Furthermore, we found a strong correlation between echolocation acuity and age of blindness onset. This first measure of functional spatial resolution in a population of expert echolocators demonstrates precision comparable to that found in the visual periphery of sighted individuals.
Echolocation; Perception; Crossmodal perception; Blindness
One of the most important aspects of visual attention is its flexibility; our attentional “window” can be tuned to different spatial scales, allowing us to perceive large-scale global patterns and local features effortlessly. We investigated whether the perception of global and local motion competes for a common attentional resource. Subjects viewed arrays of individual moving Gabors that group to produce a global motion percept when subjects attended globally. When subjects attended locally, on the other hand, they could identify the direction of individual uncrowded Gabors. Subjects were required to devote their attention toward either scale of motion or divide it between global and local scales. We measured direction discrimination as a function of the validity of a precue, which was varied in opposite directions for global and local motion such that when the precue was valid for global motion, it was invalid for local motion and vice versa. There was a trade-off between global and local motion thresholds, such that increasing the validity of precues at one spatial scale simultaneously reduced thresholds at that spatial scale but increased thresholds at the other spatial scale. In a second experiment, we found a similar pattern of results for static-oriented Gabors: Attending to local orientation information impaired the subjects’ ability to perceive globally defined orientation and vice versa. Thresholds were higher for orientation compared to motion, however, suggesting that motion discrimination in the first experiment was not driven by orientation information alone but by motion-specific processing. The results of these experiments demonstrate that a shared attentional resource flexibly moves between different spatial scales and allows for the perception of both local and global image features, whether these features are defined by motion or orientation.
attention; spatial scale; global; local; motion perception; integration; segmentation
Despite several Findings of perceptual asynchronies between object features, it remains unclear whether independent neuronal populations necessarily code these perceptually unbound properties. To examine this, we investigated the binding between an object’s spatial frequency and its rotational motion using contingent motion aftereffects (MAE). Subjects adapted to an oscillating grating whose direction of rotation was paired with a high or low spatial frequency pattern. In separate adaptation conditions, we varied the moment when the spatial frequency change occurred relative to the direction reversal. After adapting to one stimulus, subjects made judgments of either the perceived MAE (rotational movement) or the position shift (instantaneous phase rotation) that accompanied the MAE. To null the spatial frequency-contingent MAE, motion reversals had to physically lag changes in spatial frequency during adaptation. To null the position shift that accompanied the MAE, however, no temporal lag between the attributes was required. This demonstrates that perceived motion and position can be perceptually misbound. Indeed, in certain conditions, subjects perceived the test pattern to drift in one direction while its position appeared shifted in the opposite direction. The dissociation between perceived motion and position of the same test pattern, following identical adaptation, demonstrates that distinguishable neural populations code for these object properties.
Vision; Perception; Motion; Color-motion asynchrony; Binding; Spatial frequency; Texture; Contingent aftereffect; Motion aftereffect; MAE; Localization; Differential latency
Although second-order motion may be detected by early and automatic mechanisms, some models suggest that perceiving second-order motion requires higher-order processes, such as feature or attentive tracking. These types of attentionally mediated mechanisms could explain the motion aftereffect (MAE) perceived in dynamic displays after adapting to second-order motion. Here we tested whether there is a second-order MAE in the absence of attention or awareness. If awareness of motion, mediated by high-level or top-down mechanisms, is necessary for the second-order MAE, then there should be no measurable MAE if the ability to detect directionality is impaired during adaptation. To eliminate the subject’s ability to detect directionality of the adapting stimulus, a second-order drifting Gabor was embedded in a dense array of additional crowding Gabors. We found that a significant MAE was perceived even after adaptation to second-order motion in crowded displays that prevented awareness. The results demonstrate that second-order motion can be passively coded in the absence of awareness and without top-down attentional control.
Vision; Attention; Perception; Motion; Motion perception; Second-order motion; Crowding; Awareness; Consciousness; First-order motion; MAE; Contrast-defined
Motion can bias the perceived location of a stationary stimulus, but whether this occurs at a high level of representation or at early, retinotopic stages of visual processing remains an open question. As coding of orientation emerges early in visual processing, we tested whether motion could influence the spatial location at which orientation adaptation is seen. Specifically, we examined whether the tilt aftereffect (TAE) depends on the perceived or the retinal location of the adapting stimulus, or both. We used the flash-drag effect (FDE) to produce a shift in the perceived position of the adaptor away from its retinal location. Subjects viewed a patterned disk that oscillated clockwise and counterclockwise while adapting to a small disk containing a tilted linear grating that was flashed briefly at the moment of the rotation reversals. The FDE biased the perceived location of the grating in the direction of the disk's motion immediately following the flash, allowing dissociation between the retinal and perceived location of the adaptor. Brief test gratings were subsequently presented at one of three locations—the retinal location of the adaptor, its perceived location, or an equidistant control location (antiperceived location). Measurements of the TAE at each location demonstrated that the TAE was strongest at the retinal location, and was larger at the perceived compared to the antiperceived location. This indicates a skew in the spatial tuning of the TAE consistent with the FDE. Together, our findings suggest that motion can bias the location of low-level adaptation.
motion processing; flash-drag effect; tilt-aftereffect; orientation adaptation
Although the visual cortex is organized retinotopically, it is not clear whether the cortical representation of position necessarily reflects perceived position. Using functional magnetic resonance imaging (fMRI), we show that the retinotopic representation of a stationary object in the cortex was systematically shifted when visual motion was present in the scene. Whereas the object could appear shifted in the direction of the visual motion, the representation of the object in the visual cortex was always shifted in the opposite direction. The results show that the representation of position in the primary visual cortex, as revealed by fMRI, can be dissociated from perceived location.
Visual information is crucial for goal-directed reaching. A number of studies have recently shown that motion in particular is an important source of information for the visuomotor system. For example, when reaching a stationary object, movement of the background can influence the trajectory of the hand, even when the background motion is irrelevant to the object and task. This manual following response may be a compensatory response to changes in body position, but the underlying mechanism remains unclear. Here we tested whether visual motion area MT+ is necessary to generate the manual following response. We found that stimulation of MT+ with transcranial magnetic stimulation significantly reduced a strong manual following response. MT+ is therefore necessary for generating the manual following response, indicating that it plays a crucial role in guiding goal-directed reaching movements by taking into account background motion in scenes.
action; localization; manual following response; perception; pointing; TMS; visuomotor
Recent evidence suggests those with autism may be generally impaired in visual motion perception. To examine this, we investigated both coherent and biological motion processing in adolescents with autism employing both psychophysical and fMRI methods. Those with autism performed as well as matched controls during coherent motion perception but had significantly higher thresholds for biological motion perception. The autism group showed reduced posterior Superior Temporal Sulcus (pSTS), parietal and frontal activity during a biological motion task while showing similar levels of activity in MT+/V5 during both coherent and biological motion trials. Activity in MT+/V5 was predictive of individual coherent motion thresholds in both groups. Activity in dorsolateral prefrontal cortex (DLPFC) and pSTS was predictive of biological motion thresholds in control participants but not in those with autism. Notably, however, activity in DLPFC was negatively related to autism symptom severity. These results suggest that impairments in higher-order social or attentional networks may underlie visual motion deficits observed in autism.