The pulvinar nucleus of the thalamus is suspected to play an important role in visual attention, based on its widespread connectivity with the visual cortex and the fronto-parietal attention network. However, at present, there remain many hypotheses on the pulvinar’s specific function, with sparse or conflicting evidence for each. Here we characterize how the human pulvinar encodes attended and ignored objects when they appear simultaneously and compete for attentional resources. Using multivoxel pattern analyses on data from two fMRI experiments, we show that attention gates both position and orientation information in the pulvinar: attended objects are encoded with high precision, while there is no measurable encoding of ignored objects. These data support a role of the pulvinar in distractor filtering – suppressing information from competing stimuli in order to isolate behaviorally relevant objects.
vision; perception; selective attention; spatial attention; distractor filtering; thalamus; fMRI; visual cortex
When a test is flashed on top of two superimposed, opposing motions, the perceived location of the test is shifted in opposite directions depending on which of the two motions is attended. Because the stimulus remains unchanged as attention switches from one motion to the other, the effect cannot be due to stimulus-driven, low-level motion. A control condition ruled out any contribution from possible attention-induced cyclotorsion of the eyes. This provides the strongest evidence to date for a role of attention in the perception of location, and establishes that what we attend to influences where we perceive objects to be.
attention; perceptual organization; motion-2D
Fragile X syndrome is the most common cause of inherited intellectual impairment and the most common single-gene cause of autism. Individuals with fragile X syndrome present with a neurobehavioural phenotype that includes selective deficits in spatiotemporal visual perception associated with neural processing in frontal–parietal networks of the brain. The goal of the current study was to examine whether reduced resolution of spatial and/or temporal visual attention may underlie perceptual deficits related to fragile X syndrome. Eye tracking was used to psychophysically measure the limits of spatial and temporal attention in infants with fragile X syndrome and age-matched neurotypically developing infants. Results from these experiments revealed that infants with fragile X syndrome experience drastically reduced resolution of temporal attention in a genetic dose-sensitive manner, but have a spatial resolution of attention that is not impaired. Coarse temporal attention could have significant knock-on effects for the development of perceptual, cognitive and motor abilities in individuals with the disorder.
crowding; flicker; magnocellular; Mooney; contrast sensitivity
Diagnostic features of emotional expressions are differentially distributed across the face. The current study examined whether these diagnostic features are preferentially attended to even when they are irrelevant for the task at hand or when faces appear at different locations in the visual field. To this aim, fearful, happy and neutral faces were presented to healthy individuals in two experiments while measuring eye movements. In Experiment 1, participants had to accomplish an emotion classification, a gender discrimination or a passive viewing task. To differentiate fast, potentially reflexive, eye movements from a more elaborate scanning of faces, stimuli were either presented for 150 or 2000 ms. In Experiment 2, similar faces were presented at different spatial positions to rule out the possibility that eye movements only reflect a general bias for certain visual field locations. In both experiments, participants fixated the eye region much longer than any other region in the face. Furthermore, the eye region was attended to more pronouncedly when fearful or neutral faces were shown whereas more attention was directed toward the mouth of happy facial expressions. Since these results were similar across the other experimental manipulations, they indicate that diagnostic features of emotional expressions are preferentially processed irrespective of task demands and spatial locations. Saliency analyses revealed that a computational model of bottom-up visual attention could not explain these results. Furthermore, as these gaze preferences were evident very early after stimulus onset and occurred even when saccades did not allow for extracting further information from these stimuli, they may reflect a preattentive mechanism that automatically detects relevant facial features in the visual field and facilitates the orientation of attention towards them. This mechanism might crucially depend on amygdala functioning and it is potentially impaired in a number of clinical conditions such as autism or social anxiety disorders.
Crowding, the inability to recognize objects in clutter, sets a fundamental limit on conscious visual perception and object recognition throughout most of the visual field. Despite how widespread and essential it is to object recognition, reading, and visually guided action, a solid operational definition of what crowding is has only recently become clear. The goal of this review is to provide a broad-based synthesis of the most recent findings in this area, to define what crowding is and is not, and to set the stage for future work that will extend crowding well beyond low-level vision. Here we define five diagnostic criteria for what counts as crowding, and further describe factors that both escape and break crowding. All of these lead to the conclusion that crowding occurs at multiple stages in the visual hierarchy.
Neural transmission latency would introduce a spatial lag when an object moves across the visual field, if the latency was not compensated. A visual predictive mechanism has been proposed, which overcomes such spatial lag by extrapolating the position of the moving object forward. However, a forward position shift is often absent if the object abruptly stops moving (motion-termination). A recent “correction-for-extrapolation” hypothesis suggests that the absence of forward shifts is caused by sensory signals representing ‘failed’ predictions. Thus far, this hypothesis has been tested only for extra-foveal retinal locations. We tested this hypothesis using two foveal scotomas: scotoma to dim light and scotoma to blue light. We found that the perceived position of a dim dot is extrapolated into the fovea during motion-termination. Next, we compared the perceived position shifts of a blue versus a green moving dot. As predicted the extrapolation at motion-termination was only found with the blue moving dot. The results provide new evidence for the correction-for-extrapolation hypothesis for the region with highest spatial acuity, the fovea.
When a video of someone speaking is paused, the stationary image of the speaker typically appears less flattering than the video, which contained motion. We call this the frozen face effect (FFE). Here we report six experiments intended to quantify this effect and determine its cause. In Experiment 1, video clips of people speaking in naturalistic settings as well as all of the static frames that composed each video were presented, and subjects rated how flattering each stimulus was. The videos were rated to be significantly more flattering than the static images, confirming the FFE. In Experiment 2, videos and static images were inverted, and the videos were again rated as more flattering than the static images. In Experiment 3, a discrimination task measured recognition of the static images that composed each video. Recognition did not correlate with flattery ratings, suggesting that the FFE is not due to better memory for particularly distinct images. In Experiment 4, flattery ratings for groups of static images were compared with those for videos and static images. Ratings for the video stimuli were higher than those for either the group or individual static stimuli, suggesting that the amount of information available is not what produces the FFE. In Experiment 5, videos were presented under four conditions: forward motion, inverted forward motion, reversed motion, and scrambled frame sequence. Flattery ratings for the scrambled videos were significantly lower than those for the other three conditions. In Experiment 6, as in Experiment 2, inverted videos and static images were compared with upright ones, and the response measure was changed to perceived attractiveness. Videos were rated as more attractive than the static images for both upright and inverted stimuli. Overall, the results suggest that the FFE requires continuous, natural motion of faces, is not sensitive to inversion, and is not due to a memory effect.
face perception; static images; dynamic images; attractiveness; fluency
Conscious visual perception of the constantly changing environment is one of the brain’s most critical functions. In virtually every moment of every daily activity, the visual system is confronted with the task of accurately representing and interpreting scenes that change rapidly over time. Adults can judge the identity and order of changing images presented at a rate of up to 10 Hz (~50 ms per image); this limit reflects a finite temporal resolution of attention. In the research reported here, although 6- to 15-month-old infants could detect the presence of rapid flicker without difficulty, their ability to segment individual alternating states within the flicker was severely limited: Fifteen-month-old infants had a temporal resolution of attention approximately one order of magnitude lower than that of adults (~1 Hz). Coarse temporal resolution constrains how infants perceive and utilize dynamic visual information and may play a role in the visual processing deficits found in individuals with neurodevelopmental disorders.
temporal individuation; Gestalt flicker fusion; contrast sensitivity
Human object recognition degrades sharply as the target object moves from central vision into peripheral vision. In particular, one's ability to recognize a peripheral target is severely impaired by the presence of flanking objects, a phenomenon known as visual crowding. Recent studies on how visual awareness of flanker existence influences crowding had shown mixed results. More importantly, it is not known whether conscious awareness of the existence of both the target and flankers are necessary for crowding to occur.
Here we show that crowding persists even when people are completely unaware of the flankers, which are rendered invisible through the continuous flash suppression technique. Contrast threshold for identifying the orientation of a grating pattern was elevated in the flanked condition, even when the subjects reported that they were unaware of the perceptually suppressed flankers. Moreover, we find that orientation-specific adaptation is attenuated by flankers even when both the target and flankers are invisible.
These findings complement the suggested correlation between crowding and visual awareness. What's more, our results demonstrate that conscious awareness and attention are not prerequisite for crowding.
Adaptation to first-order (luminance defined) motion produces not only a motion aftereffect but also a position aftereffect, in which a target pattern’s perceived location is shifted opposite the direction of adaptation. These aftereffects can occur passively (when the direction of motion adaptation cannot be detected) and remotely (when the target is not at the site of adaptation). Although second-order (contrast defined) motion produces these aftereffects, it is unclear whether they can occur passively or remotely. To address these questions, we conducted two experiments. In the first, we used crowding to remove a local adapter’s second-order motion from awareness and still found a significant position aftereffect. In the second experiment, we found that the direction of motion in one region of a crowded array could produce a position aftereffect in an unadapted, spatially separated region of the crowded array. The results suggest that second-order motion influences perceived position over a large spatial range even without awareness.
crowding; motion; awareness; second order; contrast defined; localization; mislocalization; motion aftereffect; MAE; global motion; attention
There has been a recent surge in the study of ensemble coding, the idea that the visual system represents a set of similar items using summary statistics (Alvarez & Oliva, 2008; Ariely, 2001; Chong & Treisman, 2003; Parkes, Lund, Angelucci, Solomon, & Morgan, 2001). We previously demonstrated that this ability extends to faces and thus requires a high level of object processing (Haberman & Whitney, 2007, 2009). Recent debate has centered on the nature of the summary representation of size (e.g., Myczek & Simons, 2008) and whether the perceived average simply reflects the sampling of a very small subset of the items in a set. In the present study, we explored this further in the context of faces, asking observers to judge the average expressions of sets of faces containing emotional outliers. Our results suggest that the visual system implicitly and unintentionally discounts the emotional outliers, thereby computing a summary representation that encompasses the vast majority of the information present. Additional computational modeling and behavioral results reveal that an intentional, cognitive sampling strategy does not accurately capture observer performance. Observers derive precise ensemble information given a 250-msec exposure, suggesting a rapid and flexible system not bound by the limits of serial attention.
In everyday life, signals of danger, such as aversive facial expressions, usually appear in the peripheral visual field. Although facial expression processing in central vision has been extensively studied, this processing in peripheral vision has been poorly studied.
Using behavioral measures, we explored the human ability to detect fear and disgust vs. neutral expressions and compared it to the ability to discriminate between genders at eccentricities up to 40°. Responses were faster for the detection of emotion compared to gender. Emotion was detected from fearful faces up to 40° of eccentricity.
Our results demonstrate the human ability to detect facial expressions presented in the far periphery up to 40° of eccentricity. The increasing advantage of emotion compared to gender processing with increasing eccentricity might reflect a major implication of the magnocellular visual pathway in facial expression processing. This advantage may suggest that emotion detection, relative to gender identification, is less impacted by visual acuity and within-face crowding in the periphery. These results are consistent with specific and automatic processing of danger-related information, which may drive attention to those messages and allow for a fast behavioral reaction.
Crowding is a fundamental bottleneck in object recognition. In crowding, an object in the periphery becomes unrecognizable when surrounded by clutter or distractor objects. Crowding depends on the positions of target and distractors, both their eccentricity and their relative spacing. In all previous studies, position has been expressed in terms of retinal position. However, in a number of situations retinal and perceived positions can be dissociated. Does retinal or perceived position determine the magnitude of crowding? Here observers performed an orientation judgment on a target Gabor patch surrounded by distractors that drifted toward or away from the target, causing an illusory motion-induced position shift. Distractors in identical physical positions led to worse performance when they drifted towards the target (appearing closer) versus away from the target (appearing further). This difference in crowding corresponded to the difference in perceived positions. Further, the perceptual mislocalization was necessary for the change in crowding, and both the mislocalization and crowding scaled with drift speed. The results show that crowding occurs after perceived positions have been assigned by the visual system. Crowding does not operate in a purely retinal coordinate system; perceived positions need to be taken into account.
Peripheral objects and their features become indistinct when closely surrounding but non-overlapping objects are present. Most models suggest that this phenomenon, called “crowding”, reflects limitations of visual processing, but an intriguing idea is that it may be, in part, adaptive. Specifically, the mechanism generating crowding may simultaneously facilitate ensemble representations of features, leaving meaningful information about clusters of objects. In two experiments, we tested whether visual crowding and the perception of ensemble features share a common mechanism. Observers judged the orientation of a crowded bar, or the ensemble orientation of all bars in the upper and lower visual fields. While crowding was predictably stronger in the upper relative to lower visual field, the ensemble percept did not vary between the visual fields. Featural averaging within the crowded region does not always scale with the resolution limit defined by crowding, suggesting that dissociable processes contribute to visual crowding and ensemble percepts.
Crowding; ensemble perception; mean extraction; visual resolution
Compared with the echolocation performance of a blind expert, sighted novices rapidly learned size and position discrimination with surprising precision. We use a novel task to characterize the population distribution of echolocation skill in the sighted and report the highest known human echolocation acuity in our expert subject.
Spatial Perception; Auditory Perception; Blindness; Perception; Echolocation
A visual brain area that is thought to encode higher-level "place" information
responds instead to lower-level "edge" information. A corresponding brain area
is demonstrated in non-human species.
Defining the exact mechanisms by which the brain processes visual objects and
scenes remains an unresolved challenge. Valuable clues to this process have
emerged from the demonstration that clusters of neurons (“modules”)
in inferior temporal cortex apparently respond selectively to specific
categories of visual stimuli, such as places/scenes. However, the higher-order
“category-selective” response could also reflect specific
lower-level spatial factors. Here we tested this idea in multiple functional MRI
experiments, in humans and macaque monkeys, by systematically manipulating the
spatial content of geometrical shapes and natural images. These tests revealed
that visual spatial discontinuities (as reflected by an increased response to
high spatial frequencies) selectively activate a well-known place-selective
region of visual cortex (the “parahippocampal place area”) in
humans. In macaques, we demonstrate a homologous cortical area, and show that it
also responds selectively to higher spatial frequencies. The parahippocampal
place area may use such information for detecting object borders and scene
details during spatial perception and navigation.
Many reports suggest that different categories of visual stimuli are processed in
correspondingly specific “modules” in the visual cortex. For
instance, images of faces are processed in one cortical module (the
“fusiform face area”), while images of scenes are processed in an
adjacent module (the “parahippocampal place area,” or PPA). How does
the PPA encode for such high-level, complex visual scenes? In this study, we
show that at least part of the PPA response is due to a lower-level variable,
reflected as higher spatial frequencies. These are prominent in the edges and
details of scenes, but less prominent in faces and other stimuli. When we
altered standard images of faces and places so that they only contained low,
medium, or high spatial frequencies, we found that the PPA responded strongly to
images containing high spatial frequencies. Importantly, using the same stimuli
as for the human studies, we also demonstrated a homolog of human PPA in macaque
temporal cortex (“mPPA”). As in humans, mPPA responds selectively to
higher spatial frequencies. This demonstration of PPA in macaques paves the way
for carrying out further electrophysiological and anatomical studies that may
help elucidate the neural mechanisms for place selectivity in the human visual
Representing object position is one of the most critical functions of the visual system, but this task is not as simple as reading off an object's retinal coordinates. A rich body of literature has demonstrated that the position in which we perceive an object depends not only on retinotopy but also on factors such as attention, eye movements, object and scene motion, and frames of reference, to name a few. Despite the distinction between perceived and retinal position, strikingly little is known about how or where perceived position is represented in the brain. In the present study, we dissociated retinal and perceived object position to test the relative precision of retina-centered versus percept-centered position coding in a number of independently defined visual areas. In an fMRI experiment, subjects performed a five-alternative forced-choice position discrimination task; our analysis focused on the trials in which subjects misperceived the positions of the stimuli. Using a multivariate pattern analysis to track the coupling of the BOLD response with incremental changes in physical and perceived position, we found that activity in higher level areas—middle temporal complex, fusiform face area, parahippocampal place area, lateral occipital cortex, and posterior fusiform gyrus—more precisely reflected the reported positions than the physical positions of the stimuli. In early visual areas, this preferential coding of perceived position was absent or reversed. Our results demonstrate a new kind of spatial topography present in higher level visual areas in which an object's position is encoded according to its perceived rather than retinal location.
Preparing a goal directed movement often requires detailed analysis of our environment. When picking up an object, its orientation, size and relative distance are relevant parameters when preparing a successful grasp. It would therefore be beneficial if the motor system is able to influence early perception such that information processing needs for action control are met at the earliest possible stage. However, only a few studies reported (indirect) evidence for action-induced visual perception improvements. We therefore aimed to provide direct evidence for a feature-specific perceptual modulation during the planning phase of a grasping action. Human subjects were instructed to either grasp or point to a bar while simultaneously performing an orientation discrimination task. The bar could slightly change its orientation during grasping preparation. By analyzing discrimination response probabilities, we found increased perceptual sensitivity to orientation changes when subjects were instructed to grasp the bar, rather than point to it. As a control experiment, the same experiment was repeated using bar luminance changes, a feature that is not relevant for either grasping or pointing. Here, no differences in visual sensitivity between grasping and pointing were found. The present results constitute first direct evidence for increased perceptual sensitivity to a visual feature that is relevant for a certain skeletomotor act during the movement preparation phase. We speculate that such action-induced perception improvements are controlled by neuronal feedback mechanisms from cortical motor planning areas to early visual cortex, similar to what was recently established for spatial perception improvements shortly before eye movements.
Conscious awareness of objects in the visual periphery is limited. This limit is not entirely the result of reduced visual acuity, but is primarily caused by crowding—the inability to identify an object when surrounded by clutter. Crowding represents a fundamental limitation of the visual system, and has to date been unexplored in infants. Do infants have a fine-grained “spotlight”, similar to adults, or a diffuse “lantern” that sets limits on what they can register in the periphery? An eye-tracking paradigm was designed to psychophysically measure crowding in 6- to 15-month-olds by showing pairs of faces at three eccentricities, in the presence or absence of flankers, and recording infants’ first saccade from central fixation to either face. Results reveal that infants can discriminate faces in the periphery, and flankers impair this ability as close as 3 degrees; the effective spatial resolution of visual perception increased with age but was only half that of adults.
inversion; crowding; attention; peripheral vision; Mooney face
Peripheral objects and their features become indistinct when closely surrounding but nonoverlapping objects are present. Most models suggest that this phenomenon, called crowding, reflects limitations of visual processing, but an intriguing idea is that it may be, in part, adaptive. Specifically, the mechanism generating crowding may simultaneously facilitate ensemble representations of features, leaving meaningful information about clusters of objects. In two experiments, we tested whether visual crowding and the perception of ensemble features share a common mechanism. Observers judged the orientation of a crowded bar, or the ensemble orientation of all bars in the upper and lower visual fields. While crowding was predictably stronger in the upper relative to the lower visual field, the ensemble percept did not vary between the visual fields. Featural averaging within the crowded region does not always scale with the resolution limit defined by crowding, suggesting that dissociable processes contribute to visual crowding and ensemble percepts.
Crowding; Ensemble perception; Mean extraction; Visual resolution
Several groups have recently reported that people with autism may suffer from a deficit in visual motion processing and proposed that these deficits may be related to a general dorsal stream dysfunction. In order to test the dorsal stream deficit hypothesis, we investigated coherent and biological motion perception as well as coherent form perception in a group of adolescents with autism and a group of age-matched typically developing controls. If the dorsal stream hypothesis were true, we would expect to document deficits in both coherent and biological motion processing in this group but find no deficit in coherent form perception. Using the method of constant stimuli and standard psychophysical analysis techniques, we measured thresholds for coherent motion, biological motion and coherent form. We found that adolescents with autism showed reduced sensitivity to both coherent and biological motion but performed as well as age-matched controls during coherent form perception. Correlations between intelligence quotient and task performance, however, appear to drive much of the group difference in coherent motion perception. Differences between groups on coherent motion perception did not remain significant when intelligence quotient was controlled for, but group differences in biological motion perception were more robust, remaining significant even when intelligence quotient differences were accounted for. Additionally, aspects of task performance on the biological motion perception task were related to autism symptomatology. These results do not support a general dorsal stream dysfunction in adolescents with autism but provide evidence of a more complex impairment in higher-level dynamic attentional processes.
autism; visual motion; biological motion; coherent motion; dorsal stream
We tested whether the intervening time between multiple glances influences the independence of the resulting visual percepts. Observers estimated how many dots were present in brief displays that repeated one, two, three, four, or a random number of trials later. Estimates made farther apart in time were more independent, and thus carried more information about the stimulus when combined. In addition, estimates from different visual field locations were more independent than estimates from the same location. Our results reveal a retinotopic serial dependence in visual numerosity estimates, which may be a mechanism for maintaining the continuity of visual perception in a noisy environment.
fMRI; vision; perception; retinotopy; receptive field; gain; V2; V3; V4
Relatively few studies have been reported that document how proprioception varies across the workspace of the human arm. Here we examined proprioceptive function across a horizontal planar workspace, using a new method that avoids active movement and interactions with other sensory modalities. We systematically mapped both proprioceptive acuity (sensitivity to hand position change) and bias (perceived location of the hand), across a horizontal-plane 2D workspace. Proprioception of both the left and right arms was tested at nine workspace locations and in 2 orthogonal directions (left-right and forwards-backwards). Subjects made repeated judgments about the position of their hand with respect to a remembered proprioceptive reference position, while grasping the handle of a robotic linkage that passively moved their hand to each judgement location. To rule out the possibility that the memory component of the proprioceptive testing procedure may have influenced our results, we repeated the procedure in a second experiment using a persistent visual reference position. Both methods resulted in qualitatively similar findings. Proprioception is not uniform across the workspace. Acuity was greater for limb configurations in which the hand was closer to the body, and was greater in a forward-backward direction than in a left-right direction. A robust difference in proprioceptive bias was observed across both experiments. At all workspace locations, the left hand was perceived to be to the left of its actual position, and the right hand was perceived to be to the right of its actual position. Finally, bias was smaller for hand positions closer to the body. The results of this study provide a systematic map of proprioceptive acuity and bias across the workspace of the limb that may be used to augment computational models of sensory-motor control, and to inform clinical assessment of sensory function in patients with sensory-motor deficits.
An object or feature is generally more difficult to identify when other objects are presented nearby, an effect referred to as crowding. Here, we used Mooney faces to examine whether crowding can also occur within and between holistic face representations (C. M. Mooney, 1957). Mooney faces are ideal stimuli for this test because no cues exist to distinguish facial features in a Mooney face; to find any facial feature, such as an eye or a nose, one must first holistically perceive the image as a face. Through a series of six experiments we tested the effect of crowding on Mooney face recognition. Our results demonstrate crowding between and within Mooney faces and fulfill the diagnostic criteria for crowding, including eccentricity dependence and lack of crowding in the fovea, critical flanker spacing consistent with less than half the eccentricity of the target, and inner-outer flanker asymmetry. Further, our results show that recognition of an upright Mooney face is more strongly impaired by upright Mooney face flankers than inverted ones. Taken together, these results suggest crowding can occur selectively between high-level representations of faces and that crowding must occur at multiple levels in the visual system.
peripheral vision; spatial vision; object recognition; inversion; asymmetry