How efficiently do we combine information across facial features when recognizing a face? Previous studies have suggested that the perception of a face is not simply the result of an independent analysis of individual facial features, but instead involves a coding of the relationships amongst features. This additional coding of the relationships amongst features is thought to enhance our ability to recognize a face. In our experiments, we tested whether an observer’s ability to recognize a face is in fact better than what one would expect from their ability to recognize the individual facial features in isolation. We tested this by using a psychophysical summation-at-threshold technique that has been used extensively to measure how efficiently observers integrate information across spatial locations and spatial frequencies. Surprisingly, we found that observers integrated information across facial features less efficiently than would be predicted by their ability to recognize the individual parts.
Face Recognition; Ideal Observer; Information Integration
The central region of the human retina, the fovea, provides high-acuity
vision. The oculomotor system continually brings targets of interest into the
fovea via ballistic eye movements (saccades). The fovea thus serves both as the
locus for fixations and as the oculomotor reference for saccades. This highly
automated process of foveation is functionally critical to vision and is
observed from infancy [1, 2]. How would the oculomotor system
adjust to loss of foveal vision (central scotoma)? Clinical observations of
patients with central vision loss [3, 4] suggest a
lengthy adjustment period , but the nature and dynamics of this adjustment remain
unclear. Here we demonstrate that the oculomotor system can spontaneously and
rapidly adopt a peripheral locus for fixation and can re-reference saccades to
this locus, in normally sighted individuals whose central vision is blocked by
an artificial scotoma. Once developed, the fixation locus is retained over weeks
in the absence of the simulated scotoma. Our data reveal a basic guiding
principle of the oculomotor system that prefers control simplicity over
optimality. We demonstrate the importance of a visible scotoma on the speed of
the adjustment and suggest a possible rehabilitation regimen for patients with
central vision loss.
Crowding impairs the perception of form in peripheral vision. It is likely to be a key limiting factor of form vision in patients without central vision. Crowding has been extensively studied in normally sighted individuals, typically with a stimulus duration of a few hundred milliseconds to avoid eye movements. These restricted testing conditions do not reflect the natural behavior of a patient with central field loss. Could unlimited stimulus duration and unrestricted eye movements change the properties of crowding in any fundamental way? We studied letter identification in the peripheral vision of normally sighted observers in three conditions: (i) a fixation condition with a brief stimulus presentation of 250 ms, (ii) another fixation condition but with an unlimited viewing time, and (iii) an unrestricted eye movement condition with an artificial central scotoma and an unlimited viewing time. In all conditions, contrast thresholds were measured as a function of target-to-flanker spacing, from which we estimated the spatial extent of crowding in terms of critical spacing. We found that presentation duration beyond 250 ms had little effect on critical spacing with stable gaze. With unrestricted eye movements and a simulated central scotoma, we found a large variability in critical spacing across observers, but more importantly, the variability in critical spacing was well correlated with the variability in target eccentricity. Our results assure that the large body of findings on crowding made with briefly presented stimuli remains relevant to conditions where viewing time is unconstrained. Our results further suggest that impaired oculomotor control associated with central vision loss can confound peripheral form vision beyond the limits imposed by crowding.
crowding; form vision; peripheral vision; eye movements
There is a need for adaptive technology to enhance indoor wayfinding by visually-impaired people. To address this need, we have developed and tested a Digital Sign System. The hardware and software consist of digitally-encoded signs widely distributed throughout a building, a handheld sign-reader based on an infrared camera, image-processing software, and a talking digital map running on a mobile device. Four groups of subjects—blind, low vision, blindfolded sighted, and normally sighted controls—were evaluated on three navigation tasks. The results demonstrate that the technology can be used reliably in retrieving information from the signs during active mobility, in finding nearby points of interest, and following routes in a building from a starting location to a destination. The visually impaired subjects accurately and independently completed the navigation tasks, but took substantially longer than normally sighted controls. This fully functional prototype system demonstrates the feasibility of technology enabling independent indoor navigation by people with visual impairment.
Age-related macular degeneration (AMD) is the leading cause of vision loss among Americans over the age of 65. Currently, no effective treatment can reverse the central vision loss associated with most AMD. Digital image-processing techniques have been developed to improve image visibility for peripheral vision; however, both the selection and efficacy of such methods are limited. Progress has been difficult for two reasons: the exact nature of image enhancement that might benefit peripheral vision is not well understood, and efficient methods for testing such techniques have been elusive. The current study aims to develop both an effective image-enhancement technique for peripheral vision and an efficient means for validating the technique.
We used a novel contour detection algorithm to locate shape-defining edges in images based on natural-image statistics. We then enhanced the scene by locally boosting the luminance contrast along such contours. Using a gaze-contingent display, we simulated central visual field loss in normally-sighted young (ages 18–30) and older adults (ages 58–88). Visual search performance was measured as a function of contour enhancement strength ("Original" (unenhanced), "Medium", and "High"). For preference task, a separate group of subjects judged which image in a pair "would lead to better search performance".
We found that while contour enhancement had no significant effect on search time and accuracy in young adults, Medium enhancement resulted in significantly shorter search time in older adults (~13% reduction relative to Original). Both age groups preferred images with Medium enhancement over Original (2 to 7 times). Furthermore, across age groups, image content types and enhancement strengths, there was a robust correlation between preference and performance.
Our findings demonstrate a beneficial role of contour enhancement in peripheral vision for older adults. Our findings further suggest that task-specific preference judgments can be an efficient surrogate for performance testing.
low vision questionnaire; rehabilitation; functional ability; Veterans Affairs Low-Vision Visual Functioning Questionnaire; Impact of Vision Impairment Questionnaire
The middle temporal area of the extrastriate visual cortex (area MT) is integral to motion perception and is thought to play a key role in the perceptual learning of motion tasks. We have previously found, however, that perceptual learning of a motion discrimination task is possible even when the training stimulus contains locally balanced, motion opponent signals that putatively suppress the response of MT. Assuming at least partial suppression of MT, possible explanations for this learning are that 1) training made MT more responsive by reducing motion opponency, 2) MT remained suppressed and alternative visual areas such as V1 enabled learning and/or 3) suppression of MT increased with training, possibly to reduce noise. Here we used fMRI to test these possibilities. We first confirmed that the motion opponent stimulus did indeed suppress the BOLD response within hMT+ compared to an almost identical stimulus without locally balanced motion signals. We then trained participants on motion opponent or non-opponent stimuli. Training with the motion opponent stimulus reduced the BOLD response within hMT+ and greater reductions in BOLD response were correlated with greater amounts of learning. The opposite relationship between BOLD and behaviour was found at V1 for the group trained on the motion-opponent stimulus and at both V1 and hMT+ for the group trained on the non-opponent motion stimulus. As the average response of many cells within MT to motion opponent stimuli is the same as their response to non-directional flickering noise, the reduced activation of hMT+ after training may reflect noise reduction.
When you see a person’s face, how do you go about combining his or her facial features to make a decision about who that person is? Most current theories of face perception assert that the ability to recognize a human face is not simply the result of an independent analysis of individual features, but instead involves a holistic coding of the relationships among features. This coding is thought to enhance people’s ability to recognize a face beyond what would be expected if each feature were shown in isolation. In the study reported here, we explicitly tested this idea by comparing human performance on facial-feature integration with that of an optimal Bayesian integrator. Contrary to the predictions of most current notions of face perception, our findings showed that human observers integrate facial features in a manner that is no better than would be predicted by their ability to use each individual feature when shown in isolation. That is, a face is perceived no better than the sum of its individual parts.
visual perception; face perception; perception; vision
Processing of shape information in human peripheral visual fields is impeded beyond what can be expected by poorer spatial resolution. Visual crowding—the inability to identify objects in clutter—has been shown to be the primary factor limiting shape perception in peripheral vision. Despite the well documented effects of crowding, its underlying causes are poorly understood. Since spatial attention both facilitates learning of image statistics and directs saccadic eye movements, we propose that the acquisition of image statistics in peripheral visual fields is confounded by eye-movement artifacts. Specifically, the image statistics acquired under a peripherally deployed spotlight of attention is systematically biased by saccade-induced image displacements. These erroneously represented image statistics lead to inappropriate contextual interactions in the periphery and cause crowding.
On the basis of results from behavioral studies that spatial attention improves the exclusion of external noise in the target region, we predicted that attending to a spatial region would reduce the impact of external noise on the BOLD response in corresponding cortical areas, seen as reduced BOLD responses in conditions with large amounts of external noise but relatively low signal, and increased dynamic range of the BOLD response to variations in signal contrast. We found that, in the presence of external noise, covert attention reduced the trial-by-trial BOLD response by 15.5–18.9% in low signal contrast conditions in V1. It also increased the BOLD dynamic range in V1, V2, V3, V3A/B, and V4 by a factor of at least three. Overall, covert attention reduced the impact of external noise by about 73–85% in these early visual areas. It also increased the contrast gain by a factor of 2.6–3.8.
Crowding occurs when stimuli in the peripheral fields become harder to identify when flanked by other items. This phenomenon has been demonstrated extensively with simple patterns (e.g., Gabors and letters). Here, we characterize crowding for everyday objects. We presented three-item arrays of objects and letters, arranged radially and tangentially in the lower visual field. Observers identified the central target, and we measured contrast energy thresholds as a function of target-to-flanker spacing. Object crowding was similar to letter crowding in spatial extent but was much weaker. The average elevation in threshold contrast energy was in the order of 1 log unit for objects as compared to 2 log units for letters and silhouette objects. Furthermore, we examined whether the exterior and interior features of an object are differentially affected by crowding. We used a circular aperture to present or exclude the object interior. Critical spacings for these aperture and “donut” objects were similar to those of intact objects. Taken together, these findings suggest that crowding between letters and objects are essentially due to the same mechanism, which affects equally the interior and exterior features of an object. However, for objects defined with varying shades of gray, it is much easier to overcome crowding by increasing contrast.
spatial vision; object recognition; detection/discrimination
The inter-subject variability of visual cortex reorganization was assessed in late-blind subjects suffering from retinitis pigmentosa (RP), a degenerative retinal disease that results in tunnel vision and eventual loss of sight. fMRI BOLD responses were measured as blindfolded RP and blindfolded sighted control groups completed a tactile discrimination task (in which subjects determined the relative roughness of sandpaper discs) during successive scans in a 3T Siemens scanner. Resulting activation patterns were compared between the two groups in a whole-brain analysis. We found that vision deprivation leads to elevated activation of the visual cortex elicited with tactile stimuli, and the degree of activation correlates with the degree of visual field loss: higher visual cortex activation is associated with greater vision loss. The location of vision loss in the visual field also correlates with the location of tactile responses in the visual cortex, with greater peripheral vision loss leading to stronger activation in the peripheral of V1. Visual cortex responses to tactile stimuli may hence be used as a diagnostic marker in determining the extent of an individual’s vision loss and tracking sight recovery following treatments.
Crowding is a prominent phenomenon in peripheral vision where nearby objects impede one’s ability to identify a target of interest. The precise mechanism of crowding is not known. We used ideal observer analysis and a noise-masking paradigm to identify the functional mechanism of crowding. We tested letter identification in the periphery with and without flanking letters and found that crowding increases equivalent input noise and decreases sampling efficiency. Crowding effectively causes the signal from the target to be noisier and at the same time reduces the visual system’s ability to make use of a noisy signal. After practicing identification of flanked letters without noise in the periphery for 6 days, subjects’ performance for identifying flanked letters improved (reduction of crowding). Across subjects, the improvement was attributable to either a decrease in crowding-induced equivalent input noise or an increase in sampling efficiency, but seldom both. This pattern of results is consistent with a simple model whereby learning reduces crowding by adjusting the spatial extent of a perceptual window used to gather relevant input features. Following learning, subjects with inappropriately large windows reduced their window sizes; while subjects with inappropriately small windows increased their window sizes. The improvement in equivalent input noise and sampling efficiency persists for at least 6 months.
peripheral vision; crowding; perceptual learning; ideal observer analysis
In this study, we examined the effects of contrast and spatial frequency on reading speed and compared these effects between the normal fovea and periphery. We found that when text contrast was low, reading speed demonstrated spatial-frequency tuning properties, with a peak tuning frequency that partially scaled with print size. The spatial-frequency tuning disappeared when text contrast was 100%. The spatial-frequency tuning and scaling properties for reading were largely similar between the fovea and the periphery, and closely matched those for letter identification. Just as for the task of letter identification, we showed through an ideal-observer analysis that the spatial-frequency properties for reading could be primarily accounted for by the physical properties of the word stimuli combined with human observers’ contrast sensitivity functions.
reading; letter identification; spatial frequency; contrast; ideal observer
Classification image and other similar noise-driven linear methods have found increasingly wider applications in revealing psychophysical receptive field structures or perceptual templates. These techniques are relatively easy to deploy, and the results are simple to interpret. However, being a linear technique, the utility of the classification-image method is believed to be limited. Uncertainty about the target stimuli on the part of an observer will result in a classification image that is the superposition of all possible templates for all the possible signals. In the context of a well-established uncertainty model, which pools the outputs of a large set of linear frontends with a max operator, we show analytically, in simulations, and with human experiments that the effect of intrinsic uncertainty can be limited or even eliminated by presenting a signal at a relatively high contrast in a classification-image experiment. We further argue that the subimages from different stimulus-response categories should not be combined, as is conventionally done. We show that when the signal contrast is high, the subimages from the error trials contain a clear high-contrast image that is negatively correlated with the perceptual template associated with the presented signal, relatively unaffected by uncertainty. The subimages also contain a “haze” that is of a much lower contrast and is positively correlated with the superposition of all the templates associated with the erroneous response. In the case of spatial uncertainty, we show that the spatial extent of the uncertainty can be estimated from the classification subimages. We link intrinsic uncertainty to invariance and suggest that this signal-clamped classification-image method will find general applications in uncovering the underlying representations of high-level neural and psychophysical mechanisms.
classification image; reverse correlation; spatial uncertainty; invariance; nonlinearity
Performance for a variety of visual tasks improves with practice. The purpose of this study was to determine the nature of the processes underlying perceptual learning of identifying letters in peripheral vision. To do so, we tracked changes in contrast thresholds for identifying single letters presented at 10° in the inferior visual field, over a period of six consecutive days. The letters (26 lowercase Times-Roman letters, subtending 1.7°) were embedded within static two-dimensional Gaussian luminance noise, with rms contrast ranging from 0% (no noise) to 20%. We also measured the observers’ response consistency using a double-pass method on days 1, 3 and 6, by testing two additional blocks on each of these days at luminance noise of 3% and 20%. These additional blocks were the exact replicates of the corresponding block at the same noise contrast that was tested on the same day. We analyzed our results using both the linear amplifier model (LAM) and the perceptual template model (PTM). Our results showed that following six days of training, the overall reduction (improvement across all noise levels) in contrast threshold for our seven observers averaged 21.6% (range: 17.2–31%). Despite fundamental differences between LAM and PTM, both models show that learning leads to an improvement of the perceptual template (filter) such that the template is more capable of extracting the crucial information from the signal. Results from both the PTM analysis and the double-pass experiment imply that the stimulus-dependent component of the internal noise does not change with learning.
Perceptual learning; Training; Letter identification; Peripheral vision
Crowding refers to the increased diffculty in identifying a letter flanked by other letters. The purpose of this study was to determine if the peak sensitivity of the human visual system shifts to a different spatial frequency when identifying crowded letters, compared with single letters. We measured contrast thresholds for identifying the middle target letters in trigrams, for a range of spatial frequencies, letter separations and letter sizes, at the fovea and 5° eccentricity. Plots of contrast sensitivity vs. letter frequency exhibit spatial tuning, for all letter sizes and letter separations tested. The peak tuning frequency grows as the 0.6–0.7 power of the letter size, independent of letter separation. At the smallest letter separation, peak tuning frequency occurs at a frequency that is 0.17 octaves higher for flanked than for unflanked letters at the fovea, and 0.19 octaves at 5° eccentricity. This finding suggests that the human visual system shifts its sensitivity toward a higher spatial-frequency channel when identifying letters in the presence of nearby letters. However, the size of the shift is insuffcient to account for the large effect of crowding in the periphery.
Crowding; Letter identification; Spatial frequency channel; Spatial scale shift
The way in which input noise perturbs the behavior of a system depends on the internal processing structure of the system. In visual psychophysics, there is a long tradition of using external noise methods (i.e., adding noise to visual stimuli) as tools for system identification. Here, we demonstrate that external noise affects processing of visual scenes at different cortical areas along the human ventral visual pathway, from retinotopic regions to higher occipitotemporal areas implicated in visual shape processing. We found that when the contrast of the stimulus was held constant, the further away from the retinal input a cortical area was the more its activity, as measured with functional magnetic resonance imaging (fMRI), depended on the signal-to-noise ratio (SNR) of the visual stimulus. A similar pattern of results was observed when trials with correct and incorrect responses were analyzed separately. We interpret these findings by extending signal detection theory to fMRI data analysis. This approach reveals the sequential ordering of decision stages in the cortex by exploiting the relation between fMRI response and stimulus SNR. In particular, our findings provide novel evidence that occipitotemporal areas in the ventral visual pathway form a cascade of decision stages with increasing degree of signal uncertainty and feature invariance.
What may be special about faces, compared to non-face objects, is that their neural representation may be fundamentally spatial, e.g., Gabor-like. Subjects matched a sequence of two filtered images, each containing every other combination of spatial frequency and orientation, of faces or non-face 3D blobs, judging whether the person or blob was the same or different. On a match trial, the images were either identical or complementary (containing the remaining spatial frequency and orientation content). Relative to an identical pair of images, a complementary pair of faces, but not blobs, reduced matching accuracy and released fMRI adaptation in the fusiform face area.
Face recognition; Fusiform face area; fMRI adaptation; Face vs. object recognition; Spatial representation; Face representation
Visual crowding refers to the marked inability to identify an otherwise perfectly identifiable object when it is flanked by other objects. Crowding places a significant limit on form vision in the visual periphery; its mechanism is, however, unknown. Building on the method of signal-clamped classification images (Tjan & Nandy, 2006), we developed a series of first- and second-order classification-image techniques to investigate the nature of crowding without presupposing any model of crowding. Using an “o” versus “x” letter-identification task, we found that (1) crowding significantly reduced the contrast of first-order classification images, although it did not alter the shape of the classification images; (2) response errors during crowding were strongly correlated with the spatial structures of the flankers that resembled those of the erroneously perceived targets; (3) crowding had no systematic effect on intrinsic spatial uncertainty of an observer nor did it suppress feature detection; and (4) analysis of the second-order classification images revealed that crowding reduced the amount of valid features used by the visual system and, at the same time, increased the amount of invalid features used. Our findings strongly support the feature-mislocalization or source-confusion hypothesis as one of the proximal contributors of crowding. Our data also agree with the inappropriate feature-integration account with the requirement that feature integration be a competitive process. However, the feature-masking account and a front-end version of the spatial attention account of crowding are not supported by our data.
crowding; letter identification; peripheral vision; classification images
Objects in natural scenes are spatially broadband; in contrast, feature detectors in the early stages of visual processing are narrowly tuned in spatial frequency. Earlier studies of feature integration using gratings suggested that integration across spatial frequencies is suboptimal. Here we re-examined this conclusion using a letter identification task at the fovea and at 10 deg in the lower visual field. We found that integration across narrow-band (1-octave) spatial frequency components of letter stimuli is optimal in the fovea. Surprisingly, this optimality is preserved in the periphery, even though feature integration is known to be deficient in the periphery from studies of other form-vision tasks such as crowding. A model that is otherwise a white-noise ideal observer except for a limited spatial resolution defined by the human contrast sensitivity function and using internal templates slightly wider in bandwidth than the stimuli is able to account for the human data. Our findings suggest that deficiency in feature integration found in peripheral vision is not across spatial frequencies.
spatial frequency channels; summation; letter identification; fovea; periphery