Contrast sensitivity defines the threshold between the visible and invisible, which has obvious significance for basic and clinical vision science. Fechner's 1860 review reported that threshold contrast is 1% for a remarkably wide range of targets and conditions. While printed charts are still in use, computer testing is becoming more popular because it offers efficient adaptive measurement of threshold for a wide range of stimuli. Both basic and clinical studies usually want to know fundamental visual capability, regardless of the observer's subjective criterion. Criterion effects are minimized by the use of an objective task: multiple-alternative forced-choice detection or identification. Having many alternatives reduces the guessing rate, which makes each trial more informative, so fewer trials are needed. Finally, populations who may experience crowding or target confusion should be tested with one target at a time.
Immediately before a large eye movement, a target object is crowded by
clutter placed near the target’s future location. This new finding, from
a recent study, shows that the brain’s remapping for the anticipated eye
movement unavoidably combines features from the target’s current and
future retinal locations into one perceptual object.
Here, we systematically explore the size and spacing requirements for identifying a letter among other letters. We measure acuity for flanked and unflanked letters, centrally and peripherally, in normals and amblyopes. We find that acuity, overlap masking, and crowding each demand a minimum size or spacing for readable text. Just measuring flanked and unflanked acuity is enough for our proposed model to predict the observer's threshold size and spacing for letters at any eccentricity.
We also find that amblyopia in adults retains the character of the childhood condition that caused it. Amblyopia is a developmental neural deficit that can occur as a result of either strabismus or anisometropia in childhood. Peripheral viewing during childhood due to strabismus results in amblyopia that is crowding limited, like peripheral vision. Optical blur of one eye during childhood due to anisometropia without strabismus results in amblyopia that is acuity limited, like blurred vision. Furthermore, we find that the spacing:acuity ratio of flanked and unflanked acuity can distinguish strabismic amblyopia from purely anisometropic amblyopia in nearly perfect agreement with lack of stereopsis. A scatter diagram of threshold spacing versus acuity, one point per patient, for several diagnostic groups, reveals the diagnostic power of flanked acuity testing. These results and two demonstrations indicate that the sensitivity of visual screening tests can be improved by using flankers that are more tightly spaced and letter like.
Finally, in concert with Strappini, Pelli, Di Pace, and Martelli (submitted), we jointly report a double dissociation between acuity and crowding. Two clinical conditions—anisometropic amblyopia and apperceptive agnosia—each selectively impair either acuity A or the spacing:acuity ratio S/A, not both. Furthermore, when we specifically estimate crowding, we find a double dissociation between acuity and crowding. Models of human object recognition will need to accommodate this newly discovered independence of acuity and crowding.
amblyopia; crowding; strabismic; anisometropic; acuity; screening; spacing:acuity ratio; critical spacing; threshold spacing; legibility; overlap masking; letter identification; object recognition
Amblyopia is a much-studied but poorly understood developmental visual disorder that reduces acuity, profoundly reducing contrast sensitivity for small targets. Here we use visual noise to probe the letter identification process and characterize its impairment by amblyopia. We apply five levels of analysis — threshold, threshold in noise, equivalent noise, optical MTF, and noise modeling — to obtain a two-factor model of the amblyopic deficit: substantially reduced efficiency for small letters and negligibly increased cortical noise. Cortical noise, expressed as an equivalent input noise, varies among amblyopes but is roughly 1.4× normal, as though only 1/1.4 the normal number of cortical spikes are devoted to the amblyopic eye. This raises threshold contrast for large letters by a factor of √1.4 = 1.2×, a negligible effect. All 16 amblyopic observers showed near-normal efficiency for large letters (> 4× acuity) and greatly reduced efficiency for small letters: 1/4 normal at 2× acuity and approaching 1/16 normal at acuity. Finding that the acuity loss represents a loss of efficiency rules out all models of amblyopia except those that predict the same sensitivity loss on blank and noisy backgrounds. One such model is the last-channel hypothesis, which supposes that the highest-spatial-frequency channels are missing, leaving the remaining highest-frequency channel struggling to identify the smallest letters. However, this hypothesis is rejected by critical band masking of letter identification, which shows that the channels used by the amblyopic eye have normal tuning for even the smallest letters. Finally, based on these results, we introduce a new “Dual Acuity” chart that promises to be a quick diagnostic test for amblyopia.
amblyopia; noise; efficiency; cortical noise; Pelli-Levi Dual Acuity Chart
The Gestalt psychologists reported a set of laws describing how vision groups elements to recognize objects. The Gestalt laws “prescribe for us what we are to recognize ‘as one thing’” (Köhler, 1920). Were they right? Does object recognition involve grouping? Tests of the laws of grouping have been favourable, but mostly assessed only detection, not identification, of the compound object. The grouping of elements seen in the detection experiments with lattices and “snakes in the grass” is compelling, but falls far short of the vivid everyday experience of recognizing a familiar, meaningful, named thing, which mediates the ordinary identification of an object. Thus, after nearly a century, there is hardly any evidence that grouping plays a role in ordinary object recognition. To assess grouping in object recognition, we made letters out of grating patches and measured threshold contrast for identifying these letters in visual noise as a function of perturbation of grating orientation, phase, and offset. We define a new measure, “wiggle”, to characterize the degree to which these various perturbations violate the Gestalt law of good continuation. We find that efficiency for letter identification is inversely proportional to wiggle and is wholly determined by wiggle, independent of how the wiggle was produced. Thus the effects of three different kinds of shape perturbation on letter identifiability are predicted by a single measure of goodness of continuation. This shows that letter identification obeys the Gestalt law of good continuation and may be the first confirmation of the original Gestalt claim that object recognition involves grouping.
Gestalt; Grouping; Contour integration; Good continuation; Letter identification; Object recognition; Features; Snake in the grass; Snake letters; Dot lattice
The Gestalt psychologists reported a set of laws describing how vision groups elements to recognize objects. The Gestalt laws “prescribe for us what we are to recognize ‘as one thing’.” (Köhler, 1920). Were they right? Does object recognition involve grouping? Tests of the laws of grouping have been favorable, but mostly assessed only detection, not identification, of the compound object. The grouping of elements seen in the detection experiments with lattices and ‘snakes in the grass’ is compelling, but falls far short of the vivid everyday experience of recognizing a familiar, meaningful, named thing, which mediates the ordinary identification of an object. Thus, after nearly a century, there is hardly any evidence that grouping plays a role in ordinary object recognition. To assess grouping in object recognition, we made letters out of grating patches and measured threshold contrast for identifying these letters in visual noise as a function of perturbation of grating orientation, phase, and offset. We define a new measure, “wiggle,” to characterize the degree to which these various perturbations violate the Gestalt law of good continuation. We find that efficiency for letter identification is inversely proportional to wiggle, and is wholly determined by wiggle, independent of how the wiggle was produced. Thus the effects of three different kinds of shape perturbation on letter identifiability are predicted by a single measure of goodness of continuation. This shows that letter identification obeys the Gestalt law of good continuation, and may be the first confirmation of the original Gestalt claim that object recognition involves grouping.
Gestalt; grouping; contour integration; good continuation; letter identification; object recognition; features; snake in the grass; snake letters; dot lattice
To understand why human sensitivity for complex objects is so low, we study how word identification combines eye and ear or parts of a word (features, letters, syllables). Our observers identify printed and spoken words presented concurrently or separately. When researchers measure threshold (energy of the faintest visible or audible signal) they may report either sensitivity (one over the human threshold) or efficiency (ratio of the best possible threshold to the human threshold). When the best possible algorithm identifies an object (like a word) in noise, its threshold is independent of how many parts the object has. But, with human observers, efficiency depends on the task. In some tasks, human observers combine parts efficiently, needing hardly more energy to identify an object with more parts. In other tasks, they combine inefficiently, needing energy nearly proportional to the number of parts, over a 60∶1 range. Whether presented to eye or ear, efficiency for detecting a short sinusoid (tone or grating) with few features is a substantial 20%, while efficiency for identifying a word with many features is merely 1%. Why? We show that the low human sensitivity for words is a cost of combining their many parts. We report a dichotomy between inefficient combining of adjacent features and efficient combining across senses. Joining our results with a survey of the cue-combination literature reveals that cues combine efficiently only if they are perceived as aspects of the same object. Observers give different names to adjacent letters in a word, and combine them inefficiently. Observers give the same name to a word’s image and sound, and combine them efficiently. The brain’s machinery optimally combines only cues that are perceived as originating from the same object. Presumably such cues each find their own way through the brain to arrive at the same object representation.
The external world is mapped retinotopically onto the primary visual cortex (V1). We show here that objects in the world, unless they are very dissimilar, can be recognized only if they are sufficiently separated in visual cortex: specifically, in V1, at least 6 mm apart in the radial direction (increasing eccentricity) or 1 mm apart in the circumferential direction (equal eccentricity). Objects closer together than this critical spacing are perceived as an unidentifiable jumble. This is called “crowding”. It severely limits visual processing, including speed of reading and searching. The conclusion about visual cortex rests on three findings. First, psychophysically, the necessary “critical” spacing, in the visual field, is proportional to (roughly half) the eccentricity of the objects. Second, the critical spacing is independent of the size and kind of object. Third, anatomically, the representation of the visual field on the cortical surface is such that position in V1 (and several other areas) is the logarithm of eccentricity in the visual field. Furthermore, we show that much of this can be accounted for by supposing that each “combining field”, defined by the critical spacing measurements, is implemented by a fixed number of cortical neurons.
Binding of features helps object recognition in contour integration, but hinders it in crowding. In contour integration, aligned adjacent objects group together to form a path. In crowding, flanking objects make the target unidentifiable. But, to date, the two tasks have only been studied separately. May and Hess (2007) suggested that the same binding mediates both tasks. To test this idea, we ask observers to perform two different tasks with the same stimulus. We present oriented grating patches that form a “snake letter” in the periphery. Observers report either the identity of the whole letter (contour integration task) or the phase of one of the grating patches (crowding task). We manipulate the strength of binding between gratings by varying the alignment between them, i.e. the Gestalt goodness of continuation, measured as “wiggle”. We find that better alignment strengthens binding, which improves contour integration and worsens crowding. Observers show equal sensitivity to alignment in these two very different tasks, suggesting that the same binding mechanism underlies both phenomena. It has been claimed that grouping among flankers reduces their crowding of the target. Instead, we find that these published cases of weak crowding are due to weak binding resulting from target-flanker misalignment. We conclude that crowding is mediated solely by the grouping of flankers with the target and is independent of grouping among flankers.
crowding; wiggle; grouping; binding; Gestalt; contour integration; good continuation; alignment; object recognition; snake letter
Reading speed matters in most real-world contexts, and it is a robust and easy aspect of reading to measure. Theories of reading should account for speed.
Unless we fixate directly on it, it is hard to see an object among other objects. This breakdown in object recognition, called crowding, severely limits peripheral vision. The effect is more severe when objects are more similar. When observers mistake the identity of a target among flanker objects, they often report a flanker. Many have taken these flanker reports as evidence of internal substitution of the target by a flanker. Here, we ask observers to identify a target letter presented in between one similar and one dissimilar flanker letter. Simple substitution takes in only one letter, which is often the target but, by unwitting mistake, is sometimes a flanker. The opposite of substitution is pooling, which takes in more than one letter. Having taken only one letter, the substitution process knows only its identity, not its similarity to the target. Thus, it must report similar and dissimilar flankers equally often. Contrary to this prediction, the similar flanker is reported much more often than the dissimilar flanker, showing that rampant flanker substitution cannot account for most flanker reports. Mixture modeling shows that simple substitution can account for, at most, about half the trials. Pooling and nonpooling (simple substitution) together include all possible models of crowding. When observers are asked to identify a crowded object, at least half of their reports are pooled, based on a combination of information from target and flankers, rather than being based on a single letter.
Electronic supplementary material
The online version of this article (doi:10.3758/s13414-011-0229-0) contains supplementary material.
Crowding; Substitution; Pooling; Mixture modeling
It is now emerging that vision is usually limited by object spacing rather than size. The visual system recognizes an object by detecting and then combining its features. ‘Crowding’ occurs when objects are too close together and features from several objects are combined into a jumbled percept. Here, we review the explosion of studies on crowding—in grating discrimination, letter and face recognition, visual search, selective attention, and reading—and find a universal principle, the Bouma law. The critical spacing required to prevent crowding is equal for all objects, although the effect is weaker between dissimilar objects. Furthermore, critical spacing at the cortex is independent of object position, and critical spacing at the visual field is proportional to object distance from fixation. The region where object spacing exceeds critical spacing is the ‘uncrowded window’. Observers cannot recognize objects outside of this window and its size limits the speed of reading and search.
We investigate the channels underlying identification of second-order letters using a critical-band masking paradigm. We find that observers use a single 1–1.5 octave-wide channel for this task. This channel’s best spatial frequency (c/letter) did not change across different noise conditions (indicating the inability of observers to switch channels to improve signal-to-noise ratio) or across different letter sizes (indicating scale invariance), for a fixed carrier frequency (c/letter). However, the channel’s best spatial frequency does change with stimulus carrier frequency (both in c/letter); one is proportional to the other. Following Majaj et al. (Majaj, N. J., Pelli, D. G., Kurshan, P., & Palomares, M. (2002). The role of spatial frequency channels in letter identification. Vision Research, 42, 1165–1184), we define “stroke frequency” as the line frequency (strokes/deg) in the luminance image. That is, for luminance-defined letters, stroke frequency is the number of lines (strokes) across each letter divided by letter width. For second-order letters, letter texture stroke frequency is the number of carrier cycles (luminance lines) within the letter ink area divided by the letter width. Unlike the nonlinear dependence found for first-order letters (implying scale-dependent processing), for second-order letters the channel frequency is half the letter texture stroke frequency (suggesting scale-invariant processing).
Letter identification; Second-order vision; Critical-band masking; Scale invariance; Channel switching
Research in object recognition has tried to distinguish holistic recognition from recognition by parts. One can also guess an object from its context. Words are objects, and how we recognize them is the core question of reading research. Do fast readers rely most on letter-by-letter decoding (i.e., recognition by parts), whole word shape, or sentence context? We manipulated the text to selectively knock out each source of information while sparing the others. Surprisingly, the effects of the knockouts on reading rate reveal a triple dissociation. Each reading process always contributes the same number of words per minute, regardless of whether the other processes are operating.