Reading speed matters in most real-world contexts, and it is a robust and easy aspect of reading to measure. Theories of reading should account for speed.
Amblyopia is a much-studied but poorly understood developmental visual disorder that reduces acuity, profoundly reducing contrast sensitivity for small targets. Here we use visual noise to probe the letter identification process and characterize its impairment by amblyopia. We apply five levels of analysis — threshold, threshold in noise, equivalent noise, optical MTF, and noise modeling — to obtain a two-factor model of the amblyopic deficit: substantially reduced efficiency for small letters and negligibly increased cortical noise. Cortical noise, expressed as an equivalent input noise, varies among amblyopes but is roughly 1.4× normal, as though only 1/1.4 the normal number of cortical spikes are devoted to the amblyopic eye. This raises threshold contrast for large letters by a factor of √1.4 = 1.2×, a negligible effect. All 16 amblyopic observers showed near-normal efficiency for large letters (> 4× acuity) and greatly reduced efficiency for small letters: 1/4 normal at 2× acuity and approaching 1/16 normal at acuity. Finding that the acuity loss represents a loss of efficiency rules out all models of amblyopia except those that predict the same sensitivity loss on blank and noisy backgrounds. One such model is the last-channel hypothesis, which supposes that the highest-spatial-frequency channels are missing, leaving the remaining highest-frequency channel struggling to identify the smallest letters. However, this hypothesis is rejected by critical band masking of letter identification, which shows that the channels used by the amblyopic eye have normal tuning for even the smallest letters. Finally, based on these results, we introduce a new “Dual Acuity” chart that promises to be a quick diagnostic test for amblyopia.
amblyopia; noise; efficiency; cortical noise; Pelli-Levi Dual Acuity Chart
The Gestalt psychologists reported a set of laws describing how vision groups elements to recognize objects. The Gestalt laws “prescribe for us what we are to recognize ‘as one thing’” (Köhler, 1920). Were they right? Does object recognition involve grouping? Tests of the laws of grouping have been favourable, but mostly assessed only detection, not identification, of the compound object. The grouping of elements seen in the detection experiments with lattices and “snakes in the grass” is compelling, but falls far short of the vivid everyday experience of recognizing a familiar, meaningful, named thing, which mediates the ordinary identification of an object. Thus, after nearly a century, there is hardly any evidence that grouping plays a role in ordinary object recognition. To assess grouping in object recognition, we made letters out of grating patches and measured threshold contrast for identifying these letters in visual noise as a function of perturbation of grating orientation, phase, and offset. We define a new measure, “wiggle”, to characterize the degree to which these various perturbations violate the Gestalt law of good continuation. We find that efficiency for letter identification is inversely proportional to wiggle and is wholly determined by wiggle, independent of how the wiggle was produced. Thus the effects of three different kinds of shape perturbation on letter identifiability are predicted by a single measure of goodness of continuation. This shows that letter identification obeys the Gestalt law of good continuation and may be the first confirmation of the original Gestalt claim that object recognition involves grouping.
Gestalt; Grouping; Contour integration; Good continuation; Letter identification; Object recognition; Features; Snake in the grass; Snake letters; Dot lattice
Unless we fixate directly on it, it is hard to see an object among other objects. This breakdown in object recognition, called crowding, severely limits peripheral vision. The effect is more severe when objects are more similar. When observers mistake the identity of a target among flanker objects, they often report a flanker. Many have taken these flanker reports as evidence of internal substitution of the target by a flanker. Here, we ask observers to identify a target letter presented in between one similar and one dissimilar flanker letter. Simple substitution takes in only one letter, which is often the target but, by unwitting mistake, is sometimes a flanker. The opposite of substitution is pooling, which takes in more than one letter. Having taken only one letter, the substitution process knows only its identity, not its similarity to the target. Thus, it must report similar and dissimilar flankers equally often. Contrary to this prediction, the similar flanker is reported much more often than the dissimilar flanker, showing that rampant flanker substitution cannot account for most flanker reports. Mixture modeling shows that simple substitution can account for, at most, about half the trials. Pooling and nonpooling (simple substitution) together include all possible models of crowding. When observers are asked to identify a crowded object, at least half of their reports are pooled, based on a combination of information from target and flankers, rather than being based on a single letter.
Electronic supplementary material
The online version of this article (doi:10.3758/s13414-011-0229-0) contains supplementary material.
Crowding; Substitution; Pooling; Mixture modeling
Previous cue integration studies have examined continuous perceptual dimensions (e.g., size) and have shown that human cue integration is well described by a normative model in which cues are weighted in proportion to their sensory reliability, as estimated from single-cue performance. However, this normative model may not be applicable to categorical perceptual dimensions (e.g., phonemes). In tasks defined over categorical perceptual dimensions, optimal cue weights should depend not only on the sensory variance affecting the perception of each cue but also on the environmental variance inherent in each task-relevant category. Here, we present a computational and experimental investigation of cue integration in a categorical audio-visual (articulatory) speech perception task. Our results show that human performance during audio-visual phonemic labeling is qualitatively consistent with the behavior of a Bayes-optimal observer. Specifically, we show that the participants in our task are sensitive, on a trial-by-trial basis, to the sensory uncertainty associated with the auditory and visual cues, during phonemic categorization. In addition, we show that while sensory uncertainty is a significant factor in determining cue weights, it is not the only one and participants' performance is consistent with an optimal model in which environmental, within category variability also plays a role in determining cue weights. Furthermore, we show that in our task, the sensory variability affecting the visual modality during cue-combination is not well estimated from single-cue performance, but can be estimated from multi-cue performance. The findings and computational principles described here represent a principled first step towards characterizing the mechanisms underlying human cue integration in categorical tasks.
Understanding foreign speech is difficult, in part because of unusual mappings between sounds and words. It is known that listeners in their native language can use lexical knowledge (about how words ought to sound) to learn how to interpret unusual speech-sounds. We therefore investigated whether subtitles, which provide lexical information, support perceptual learning about foreign speech. Dutch participants, unfamiliar with Scottish and Australian regional accents of English, watched Scottish or Australian English videos with Dutch, English or no subtitles, and then repeated audio fragments of both accents. Repetition of novel fragments was worse after Dutch-subtitle exposure but better after English-subtitle exposure. Native-language subtitles appear to create lexical interference, but foreign-language subtitles assist speech learning by indicating which words (and hence sounds) are being spoken.
It is now emerging that vision is usually limited by object spacing rather than size. The visual system recognizes an object by detecting and then combining its features. ‘Crowding’ occurs when objects are too close together and features from several objects are combined into a jumbled percept. Here, we review the explosion of studies on crowding—in grating discrimination, letter and face recognition, visual search, selective attention, and reading—and find a universal principle, the Bouma law. The critical spacing required to prevent crowding is equal for all objects, although the effect is weaker between dissimilar objects. Furthermore, critical spacing at the cortex is independent of object position, and critical spacing at the visual field is proportional to object distance from fixation. The region where object spacing exceeds critical spacing is the ‘uncrowded window’. Observers cannot recognize objects outside of this window and its size limits the speed of reading and search.
Along with physical luminance, the perceived brightness is known to depend on the spatial structure of the stimulus. Often it is assumed that neural computation of the brightness is based on the analysis of luminance borders of the stimulus. However, this has not been tested directly. We introduce a new variant of the psychophysical reverse-correlation or classification image method to estimate and localize the physical features of the stimuli which correlate with the perceived brightness, using a brightness-matching task. We derive classification images for the illusory Craik-O'Brien-Cornsweet stimulus and a “real” uniform step stimulus. For both stimuli, classification images reveal a positive peak at the stimulus border, along with a negative peak at the background, but are flat at the center of the stimulus, suggesting that brightness is determined solely by the border information. Features in the perceptually completed area in the Craik-O'Brien-Cornsweet do not contribute to its brightness, nor could we see low-frequency boosting, which has been offered as an explanation for the illusion. Tuning of the classification image profiles changes remarkably little with stimulus size. This supports the idea that only certain spatial scales are used for computing the brightness of a surface.
We investigate the channels underlying identification of second-order letters using a critical-band masking paradigm. We find that observers use a single 1–1.5 octave-wide channel for this task. This channel’s best spatial frequency (c/letter) did not change across different noise conditions (indicating the inability of observers to switch channels to improve signal-to-noise ratio) or across different letter sizes (indicating scale invariance), for a fixed carrier frequency (c/letter). However, the channel’s best spatial frequency does change with stimulus carrier frequency (both in c/letter); one is proportional to the other. Following Majaj et al. (Majaj, N. J., Pelli, D. G., Kurshan, P., & Palomares, M. (2002). The role of spatial frequency channels in letter identification. Vision Research, 42, 1165–1184), we define “stroke frequency” as the line frequency (strokes/deg) in the luminance image. That is, for luminance-defined letters, stroke frequency is the number of lines (strokes) across each letter divided by letter width. For second-order letters, letter texture stroke frequency is the number of carrier cycles (luminance lines) within the letter ink area divided by the letter width. Unlike the nonlinear dependence found for first-order letters (implying scale-dependent processing), for second-order letters the channel frequency is half the letter texture stroke frequency (suggesting scale-invariant processing).
Letter identification; Second-order vision; Critical-band masking; Scale invariance; Channel switching
The Gestalt psychologists reported a set of laws describing how vision groups elements to recognize objects. The Gestalt laws “prescribe for us what we are to recognize ‘as one thing’.” (Köhler, 1920). Were they right? Does object recognition involve grouping? Tests of the laws of grouping have been favorable, but mostly assessed only detection, not identification, of the compound object. The grouping of elements seen in the detection experiments with lattices and ‘snakes in the grass’ is compelling, but falls far short of the vivid everyday experience of recognizing a familiar, meaningful, named thing, which mediates the ordinary identification of an object. Thus, after nearly a century, there is hardly any evidence that grouping plays a role in ordinary object recognition. To assess grouping in object recognition, we made letters out of grating patches and measured threshold contrast for identifying these letters in visual noise as a function of perturbation of grating orientation, phase, and offset. We define a new measure, “wiggle,” to characterize the degree to which these various perturbations violate the Gestalt law of good continuation. We find that efficiency for letter identification is inversely proportional to wiggle, and is wholly determined by wiggle, independent of how the wiggle was produced. Thus the effects of three different kinds of shape perturbation on letter identifiability are predicted by a single measure of goodness of continuation. This shows that letter identification obeys the Gestalt law of good continuation, and may be the first confirmation of the original Gestalt claim that object recognition involves grouping.
Gestalt; grouping; contour integration; good continuation; letter identification; object recognition; features; snake in the grass; snake letters; dot lattice
The speed and accuracy of decision-making have a well-known trading relationship: hasty decisions are more prone to errors while careful, accurate judgments take more time. Despite the pervasiveness of this speed-accuracy trade-off (SAT) in decision-making, its neural basis is still unknown.
Using functional magnetic resonance imaging (fMRI) we show that emphasizing the speed of a perceptual decision at the expense of its accuracy lowers the amount of evidence-related activity in lateral prefrontal cortex. Moreover, this speed-accuracy difference in lateral prefrontal cortex activity correlates with the speed-accuracy difference in the decision criterion metric of signal detection theory. We also show that the same instructions increase baseline activity in a dorso-medial cortical area involved in the internal generation of actions.
These findings suggest that the SAT is neurally implemented by modulating not only the amount of externally-derived sensory evidence used to make a decision, but also the internal urge to make a response. We propose that these processes combine to control the temporal dynamics of the speed-accuracy trade-off in decision-making.
Research in object recognition has tried to distinguish holistic recognition from recognition by parts. One can also guess an object from its context. Words are objects, and how we recognize them is the core question of reading research. Do fast readers rely most on letter-by-letter decoding (i.e., recognition by parts), whole word shape, or sentence context? We manipulated the text to selectively knock out each source of information while sparing the others. Surprisingly, the effects of the knockouts on reading rate reveal a triple dissociation. Each reading process always contributes the same number of words per minute, regardless of whether the other processes are operating.
We conducted a preliminary study to examine whether Chinese readers’ spontaneous word segmentation processing is consistent with the national standard rules of word segmentation based on the Contemporary Chinese language word segmentation specification for information processing (CCLWSSIP). Participants were asked to segment Chinese sentences into individual words according to their prior knowledge of words. The results showed that Chinese readers did not follow the segmentation rules of the CCLWSSIP, and their word segmentation processing was influenced by the syntactic categories of consecutive words. In many cases, the participants did not consider the auxiliary words, adverbs, adjectives, nouns, verbs, numerals and quantifiers as single word units. Generally, Chinese readers tended to combine function words with content words to form single word units, indicating they were inclined to chunk single words into large information units during word segmentation. Additionally, the “overextension of monosyllable words” hypothesis was tested and it might need to be corrected to some degree, implying that word length have an implicit influence on Chinese readers’ segmentation processing. Implications of these results for models of word recognition and eye movement control are discussed.
We tested whether eye color influences perception of trustworthiness. Facial photographs of 40 female and 40 male students were rated for perceived trustworthiness. Eye color had a significant effect, the brown-eyed faces being perceived as more trustworthy than the blue-eyed ones. Geometric morphometrics, however, revealed significant correlations between eye color and face shape. Thus, face shape likewise had a significant effect on perceived trustworthiness but only for male faces, the effect for female faces not being significant. To determine whether perception of trustworthiness was being influenced primarily by eye color or by face shape, we recolored the eyes on the same male facial photos and repeated the test procedure. Eye color now had no effect on perceived trustworthiness. We concluded that although the brown-eyed faces were perceived as more trustworthy than the blue-eyed ones, it was not brown eye color per se that caused the stronger perception of trustworthiness but rather the facial features associated with brown eyes.