|Home | About | Journals | Submit | Contact Us | Français|
Much of our visual experience is an ‘illusion of completeness’, without true conscious access to the details of objects in a typically cluttered natural scene. Recent results on this ‘crowding’ phenomenon are starting to bridge the gap between the striking limits of perception and the subjective impression of a rich visual world.
Conscious visual perception is more limited than we think. The limits are not simply imposed by acuity (the resolution of vision) but also by crowding — an impairment in the recognition of individual objects located in cluttered scenes. Figure 1A, for example, shows a natural scene filled with clutter, typical of our moment-to-moment visual experience. When fixating on the white cross, some characteristics of the scene are readily apparent — the visual textures and gist of the scene, for example. When probed more deeply, however, it is clear that we have very little access to the details or particulars; identifying the types of dish, or the number of pieces of silverware in the center of the image is difficult or impossible. This difficulty is the phenomenon of crowding. We can easily confirm that this is not due to limited acuity, because the individual objects are recognizable in the periphery when isolated from the clutter (Figure 1 A,B). Because of crowding, we must make eye movements to individuate and scrutinize objects in naturally cluttered scenes — a costly and time consuming consequence of limited resources.
Crowding happens in the periphery of almost every natural scene that we look at, though it is usually studied in simplified contexts, using basic visual features such as lines, edges, gratings and letters (Figure 1B–1D). Experiments using these sorts of stimuli have revealed several defining characteristics of crowding, including density and eccentricity dependence — crowding usually happens when the spacing between features or objects is less than half the eccentricity of the object (Bouma’s law [1–3]). Crowding is also stronger when the flanking object is on the far (more eccentric) side of the target than on the near (foveal) side of the target [1,4], and is stronger in the upper visual field , but does not occur at the fovea .
There is no shortage of models for crowding, ranging from over-integration or pooling of low level features, to attentional resolution (for reviews, see [2,3,5]). The abundance of hypotheses reveals the continuing struggle to identify a single neural mechanism and computational model that can account for this fundamental limit on object recognition. A single neural mechanism, however, may be a misguided search. Among many recent results, three recent ones, in particular, suggest that our previous understanding of the locus and homogeneity of the phenomenon of crowding requires revision.
In this issue of Current Biology, Levi and Carney  report that, paradoxically, increasing the size or number of flankers can, under certain circumstances, decrease crowding (Figure 1C). The authors found that crowding varied systematically as a function of the center-to-center distance between objects, independent of the size of the flankers or their edge location. Increasing the number of flankers beyond roughly four to eight also decreased crowding . In their simplest form, models that invoke spatial integration, pooling or attentional resolution would generally predict that increasing the size or number of flanking (distractor) elements should increase crowding [2,3,5]. Levi and Carney  suggest, in contrast, that crowding occurs because the visual system independently samples a limited number of distinct features (or objects or positions; cf ) before integrating them.
Two additional results challenge our current notion of what perceptual crowding really is. First, crowding can be reduced when flanking objects are perceived as a spatial group [9,10] or a temporal group , suggesting that grouping happens before crowding (Figure 1D). Models of crowding that invoke low-level integration of features at an early stage of processing have difficulty accounting for this finding. Second, crowding can happen between features or objects at different levels of visual processing. For example, crowding happens selectively between holistic face representations, independent of the low-level feature-based crowding within those faces (Figure 1E,F). Both grayscale  and two-tone Mooney faces  are more crowded by upright flanker faces than they are by inverted ones.
This inversion effect in crowding is not simply ‘similarity’ in any low-level featural sense (for example [13,14]) — the similarity is in the ‘faceness’, the holistic nature of the face. Holistic crowding does not happen for cars or other non-face objects and it is not simply grouping (as in Figure 1D), because the flankers do not perceptually ‘group’ in a Gestalt sense any more with each other than they do with the target. Crowding cannot, therefore, be due to a single bottleneck; it cannot even be a single high-level bottleneck. There are layers of crowding: whatever mechanism or set of mechanisms contributes to crowding (be that integration, attentional resolution, positional averaging, and so on), the process must occur redundantly at multiple stages of visual processing.
A model of crowding that can account for the diversity of empirical findings will be a milestone, to be sure, but it may leave potentially unanswered the broader question of how we get from rarified percepts of objects in crowds to phenomenologically rich percepts of the world; despite crowding, we nonetheless have an ‘illusion of completeness’, we feel that the visual periphery is meaningful.
At the very end of their article, Levi and Carney  leave us with the intriguing speculation that crowding results in a ‘flattened’ percept. This idea is worth exploring. Perhaps there is a sensory threshold for what counts as ‘rich’ or ‘meaningful’ — within some limit, the default percept is that the peripheral visual field is organized and detailed (like the default percept is of a stationary world, even though the image of the world is constantly jittering on the retina because of small eye movements and tremor ). The nature of the information on which this sensory threshold operates is unclear, but one possibility is summary statistics. Natural scenes are filled with similar objects, textures and features, resulting in the perception of ensembles (groups of trees, bricks, faces, flocks of soaring birds), and these may contribute to our rich perceptual experience of the world [16–18]. These ensembles are perceived whether or not crowding happens [17–20].
Thus, crowding may not be necessary for the illusion of completeness, but it may force the issue — obligating the visual system to efficiently compress (as opposed to filter or dismantle) the crowded information into a summary statistical representation. This happens at a number of independent levels ranging from low-level features and textures to high-level objects. Developing a model that accounts for the diverse effects of crowding while simultaneously bridging the explanatory gap between a ‘flattened’ percept and a rich visual impression of the world remains an important goal for vision science in the near future.