A great many, perhaps a majority, of shape recognition theories propose that contour attributes, i.e.
, orientation, curvature and linear extent, provide the elemental features that define the shape of an object. Selfridge [9
] may have been the first to characterize the perceptual process in terms of an assemblage of filters, each having the ability to register a distinctive contour attribute, but many others have followed this lead [see [10
The minimal transient discrete cue (MTDC) protocol [5
] provides a means to evaluate the validity of this hypothesis. This method briefly displays a spaced array of dots that mark the outer boundary of the shape, the number of dots being just sufficient for recognition of the shape if all of them are shown with minimal delay. By choosing which dots to sample, and introducing delays between successive samples that are chosen, one can assess the effectiveness of the shape cues being provided by the samples.
The present goal was to examine whether contiguous subsets of dots would be more effective at eliciting recognition of shapes than would subsets having an equal number of dots that were randomly chosen from the full inventory of dots. The contiguous subsets should provide a more effective stimulus for the filters that are presumed to register the contour attributes. If shapes are specified on the basis of their contour attributes, then the contiguous subsets should convey the best partial shape cues, and one would expect these subsets to be more effective for eliciting recognition.
The overall result was that contiguous and randomly selected subsets contributed equally to shape recognition, even though the randomly selected subsets did not display cues that relate to the orientation, curvature and linear extent of the boundary. This indicates that under the present test conditions, contour attributes did provide cues that are essential for shape perception.
For the present task conditions, one might speculate that information persistence allowed successive dots to accumulate, such that dots from the random subsets could eventually form contiguous strings that provided contour attributes. There is persistence of brief visual stimuli, as reported by Sperling [15
], Neisser [16
], Haber and Standing [17
], and Eriksen and Collins [18
], among others, and reviewed by Coltheart [20
], Long [21
], and Nisly and Wasserman [22
]. Whereas local contour information was not provided by a given random subset, one could argue that the contour-filtering process simply waited for a number of the subsets to be delivered, after which the contour attributes could be extracted from the aggregate pool of dots.
Recent work using the present experimental protocols, however, has found that millisecond and even submillisecond differentials in the display of dot subsets can produce significant differences in shape recognition [6
]. The result that is most critical to this discussion was provided by the second experiment in each of the cited studies, wherein the total time (and thus duration of persistence) for a given shape was held constant. Under these conditions, it was found that varying the interval between successive dots impaired recognition, with temporal separation of as little as half a millisecond being significant. Shape-relevant contour attributes are delivered directly by the contiguous dot subsets, but they could be provided by random subsets only through aggregation. The prior studies demonstrate that the cues do not aggregate without a recognition penalty.
When neural substrates for shape perception are discussed, most see the orientation-selective cells characterized by Hubel & Wiesel [3
] as providing the first step for registering contour attributes. A given cell can be activated by a contour, and because the firing rate is influenced by the orientation, length, and (possibly) curvature of the contour, the response is thought to convey information about these attributes. It is further suggested that an assemblage of these contour filters delivers the full complement of contour attributes needed for recognition.
However, previous results from this laboratory [2
] raise the question of whether shape analysis depends on activation of orientation-selective cells. That study found that recognition was possible when the full complement of dots being shown was relatively sparse. Recognition was well above chance when dot spans exceeded the length of orientation-selective receptive fields [23
]. That outcome suggests that each dot is acting as an independent marker of boundary position, and that shape is defined by an unspecified – not yet known – relationship among the individual markers. Even when the orientation-selective cells are activated by an array of dots, the essential information might be the locations that have been specified rather than the collinearity in the array.
With respect to the present results, one might wonder whether the contiguous subsets were effective stimuli for the orientation-selective cells. Perhaps the cells did not respond to the very brief presentation of just four dots. There are three reasons to suggest that the subsets delivered adequate stimulation.
First, although the stimulus duration was very brief, the flashes were easily visible, i.e., consciously perceived. It is generally accepted that conscious awareness of a visual stimulus requires processing by the primary visual cortex, thus the stimulus strength was adequate for activating its neurons.
Second, the span of each contiguous subset was a suitable fit to the size of receptive fields. Sceniak et al. [23
] examined receptive field size of orientation-selective cells in V1 of Macaque, and found the average space constant to be 60 arc', and the average length-summation tuning curve to be 49 arc'. The four-dot array of the contiguous subsets spanned 35 arc' for horizontal or vertical alignments, and 47 arc' for diagonal alignments. Therefore each of the contiguous subsets displayed an image size that would provide four dots to the receptive fields.
Third, there is direct electrophysiological evidence that an array of briefly flashed dots will stimulate the cortical cells. Jones & Palmer [24
] examined responsiveness of orientation-selective cells with successive stimulation of local points across the receptive fields, the typical duration of each stimulus being 50 ms. They reported that the responses that could be elicited by stimulating one location at a time was too weak to be of practical value in the analysis of receptive field structure. However, simultaneous activation of three sites within the receptive field yielded usable data. As indicated above, the contiguous subsets of the present experiment displayed four dots that would register on a given receptive field, and this would provide a stronger stimulus than was found to be effective by Jones & Palmer [24
The more general point is that the random subsets as well as the contiguous subsets were seen by the subject and delivered sufficient stimulation to elicit recognition. If one took the position that the contiguous arrays provided an insufficient stimulus for activating orientation-selective cells, it would mean that recognition was accomplished without any contribution from these cells.
It is possible, that the cues used for this experiment may be especially salient for activating a primitive shape encoding system. The pattern provided by the full complement of dots is very similar to a silhouette, and recognition is best when there is maximal simultaneity of the flashed dots. This is not unlike conditions that might face an early vertebrate – perhaps a fish – who detects simultaneous movement through small openings in a wall of seaweed. The pattern that is seen could be a predator, or might be prey, and successful recognition by the creature would have implications for survival. It is likely that these recognition skills evolved, and are present in a great many present-day animals that have no cortex.
Recent evidence from this laboratory [25
], gathered and published after the present research was conducted, has demonstrated that the retina contains a neural system that is sensitive to millisecond-level simultaneity when the subsets consist of dot pairs. This suggests that the present task draws on primitive shape-encoding mechanisms that put a premium on very tight temporal proximity within a stimulus pattern. More advanced image-processing systems, such as primary visual cortex, might have similar requirements for simultaneity, but with a longer time constant. This could explain the differential at T3 = 27 ms as a contribution to the temporal-integration process by orientation-selective cells that could not be accomplished in the retina.
The finding that the contour attributes did not benefit recognition under the present test conditions should not be taken as a blanket rejection of a useful role in the perception of objects. The fact that we can detect edges with a contrast differential as small as 3% speaks to the benefit of these filters for registering the presence of a boundary. Doubtless this is useful for detecting an object that is almost the same color or luminance as the background, or where it must be seen through haze. Contour filters may make it possible to see the object's boundaries under a variety of degraded conditions, and there is ample evidence that alignment of lines and edges provides a basis for object completion. It is possible, however, that this processing allows the position of discrete markers to be specified. Shape perception, per se, may then be based on metric relationships that have little or nothing to do with collinearity of the markers.
It is unclear why so many insist that shape is defined by the orientation, curvature and linear extent of the contours. We know that all manner of cues can contribute to identification of objects, but have no trouble discarding most of them as being ancillary. Fig. illustrates the situation. The left image shows a detailed colored sketch that can be readily identified as a rooster. In fact the image is devoid of various depth cues that would be present in the real object. Nonetheless, we accept that the 2D image has the shape of a rooster, so the depth cues must be ancillary to our concept of shape.
Figure 6 Stimuli that are available as shape cues are listed above each image. The right image provides the number of dots in a display set that allows for 75% successful recognition of the rooster when the dots are displayed successively, each being shown for (more ...)
The middle image has eliminated internal contours, texture, and color, replacing all these cues with uniform black. Yet this silhouette is readily identified as being in the shape of a rooster. The internal parts, color and texture must be at least somewhat ancillary, i.e., nonessential.
The right image has replaced the boundary edge with an array of dots, and we can still see the stimulus as having the shape of a rooster. Contour attributes of the boundary have been eliminated, but many will insist that they must be inferred in order to identify the shape.
Previous research demonstrated that as few as 19 dots allowed for recognition of the rooster by half of the subjects [2
]. It was hypothesized that the individual dots serve as markers of boundary positions, and the information needed for encoding and storage of shapes might be based on metric relationships among these markers. For the image provided by a natural object, a great many more markers are activated by its contours. But here also, some unspecified metric relationship among the markers may provide the basis for recognition, rather than collinearity of the markers.