|Home | About | Journals | Submit | Contact Us | Français|
Sparse coding has long been recognized as a primary goal of image transformation in the visual system [1–4]. Sparse coding in early visual cortex is achieved by abstracting local oriented spatial frequencies  and by excitatory/inhibitory surround modulation . Object responses are thought to be sparse at subsequent processing stages [7, 8], but neural mechanisms for higher-level sparsification are not known. Here, convergent results from macaque area V4 neural recording and simulated V4 populations trained on natural object contours suggest that sparse coding is achieved in mid-level visual cortex by emphasizing representation of acute convex and concave curvature. We studied 165 V4 neurons with a random, adaptive stimulus strategy to minimize bias and explore an unlimited range of contour shapes. V4 responses were strongly weighted toward contours containing acute convex or concave curvature. In contrast, the tuning distribution in non-sparse simulated V4 populations was strongly weighted toward low curvature. But as sparseness constraints increased, the simulated tuning distribution shifted progressively toward more acute convex and concave curvature, matching the neural recording results. These findings indicate a sparse object coding scheme in mid-level visual cortex based on uncommon but diagnostic regions of acute contour curvature.
We recorded the responses of isolated V4 neurons from three rhesus macaque monkeys (Macaca mulatta) performing a visual fixation task. During the fixation period, five randomly selected stimuli were flashed in the receptive field (RF) for 750 ms each (with a 250 ms interstimulus interval). The stimuli were presented in a color roughly optimized for the neuron and animated in a small-diameter circular motion pattern against a textured background to establish figure/ground organization.
We used an evolutionary stimulus strategy to minimize bias in the experimental design and maximize the effective stimulus space being explored. This strategy is based on initially random, abstract stimuli evolving probabilistically in response to neural activity, so as to concentrate sampling in relevant regions of shape space. Although our ultimate goal was to demonstrate a relationship of V4 tuning to natural objects, the use of natural objects themselves as stimuli could have biased the results in that direction. Instead, we sought to show that even abstract stimulus responses are related to the constraints imposed by natural object coding.
For each stimulus lineage, a first generation of random stimuli was created by probabilistic placement of control points defining concatenated Bezier splines, with constraints against collision and looping (Figure 1A and Supplemental Movie). These spline-based contours could form either closed figures or partial boundaries that extended beyond the RF (Figure 1B). Subsequent stimulus generations included partially morphed descendants of previous stimuli in addition to newly generated random stimuli (Figure S1). The probability of a given stimulus producing descendants was related to the neural response it evoked (see Supplemental Experimental Procedures). The partial morphing procedure provides a way of breaking up effective stimuli into components and recombining them with other elements to discover which shape parameters are critical for evoking responses.
For the example cell in Figure 1, two independent lineages, starting from different first generations (Fig. 1B), ran in parallel. After seven generations, high response stimuli in both lineages converged toward the same configuration: a convex projection next to a concave indentation, both oriented toward the left/upper left (Figure 1C). Other neurons converged on different contour configurations (Figure S2A). Convergence of multiple lineages is not essential to the analysis of general response bias presented here, so our sample includes 65 neurons studied with only one lineage.
Previous studies have established that V4 neurons are sensitive to both orientation and curvature (the derivative of orientation) [9–14]. Here, we characterized each V4 neuron's response bias in the curvature/orientation domain with a spike-weighted matrix (Figure 2A). To construct this matrix, we determined for each stimulus which bins were occupied by at least one point along its contour, and we summed the stimulus's response rate into those bins. After repeating this for all stimuli, we normalized each sampled bin by the number of samples, interpolated a surface across these bins, and smoothed with a Gaussian function. We then used a smoothed threshold function to reveal the regions consistently associated with high responses (see Supplemental Experimental Procedures for further details). Fitted tuning models, with position dimensions and nonlinear terms, are much more accurate for predicting responses [12, 13, 15], but the spike-weighted matrix is more inclusive and avoids the assumptions and instabilities of model-fitting, and is thus better suited for our purpose here of measuring overall population bias. For the example cell, high responses were associated with convex curvature with an orientation range centered near 160° and concave curvature with an orientation range centered near 180° (Figure 2A). This pattern corresponds to the convex/concave configuration visible in the highest response stimuli (Figure 1C).
To estimate V4 response bias at the population level, we averaged spike-weighted matrices across the 165 neurons in our sample. The resulting population matrix (Figure 2B) is clearly anisotropic in the curvature dimension (Kolmogorov-Smirnov, p < 0.05), with a strong bias toward acute convex and concave curvature. The correlation between response energy and curvature magnitude was highly significant (r = 0.98, p < 0.001). Similar results were obtained with a range of threshold functions.
The bias toward acute curvature was robust across independent lineages. For 100 neurons studied with two lineages, one lineage each was used to construct the population matrix shown in Fig. 2C. The other lineages for each neuron were used to construct Fig. 2D. These two matrices, based on independent datasets, are both strongly biased toward acute curvature. Random sampling of lineages from these 100 neurons was used to construct a 95% confidence interval for the curvature tuning function (Fig. 2E). The histogram of correlations between lineages for these neurons in the curvature/orientation domain (Fig. 2F, blue bars) had a median value of 0.37. In many cases, lower correlation was due to partial rotation invariance of V4 responses, which meant that different lineages could evolve stimuli with similar shape but different orientations. Thus, correlations in the curvature domain alone (Fig. 2F, red bars) were higher, with a median value of 0.65. The acute curvature bias was also consistent when analysis was restricted to neurons with stronger correlation between lineages and higher maximum responses (Figure S2C).
The acute curvature bias was not inherent to the evolutionary sampling method. We performed simulated experiments on 100 hypothetical neurons emitting random response rates. The average maximum stimulus curvature in generation six was 1.3% lower than in generation one. The average mean curvature was 0.04% higher. These negligible differences could not have produced the strong tuning biases observed here.
We hypothesized that the V4 acute curvature bias reflects constraints associated with object coding. We therefore attempted to explain the bias by simulating V4 population coding of natural objects. We found that object discrimination training alone did not produce the observed distribution. However, simulations with an additional sparse coding constraint were biased toward acute curvature in the same way as V4 neural responses.
Each simulation comprised 100 model neurons with V4-like Gaussian tuning for contour curvature, orientation, and object-relative position. We have previously shown that this type of model is successful in explaining neural response variance across large, diverse stimulus sets and supports reconstruction of complex stimuli from neural responses [12, 13, 15]. Here, we constructed a population of these previously validated models and allowed only their mean tuning positions (in the curvature, orientation, and position dimensions) to vary during training. Figure 3A shows three example tuning profiles in red, green, and blue. The curvature/orientation domain is analogous to that in Figure 2. Relative position is specified as angle with respect to object center, with 0 representing contours on the right side of the object, 90 representing contours at the top, etc. For tractability, tuning for retinotopic position (i.e. RF position) was ignored. Instead, each model neuron effectively stood for a subpopulation of V4 neurons, with different RF locations tiling retinotopic space, but with equivalent tuning for curvature, orientation, and object-relative position.
A set of 10966 natural object photographs from HEMERA Photo-Objects Database was used for training and testing. Each object boundary was fit with a continuous contour (Figure S3) and numerically described as a set of sample points specified by curvature, orientation, and angular relative position (Supplementary Experimental Procedures). Each model neuron responded according to how close any of these contour fragments fell to its curvature/orientation/position tuning peak. For example, the table stimulus (Figure 3B) would evoke a strong response from all three example neurons, based on its correspondingly colored contour fragments. Other objects would evoke different response patterns depending on their constituent contour fragments.
The training objective was to minimize estimated pairwise object discrimination error by an ideal observer. Pairwise error was defined as the Gaussian error function (erfc) of inter-object distance in the 100-dimensional neural response space. Error was minimized over tuning parameters by constrained optimization in MATLAB. For the sake of tractability, discrimination error was estimated for a subsample of 100,000 randomly selected object pairs. A new subsample was selected every 300 iterations, over the course 7500 iterations in total (see Supplementary Experimental Procedures for further details). We ran a total of five simulations, starting from different sets of randomly generated tuning parameters.
For the resulting 500 model neurons, identification accuracy (based on maximum likelihood) for five sets of 100 objects not used in training averaged 100%. The curvature/orientation tuning peaks for these 500 model neurons are shown in Figure 4A (circles). Tuning peaks are broadly distributed across the orientation dimension, but clustered in the curvature dimension near flat and shallow curvatures. This pattern is the opposite of the V4 neural response distribution, which is replotted for comparison in Fig. 4 (colored surfaces). This suggests that the V4 neural response distribution is not optimized for object discrimination alone.
Simulated tuning distributions became similar to the V4 neural response distribution when sparseness was added as an additional constraint. In these simulations, the error function to be minimized was a weighted sum of discrimination error and a standard index of response density [4, 6], measured across the model neuron population for each object and then averaged across objects:
where xi,j is the response of the jth model unit to the ith object. RD is inversely related to sparseness and has a maximum value of 1. Minimizing RD, as defined here, increased population sparseness, i.e. sparseness of the expected distribution of neural responses to a given image . The simulated tuning distributions in Figures 4B–D are based on increasing RD weights in the error function. With no sparseness constraint (Figure 4A), average RD across images was 0.80; with the strongest weighting (Figure 4D), average RD fell to 0.11. As RD fell and sparseness increased, simulated tuning progressively concentrated toward acute convex and concave curvature, matching the observed V4 response distribution. The strongest correlation with the observed V4 response distribution was obtained with an RD value of 0.11 (r = 0.55). Identification accuracy remained high at all three sparseness levels (100%, 100%, and 97%, respectively).
These modeling results indicate that the observed curvature bias in area V4 would produce much sparser population responses to objects while retaining sufficient information for object discrimination. The observed V4 distribution is very different from the distribution expected on the basis of object discrimination alone. The V4 distribution is strongly biased toward acute curvature, while the expected distribution is strongly biased toward flat contours. This difference has a strong effect on sparseness of population responses to objects. The acute curvature bias, as found in V4, produces an average RD value in the range of 0.1, while the flat contour bias in the expected distribution produces an average RD value in the range of 0.8. Thus, the V4 coding scheme appears markedly sparse relative to what would be expected if object discriminability were the only constraint on intermediate visual processing.
Our results provide the first description of a sparse coding scheme in area V4, a major intermediate stage in ventral pathway visual cortex. Sparse coding is considered an important goal of sensory transformation because it increases representational capacity  and reduces metabolic energy requirements . It is reasonable to speculate that the V4 coding scheme evolved in response to these adaptive advantages, though sparseness is not the only constraint that might produce an acute curvature bias. At the modeling level, we found that the acute curvature bias was not produced by minimizing various rearrangements of the terms in the RD expression (Figure S4A), but there must be many mathematical constraints that would produce a similar bias. On the evolutionary level, there could be many other advantages to selective representation of acute curvature, perhaps relating to higher ecological relevance for object parts with acute curvature. Thus, there may be some other constraint that drove the acute curvature bias. However, regardless of how and why it evolved, the curvature bias seems likely to produce sparser object responses in area V4, and sparseness has strong implications for computational efficiency, metabolic efficiency, and memory storage.
This conclusion derives from the assumption that our modeling results are appropriate for interpreting the observed acute curvature bias in area V4. The models were closely based on previously validated models of intermediate visual neurons [12, 13, 15], but mechanisms of intermediate and higher-level vision remain controversial, and no current model can be regarded as definitive. Area V4 neurons might operate in ways not captured by our models that affect sparseness. Moreover, we cannot make any claim, based on these analyses, about the absolute level of sparseness in V4 responses. We are only claiming that, given the tuning bias toward acute curvature, V4 responses are likely to be sparser than they would be without such a tuning bias. This seems logical, apart from any specific modeling results, given the low frequency of curved contours in relation to flat contours (Figure S4B). Neurons tuned for less common image elements are bound to respond less frequently. V4 neurons show a strong tuning bias for acute curvature. Given the relatively low frequency of acute curvature, these neurons are bound to respond more sparsely than neurons without such a bias.
In early visual cortex, sparse coding is achieved by exploiting local statistical regularities in natural images to reduce redundancy of neural signals [1, 2]. Gabor-like RF structures in V1 reduce redundancies due to local spatial frequency correlations in natural images . Nonlinear interactions with the non-classical surround exploit image correlations that extend beyond the classical RF . Coding in early visual cortex is constrained to minimize information loss, since the rest of the brain gets most of its detailed visual information directly or indirectly from V1 . Redundancy reduction based on local statistical regularities can be achieved without substantial loss of information.
Sparsification in mid-level cortex might require other mechanisms due to the different constraints of intermediate shape processing. Neurons in mid-level visual cortex integrate information across larger RFs and non-classical surrounds . Statistical correlations are bound to be lower on this larger scale, because physical relatedness between object parts is statistically weaker across greater distances. Thus, redundancy reduction may not be an option for sparsification at this scale.
However, mid-level cortex is also more specialized, with less need for complete preservation of image information, and more scope for emphasizing information required for specific aspects of visual perception . V4, in particular, is part of the ventral pathway [18, 19], which emphasizes shape, color, and texture in the service of object perception. Given this specialization, further sparsification could be achieved by biasing representation toward image features with high object information content but lower probability of occurrence. In this way, a given object could be represented in terms of a small number of uncommon but diagnostic elements.
This alternate kind of sparse coding strategy appears to be implemented in V4 by emphasizing the representation of acute contour curvature, which is appropriately uncommon and diagnostic. Acute curvature was approximately an order of magnitude less common than flat or shallow curvature in our natural object set (Figure S4B). This reflects the fact that, on the scale of visual perception, natural objects have mostly smooth rather than highly intricate boundaries. Thus, sparse coding simulations based on primarily acute curvature tuning (Figure 4C and D) had low response densities (0.22 and 0.11, respectively). In contrast, non-sparse simulations based on tuning for more common flat/shallow contour regions had high response densities (Figure 4A, 0.80). (See also Figure S4C.) At the same time, regions of acute curvature are still highly informative about object identity. In our simulations, accuracy remained high for the sparsest condition (RD = 0.11, accuracy = 97%) even when the remaining low-curvature model neurons (−0.4 < c′ < 0.4) were removed (accuracy = 85%). Curved contour regions are also perceptually salient [20–22] and more perceptually informative than flat contours [1, 23].
Bias toward representation of uncommon features with specialized information content could be a general strategy for sparse coding in higher-level cortex. Some evidence suggests that object coding is sparse at the final stages of the ventral pathway in IT (inferotemporal) cortex and medial lobe temporal structures like the hippocampus [7, 8, 24]. Sparseness at these higher levels could be achieved by selectivity for more complex features [15, 25, 26] with even higher information content. Bias toward tuning for acute curvature, which has been demonstrated in IT , might also enhance response sparseness at this level. Alternatively, IT cortex might be optimized for discrimination at the expense of sparseness . The combined simulation/adaptive search strategy used here might help to elucidate coding strategies in higher-level visual cortex as well as in other sensory modalities.
We thank Zhihong Wang, William Nash, and William Quinlan for technical assistance. This work was funded by the National Eye Institute.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.