|Home | About | Journals | Submit | Contact Us | Français|
Most neurons in visual cortex respond to contrast borders and are orientation selective, and some are also selective for which side of a border is figure and which side is ground (‘border ownership coding’). These neurons are influenced by the image context far beyond the classical receptive field (CRF), and as early as 25ms after the onset of activity in the cortex. The nature of the fast context integration mechanism is not well understood. What parts of a figure contribute to the context effect? What is the structure of the ‘extra-classical surround’? Is the context information propagated through horizontal fibers within cortex, or through reciprocal connections via higher-level areas? To address these questions we studied border ownership modulation with fragmented figures. Neurons were recorded in areas V1 and V2 of macaca mulatta under behaviorally induced fixation. Test figures were fragmented rectangles. While one edge was centered on the CRF, the presence of the fragments outside the CRF was varied. The surround fragments produced facilitation on the preferred border ownership side as well as suppression on the non-preferred side, with about 80% of the locations contributing on average. Fragments far from the CRF influenced the responses even in the absence of fragments closer to the CRF, and without the extra delay that would incur from propagation through horizontal fibers. Three principally different models are discussed. The results support a model in which the antagonistic surround influences are produced by reentrant signals from a higher-level area.
Most neurons in the monkey visual cortex (V1, V2) are orientation and edge selective (Hubel and Wiesel, 1968). When tested with a large uniform figure, these neurons respond selectively to the borders of the figure (Friedman et al., 2003). We have recently found that many neurons are also selective for the location of the test figure relative to the receptive field even when the displays are made locally identical (Fig. 1A) (Zhou et al., 2000). About 50% of the neurons in area V2 show side-of-figure selectivity. We called this ‘border ownership coding’, suggesting that the differential activity of neurons with opposite side preference is the code for assigning borders to objects. Supporting this interpretation, side-of-figure selectivity was often combined with selectivity for cues that define depth order of surfaces, such as dynamic occlusion (von der Heydt et al., 2003) and stereoscopic depth (Qiu and von der Heydt, 2005), and the responses of the neurons were consistent with perceptual organization in situations of transparent overlay (Qiu and von der Heydt, 2007). Border-ownership selective activity was also demonstrated in the human visual cortex (von der Heydt et al., 2005; Fang et al., 2009).
The selectivity for side-of-figure is interesting because it shows that the neurons are influenced by the image context, but its mechanism still remains unclear. Are we dealing with Gestalt mechanisms that endow the visual cortex with the ability to identify objects at this early stage of processing? Or does the context influence merely reflect simple operations like end-stopping or surround inhibition that make a neuron selective for convexity of contour? The neurons might sense the way a contour continues outside the CRF. In the test of Fig. 1, knowing the orientation of the corners next to the CRF would be sufficient to tell where the square is located. Such a simple mechanism could certainly contribute to border ownership assignment, but would not have the wisdom for true figure-ground segregation, nor could it enable visual search and selective attention, as proposed by the Gestalt theory (Koffka, 1935).
The aim of the present experiments was to clarify the mechanism of side-of-figure selectivity. We devised a method of fragmented figures to analyze the spatial and temporal characteristics of the surround influence. Capitalizing on the power of factorial design, this method allowed us to measure the contributions of the elementary features composing a figure, and their interactions. The results reveal a sophisticated and specific mechanism. Most parts of the contour of a figure were found to contribute surround modulation: enhancement on one side of the CRF, and suppression on the other. The timing of the influences from spatially dispersed features indicates that the differential modulation involves rapid back projections from a more central cortical area.
We studied neurons in two male adult macaques (Macaca mulatta). The details of our general methods have been described (Qiu and von der Heydt, 2005; Zhou et al., 2000). The animals were prepared by implanting, under general anesthesia, three small posts for head fixation and two recording chambers (one over each hemisphere). The anesthesia was produced by subcutaneous injection of ketamine HCl followed by intravenous infusion of pentobarbital. Buprenorphine was used for postoperative analgesia. Fixation training was achieved by controlling fluid intake and using small amounts of juice or water to reward fixation. All animal procedures conformed to National Institutes of Health and USDA guidelines as verified by the Animal Care and Use Committee of the Johns Hopkins University.
Stimuli were generated with Open Inventor on a Pentium 4 Linux workstation with an NVIDIA GeForce 6800 graphics card using the anti-aliasing feature of the software, and were presented on a 21-inch EIZO FlexScan T965 color monitor with 1600×1200 resolution at 72 Hz refresh rate. Stereoscopic pairs were presented side-by-side and superimposed optically at 40 cm viewing distance. The field of view subtended 17H by 26V deg visual angle. A white (93 cd/m2) cross inside a 20 arcmin diameter disc of 9 cd/m2 served as fixation mark. A neutral gray background of 28 cd/m2 luminance was used, except in the border ownership tests to be described next.
Border ownership (BOS) selectivity was tested with solid figures and ‘Cornsweet figures’ of rectangular shapes (Fig. 1). The contours of the Cornsweet figures consisted of step-edges with exponential decays on either side, so that the color/luminance variation was confined to a narrow band along the edge and the interior of the figures had the same color or gray level as the background. Specifically, a Cornsweet figure was constructed as follows: Let B and F be the color vectors of background and figure colors, respectively, in RGB color space. Then, the profile of an edge at x = x0 was defined by
where x is the spatial dimension orthogonal to the edge, exp is the exponential function, and sgn is the sign function. The space constant λ was set to 1/24th of the figure size in the x direction.
Fragmented figures (see examples in Fig. 2) were created from a Cornsweet figure by occluding parts of it with rectangles of the background color. The size of these ‘invisible occluders’ was half the length of the figure edge. To avoid creating sharp terminations the occluders were made transparent near their borders (12% of the occluder width), where opacity dropped according to a cosine-squared function. As a result, the Cornsweet edges blended smoothly with the background. Eight occluders were used, centered on the edges and corners of the figure.
In the BOS tests, two colors were used for figure and ground, the cell’s preferred color (or white, if no clear color selectivity was found) and a complementary color (or a neutral gray of 28 cd/m2 luminance). In half of the presentations the first color was used for the figure and the other for the background. In the other half, the two colors were flipped. In tests with Cornsweet figures, a color intermediate between the two colors was used for the background throughout the trial. In solid figure tests, this color was used as the background between trials and in the inter-stimulus intervals within trials.
Direction of gaze was monitored for one eye with an infrared video-based system (Iscan ETL-200) at 60Hz with a spatial resolution of 5120 (H) and 2560 (V). The eyes were imaged through a hot mirror, optically placing the camera on the axis of fixation. The optical magnification in our system resulted in a resolution of the corneal position signal of 0.08 deg visual angle in the horizontal and 0.16 deg in the vertical. Noise and drifts of the signal of course reduced the accuracy. Behavioral trials began with the presentation of the fixation mark on a blank screen. A test sequence was initiated when gaze was in a predetermined fixation window (1 deg radius) and the first stimulus appeared 300 ms after fixation was detected. The animal was rewarded for keeping its gaze in the fixation window for a fixed duration of 2.3 or 3.3 s, depending on the experiment. After successful termination of a trial the display was blanked for an interval of 0.5 – 1.2 s. When fixation was broken, the trial was terminated and the following blank interval was increased by 1 s. The recordings showed that fixation was generally more accurate than required by the size of the fixation window. The standard deviations of the fixational eye movements in horizontal and vertical direction, calculated from all fixation periods of the main experiment, were (0.18, 0.24) deg. This includes drifts and other signal errors produced by the recording system.
Single-neuron activity was recorded extracellularly with epoxy-insulated tungsten microelectrodes inserted through the dura mater. A spike detection system (Alpha Omega MSD 3.22) was used. Spike times, stimulus events, and behavioral events were digitized and recorded by computer. The spike times were corrected for the spike detection delay. The stimulus events refer to the time when the vertical scan of the display monitor reached the average position of the receptive fields.
Cells in area V2 were recorded either in the lunate sulcus after passing through V1 and the white matter, or in the lip of the post-lunate gyrus. The assignment of cells to areas is based on location of tracks, depth of recording, and physiological criteria (topography and size of receptive fields).
After isolating a cell we first characterized its selectivity for color, bar size, and orientation, and mapped its CRF. The color tuning was determined with stationary flashing bars, and the minimum response field was hand mapped with bars or drifting gratings. Orientation and disparity tunings were determined with moving bars. Stimuli were generally presented with zero disparity (i.e., in the fixation plane), except when a cell was found to be disparity selective, in which case the optimum disparity was used in the following tests. For most cells in our sample, CRFs were also mapped automatically by presenting light and dark bars randomly at 81 positions on a 9×9 grid. Bar presentation was for 100ms with blank intervals of 100s.
BOS selectivity was determined by a standard test using edges of solid squares (Fig. 1A). This test was performed with both polarities of edge contrast and with two sizes of squares, 3 and 8 deg (Qiu and von der Heydt, 2005). The same tests were performed with Cornsweet squares (Fig. 1B).
After the initial measurements of BOS selectivity, cells were tested with fragmented figures. Each cell was tested with a square, between 3 and 8 deg on a side, and, if time permitted, with squares of other sizes and rectangular figures. In each test, all conditions (including the two contrast polarities and no-center-edge controls) were presented in pseudo-random order in which each condition was presented once before moving on to the next repetition. The stimuli for studying BOS selectivity were presented for 300 ms with blank intervals of 200 ms, which allowed us to test 5–6 conditions per fixation period.
Spikes were counted during the intervals from 100 to 350 ms relative to stimulus onset for the response, and from −50 to +30 ms for the baseline (which was used only in plotting the no-center-edge controls).
The data from the standard BOS tests were analyzed by performing a 3-factor ANOVA on square-root transformed spike counts. The factors were side of figure, local edge contrast, and figure size. The square-root transform was used to equalize the variances.
The fragmented figure data were analyzed with a 2nd order regression model as described in Results. This analysis was performed on direct spike counts and bootstrapping was used to determine confidence limits of the estimated parameters (no square root transform was used because the bootstrapping estimate does not depend on assumptions about the distribution of the spike counts). The bootstrapping was performed by randomly re-sampling responses within each stimulus condition.
This method estimates the confidence limits based on the variation of the responses between repeated presentations of the same stimulus (within-neuron design). This is the method of choice here because the questions we ask are of the kind whether an observed effect could be explained by the random variation of responses in the given neuron (or set of neurons). For example, whether a surround stimulus affects the CRF responses or not; and to what extent the measured variation of effects between surround stimuli is due to their different effectiveness rather than random response variation. In this design, the variation between neurons is not considered an error variance. Thus, the present analysis does not answer to the question of the reproducibility of the results if the experiments were repeated in a different set of neurons. To address this question, we have included supplementary figures showing the population means with confidence limits calculated by re-sampling between neurons (Figs. S1 – S2). For this calculation, multiple tests in the same neuron were pooled by averaging the effects of corresponding fragments.
Population results were obtained by computing weighted averages of the regression coefficients. If a neuron was tested with more than one figure size or shape, a separate analysis was performed on each test, and the results from each test were entered separately in the average. Each neuron (or test) was weighted with the inverse of the standard deviation of residuals obtained from the regression analysis.
For the time course plots we computed peri-stimulus time histograms (1 ms bin width) and the difference between the histograms for preferred and non-preferred sides, for each neuron or test. We then obtained weighted averages of these histograms (or differences of histograms) and applied smoothing (LOWESS, Systat 11; tension=0.04 or 0.06, depending on number of cells averaged). The weighting was the same as described above for the regression coefficients. The time points of half-maximal signal were estimated by computing least-square fits to the average histograms in the range from −50ms to 250ms with the function
where t is time, S(t) is the difference histogram (the BOS signal), a is the estimate of the signal amplitude, t0 the estimate of the signal onset, and t1/2 the estimate of the point of half-maximal amplitude. The exponent n was set to 5 (the value of n had little influence on the estimates of t1/2; a value of 5 produced a slightly better fit, as measured by r2, than smaller values). Responses (mean of the two BOS conditions) were fit with the function
with n=2. The exponential factor with the additional parameter τ was added to better fit the peak and decay of the firing rate. For BOS signals this factor was not included because these signals did not decay as much within the range of the fit.
The goal of our experiments was to study the spatial structure and temporal dynamics of the modulatory surround influence in border ownership (BOS) selective neurons. We have shown in previous studies that displays of rectangular figures produce strong BOS modulation in many neurons of area V2. Building on these results, we devised tests in which similar figures were broken up into fragments that could be presented in isolation and in combinations. This method allowed us to measure the influences of the individual fragments and their interactions.
The solid figures that we have mainly used in the past cannot be decomposed into fragments without introducing conspicuous new edges that would likely influence the responses themselves, whereas outline figures, which are often used in fragmented figure tests by psychologists, tend to be less effective in producing side-of-figure selective responses (Fangtu Qiu and Rudiger von der Heydt, unpublished results). We have therefore designed displays with ‘Cornsweet edges’ in which luminance (or color) varies across the contour in the form of a step function, as in regular solid figures, but transitions smoothly (exponentially) to the background color on both sides. Examples of such figures are shown in Figs. 1 and and2.2. We used odd-symmetrical edges with a positive luminance/color excursion on one side and an equally large negative excursion on the other side. (In the case of color variation, positive means a deviation from the background color—a neutral gray—towards the neuron’s preferred color in 3-dimensional linear color space, and negative means a deviation in the opposite direction.) We generally used luminance variation of neutral gray if a cell was not color selective, and combined luminance-color variation in color selective cells. It was important that the transition to the background color had the same shape on both sides of the edges because this made the assignment of the local edge ambiguous (asymmetric edges with a sharp transition on one side and a gradient on the other tend to bias perception towards assignment to the side of the gradient (von der Heydt and Pierson, 2006; Palmer and Ghose, 2008)).
Figures composed of Cornsweet edges (in the following called ‘Cornsweet figures’) appear either brighter or darker than the background, depending on the polarity of the edges (Fig. 2A, top). They also appear tinted with color if the variation has a chromatic component. As in our previous studies with solid figures we always tested both signs of contrast polarity and compared the responses to the right-hand edge of a bright figure with the responses to the left-hand edge of a dark figure, that is, two displays that were identical within and around the CRF (see examples in Fig. 1B). The tests also included the corresponding conditions in which the colors were flipped, producing the opposite sign of local contrast in the CRF (Fig. 1 bottom).
For each neuron we generated the basic pair of Cornsweet figures based on the cell’s preferred orientation and color, and placed each figure so that one edge was centered on the CRF, as shown in Fig. 2A (for a cell with vertical orientation preference). In the following we refer to this edge as the ‘center edge’. From each of the basic figures a number of fragmented figures were generated by variably eliminating edges and corners. The edge fragments tapered off to the background along the edge with a profile of a half cycle of cosine-squared so as to avoid introducing sharp terminations. Each figure was decomposed into eight fragments, four edges and four corners. According to their location relative to the CRF refer to these as ‘Near corners’, ‘Near edges’, ‘Far corners’, and ‘Far edges’ (Fig. 3).
In the main test, the center edge was present in each stimulus presentation, and the presence of the other seven fragments was varied factorially, generating 27 = 128 combinations for each figure. Thus, the main test consisted of 128 fragmented versions of each of the basic figures (128 versions of the bright figure on the left and 128 of the dark figure on the right) plus the corresponding displays with reversed local contrast, for a total of 512 displays.
In addition, displays without the center edge were tested for control (Fig. 2B). This set consisted of displays of all seven surround fragments on either side (Fig. 2B top) and displays of each of the 14 surround fragments in isolation (Fig. 2B below), again with both contrast polarities, for a total of 32 displays.
In a complete test run, the 512 displays of the main test and 2 sets of the 32 control displays were presented in random order at a rate of two per second (six presentations per fixation).
We analyzed the data of each neuron with two regression models, a simple linear model for the control set,
and a 2nd order model for the main test,
where R represents the responses, defined as the spike count between 100 and 350 ms after stimulus onset, R0 is the baseline spike count of the neuron (the mean across all intervals from 50 ms before to 30 ms after stimulus onset), Re is the mean center edge response, C is the contrast polarity variable (C= −1, 1), and Fi (i=1..14) are the fragment variables (Fi = 0 if absent, Fi=1 if present). The first sum in Equation 2 represents the linear effects of the surround fragments, and the second sum their interactions with contrast polarity. The last term represents the interactions between fragments. Note that only the products for pairs of fragments on the same side are non-zero, because fragments were present only on one side at a time. The value of R for F 0 is the estimated response to the center edge alone (which depends only on the contrast polarity). Thus, di, ei and fij estimate how much the presence of surround fragments enhanced or reduced the center edge responses. If a neuron was tested with different sizes or shapes of figures, separate regressions were performed on each set of data.
To better understand the design, consider that for each contrast polarity a total of 15 fragments were tested, the center edge and 14 surround fragments. Each fragment is associated with a binary parameter (0=absent, 1=present). Thus, the number of all possible fragment combinations would be 2^15. However, only a subset of these combinations was tested. There were two restrictions: (1) only fragments of one figure at a time were presented; thus only seven surround fragments and the center edge were tested for either figure location; (2) complete factorial sets of surround fragment combinations were tested only in the presence of the center edge; in the absence of the center edge (the control condition), a reduced set was tested: the single fragments and the combinations of seven surround fragments on either side. These restrictions enabled us to use a factorial design to measure the effects of the surround fragments in modulating the CRF response, and also the interactions between the fragments in modulating this response. Without restrictions this would not have been feasible.
The 1st order effects of the surround fragments were estimated from the control data set (equation 1). In this set, we tested each single fragment on either side in the absence of the center edge. According to the concept of a CRF that is surrounded by regions that are purely modulatory, these stimuli should produce no responses at all. We will test this assumption with the model of Equation 1.
The influences of the surround fragments in the presence of the center edge were estimated by the model of Equation 2. Under the assumption that the 1st order effects are zero, the main effects in this model are estimates of 2nd order effects: the modulatory influences of the surround fragments on the center edge response. The model of equation 2 also estimates the interactions between fragments in modulating the center edge response. In the CRF-surround concept, these are 3rd order effects.
Neurons were recorded from areas V1 and V2. The receptive field eccentricities ranged between 0.7 and 6.5 (median 1.77) deg visual angle. Only orientation selective cells were studied. After determining the preferred orientation with a suitable bar stimulus and mapping the CRF, the standard BOS test with solid squares was performed (Fig. 1A, for two square sizes, 3 deg and 8 deg). Cells that did not show a significant side-of-figure effect (p<0.05) in this test were usually not studied further.
We first wanted to see if Cornsweet figures were effective in producing BOS signals. Solid figures differ in color from the background over large regions, whereas in Cornsweet figures, the color differences are confined to a narrow seam along the contours. We compared BOS modulation by Cornsweet figures and by solid figures in each neuron. A scatter plot of the main effects of side for the two kinds of figures (from separate ANOVAs) shows that the effects were highly correlated (Fig. 4, Pearson r = 0.85, N=171). Cornsweet figures were 86% as effective as solid figures (as measured by the slope of the line that minimizes the squared perpendicular deviations). This result shows that BOS modulation depends more on the contours than on regions of different color or luminance.
Fragmented figure tests were completed in 99 neurons from two animals (31 from M23 and 68 from M24), in some cases with multiple sizes and shapes, for a total of 141 tests. Neurons were included in the following analysis if the effect of side-of-figure was significant at p<0.01 (2-factor ANOVA, factors: side-of-figure, local contrast). To be sure that the surround fragments were outside the CRF we also checked if, in the control condition without center edge, the ‘near corners’ did not produce any responses. One neuron (2 tests) was excluded for this reason. A total of 100 tests in 66 neurons were included (20 from M23 and 46 from M24; 43 neurons were tested with one figure size, 23 with multiple figure sizes and shapes). In 80% of the tests, each condition was tested twice or more times, in the others at least one quarter of the conditions were repeated. Eight of the neurons were recorded in area V1 (15 tests) and 57 in V2 (84 tests). One neuron could not be assigned to an area with certainty. Our sample consists mainly of V2 neurons, because BOS selectivity is much less common in V1 than in V2 (Zhou et al., 2000).
The modulatory effects of the surround in two example neurons, each tested with squares of two sizes, 3 and 8 deg, are shown in Fig. 5. We plotted the linear estimates in the form of maps that show the locations of the figure fragments relative to the CRF, and the strengths of the effects coded by color (red for facilitatory, blue for suppressive). The estimates for the two contrast polarities were averaged. As can be seen from Equation 2, these linear estimates quantify how much, in spikes per second, a fragment increased or decreased the responses relative to the center edge response. The insets show the maps of the CRF obtained with bar stimuli; the gray values represent response strength.
The maps for the control condition (no center edge) are shown at the bottom of the figure. Here, the colors represent the mean responses to each single fragment relative to the activity before stimulus onset.
Inspecting plots of the fragment effects for all cells we made the following observations: (1) Figure fragments on both the preferred and the non-preferred sides modulated the responses. (2) On each side, several fragments seemed to have an effect; those on the preferred side tended to have positive effects, those on the non-preferred side tended to have negative effects. However, many of the effects of individual fragments were not statistically significant. (3) The control condition responses did not seem to be correlated with the effects in the main test. The control responses tended to be similar on the two sides, whereas the modulatory effects tended to go in opposite directions. Specifically on the non-preferred side, control responses could be positive, while the modulatory effects were negative. (4) Most cells showed positive and negative modulatory effects, but some showed only negative effects. Otherwise, the results were quite similar across cells.
The population means of the linear estimates of fragment effects are shown in Fig. 6A. Here, we have averaged the effects of symmetrically located fragments, plotting only one mean for near corners, near edges, and far corners, on each side (no difference is expected between the population means for fragments in symmetrical locations, because the receptive field orientation varied from neuron to neuron, and, in the average, only the location of features relative to the receptive fields should matter). The two contrast polarities were also averaged. Zero on the ordinate (left scale) corresponds to the neurons’ responses to the center edge alone (19.5 spikes·s−1). The bars labeled ‘All’ are estimates of the responses to the complete figures on preferred and non-preferred sides. These were calculated from the linear estimates and the interactions (Equation 2). The responses in the control condition (no center edge) are plotted at the bottom of the graph (scale on the right). Here, the bars represent the mean responses to each single fragment and to figures in which only the center edge was missing (‘All’), and zero corresponds to the activity before stimulus onset. The error bars represent the 95% confidence limits determined by bootstrapping, as explained in Methods.
The Figure shows that, in the mean, the effects on the preferred side were all positive and of almost equal size for all types of fragments. The mean effects on the non-preferred side were negative and stronger than those on the preferred side. Here, the fragments closer to the CRF produced slightly stronger suppression than farther fragments.
The effects of the complete figures (Fig. 6A All) were much weaker than the sum of the effects of the single fragments: had the effects of the single fragments been additive, the response to the complete figure on the preferred side would have been 3.8 spikes·s−1 greater than the response to the center edge alone, but in fact the response to the complete figure was smaller than the center edge response: −0.5 spikes·s−1. On the non-preferred side, adding up the effects of the single fragments would give a suppressive effect of −17.2 spikes·s−1, but the response to the complete figure was only −4.9 spikes·s−1 smaller than the center edge response. This non-additivity indicates interactions between the fragments in modulating the center edge response. These will be discussed below.
In the absence of the center edge, the surround fragments produced virtually zero responses on average (Fig. 6A bottom). The virtual absence of responses in the single fragment controls demonstrates that the surround stimuli were in fact outside the CRF. Remarkably, the combinations of the 7 surround fragments (Fig. 6A bottom, All) did produce responses on both sides. These were small, but much stronger than expected based on linear summation of the single fragment responses. Note also that the combination of 7 fragments on the non-preferred side produced excitation in the absence of the center edge, but had strong suppressive effects in the presence of the center edge (Fig. 6A top, All). Finding positive responses to the combinations of the seven surround fragments despite the absence of single-fragment responses might suggest a summation-and-threshold mechanism. However, such a mechanism would not explain the suppressive effect in the presence of the center edge response.
As the non-additivity of the main effects indicates, there were interactions between the fragments. The pairwise interactions of fragments are plotted in Fig. 6B. The interactions on the preferred side were mainly negative, whereas those on the non-preferred side were mainly positive. The result is the sub-additivity of effects seen in Fig. 6A.
We also analyzed the data from V1 and V2 separately (Fig. S3). Although our sample of BOS selective V1 neurons is small (8 neurons, 15 tests), the results indicate that those V1 cells that are BOS selective use similar context integration mechanisms as their counterparts in V2.
The plot of the population main effects (Fig. 6A) shows that the suppressive effects on the non-preferred side were stronger than the enhancement effects on the preferred side. We wondered if this asymmetry was due to a gain normalization mechanism of the kind proposed by Heeger (1992), in which the gain of a neuron depends on the total amount of ‘stimulus energy’ in the receptive field. This means that the observed response robs is the product of the response r that would be obtained without the gain normalization mechanism and a gain factor g:
where g depends on the stimulus energy, g = g(E), and decreases as E increases. In the case of V2 neurons, the energy summation region might include the surround, and the stimulus energy would then be given by the number of contour fragments in the display, g = g(n). The average response strength of the neurons in our sample did indeed decrease with the number of fragments. In Fig. 7, crosses represent (n), the mean response across all stimuli in which the number of fragments was n. Dots connected by solid lines show the mean responses for preferred and non-preferred sides separately. Being a nonlinear operation, gain normalization would produce interactions between fragments. This raises the question if the interactions we found (Fig. 6B) are due in part, or even entirely, to the gain normalization mechanism.
Since (1) is the strength of response to the center edge alone, (n)/(1) is the gain reduction produced by adding n-1 surround fragments. We can formally ‘undo’ this effect by multiplying each single response ri with (1)/(n), where n=n(i) is the number of fragments present for response ri. The dots connected by dashed lines in Fig. 7 show the mean responses after this scaling.
The regression coefficients obtained from the scaled responses are plotted in Fig. 8. The main effects were more symmetric (Fig. 8A). The interactions also became slightly more symmetric after the scaling, but the general pattern remained the same – negative on preferred, positive on non-preferred side (Fig. 8B). Thus, gain normalization is not the cause of this characteristic pattern of interactions.
The pattern of main effects and interactions shown in Fig. 8 can be understood by assuming that the neurons have opposite modulatory inputs from contour integration mechanisms on the two sides, and that the signals at these integration stages saturate. Saturation of the modulatory signals results in incomplete summation of the surround influences from either side. In the model of Equation 2, this would produce negative interactions between the positive (enhancing) factors and positive interactions between the negative (suppressive) factors, exactly as observed.
Without the assumption of gain normalization it is hard to explain how the negative interactions on the preferred side can be so strong that the surround effect in the All condition is negative despite the linear effects of the seven fragments all being positive (Fig. 6A). For example, if one assumes response saturation at the recorded neurons, this would explain the negative interactions, but saturation at this point cannot reverse the sign of the All effect. Also, a compressive nonlinearity at the output, as in response saturation, would predict the interactions on the non-preferred side to be negative too, but they are positive. The assumption of gain normalization, which is plausible in itself, reveals the intrinsic symmetry of the BOS mechanism (Fig. 8).
We conclude that one can interpret the pattern of surround effects as the combined result of gain normalization and a mechanism that produces symmetrical enhancement and suppression on the two sides.
In all our experiments we tested figures of both contrast polarities. For a dark-light edge in the CRF, we measured the influence of the fragments of dark figures on the left, and the influence of the fragments of light figures on the right; for a light-dark edge in the CRF, the contrasts of both figures were reversed correspondingly. In Figs. 4–8 we showed averages over the two contrasts. However, in most of the cells in our sample the responses depended on the contrast polarity of the edge in the CRF, and the question arises if the influence of the surround fragments also depend on their contrast polarity. To see if that was the case, we selected the subsample of cells that were contrast polarity selective (47 neurons, 78 tests, significant effect of C in model Eq. 2, p<0.01) and calculated the surround fragment effects for preferred and non-preferred contrast polarity (in the following called “positive” and “negative” contrasts). Fragments of positive contrast produced much stronger effects than fragments of negative contrast (Fig. 9A; for the interactions see Fig. S4). Of course, the CRF responses differed also in strength (24.7 versus 15.8 spikes·s−1).
If the surround influence were multiplicative and independent of contrast polarity, then the fragment effects would be proportional to the CRF responses. To see if this was the case, we have plotted the mean surround fragment effect over the strength of the CRF responses (Fig. 9B; the solid line connects the points for the preferred figure side, the dashed line those for the non-preferred side). If the surround modulation had been independent of contrast polarity, then the effects for positive contrast could be extrapolated from the effects for negative contrast, as shown by the dotted lines in Fig. 9B. However, this was clearly not the case. While the CRF responses for positive contrast were 56% stronger than those for negative contrast, the differential effect of the surround fragments (preferred – non-preferred side) more than doubled (increase of 127%). This shows that the surround influences themselves are contrast polarity selective in these cells, matching the selectivity of the CRF.
In models of BOS coding it is generally assumed that the surround influence be orientation specific (Kikuchi and Fukushima, 2003; Zhaoping, 2005; Baek and Sajda, 2005; Craft et al., 2007). This specificity is important for implementing some of the Gestalt rules that govern figure-ground organization in perception. We performed a control experiment in a subset of the neurons in which we compared squares with scrambled figures. The latter were obtained by rotating each of the seven surround fragments by 90 deg (Fig. 2C). The scrambled figures consisted of the same amount of contour as the regular figures. All combinations of fragments were tested, exactly as was done in the main experiment, but with scrambled and regular figure displays randomly interleaved.
A comparison of the effects of regular and scrambled figures is shown in Fig. 10. The scrambling reduced the differences between preferred and non-preferred sides. Especially on the preferred side, the scrambled figures did not have much of an effect at all, whereas regular figures produced clear enhancement. This indicates that the surround input comes from orientation tuned neurons and that the integration of these inputs is orientation specific. The scrambled figure experiment is discussed further in the section on the time course below.
In principle, there are three different ways of how information can spread laterally in the visual system (Angelucci et al., 2002): by convergence of forward connections; through intracortical connections (‘horizontal fibers’); and by divergence of backward projections. The spreading of forward connections is believed to account for the CRFs which are small and thus provide only little context. The non-classical surround modulation is thought to be mediated by horizontal fibers and by feedback from higher-level areas.
Accordingly, models of BOS have used three principles of context integration: feed forward mechanisms (e.g., Sakai and Nishimura, 2006), signal propagation via horizontal fibers (e.g., Zhaoping, 2005; Baek and Sajda, 2005), and feedback loops including higher-level areas (e.g., Craft et al., 2007; Jehee et al., 2007). These three principles are illustrated in Fig. 11A–C.
It has been suggested that the surround influence in BOS neurons is similar to the ‘non-classical surrounds’ of receptive fields in primary visual cortex that can be demonstrated by applying a grating in an annular region around the CRF and measuring its effect on the response evoked by the center stimulus. Such gratings generally have a suppressive effect. Using patches of gratings, Walker et al. (1999) found that the surrounds are usually non-uniform. Often, a grating patch that covered only a small fraction of the annulus had the same effect as the whole annulus. Walker et al. concluded that surrounds often consist of a single localized sensitive region.
Based on this and other similar studies of V1, Sakai & Nishimura (2006) proposed that the BOS modulation in neurons of monkey V2 observed by Zhou et al. (2000) could be fully explained by assuming that each neuron possesses two localized sensitive regions in the surround, a suppressive one on the non-preferred side and a facilitatory one on the preferred side (Fig. 11A). They also showed that the observed variation between neurons in BOS selectivity for different stimulus geometries could be explained by assuming that location and size of these regions varies between neurons, and were able to simulate this variation by randomly picking the location and size of the regions.
The assumption that the modulatory surrounds consist of only two ‘hot spots’ seems to contradict the rather uniform distribution of surround influences on either side in the population means (Fig. 6). But this uniform appearance could of course be the result of averaging the surrounds of many neurons, each of which might have a non-uniform structure. The results in the example neurons (cf. Fig. 5) also suggest that BOS modulation originates from multiple locations on either side of the CRF. However, in the data of the individual neurons only few of these effects were significant. Thus, it is not clear if all the locations contribute small amounts that add up to the total BOS signal, or if most of the surround is insensitive and contributes nothing, as was found for the non-classical surrounds in V1. The population average is not conclusive, and the data from single neurons are too noisy to allow a conclusion about the exact spatial distribution of sensitivity in the surround.
To resolve this dilemma we devised the following method. We first ranked the fragment (main-) effects of each neuron, separately for preferred and non-preferred sides, and then averaged them for each rank. If the majority of neurons had a ‘hot spot’ in their surround, this method would show a high value for the first rank and much smaller values for the other ranks. And for surrounds like those described by Walker et al. in cat V1 which mainly consist of one local suppressive region, our rank-and-average method should turn out a strong negative value for the last rank on the non-preferred side (or two negative values, if two neighboring fragments stimulated the suppressive region), and values close to zero for the other ranks. In fact, the interpretation of the results is not quite so straightforward because, due to random variation of the estimated effects, the ranking tends to exaggerate the differences across ranks. Even in the case of a perfectly uniform sensitivity distribution, the value for the first rank would be higher than the value for the last rank just by sorting the random variations of effects in each neuron.
To derive model predictions for comparison with our results, we therefore generated from each set of coefficients of a real neuron a number of synthetic sets according to the model hypotheses, and then performed the same rank-and-average procedure on the synthetic sets of coefficients. Each synthetic coefficient was the sum of a mean and a noise term. The mean was assigned according to the hypothesis to be tested: For the null hypothesis, the means of all effects were set to zero. For the Sakai-Nishimura hypothesis, we took the regression coefficients of a neuron and assigned the sum of the main effects on either side to the location of the largest effect (in absolute terms) on that side, and zero to the other locations. The noise terms were generated from the data of each neuron by randomly re-sampling the responses within stimulus conditions, and fitting the model of Equation 2. The differences between the coefficients of each fit and their bootstrapped means were added as noise terms to the hypothetical means defined above.
To obtain confidence limits for the model predictions we thus generated 2,000 synthetic sets of coefficients, each with 100 members (the number of neuronal data sets). As was done with the real neurons, we ranked the coefficients of each synthetic set, and averaged them by rank across sets. This produced probability distributions of the effects from which the means and 95% ranges were determined.
The means of the ranked effects characterize the variation of effects across the surround regions in an average neuron (Fig. 12A, crosses). Blue bands show the 95% confidence limits under the null hypothesis. The slopes of these bands show the spurious variation produced by random variation of responses. The prediction from the null hypothesis also simulates the small difference between the grand means of the two sides that is introduced by assigning a preferred side in each neuron (assignment bias). The difference between the data means and the means of the null prediction is plotted in Fig. 12B. This shows the true variation of fragment influences across the surround. Four of the seven fragments on the preferred side had facilitating effects, while two had suppressive effects, on average. On the non-preferred side, all seven fragments had suppressive effects.
As pointed out, the asymmetry between the sides might be due to a separate gain control stage. When computing the variation of surround effects after removing the gain control effect by scaling the responses, as described, we find facilitation from 5/7 fragments on the preferred side, and suppression from 6/7 fragments on the non-preferred side. With or without scaling, 11 of the 14 fragments (79%) contributed positively to the BOS signals on average.
To compare our experimental data with the predictions from Sakai and Nishimura’s model, we first had to address an obvious discrepancy. The model allows only for positive effects on the preferred side and/or negative effects on the non-preferred side, whereas in our data many neurons had negative effects on both sides. To preserve the general idea of the model, we therefore performed the regressions (Equations 1 and 2) after scaling the responses (see Discounting gain normalization above). As a result, the maximum effect on the preferred side was now positive, and the minimum effect on the non-preferred side negative, in nearly all neurons, as in Sakai and Nishimura’s model. In essence, what we assume here is that the actual neuronal responses involve an additional gain normalization mechanism that was not included in that model.
The mean ranked effects are plotted in Fig. 12 together with the predictions from the model (red shaded bands). For comparison we also calculated the results that would be obtained had all fragments on either side influenced the responses equally (blue shaded bands). Most of the data points lie outside the red band, indicating significant deviation from the model predictions. Specifically, the model predicts exceedingly large values at rank 1 on the preferred side and rank 7 on the non-preferred side which are not apparent in the data. Instead, the data points show a smooth progression similar to that obtained under the even distribution assumption (blue), but somewhat steeper. The sensitivity distributions on both sides are clearly much more uniform than postulated by Sakai and Nishimura’s model. Note that the even distribution assumption is illustrated only as a baseline for comparison. The feedback schemes do not necessarily predict an even distribution of surround influences. As illustrated Fig. 11D, the grouping cell model (Craft et al., 2007) rather predicts a gradual decrease with distance.
Walker et al. also showed (in cat V1) that, when two locations of test gratings contributed surround modulation, then these locations were usually adjacent. This was further support for their conclusion that surround modulation often originates from a single compact region. To see if our data are compatible with such a model in which the surround patches are large enough so that each patch would generally be stimulated by two fragments, we calculated the combined effects of pairs of adjacent fragments and performed another rank-and-average test, this time assigning the sum of all the measured effects on either side to the pair that had the strongest combined effect. The result (Fig. S5) was similar to that shown in Fig. 12, again at variance with the patchy surround model.
In conclusion, the analysis of the population data shows that the surround structure we have seen in the example neurons is typical. BOS selective neurons receive modulatory influences from large portions of their receptive field surrounds, and the receptive field axis divides the surrounds into watersheds of suppressive and facilitatory influences that are fairly symmetric. The apparent patchiness seen in the modulation maps of single neurons is partly due to the random variation of responses. The modulation maps obtained with different figure sizes, as shown in Fig. 5, also indicated that the surround influence is not due to isolated patches, but essentially emerges wherever the surround is stimulated by a contour.
The time course of the surround influence is essential for understanding the underlying mechanism. Specifically the latency of the surround effects poses an important constraint on models. Sakai & Nishimura (2006) assumed that context is integrated by feed-forward mechanisms, but stopped short of explaining how this could be implemented in the visual cortex. This question is not trivial, because V1 and V2 are retinotopically organized areas and, consequently, the ‘region of identical stimulation’ in our BOS test (Fig. 1A) is mapped onto corresponding cortical regions surrounding the neuron under study. The interior of these regions is devoid of relevant context information, and this information must therefore be transmitted to the neuron over some distance. For the figures in our study the cortical distances that need to be bridged are considerable. Neurophysiologically realistic models of BOS have proposed either signal propagation via horizontal fibers within V2 (e.g., Zhaoping, 2005; Baek and Sajda, 2005), or feedback loops including higher-level areas (e.g., Craft et al., 2007; Jehee et al., 2007).
The cortical distances to be bridged can exceed the maximum length of horizontal fibers, which is about 5mm in V2 (Levitt et al., 1994; Stepniewska and Kaas, 1996). In foveal cortex, this corresponds to no more than a degree of visual angle (Gattass et al., 1981; Dow et al., 1985). Over larger distances, signals must be relayed, which requires that each neuron in a chain of relays fires spikes. This means that these neurons must also be activated directly from their CRFs by the visual stimulus (if a neuron would be activated by a distant feature alone, this would imply a very large CRF, for which there is no evidence). Thus, the context information must be propagated along the figure contours, or through neurons that are activated by the surface of the figure. Here, the use of Cornsweet figures is decisive, because the interior of these figures is not only devoid of contours, but completely unchanged when the figures are turned on and off. Thus, the presentation of a Cornsweet figure does not directly activate any neurons that have CRFs inside the figure. Therefore, intracortical connection models must assume that signals propagate over cascades of neurons along the contour representation, as shown in Fig. 11B (Zhaoping, 2005).
For the larger figures in our study, the path lengths of signal propagation would be on the order of 10–20mm (see Craft et al., 2007, for details). Because of the limited conduction velocity of intracortical fibers, signal transmission over such distances would introduce significant delays. Assuming a conduction velocity of 0.2m·s−1 (Grinvald et al., 1994; Bringuier et al., 1999), the delays would be on the order of 50–100ms. These estimates are somewhat uncertain, because data on horizontal fiber conduction velocity are scarce (we are not aware of any for V2). Nevertheless, they show that it should be possible to distinguish between different integration schemes on the basis of latencies: In the case of intracortical signal propagation, effects of fragments that are far from the CRF should arrive with longer latencies than effects of fragments close to the CRF. For mechanisms using back projections from a higher level the situation is different, because here the length of the pathways does not increase linearly with the lateral distance in cortex, and because the projections between areas consist of myelinated fibers that are about ten times faster than intracortical fibers (Girard et al., 2001).
The fragmented-figure method allowed us to calculate the time course for the effect of each single fragment. To address the question of conduction delays, we calculated the contributions to the BOS signal from the closest fragments (Near corners), and from the farthest fragments (Far corners and Far edge). In each case, the average response difference (preferred – non-preferred side) was calculated across the subset of tests in which only these fragments were present. The results are relatively ‘noisy’ because they are based on small fractions of the data (1/32 and 1/16, respectively).
Two findings are remarkable (Fig. 13). First, the fragments far from the CRF produced a strong BOS signal in the absence of contours connecting to the CRF (dotted line). As explained above, this is not possible in the lateral propagation scheme. Second, the onset of this signal does not show the expected delay caused by signal propagation, but rather the opposite: it arrives earlier than the signal produced by the near corners (dashed line). The signal from the far fragments reached half amplitude at 91 ms (95% CI 81 – 100) and the signal from the near corners at 107 ms (95% CI 90 – 123). (We did not include the Near edges in the comparison, because they are at an intermediate distance; inclusion of the Near edge data in one or the other group did not change the ordering of latencies.) Thus, it is highly unlikely that BOS signals in V2 are generated entirely by interactions within V2 (as suggested by Zhaoping, 2005, and Fig. 11B). At least a major portion of the signal must be generated by mechanisms that use feedback loops through white matter, otherwise the nearly simultaneous arrival of the contributions from near and far locations is hard to explain. Note that we do not argue against the possibility that horizontal fiber connections in V2 contribute. Indeed, the paradoxical observation that the ‘near’ contribution arrives slightly later than the ‘far’ contribution could be explained by assuming that the former involves at least in part horizontal fiber connections while the latter does not. Thus, it is possible that both kinds of mechanisms are combined.
To study the effect of figure size on the time course of context integration we compared the BOS signals in the neurons that were tested with large and small figures. The plots in Fig. 14 show the mean BOS signals for squares of 3–4 deg size (solid thick line) and for squares of 6–8 deg size (dashed thick line). The signal for small figures emerged earlier than the signal for large figures (92 vs. 113 ms). Thus, the assignment of BOS for larger figures takes longer.
Although an increase of the latency of BOS signals with figure size is predicted by the lateral propagation model, the measured difference of 21 ms is shorter than the expected 50–100 ms. An increase of latency with figure size would also be expected in the grouping cell model (Fig. 11C–D, Craft et al., 2007), because the length of the feedback loop is likely to increase somewhat when a figure is made larger. Larger figures would often straddle the vertical meridian, with the result that parts of the figure are represented in the hemisphere opposite the recorded neuron, requiring an extra path length through the corpus callosum. Likewise, if a figure straddles the horizontal meridian, it would be represented in widely separated parts of extrastriate cortex, which would also probably lead to an increase in length of white matter connections (Jeffs et al., 2009), compared to smaller figures which are more likely to fit entirely within the representation of one quadrant. These conjectures can be tested by analyzing the influence of the retinal location of contours on the latency of the BOS signal (N.R. Zhang, P.J. O’Herron and R. von der Heydt, work in progress.)
It may seem odd that large figures lead to longer latencies than small figures, while far fragments produce shorter latencies than near fragments (Fig. 13). However, these two findings are not in contradiction. In the subsample of neurons that were tested with two sizes, the BOS signal for large figures showed the same paradoxical delay of Near influences relative to Far influences as seen in Fig. 13, whereas for small figures, the signals emerged simultaneously (Fig. S6). In other words, increasing the figures size delayed the influence of the Near corners more than the influence of the Far fragments. This supports our conclusion that the influence of the Near corners might involve horizontal fibers, whereas the integration of more distant features occurs via the fast feedback scheme.
The time course of the BOS signals for scrambled and regular figures is shown in Fig. 15. The data are from the subset of cells tested with scrambled figures. The curves show the averages over all combinations of fragments in each condition. It can be seen that the scrambled figures generated a reduced BOS signal. Interestingly, the signal for scrambled figures emerged virtually at the same time as the signal for regular figures (99 vs. 97 ms at half amplitude), but the two signals diverged over time: in the interval between 200 and 350ms, scrambling reduced the signal to half its value (3.1 vs. 6.3 spikes·s−1).
It may seem surprising that scrambled figures generate BOS signals at all. However, the scrambled figures also have contour fragments located almost exclusively either on the preferred or the non-preferred side. Thus, the displays are strongly asymmetric for scrambled as well as for regular figures. In comparing the signals for the two conditions, note also that the results shown in Figs. 10 and and1515 are averages across neurons, and that not all neurons may be most BOS selective for square shapes. A neuron that is more selective for elongated shapes, for example, would be sensitive to contours parallel to the RF at the locations where our squares have contours orthogonal to the RF. In such a neuron, the Near Edges of our squares would have a stronger influence in the scrambled condition than in the regular condition. Thus, averaging across neurons might have reduced the effect of scrambling in Figs. 10 and and15.15. We do not have sufficient data from tests with different shapes to show whether the shape selectivity of the context integration mechanism varies between neurons.
The latencies of BOS signals and their confidence limits under the various conditions are summarized in Fig. 16. The figure also includes the mean BOS signal (the difference between the mean of all responses to preferred-side displays minus the mean of all responses to non-preferred-side displays), which reached half amplitude at 101 ms (CI 99, 103).
The latency estimates in Fig. 16 are based on the pooled responses to the various fragmented figures. For complete figures, the latencies are slightly shorter. From the standard test, we obtained 90 ms (CI 86, 95) for Cornsweet figures, and virtually the same for solid figures (88 ms; CI 84, 93). These latencies are still markedly longer than the 68 ms found previously for BOS signals in V2 (Zhou et al., 2000). This discrepancy might be due to a difference in stimulus presentation. Zhou et al. presented only one figure per fixation period, whereas in the present study, we presented five or six different conditions within a fixation period. O'Herron and von der Heydt (2009) have recently shown that BOS signals depend on the stimulus history. As a result, latencies are longer when a figure is flipped from one side to the other than when a figure is presented on a blank screen at the beginning of the fixation period. Thus, the frequent reversals of figure side in the present experiments may have led to the longer latencies.
We used fragmented figures to study the context integration in border ownership selective neurons. The method allowed us to measure the influence of contour segments in various locations outside the CRF. The most important results of our study can be summarized as follows: (1) The context integration fields of BOS neurons consist of two antagonistic regions on either side of the CRF. Either region integrates contour signals over a wide range, with about 80% of locations contributing positively to the BOS signal on average. (2) The contributions of different parts of a figure are largely independent; one piece of contour can influence the responses in the absence of the rest of the contour. (3) The influences of different parts arrive at the cell about simultaneously; the influence of distant parts is not delayed relative to that of close parts. Apparently there is no extra cost for long distance connection.
The fragmented figure method also allowed us to measure the interactions between surround fragments in modulating the CRF response. The interactions were generally opposite in sign to the main effects: negative on the preferred side and positive on the non-preferred side. Thus, when multiple fragments are present, their effects are sub-additive. This is remarkable, because models often assume that a whole figure is more than the sum of its parts, and that closure of contours is important. In other words, the effects of contour fragments should be super-additive. However, psychophysical measurements show that incomplete figures can produce BOS adaptation (Sugihara et al., 2007), which parallels our results. We did find super-additivity in our control condition, in which the center edge was absent (Fig. 6, bottom). This is similar to illusory contour responses which also showed super-additivity for surround stimuli in the absence of a CRF stimulus (von der Heydt et al., 1984).
It is not clear how the present results relate to the surround influences in V1 neurons as demonstrated with grating stimuli. In BOS neurons, we find suppressive as well as facilitatory effects, while the grating studies found surrounds to be mostly suppressive (Walker et al., 1999; Cavanaugh et al., 2002; Levitt and Lund, 2002; Bair et al., 2003; Webb et al., 2005; Shen et al., 2007) (but see Jones et al., 2002). In BOS neurons, the surround influences extend much farther than the suppressive surrounds in V1 neurons. The distance of the ‘far edge’ in our tests was typically about ten times the radius of the CRF (Fig. 5), whereas the V1 surrounds measured with grating stimuli are only about 2–3 times the size of the CRF (Cavanaugh et al., 2002; Levitt and Lund, 2002). However, these are not necessarily differences between V1 and V2, because the two groups of studies differ in methods and scope. Our results were obtained in awake behaving animals whereas most of the V1 studies were done in anesthetized animals. The ‘Cornsweet edges’ of our study confine stimulation to narrow bands along the figure contours, whereas gratings, as used in the V1 studies, massively stimulate entire regions. Our stimuli are likely to activate contour processing mechanisms, whereas gratings may rather stimulate channels for texture processing. Although our V1 sample is small, the results suggest that the context integration mechanisms in BOS selective V1 neurons are similar to those found in V2 (although such neurons are much less frequent in V1 (Zhou et al., 2000)).
Sakai and Nishimura (2006) suggested that BOS selectivity of V2 neurons could be explained by asymmetric location of sensitive patches in the surround (Fig. 11A), similar to those found in neurons of V1. However, the present results—the plots of the surround effects in single neurons (Fig. 5) as well as the analysis of the population data (Fig. 6, Fig. 12)—clearly show that the source of the surround influence in BOS selective neurons is not localized, but broadly distributed on both sides of the CRF.
The model by Zhaoping (2005) tries to explain BOS selectivity by the horizontal connectivity within V2 (Fig. 11B). We have examined our data in the light of two critical predictions from this model, (1) a characteristic variation of latencies due to conduction delays, and (2) the implication that effects of contour fragments far from the CRF are conditional on the presence of contour fragments connecting the far fragments to the CRF. Our results do not show the predicted delays, but rather a small difference in the opposite direction (Fig. 16). Moreover, contrary to the second prediction, we found that the presentation of Far edges and Far corners strongly modulated the responses in the absence the intervening fragments (Fig. 13). Thus, lateral propagation alone cannot explain the present data.
The grouping cell model by Craft et al. (2007) predicts that all locations on either side of the surround should contribute (Fig. 11C). The grouping cells in this model have annular integration fields of various sizes, leading to distributed sensitivity in the two halves of the surround (Fig. 11D). The present results are consistent with this prediction, except for occasional gaps in the annular pattern of surround influences which can be seen in the examples (Fig. 5) and, more quantitatively, in the population analysis (Fig. 12). In this model, there are no constraints on continuity of contours like in lateral propagation schemes. In principle, each fragment can contribute independently.
The observation that the interactions between fragments are negative on the preferred side and positive on the non-preferred side (Fig. 6B) has a simple explanation in the grouping cell model. It indicates that the grouping cells saturate. The saturation effect can be seen clearly in Fig. 7 which shows how the difference between preferred and non-preferred side responses increases with the number of fragments displayed; however, as more fragments are added, their effects become less and less. The lateral propagation scheme predicts the opposite, because in that scheme neighboring contour segments are cooperative (Fig. 11B). Therefore, interaction would be positive on the preferred side and negative on the non-preferred side.
Because the grouping cells need not reside within V2, the feedback scheme of context integration can be applied. This alleviates the problem of spreading context information quickly over large distances for two reasons: (1) the paths of signal propagation do not increase in proportion to the distances between the neurons in V2, but have a rather fixed length (given by the separation of two cortical areas); (2) the effects of conduction delays are minor, because the signals are conducted by myelinated fibers which are fast. We believe that this is the only way to explain the short latency of the influence of the Far fragments (Fig. 13).
The small increase of the latency of BOS signals with figure size (Fig. 14) does not contradict the grouping cell model, but is rather a natural consequence, as larger figures more often straddle the vertical and horizontal meridians, thus lengthening the path through the white matter.
The lateral propagation and the feed-back grouping schemes are not mutually exclusive. Possibly, the cortex uses both kinds of integration for BOS assignment. The paradoxical finding that the effects of contours close to the CRF arrived slightly later than that of more distant contours could be explained by assuming that the influence of the close features is mediated partly by horizontal fibers, and therefore delayed, whereas the integration of the far features occurs via the fast feedback scheme.
The model (Craft et al., 2007) assumes that grouping cells are selective for co-circular arrangement of edges. Our finding that scrambled figures produced weaker BOS signals than squares is in agreement with this assumption, because squares would fit the circular integration fields better than the scrambled figures. A similar effect was demonstrated psychophysically (Sugihara et al., 2007).
In short, many details of the present results support a model in which contour signals are integrated by neurons at a higher level of the cortex, and BOS selectivity is created by feedback to V2. However, several points still need to be clarified. The implementation of the grouping cell model in Craft et al. (2007) assumes that grouping cells combine edge signals after pairwise multiplication. Thus, figure fragments would have cooperative effects in this model. If tested like the neurons in the present study, it would probably produce the opposite sign of fragment interactions—positive on the preferred side and negative on the other side. Future models will have to be tested with fragmented figures to see if they fit the present data. Also, we found that the surround effects are much stronger when the figure fragments match the preferred contrast polarity of the CRF (Fig. 9), indicating that the grouping mechanism is contrast selective. This selectivity was not implemented in Craft et al. (2007).
Non-classical surround effects have often been interpreted in terms of generating or enhancing feature contrast (Allman et al., 1985; Knierim and Van Essen, 1992; Nothdurft et al., 1999; Shen et al., 2007). The present results, especially the configuration specific context integration and its characteristic spatial asymmetry (Fig. 5), do not fit in this theory. We have argued previously that BOS selectivity reflects a mechanism of contour grouping that also creates a structure for object-based attention (Craft et al., 2007; Qiu et al., 2007). The present results strongly indicate an involvement of cortical areas beyond V2, which would also explain the specific attention effects observed in BOS neurons (Qiu et al., 2007), and the persistence of BOS signals across a blank period that shuts down the activity in V2 (O'Herron and von der Heydt, 2009). The origin of this central influence needs to be identified and further studies are needed to understand the mechanisms of the top-down influence in the visual cortex.
This work was supported by NIH grants EY02966 and EY016281. We thank Anne B. Martin, Stefan Mihalas and Ernst Niebur for critical comments on the manuscript, and Ofelia Garalde for technical assistance.