|Home | About | Journals | Submit | Contact Us | Français|
An amazing feature of our visual system is the ability to detect and track objects in the stream of continually changing retinal images. Theories have proposed that the system creates temporary internal representations that persist across changing images, providing continuity. But how such representations are formed in the brain is not known. Here we examined the time course of the responses of border-ownership selective neurons in visual cortex to displays that portray object continuity. We found that the neurons signal border-ownership immediately when new objects appear, but when a border that has been assigned to one object is reassigned to another object while the first remains in the display, the initial responses persist. The neurons continue to signal the initial assignment despite the presence of contradicting figure-ground cues. We propose that border-ownership selectivity reflects mechanisms that create object continuity.
Despite continual fluctuations of the retinal image we perceive a stable world in which objects have continuity. We can identify objects across a sequence of changing images. We can find objects with the specific features we are looking for, and we can also distinguish and scrutinize objects we have not seen before. To explain these amazing abilities, theories assume that the brain creates temporary internal representations that link the elemental features of an object and persist across changing images to provide the necessary continuity (Kahneman et al., 1992; Pylyshyn and Storm, 1988; Rensink, 2000). But how such representations are formed in the brain is not known.
Neurophysiological studies have shown that the early stages of the visual cortex produce maps of local features, but recent studies related to figure-ground organization have shown that there is also more global processing (e.g., Lamme, 1995; Lee et al., 1998; Roelfsema et al., 1998; Zhou et al., 2000; Zipser et al., 1996). While most neurons in areas V1 and V2 respond to local contrast borders and are orientation selective, about half of the neurons in V2 are also selective for the side on which a border is “owned” by a figure (border ownership, Zhou et al., 2000). The left-hand side of a square, for example, produces high firing rates in neurons of figure-right preference and low firing rates in neurons of figure-left preference. Although these neurons can see only a small segment of border through their classical receptive field, they seem to “know” that this segment is part of the contour of a larger object. They integrate global shape information with various local cues, such as stereoscopic depth and occlusion cues, to infer which side is foreground and which background (Qiu and von der Heydt, 2005; Qiu and von der Heydt, 2007; von der Heydt et al., 2003; Zhang and von der Heydt, 2010; Zhou et al., 2000).
Figure-ground organization has two aspects. It shows that the system uses figure-ground cues to infer the ordering of objects in depth. A display of an L shaped region next to a square region is perceived as one square overlapping another square; a texture region moving through a textured surround is perceived as a moving surface in front of a background surface. In these cases, the system uses figure-ground cues to infer depth, the dimension that is missing in images. But figure-ground organization also gives insight into the way the system defines objects. The visual image is nothing but a large array of retinal cone signals that vary in time. Perception interprets this stream of signals as objects in space. A display can be as simple as a square region of pixels of one color surrounded by pixels of a different color, but perception interprets the square as an object, and the color boundaries as the contours of the object. Thus, the rules of figure-ground perception reveal something about how the system defines objects.
These two aspects are related, but there is an important difference. The determination of depth order involves the evaluation of figure-ground cues, whereas object representation also requires short-term memory, because the system needs to represent object continuity. When a figure disappears and shortly after that another figure appears in a different location, we usually perceive one moving object. Thus, the second figure is not represented as a new object, but merely as a different state of an existing object. Even in more complex displays, the system is able to keep track of objects (Pylyshyn and Storm, 1988).
We have previously found that border ownership signals persist when the edge of a square is replaced by an ambiguous edge (a split circular field) (O’Herron and von der Heydt, 2009). However, from this study it was not clear if the persistence reflects a slow decay of depth order signals or the persistence of an object representation. Depth order signals might decay slowly in the absence of new depth information. Alternatively, when the square is replaced with an ambiguous edge, a representation of the square might persist, and the edge, which coincides with one side of the square, might still be part of the persisting square representation. Our previous experiments also showed that the persisting border ownership signals at the ambiguous edge could be reversed quickly if a new figure was presented on the opposite side of the edge. Thus, the signals persisted in the absence of figure-ground cues, but could be reset immediately by the presentation of new figure-ground information. This suggested that the absence of figure-ground cues is critical for the persistence. However, the experiment did not dissociate the effect of figure-ground cues from that of the hypothetical object representation because the display sequence did not portray object continuity. If such a representation was indeed formed and persisted into the ambiguous edge phase, it might have been overwritten when the new figure was presented.
In the experiments to be described, we pitted figure-ground cues against the hypothetical object representation. This was achieved by presenting two figures, one of which was moved so that the figure-ground cues reversed while the figures remained the same. Thus, the border between the figures was first assigned to figure A, and then to figure B, while both figures remained in the display. The question we asked was: will the responses to the new figure-ground cues be affected by the previous assignment to figure A? The results clearly show that this was the case. It took more than a second until the new figure-ground cues won the border over from the previous ownership.
All animal procedures conformed to National Institutes of Health and USDA guidelines as verified by the Animal Care and Use Committee of the Johns Hopkins University. We studied neurons in two male adult rhesus monkeys (Macaca mulatta). The details of our general methods have been described (O’Herron and von der Heydt, 2009).
The animals were prepared by implanting, under general anesthesia, first three small posts for head fixation, and later two recording chambers (one over each hemisphere). Fixation training was achieved by controlling fluid intake and using small amounts of juice or water to reward steady fixation.
Stimuli were generated with Open Inventor on a Pentium 4 Linux workstation with NVIDIA GeForce 6800 graphics card using the anti-aliasing feature of the software, and were presented on a 21-inch EIZO FlexScan T965 color monitor with 1600×1200 resolution at 72 Hz refresh rate. Stereoscopic pairs were presented side-by-side and superimposed optically at 40 cm viewing distance. The field of view subtended 17 by 26 deg visual angle. A white (93 cd/m2) cross inside a 20 arc min diameter disc of 9 cd/m2 served as fixation point. The color tuning of each neuron was determined with stationary flashing bars, and the minimum response field was mapped with bars and drifting gratings. Orientation and disparity tunings were determined with moving bars. The square figures were typically 4 deg on a side. Occasionally, larger figures were used, so that the figure was at least twice the linear size of the receptive field. The L-figure (Fig. 1A) was created by adding a small square whose sides measured one-fourth the sides of the square. The figures had a texture of small random dots added to their surfaces in order to enhance the effect of motion on figure-ground assignment. The bottom figure moved at an angle 45 degrees to the edge in the receptive field (the preferred orientation of the neuron). The distance of movement was 2/3 of the length of the side of the square and the speed of movement was set by forcing the movement to last 0.5 s.
The direction of gaze was monitored for one eye with an infrared video-based system (Iscan ETL-200) at 60Hz with a spatial resolution of 5120 (H) and 2560 (V). The eyes were imaged through an infrared-reflecting mirror, placing the camera on the axis of fixation. The optical magnification in our system resulted in a resolution of the corneal position signal of 0.08 deg visual angle in the horizontal and 0.16 deg in the vertical. Noise and drifts of the signal of course reduced the accuracy. Behavioral trials began with the presentation of the fixation mark on a blank screen. A test sequence was initiated when gaze was in a predetermined fixation window (1 deg radius) and the first stimulus appeared 300 ms after fixation was detected. The monkey was rewarded for keeping its gaze in the fixation window for a fixed duration of 2.3 or 3.3 s, depending on the experiment. After successful termination of a trial the display was blanked for an interval of 0.5 to 1.2 s. When fixation was broken, the trial was terminated and the following inter-trial interval was increased by 1 s.
Each of the experiments described involved variation of several stimulus parameters. For example, the moving figures experiment represented in Fig. 1A–B involved 4 binary variables: the local contrast polarity, the side of the receptive field on which the static figure was presented, whether the foot of the L was above or below the receptive field, and the two conditions (CUE REVERSAL or CONSISTENT, i.e. Fig. 1A or B). The ONSET condition (Fig. 1C) was presented in a separate test and included two local contrast polarities and two directions of overlap. The FLIP condition (Fig. 1D) was run in another test and included two contrast polarities and both sides of initial figure presentation. Factorial designs were used and all conditions of a test were presented in pseudo-random order in which each condition was presented once before moving on to the next repetition.
Single-neuron activity was recorded extracellularly with epoxy-insulated tungsten microelectrodes inserted through the dura mater. A spike detection system (Alpha Omega MSD 3.22) was used. Spike times, stimulus events, and behavioral events were digitized and recorded by computer.
Cells in area V2 were recorded either in the lunate sulcus after passing through V1 and the white matter, or in the lip of the post-lunate gyrus. The eccentricities of the receptive fields ranged from 0.74 to 6.8 deg (median 2.2 deg). After isolating a cell we first characterized its selectivity for color, bar size, and orientation, and mapped its receptive field using hand and computer controlled stimuli (Zhou et al., 2000). Next, border ownership selectivity was determined by a standard test using the edge of a square, square sizes of 3 and 8 deg, and both contrast polarities (Qiu and von der Heydt, 2005). If a cell was color selective, the preferred color and a 28 cd/m2 gray were used for the two figure colors, otherwise white (93 cd/m2) and gray (28 cd/m2). The background color was the average of the figure colors, except in the FLIP condition (Fig. 1D), in which the preferred color and the gray were used as the figure and background colors. The color of the blank screen shown between trials was also the average of the two colors. If a cell showed border ownership selectivity for single squares, it was then tested for selectivity to the overlapped condition and subjected to the tests of Fig. 1. In these tests, the figure edge length was set to be at least twice the size of the RF. Neurons that showed no border ownership selectivity in the preliminary standard test (ANOVA, p ≥ 0.05; about half of the neurons) were generally not tested further.
A total of 49 cells were tested in our experiments. Of these, 34 showed selectivity for border ownership in the CONSISTENT condition of Fig. 1B (10 from monkey 1 and 24 from monkey 2), as determined by a significant difference in the firing rates in the interval from 0–1.3 s after stimulus onset (ANOVA p<0.05). Of these 34 neurons, 27 were also tested with the FLIP condition in Fig. 1D (6 from monkey 1, 21 from monkey 2).
For the time course plots (Figs. 2B and and4B)4B) we computed an average of the peristimulus time histograms (2 ms bin width) of the single neurons. Only border ownership selective cells (p ≤ 0.05) were included in the average. The resulting averaged firing rates were smoothed with a Gaussian kernel of σ = 20 ms.
To describe the time course of the border ownership signal we calculated fits to the population average (2ms bin width) using multiphase least-squares approximation. For the ONSET condition the fit had two phases, (1) a zero line, (2) a sum of two exponentials with independent time constants, amplitudes, and asymptotes. For CONSISTENT, CUE REVERSAL and FLIP, the fit consisted of three phases: (1) a zero line, (2) a sum of two exponentials with independent time constants, amplitudes, and asymptotes, and (3) a third phase that differed between conditions. For the CUE REVERSAL and CONSISTENT conditions the third phase were exponentials with individual time constants, but the same asymptote, where the amplitudes were constrained so as to have continuity with the second phase. For FLIP, the third phase was a sum of two exponentials with independent time constants, amplitudes and asymptotes, the amplitudes being constrained to achieve continuity with the second phase. The time points of transition between phases were additional free parameters. However, the transition times from phase 2 to phase 3 for conditions CONSISTENT and CUE REVERSAL (for which estimates had a large uncertainty) were assumed to be the same as that for the FLIP condition. The equations of the functions fitted to the data for CONSISTENT and CUE REVERSAL are as follows:
0· phaseone (t) of course equals 0; we include the first term in the equations to show that the first leg is a zero line. We fixed t2 to the value determined by the fit of FLIP, t2 = 96 ms. The fit returned the following parameters:
c1, c2, c3, c4, a, t1,τ 1, τ2, τ3, τ4, s consist, s revers. Parameters c1, c2, c3, c4, τ 1, τ2, τ3, τ4 specify the sums of exponentials that describe the two signals in phase 2. a is the common asymptote for t → ∞, and s consist and s revers are the initial slopes of the functions in phase 3. We used the inverse of the slopes to quantify persistence.
The equation for the FLIP condition was
and the equation for the ONSET condition was
The phase functions are defined as above, replacing t1 and t2 with t5 and t6 for FLIP, and t2 with t9 for ONSET. The fit for FLIP returned the parameters c5, c6, c8, t5, t6, τ5, τ6, τ7, τ8, s flip and the fit for ONSET returned the parameters c10, t9, τ 9, τ 10, s onset. Parameters s flip and s onset are the initial slopes after flip and onset, respectively.
The goal of this study was to see if the persistence of border ownership signals in V2 is an intrinsic property of the figure-ground mechanisms or if it reflects emerging object representations. Considering figure-ground organization as a process in a network of interconnected neurons, it is conceivable that, once the state of the network is set by the input signals, the network remains in that state until new signals arrive at the input that move it into a different state. Thus, if an edge is first assigned to object A, and then a new object B appears and new figure-ground cues indicate that edge should now be assigned to B, the network will switch border ownership to B. In contrast, if the persistence of signals reflects object representations, we expect that an edge that is assigned to one object cannot easily be assigned to another object.
To distinguish between these hypotheses, we created displays in which the figure-ground cues reverse, while the objects remain the same. This was accomplished by displaying two figures, one occluding the other (Fig. 1A left), and then smoothly moving the bottom figure to a new position (Fig. 1A right). The final configuration is typically perceived as a square overlapping a rectangle. The black square that was initially in back now appears in front of the white figure. Thus, after the cessation of movement, the vertical edge in the center changes ownership from left to right. We refer to this condition as CUE REVERSAL (Movie S1).
For this test, the cue integration hypothesis predicts that the border ownership signal should change from “left” to “right” as soon as the figure-ground cues reverse. In contrast, the object representation hypothesis predicts persistence of signals, because the representation for the white L shape persists after the cessation of movement.
We tested this display sequence in neurons of area V2. The edge that underwent the change of border ownership was placed in the receptive field of the neuron under study (Fig. 1A, red ellipse). For comparison, three other conditions were also tested: One was a similar movement display in which the other figure was in back and moved (Fig. 1B). This produced a sequence in which the assignment of the border between the figures does not change (CONSISTENT, Movie S2). The other two conditions were: presentation of the final overlapping figure configuration without a history (ONSET, Fig. 1C, Movie S3), and a figure-flip condition in which the light figure on the left was deleted while a dark figure appeared on the right (FLIP, Fig. 1D, Movie S4).
The comparison of the CONSISTENT and CUE REVERSAL conditions is shown in Fig. 2. Fig. 2A shows the signals of an example neuron and Fig. 2B the population averages. The CONSISTENT display produced a sustained border ownership signal, as expected (blue traces). For the first 500ms, foreground and background are defined by dynamic occlusion as well as geometric cues (shape and T-junctions). After the cessation of movement (time 0), only the geometric cues remain. In both phases the cues indicate ownership-right.
The CUE REVERSAL display is similar, except that the cues in the motion phase indicate ownership-left. Accordingly, the border ownership signal first goes negative (Fig. 2, red traces). After the cessation of movement, the signal remains negative for about 700ms despite the display now being identical to that of the CONSISTENT condition, indicating ownership-right. The two signals slowly approach each other, but do not reach a common level by the end of the fixation period. This shows that the initial border ownership assignment has a long lasting effect despite the presence of new, contradictory figure-ground cues. The results were similar in the two animals (Fig. S1).
The variation between individual neurons is illustrated in Fig. 3 (see Fig. S2 for the results separated by animal). The signals in CUE REVERSAL were averaged over four time bins, as indicated by double arrows on the time axis. Because the firing rates and hence the amplitudes of the border ownership signals varied widely between neurons, the CUE REVERSAL signal of each neuron was normalized by the mean of its signal in the CONSISTENT condition. The Figure shows that most neurons showed persistence of the negative signal, only slowly approaching the mean CONSISTENT level (represented at 1 in the graph because of the normalization). The median border ownership signals of CUE REVERSAL and CONSISTENT conditions were significantly different in each of the time bins (p<0.05, Wilcoxon signed rank test).
The effect of stimulus history can be seen clearly by comparing the CUE REVERSAL with the ONSET condition (Fig. 4, red and black traces): without history, the signal reaches its maximum value in less than 200ms. Note that for the red and the black curve, visual stimulation is identical after time 0.
In the FLIP condition the border ownership cues reverse, as in CUE REVERSAL, but with the difference that the object of the first assignment is removed at the same time. The result was that the signal reversed quickly (Fig. 4, green trace). Although the signal is first negative, as in CUE REVERSAL, the effect of replacing a figure on one side with a new figure on the opposite side is quite different. Note that the new figure in the FLIP condition is identical to the right figure in CUE REVERSAL at the end of movement. Thus, both conditions stimulated the same receptive fields, producing new edge signals at the same time in the same locations. The critical difference is that the figure to which the edge was initially assigned disappears in FLIP, whereas in CUE REVERSAL it continues to be visible.
Some other differences need to be considered too. (1) ONSET of course included the onset of an edge in the receptive field while this edge was turned on 500ms earlier in the other conditions. The edge onset might explain some of the rapid change. Note, however, that FLIP produced a similar rapid change of the signal at time 0 without an edge transient. (2) The background changed in FLIP, but not in CUE REVERSAL. However the background change is unlikely to have a big influence because other experiments have shown that presentations of figures of the type of the Cornsweet illusion produce border ownership signals very similar to those produced by solid figures (Zhang and von der Heydt, 2010). In such a “Cornsweet figure”, the color/luminance varies only along a narrow seam at the contours. Thus, border ownership signals depend mainly on responses evoked by the contours. Moreover, we have shown that a change of background color does not interrupt the persistence of signals at an ambiguous edge (O’Herron and von der Heydt, 2009, Fig. 6). (3) The reason why FLIP produces a larger signal than CUE REVERSAL is that border ownership signals are larger for isolated figures than for borders between overlapping figures (Qiu et al., 2007).
The durations of signal persistence are summarized in Fig. 5. There was a twentyfold difference in persistence between CUE REVERSAL and FLIP: When the owner of the edge disappeared (FLIP condition) the signal changed by 1 Hz in 3 ms, but when it continued to be visible (CUE REVERSAL), the same change took 65 ms. In the ONSET condition, the signal changed equally fast as in the FLIP condition. In the CONSISTENT condition, where occlusion cues continuously pointed in one direction, there was a slow negative signal change, indicating that the signal slowly adapted.
We tested the hypothesis that border ownership signals reflect the persistence of emergent object representations. Such representations would enable tracking of object identity, facilitate selective processing, and provide the continuity needed for bridging cuts in the inflow of visual information caused by saccadic eye movements, blinks, or transient occlusions. We had recently found that border ownership signals persist when a figure display is switched to an ambiguous edge, but can be “reset” by presentation of another figure (O’Herron and von der Heydt, 2009). We suggested that the signals persist because of the absence of figure-ground cues. We now find that, when objects in a display are rearranged so that figure-ground cues indicate a new assignment of an edge that was already assigned otherwise, the signals for the initial assignment persist despite the contradicting cues (Fig. 2, CUE REVERSAL). This finding shows that the critical condition for the persistence is not the lack of figure-ground cues, but a continued representation of the object to which the edge has been assigned.
The argument is based on a comparison of four conditions. Fig. 2 shows the duration of the persistence. One and a half seconds after the displays were made identical, the signals of the CONSISTENT and CUE REVERSAL conditions still had not fully converged. These are the displays that portray object continuity. We know that the slow signal change in CUE REVERSAL is not due to inefficiency of figure-ground cues, because the signals rise quickly when the overlapping figures are presented directly (Fig. 4, ONSET). Also, the presence of a negative signal in itself does not hamper the transition to positive values: when an edge that was assigned to an object on one side is taken over by a new object on the other side, the signals change rapidly (Fig. 4, FLIP).
A schematic comparison of the conditions that produced the different signal transitions is shown in Fig. 6. Boxes mark the possible object locations in the first and second phases of display, A and B stand for the two objects, and arrows indicate the direction of border assignment according to the figure-ground cues. The comparison between ONSET and CUE REVERSAL shows the effect of the stimulus history. The comparison between the CUE REVERSAL and FLIP conditions specifically shows effect of object continuity. The main difference is that in CUE REVERSAL, the object to which the edge is initially assigned remained visible, whereas in FLIP it disappeared. When the object remained visible, the border ownership signal persisted 20-times longer than when the object disappeared.
We propose that the appearance of an object creates a representation in the visual cortex that has persistence. An important constraint is that border ownership signals emerge around 70ms after stimulus onset (see Figs. 2–3). This is before neurons in IT cortex become active (Bullier, 2001) and long before stimulus-triggered attention affects the activity in visual cortex (Motter, 1994; Roelfsema et al., 1998; Super et al., 2001). It means that the system does not use information from long-term object memory for the creation of these representations. A representation might consist in the activation of a node in a network that has connections to the neurons representing the object features: for example, a set of reciprocal connections between edge neurons in V2 and a small number of common “grouping cells”, where the grouping cells sum the edge signals from the figure, and, by feedback, set the gain of the corresponding edge neurons. “Grouping node” might be a suitable term for such a circuit (for a detailed description of such a model see Craft et al. (2007)). This circuit produces border ownership selectivity in the edge neurons. It also enables selective enhancement of the responses of those neurons by top-down attention, which has been demonstrated experimentally (Qiu et al., 2007). To explain the present results we assume (1) that grouping nodes are activated instantaneously by the stimulus, but maintain their activity in the absence of sufficient input; and (2) that there is mutual inhibition between grouping nodes. The persistence of activity in the grouping nodes explains the persistence of border ownership signals in CUE REVERSAL, and the mutual inhibition accounts for the rapid reversal of border ownership signals in the FLIP condition.
Is the continuity of representation a function of attention? The fact that we recorded persisting signals in monkeys that were trained to fixate and not saccade to the figures suggests that attention is not necessary. It is of course possible that the monkeys’ attention was drawn automatically to the figures, enabling persistence. However, experiments with sequential presentation of two figures ruled out this possibility (O’Herron and von der Heydt, 2009). The second figure was turned on 300 ms after the onset of the first figure, and, after another 300 ms, both figures were reduced to ambiguous edges. If attention was drawn automatically to the figures, the onset of the second figure should have drawn attention away from the first figure. However, the persistence of the border ownership signals at the first figure was undiminished.
Persistence of visually evoked responses has generally been found only in experiments where the task required memorization of stimulus information. For example, the presentation of motion- or texture-defined figures produces enhancement of activity in V1 (Lamme, 1995), and this enhancement persists when a briefly presented figure is used as the target in a memory guided saccade task (Super et al., 2001). The modulation persisted while the monkey attended to the location where the figure had been presented, but it decayed immediately when another target was presented to which the monkey had to saccade instead. Thus, in this case, the persistence depended on continued attention to the stimulated location. On the other hand, Lamme et al. (1998) found persistence of figure-ground modulation after brief presentation of motion-defined figures in the fixation paradigm, under conditions that are similar to those of the present experiments.
The question whether visual cortical representations persist on their own, or need attention to be maintained, is of fundamental importance for understanding the function of the visual cortex. Our studies of border ownership signals indicate that temporary object representations (“grouping nodes”) are created automatically (Qiu et al., 2007), and persist without attention (O’Herron and von der Heydt, 2009). However, it is conceivable that, under certain task conditions, a change of attention might terminate the persistence. Clearly, further studies are needed to clarify the influence of attention on the persistence and decay of figure-ground signals.
In conclusion, our results suggest that border ownership signals reflect the cortical representation of object continuity. Presumably, this representation plays a role in maintaining object identity across eye movements and object movements. Preliminary results support this prediction (O’Herron and von der Heydt, 2010).
Figure S1. Comparison of the border ownership signals for CUE REVERSAL (red) and CONSISTENT (blue) display conditions as in Fig. 2, but separately for the neurons of the two monkeys TH and JA. Thin lines, smoothed histograms. Thin black line shows average of preferred and non-preferred side responses of both conditions. Thick lines, combinations of exponentials fitted to the data (see Methods).
Figure S2. Persistence of the border ownership signal in individual neurons as in Fig. 3, but separately for the neurons of the two monkeys TH and JA.
This research was supported by NIH grants EY02966 and EY016281, and ONR grant N000141010278. We wish to thank Ofelia Garalde and Fangtu Qiu for technical assistance and Howard Egeth, Jonathan Flombaum, Anne Martin and Ernst Niebur for comments on drafts of this paper.