|Home | About | Journals | Submit | Contact Us | Français|
While regions of the lateral occipital cortex (LOC) are known to be selective for objects relative to feature-matched controls, it is not known what set of cues or configurations are used to promote this selectivity. Many theories of perceptual organization have emphasized the figure-ground relationship as being especially important in object-level processing. In the present work we studied the role of perceptual organization in eliciting visual evoked potentials from the object selective LOC. To do this, we used two-region stimuli in which the regions were modulated at different temporal frequencies and were comprised of either symmetric or asymmetric arrangements. The asymmetric arrangement produced an unambiguous figure-ground relationship consistent with a smaller figure region surrounded by a larger background, while four different symmetric arrangements resulted in ambiguous figure-ground relationships but still possessed strong kinetic boundaries between the regions. The surrounded figure-ground arrangement evoked greater activity in the LOC relative to first-tier visual areas (V1-V3). Response selectivity in the LOC, however, was not present for the four different types of symmetric stimuli. These results suggest that kinetic texture boundaries alone are not sufficient to trigger selective processing in the LOC, but that the spatial configuration of a figure that is surrounded by a larger background is both necessary and sufficient to selectively activate the LOC.
Objects in the environment create discontinuities that correspond to their physical boundaries and the detection of these discontinuities is a key component of the image segmentation process. An important result of this segmentation processes is the designation of regions as either belonging to the “figure” or “background”. Figure-ground assignment is thought to be driven by global cues that are extracted over large areas of the image such as the relative size of image regions, whether one image region surrounds another, and by various cues for depth ordering (Kanisza, 1979; Palmer, 1992; Rubin, 1915; Vecera & O'Reilly, 1998).
The neural basis of figure-ground assignment has received much attention in recent years (reviewed in Roelfsema, 2006). Neurophysiological studies have shown that cells in early visual areas are sensitive to image discontinuities, with the border ownership relationship first being established in cortical area V2 (Craft, Schutze, Niebur, & von der Heydt, 2007; Zhou, Friedman, & von der Heydt, 2000). Border ownership relationships are extracted within 30-60 ms following the onset of a stimulus (Craft, et al., 2007; Sakai & Nishimura, 2006; Zhaoping, 2005). Senisitivty to global organization appears later in V1 presumably as a result of recursive feedback from high-order visual areas (Lamme, 1995; Scholte, Jolij, Fahrenfort, & Lamme, 2008; Zipser, Lamme, & Schiller, 1996). In humans, a likely source of this feedback is the lateral occipital complex (LOC), an extended region of cortex that has been shown to be highly involved in object-level processing (Grill-Spector, Kourtzi, & Kanwisher, 2001; Grill-Spector, Kushnir, Edelman, Itzchak, & Malach, 1998; Malach, et al., 1995; Vuilleumier, Henson, Driver, & Dolan, 2002).
We have previously studied scene segmentation using simple texture-defined figures and an electrophysiological (EEG) paradigm that allowed us to monitor cortical responses to figure and background regions separately by applying distinct periodic ‘frequency-tags’ to different regions of the scene (Appelbaum & Norcia, 2009; Appelbaum, Wade, Pettet, Vildavski, & Norcia, 2008; Appelbaum, Wade, Vildavski, Pettet, & Norcia, 2006). Using this approach, we found spatially distinct networks responsible for the processing of each region where activity related to the figure region, but not the background region was preferentially routed between first-tier visual areas (V1-V3) and the LOC. A separate network, extending from the first-tier through more dorsal areas responded preferentially to the background region. The figure-related responses were largely invariant with respect to the texture types used to define the figure and did not depend on its spatial location or size. Routing through these two networks was also largely independent of attention.
Here we ask whether an important aspect of perceptual organization, namely “surroundedness” (e.g. a smaller figure on a larger background), is necessary to selectively activate the LOC. For this purpose we compared the magnitude of frequency-tagged activity in the objective selective LOC across several different spatial arrangements. The displays consisted either of one small region surrounded by one large temporally modulated region or two spatially symmetric, temporally modulating regions that were either both small or both large. We asked whether the magnitudes of the tagged region responses in the LOC ROI were sensitive to the spatial configuration of the different image regions when its response was normalized to that recorded in an ROI comprised of first tier (V1, V2 and V3) areas. Using this approach, we found that replacing the classic surrounded figure-ground organization with a symmetric one greatly reduced the specificity of the LOC response to the different image regions The surrounded organization therefore exerts a powerful controlling effect on the routing of information about image regions through first-tier areas and lateral occipital cortex.
Thirteen neurologically intact individuals (mean age 34), with normal or corrected-to-normal visual acuity, participated in this experiment. Two subjects were excluded from the final analysis due to poor signal-to-noise ratio in their EEG data. Informed consent was obtained prior to experimentation under a protocol that was approved by the Institutional Review Board of the California Pacific Medical Center.
In the stimuli used in this experiment, periodic temporal modulation (‘frequency-tagging’) was applied to separate regions of the visual input, which allowed us to distinguish the responses arising from the two simultaneously present regions by spectrum analysis. These stimuli included one spatially asymmetric arrangement consisting of a small figure region surrounded by a large background (Figure 1a) and four spatially symmetric configurations (Figure 1b-1e) in which the two tags modulated spatially equivalent regions of space that contained no size or configuration cues to predetermine the relative figure-ground assignment of the two regions.
The asymmetric, spatially surrounded stimulus used in this experiment was identical to the “phase-defined form” stimulus utilized in our previous experiments (Appelbaum & Norcia, 2009; Appelbaum, et al., 2008; Appelbaum, et al., 2006). This stimulus, shown in Figure 1a, consisted of a 5° diameter circular figure region surrounded by a 20° × 20° background in which the figure was defined by local discontinuities present at the border of horizontally oriented textures. The textures comprising both figure and background regions consisted of one-dimensional random-luminance bars with a minimum bar width of 6 arc min and texture contrast of 80%. In this stimulus the figure region modulated by 180° at 3.0 Hz while the background region modulated 180° at 3.6 Hz. Because the figure region was composed of the same texture as the background region, it either blended seamlessly into the background when both figure and background were in their un-rotated state, or appeared to be segmented from the background when the rotation state of the figure or background differed. Because the orientation of the figure and background regions was always horizontal, the segmentation was defined both by local luminance discontinuities that occurred at the figure/background border and by the temporal structure imposed by the texture modulations.
Four types of spatially symmetric stimuli were used to test the influence of the spatial organization of the inputs on the routing of scene information through visual cortex. Figures 1b and 1c show two types of stimuli comprised of fields of randomly oriented ‘texture doublets’ composed of pairs of equal size grating patches surrounded by a mean gray background. Within each pair, one patch was made to modulate contrast at 3.0Hz and the other at 3.6Hz, causing the two local grating patches to cycle into and out of phase with each other and therefore generating a kinetic border between the two parts of the stimulus that intermittently appeared and disappeared. These stimuli were made to modulate in counter phase (pattern reversal) with a temporal waveform that was either a square wave (Figure 1b) or a sine wave (Figure 1c). On each successive trial a new field of randomly oriented textures was presented.
Figure 1d shows a single symmetric grating pair (texture doublet) composed of two patches each extending 2.5° vertically and 2.5° horizontally from the midline and surrounded with a mean gray background. Within this pair of patches the left grating modulated at 3.0 Hz, and the right grating at 3.6 Hz, according to a counter phase sine wave reversal. This stimulus presents two regions that are of similar size to the figure region in the asymmetric figure-ground stimulus and provides a test of whether small, centrally presented stimuli are sufficient to preferentially activate the LOC.
Figure 1e shows two frames of a full-field square wave modulated square-wave grating stimulus. This stimulus consists of abutting rows of alternating black-and-white vertical bars. The even rows reversed contrast (square-wave) at 3.0 Hz, and the odd rows at 3.6 Hz. As with the texture doublet patches, the two reversal frequencies imposed on this grating stimulus caused the alternating rows to cycle into and out of phase with each other, generating a kinetic border between the two parts of the stimulus that intermittently appeared and disappeared. The figure-ground arrangement of these rows was ambiguous as the regions partitioned by these borders were balanced for size and other configural cues.
Stimulus generation was performed by in-house software, running on the Power Macintosh platform and stimuli were presented on a Sony multi-synch video monitor (GDP-400) at a resolution of 800 × 600 pixels, with a 72 Hz vertical refresh rate. Participants were instructed to fixate a fixation mark at the center of the display and to distribute attention evenly over the entire display. Stimuli were viewed in a dark and quiet room with individual trials lasting 16.7 seconds and conditions randomized within a block. A typical session lasted roughly 45 minutes and consisted of 15 blocks of randomized trials in which the observer paced the presentation and was given opportunity to rest between blocks.
The procedures for this experiment (EEG signal acquisition, head conductivity modeling, source estimation, visual areas definition, region-of-interest quantification, and statistical analysis) are similar to those utilized in our previous studies. In the interest of brevity, we provide an overview of these methods, and reference the reader to our previous work (Ales & Norcia, 2009; Appelbaum, et al., 2008; Appelbaum, et al., 2006) for a more a more detailed description.
As in our previous experiments, the electroencephalogram (EEG) was collected with 128-sensor HydroCell Sensor Nets (Electrical Geodesics, Eugene OR) and was band-pass filtered from 0.1 to 200 Hz. Following each experimental session, the 3D locations of all electrodes and three major fiducials (nasion, left and right peri-auricular points) were digitized using a 3Space Fastrack 3-D digitizer (Polhemus, Colchester, VT). For all observers, the 3D digitized locations were used to co-register the electrodes to their T1-weighted anatomical MRI scans.
Raw data were evaluated off line according to a sample-by-sample thresholding procedure to remove noisy sensors that were replaced by the average of the six nearest spatial neighbors. Once noisy sensors were substituted, the EEG was re-referenced to the common average of all the sensors. Additionally, EEG epochs that contained a large percentage of data samples exceeding threshold (25-50 micro volts) were excluded on a sensor-by-sensor basis. Preliminary analysis of the statistical significance of individual observer data was conducted using the Tcirc2 methods described by (Victor & Mast, 1991).
Structural and functional MRI scanning was conducted at 3T (Siemens Tim Trio, Erlangen, Germany) using a 12-channel head coil. We acquired a T1-weighted MRI dataset (3-D MP-RAGE sequence, 0.8 × 0.8 × 0.8 mm3 and a 3-D T2-weighted dataset (SE sequence at 1 × 1 × 1 mm3 resolution) for tissue segmentation and registration with the functional scans. For fMRI, we employed a single-shot, gradient-echo EPI sequence (TR/TE = 2000/28 ms, flip angle 80, 126 volumes per run) with a voxel size of 1.7 × 1.7 × 2 mm3 (128 × 128 acquisition matrix, 220 mm FOV, bandwidth 1860 Hz/pixel, echo spacing 0.71 ms). We acquired 30 slices without gaps, positioned in the transverse-to-coronal plane approximately parallel to the corpus callosum and covering the whole cerebrum. Once per session, a 2-D SE T1-weighted volume was acquired with the same slice specifications as the functional series in order to facilitate registration of the fMRI data to the anatomical scan.
The FreeSurfer software package (http://surfer.nmr.mgh.harvard.edu) was used to perform gray and white matter segmentation and a mid-gray cortical surface extraction. This cortical surface had 20,484 isotropically spaced vertices and was used both as a source constraint and for defining the visual areas. The FreeSurfer package extracts both gray/white and gray/cerebrospinal fluid (CSF) boundaries, but these surfaces can have different surface orientations. In particular, the gray/white boundary has sharp gyri (the curvature changes rapidly) and smooth sulci (slowly changing surface curvature), while the gray/CSF boundary is the inverse, with smooth gyri and sharp sulci. In order to avoid these discontinuities, we generated a surface partway between these two boundaries that has gyri and sulci with approximately equal curvature.
Individual Boundary Element Method (BEM) conductivity models were derived from the T1 and T2 weighted MRI scans of each observer. The FSL toolbox (http://www.fmrib.ox.ac.uk/fsl/) was also used to segment contiguous volume regions for the scalp, outer skull, and inner skull and to convert these MRI volumes into inner skull, outer skull, and scalp surfaces (Smith, 2002; Smith, et al., 2004).
Rotating wedge stimuli were used to map polar angle sensitivity and expanding and contracting ring stimuli were used to map retinal eccentricity up to 3.5°. Complete cycles lasted 24 sec and a total of 10 cycles in each of 3 scans were collected in each participant. Fourier analysis was used to extract the magnitude and phase of the BOLD signal, which was visualized on a flattened representation of the cortical surface. Retinotopic field mapping produced regions-of-interest (ROIs) defined for each participant's visual cortical areas V1, V2v, V2d, V3v, V3d in each hemisphere (Engel, Glover, & Wandell, 1997). A contrast between scrambled versus intact objects (block design 12 sec intact/12 sec scrambled; 10 cycles (240 sec per scan, 2 to 3 scans) was used to define the LOC. The stimuli of (Kourtzi & Kanwisher, 2000) were used. These stimuli result in an activation that extends onto both the lateral and ventral surfaces (Vinberg & Grill-Spector, 2008). Only the portion lying on the lateral surface, posterior and adjacent to hMT+ was included in our definition. Activations in ventral areas were more variable and sources in these areas are less visible in the EEG due to their greater depth. Representative functional ROIs are shown in Figure 2 for all subjects.
An L2 minimum norm inverse was computed with sources constrained to the location and orientation of the cortical surface. In addition, we modified the source covariance matrix in two ways to decrease the tendency of the minimum norm procedure to place sources outside of the visual areas. These constraints involved; 1) increasing the variance allowed within the visual areas by a factor of two relative to other vertices, and 2) enforcement of a local smoothness constraint within an area using the first- and second-order neighborhoods on the mesh with a weighting function equal to 0.5 for the first-order and 0.25 for the second-order relationships. The smoothness constraint therefore respects areal boundaries unlike other smoothing methods such as LORETA that apply the same smoothing rule throughout cortex.
A Discrete Fourier Transform was used to estimate the response magnitude associated with each region tag within each functionally defined ROI. An estimate of the average response magnitude for each ROI was computed by coherently averaging across all nodes within that ROI. This averaging was performed on the complex Fourier components and therefore preserved phase information. Averaging across hemispheres and observers was then performed for each individual frequency component again maintaining the complex phase of the response. Finally, for each ROI, the total amplitude of the 2nd, 4th, and 8th harmonics was computed for each region tag by summing the powers of the individual harmonics and then taking the square root ; .
To test for differences in region selectivity due to stimulus configuration across ROIs, response magnitude ratios were computed as the mean activity occurring in the first-tier ROIs (Tier 1; V1v, V1d, V2v, V2d, V3v, V3d, averaged over the left and right hemispheres), divided by the mean activity in the left and right LOC ROIs. Two-way, within-subject, repeated measures analysis of variance (rANOVA) was used to determine significant main effects of, and interactions between, stimulus arrangement (5 levels) and harmonic components (nf1 or nf2) for the Tier1-to-LOC ratio. Paired t-tests were performed between components for each stimulus condition to further probe effects of stimulus arrangement.
SSVEP activity in all subjects included in the analysis showed statistically significant (T2circ <0.05) narrow-band peaks at harmonics of the input stimulus frequencies. Figure 3 shows the temporal frequency amplitude spectrum for the asymmetric surrounded figure-ground stimulus derived from the grand average source-space reconstructions for the LOC and Tier 1 ROIs. In this figure, significant (tcirc p<.05) responses of the first input (f1) are shown as solid lines, and harmonics of the second input (f2) are shown as dashed lines. Significant responses are only present at the even harmonics of the region tag frequencies (2f1, 2f2, 4f1, 4f2, etc.), with the amplitude of these responses (and the background EEG noise) decreasing with increasing frequency. Because the image updating procedure produces two temporal transients for each cycle, it is not surprising that a large response would be evoked at the even harmonic frequencies, and the lack of odd-harmonic activity indicates that each stimulus transition evoked similar responses in the population. In absolute terms, the background responses are larger than the figure responses. These responses are generated by a stimulus area that is about 6-times larger than the figure region for these asymmetric stimuli. Nonetheless, the figure region evokes a larger response in the LOC than in the first-tier ROIs (black lines), while the background region responses are roughly the same in the two ROIs (dashed lines) replicating our previous results (Appelbaum, et al., 2008; Appelbaum, et al., 2006).
In our previous work, we found this response selectivity, defined as the ratio of figure activity in LOC relative to the first-tier was greater than 1 for several manipulations including the texture cue (orientation, phase and temporal) and spatial extent (size and eccentricity) of the display. As in our previous work, the asymmetric figure-ground arrangement here shows a high-degree of selectivity for the figure region in the LOC (Figure 4). The ratio for figure activity (nf1) in LOC ROI versus first-tier areas was 2.14, while this ratio was 1.1 for the background (nf2), consistent with the idea that the LOC selectively processes the figure, but not the background, region. The ratio of LOC to Tier1 response was lower for all four types of symmetric stimuli. For each of these symmetric stimuli, activity in the LOC was roughly comparable to activity in first-tier visual areas, producing ratios close to 1 in all conditions for both stimulus tags. Paired t-tests revealed that all symmetric region response ratios for nF1 differed significantly from the asymmetric figure response (p<.05) while none differed significantly from the asymmetric background response at nF2 (p>.17).
Repeated measures ANOVA performed on the ROI response ratios revealed a significant condition by component interaction [F(4,40)=4.61, p=0.038]. Subsequent planned comparisons of the differences between the LOC/Tier 1 ratios for the two tags over the five configurations revealed that this interaction was driven by the preference of the LOC for the conventional figure-ground display configuration (see Table 1).
Through the use of a frequency-tagged SSVEP paradigm, we find that the classic figure-ground arrangement of a smaller figure on a larger background is both necessary and sufficient to activate at least portions of lateral occipital cortex that have been previously associated with object-level and category specific processing. This temporal tagging procedure allowed us to separate and measure the evoked response driven by the different regions of our test images and to determine the relative preference of different cortical areas for the different tagged regions and configurations. By using texture-defined forms, we eliminated many confounding cues to figure-ground assignment that are present in real objects. Nonetheless, texture-defined forms contain a rich set of cues that can be precisely controlled in order to evaluate the effects of shape and configuration on figure-ground assignment and access to LOC.
In traditional figure-ground experiments, including our previous studies, the stable assignment of the figure-ground relationship is typically obtained with the figure being smaller than the background and surrounded by it. The present data directly show, via the tagged response profiles, that both the relative size and surrounded structure relationships are important controlling factors in determining access to the “object-specific” LOC. Each stimulus that was composed of symmetric regions of texture did not preferentially activate the LOC. These stimuli, by definition, lack a relative size difference between regions. Each of the symmetric stimuli contain texture boundaries, but these alone were not sufficient to trigger object processing in the LOC. Further, none of our symmetric stimuli had a clear sense of surroundedness. We thus conclude that surroundedness and a relative size-difference, combined with spatio-temporal discontinuity cues are both necessary, and sufficient, to cause a stimulus region to be preferentially processed through both first-tier (V1-V3) and LOC ROIs.
Data from the symmetric pair condition indicates that a small region, presented in the central visual field is not sufficient to produce differential activity in the LOC ROI. The ratio of activity between the LOC and Tier 1 ROIs was less than 1 in this condition instead of being about a factor of 2, as it is for the asymmetric figure-ground configuration. This control is important because the central field representation of the areas comprising Tier 1 lies at the occipital pole, relatively close to the LOC ROI. If the resolution of the inverse was insufficient, a preference for a small, centrally fixated figure could be confused with a foveal Tier 1 response. The fact that the central figure region response depends strongly on configuration rules out this explanation. We had previously discounted this explanation in a control experiment where we varied the retinal eccentricity of the figure region. We found that the figure region response was largely unaffected by this manipulation, something that would be expected from the weakly retinotopic LOC versus the strongly retinotopic Tier 1 areas (see Figure 7 in Appelbaum, et al., 2006). The fact that neither member of the symmetric pair elicits a preferential response suggests that neither region is being treated as a figure, despite the strong asymmetry between these two small regions and the larger static background. As shown in our previous work, however, a single modulating patch of comparable size presented on a gray background does produce a preferential response in the LOC ROI. This suggests a rather high degree of selectivity for which configurations are treated as unitary objects by the LOC. It thus appears that a relative size differential between regions with unitary temporal characteristics is needed to activate the LOC.
The presence of spatiotemporally distinctive regions, separated by kinetic boundaries is also not sufficient to selectively activate the LOC. This result is consistent with a previous fMRI study that found that motion-defined gratings activate a focal region of dorso-lateral occipital cortex that is distinct from regions that are preferentially activated by motion-defined objects (Grill-Spector, et al., 1998). Motion-defined gratings, such as those originally used to define the Kinetic Occipital region (Dupont, et al., 1997) portray structure through regional differences in motion direction and have an ambiguous figure-ground relationship between the different image regions. Our gratings had a similarly ambiguous global structure, but here the borders were created by differences in temporal frequency rather than direction of motion. Our results are also consistent with those of Vinberg & Grill-Spector (2008) who argued on the basis of fMRI results that the LOC processes shapes, not edges or surfaces. Through the use of temporal tagging, we are able to separately analyze the fate of both the nominal figure region, as well as the background. These contributions could in principal be studied with fMRI in retinotopic areas where the figure and background regions have a spatially distinct representation, but not in weakly or non-retinotopic areas, such as the LOC. Vinberg and Grill-Spector (2008) and Haushofer et al. (2008) have also found that the LOC responded more strongly to objects than to holes with the same shape. Our results are not sufficient to determine whether the mechanisms we have identified in the SSVEP also make this distinction.
In summary, our results indicate that the Gestalt cue of surroundedness is both necessary and sufficient to selectively activate portions of lateral occipital cortex previously associated with object-level processing.
Supported by INRSA EY14536, EY06579 and the Pacific Vision Foundation. The authors wish to thank Spero Nicholas and Nico Boehler for their helpful comments on this manuscript.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.