|Home | About | Journals | Submit | Contact Us | Français|
In everyday environments, objects frequently go out of sight as they move and our view of them becomes obstructed by nearer objects, yet we perceive these objects as continuous and enduring entities. Here, we used functional MRI with an attentive tracking paradigm to clarify the nature of perceptual and cognitive mechanisms subserving this ability to fill in the gaps in perception of dynamic object occlusion. Imaging data revealed distinct regions of cortex showing increased activity during periods of occlusion relative to full visibility. These regions may support active maintenance of a representation of the target’s spatiotemporal properties ensuring that the object is perceived as a persisting entity when occluded. Our findings may shed light on the neural substrates involved in object tracking that give rise to the phenomenon of object permanence.
When viewing a moving object, information for the object’s existence and speed are available directly, and the time the object will arrive at a particular location can be estimated by extrapolating its trajectory (McBeath, Shaffer, & Kaiser, 1995). These estimates can be updated continually based on visible information. When tracking a moving object that becomes temporarily hidden, however, its persistence, speed, and arrival time must be inferred on the basis of information that is necessarily based on a mental representation of the object’s continued existence and its trajectory. In the real world objects move in and out of our view, and parts of objects are often hidden by surfaces of the same object or by other nearby objects. Yet, seemingly without any effort, the visual system fills in the gaps and our perception of static and moving objects remains uninterrupted despite occlusion (Michotte, Thinès, & Crabbé, 1964; Nakayama, He & Shimojo, 1995), even up to four targets tracked simultaneously (Pylyshyn & Storm, 1988; Scholl & Pylyshyn, 1999).
We reasoned that we would be able to isolate the neural correlates of dynamic object occlusion by comparing cortical activity as observers viewed a dynamic occlusion stimulus (i.e., a moving object that becomes temporarily hidden) to activity when viewing a moving, fully visible object. Participants in our task were asked to estimate the arrival time of a moving object at a prespecified location in the display as cortical activity was recorded using magnetic resonance imaging. We considered the possibility, in addition, that estimates of arrival time of a concealed object might rely on a “time-keeping” strategy, rather than a mental representation of object persistence and speed. To distinguish a time-keeping strategy from true object tracking, we compared activation patterns as observers viewed a dynamic occlusion display to activity when viewing a stimulus in which a moving object went out of sight and back into view via shrinking and expansion. This means of disappearance/reappearance has been shown to disrupt perception of object persistence when observers track multiple targets in occlusion displays (Scholl & Pylyshyn, 1999). Perception of persistence is maintained by accretion and deletion of objects by an occluding or virtual surface, a means of disappearance/reappearance that is thought to have greater “ecological validity” (Gibson, 1979).
We used functional magnetic resonance imaging (fMRI) to measure brain activity as observers maintained central fixation and covertly tracked a visual target translating continuously and repetitively on a constant linear trajectory in three types of trial (Unoccluded, Occluded and Shrinking; see Figure 1). Observers’ estimates of target arrival time at a specified location in the stimulus, indicated by a vertical dotted line, were recorded with a button press task. In the Unoccluded condition, we hypothesized that performance would be based on information available directly in the stimulus. In the Occluded condition, we hypothesized that observers would rely on an internal representation of target persistence and trajectory when estimating arrival time. The Shrinking condition, in contrast, was not expected to involve a stable representation of a persisting object, and we hypothesized that observers would employ a non-object based strategy, such as time-keeping, to perform accurately on the task. We predicted that observers would employ different cognitive strategies in each of the conditions, which in turn would yield differences in networks of cortical activation across the three conditions and provide evidence for the computations that subserve perception of dynamic object occlusion.
Data from ten participants were included in the final analysis of behavioral and brain imaging data (5 females, 5 males). Four participants were observed but excluded from the analyses due to excessive motion artifact (1 female, 1 male), failure to understand and correctly perform the task (1 female), or failure to complete the scanning session due to fatigue (1 male). Prior to scanning, all participants performed at least two practice sessions on the behavioral task. The first training session occurred several days prior to scanning and the second training session occurred on the day of the scan session. During training, the experimenter instructed participants to maintain central fixation while covertly tracking the moving target and pressing a button to estimate its arrival time. Participants were instructed not to make eye movements during the task. The experimenter verified compliance by observing them during all practice trials. Voluntary written consent was obtained prior to testing. All aspects of the research were in compliance with safety guidelines for MR research conducted at the Center for Brain Imaging as well as the human subjects (IRB) committee at New York University.
Stimuli consisted of three types of visual display depicting a green spherical target (1º visual angle) moving horizontally across a gap (2.5º visual angle) defined by vertical gray dotted lines (Figure 1). In Unoccluded trials, the target remained visible throughout the trial (Figure 1, left). In Occluded trials, an invisible occluder between the dotted lines temporarily concealed the target in the center of its trajectory in an ecologically valid manner, via accretion and deletion of its visible surface (Figure 1, center). In Shrinking trials, the target imploded at the first set of dotted lines, and expanding on the second, an ecologically invalid means of going out of and back into view (Figure 1, right). The target moved at 2º/s, taking 6 s to move from left to right, and 6 s to move back. All stimuli were created using Flash (Macromedia Studio MX 2004) and exported as QuickTime movies.
A run consisted of eight repetitions of each of the three trial types. Each run began with a 4-s central fixation and lasted 5 min 40 s. Within each run, stimuli were presented in a predetermined pseudorandom sequence and were counterbalanced for left-right direction of movement. Each stimulus was preceded by a 2-s interstimulus interval consisting of a centrally located fixation cross (.25º visual angle) against a black background.
Reaction times were recorded with a MR compatible fiber-optic button press instrument inside the scanning module. Individuals were instructed to press down when the target’s leading edge reached the first dotted line and to release when the target’s leading edge reached the second dotted line. In Occluded trials, therefore, observers judged the target’s impending re-emergence from behind the occluding surface, and in Shrinking trials, observers judged when the target would begin to expand. All participants completed one practice run immediately prior to scanning and three experimental runs during the scan session, which were included in the final analysis of behavioral data. Each participant contributed at least 65 data points (a minimum of 45% of all possible observations).
Brain activity was measured with fMRI as observers participated in the behavioral task. MR scanning was performed in a 3-Tesla Siemens head-only research scanner equipped with a Siemens 3T Allegra whole-brain surface head coil. All testing was conducted in one session that consisted of structural anatomical scans and blood oxygenation level dependent (BOLD) functional scans for each subject. Each scanning session began by acquiring a set of low-resolution images in the sagittal, axial and coronal planes that were used for slice selection. A set of structural images was acquired using a T1-weighted spin echo pulse sequence (32 slices, 3 mm, TR = 600 ms, TE = 9.1 ms, total duration = 3 min 18 sec). Then a set of 3D high-resolution magnetization prepared rapid acquisition gradient echo (MPRAGE) images was acquired (176 slices, 1 mm, TR = 2500 ms, TE = 4.38 ms, total duration = 10 min 42 s). A series of functional scans were performed using a standard GR-EPI BOLD pulse sequence using the same slice orientation prescription as the T1-weighted structural scan (32 slices, 3 mm, TR = 2000 ms, TE = 30 ms, flip angle = 800, total duration = 5 min 40 s). The functional data were coregistered to the T1-weighted inplane anatomical images, which were then coregistered to the high-resolution images for ensuing data analysis.
Analysis of imaging data was carried out using FEAT (fMRI Expert Analysis Tool) Version 5.1, part of FSL (Smith et al., 2004; http://www.fmrib.ox.ac.uk/fsl) and MRIcro (Rorden & Brett, 2000). Caret was used for three-dimensional image rendering and optimal visualization of statistical parametric maps of activation (Van Essen, 2002; Van Essen, Dickson, Harwell, Hanlon, Anderson & Drury, 2001; http://brainmap.wustl.edu/caret). The model used a gamma function for convolving the hemodynamic response (phase = 0, standard deviation = 3 s, lag = 6 s). Preprocessing procedures include stripping the anatomical images of the non-brain structures (BET), motion correction (MCFLIRT), and temporal high pass filtering (42 s cutoff). Statistical images were thresholded using clusters determined by a Gaussian Z > 3.0 and a corrected cluster significance threshold of p > 0.001. Statistical maps of activation differences were plotted in contrast comparisons between each of the three conditions (Unoccluded, Occluded, and Shrinking). Functional imaging data from each individual were co-registered to his/her own anatomical images (initial T1-weighted and 3D high resolution structural images), which were then were co-registered to the MNI standard brain template for a group analysis.
Data consisted of latency differences between the observers’ judgments of target arrival versus the actual time of target arrival. Results of a one-way ANOVA yielded a reliable effect of trial type on differences in accuracy judgments of the target’s arrival in each of the three conditions [Mean (SD): Unoccluded = 75.5 ms (111.5), Occluded = −53.6 ms (81.2), Shrinking = −3.2 ms (88.8)], F(2, 18) = 12.89, p < .001]. Simple effects tests comparing pairs of trial types revealed reliable differences in all three contrasts (Unoccluded vs. Occluded, p < .01; Unoccluded vs. Shrinking, p < .01; Unoccluded vs. Shrinking, p < .05). Participants were highly engaged in tracking the stimulus in all three conditions, and accuracy was high (i.e., within 80 ms on average of the correct response). Nevertheless, there were reliable differences in mean latency of response between conditions, which supports our prediction that the three conditions evoked different cognitive operations or strategies for object tracking.
Group analysis of imaging data revealed a network of regions differentially involved in the three types of object tracking conditions. Unless noted otherwise, all cortical activations reported were bilateral, although there were generally more clusters of activation in the right hemisphere. Minimum cluster size is 5 voxels for reported regions of interest. Contrast comparisons between Unoccluded, Occluded and Shrinking conditions yielded increased neural responses in all 10 participants in a network of cortical regions (Z > 3, p < .001). Cortical activation differences resulting from these contrast comparisons are depicted in Figures 2–4 and are listed in detail with XYZ coordinates in MNI space in Appendix 1.
The contrast of Occluded > Unoccluded (Figure 2, blue tones) was designed to isolate cognitive processes involved in tracking a target through space and time during periods of invisibility relative to full visibility. This contrast yielded activation differences in the inferior temporal cortex (lateral/superior region of fusiform and lingual gyri, BA 27/37), posterior and anterior regions of middle and superior temporal cortex (BA 21/22/38/48), insula (BA 48), cuneus (BA 18), inferior parietal lobule (BA39/40), midbrain regions (hippocampus, thalamus, caudate and putamen), the cerebellum, the precentral sulcus (BA6), and several regions of prefrontal cortex (anterior dorsolateral prefrontal regions, BA 10/46, ventrolateral prefrontal regions, BA 11/44/45/47, and along the medial wall in the superior frontal cortex in pre-SMA and anterior cingulate, BA 6/8/32). The most robust activation differences were found in four areas: 1) inferior parietal lobule, 2) superior temporal sulcus, 3) pre-SMA, and 4) precentral sulcus. Conversely, a contrast of Unoccluded > Occluded (Figure 2, orange-yellow tones) was designed to isolate processes involved in tracking a target through space and time during periods of full visibility relative to invisibility. This contrast yielded activation differences in inferior and mid-occipital regions of extrastriate cortex (BA 18/19), inferior temporal cortex (medial/ventral region of fusiform gyrus, BA 37), superior parietal lobe near the posterior IPS (BA 5/7) and right superior frontal sulcus (BA 6).
A contrast comparison of Unoccluded > Shrinking (Figure 3, orange-yellow tones) was designed to isolate processes involved in tracking a fully visible target over a period of time relative to tracking a target undergoing an ecologically invalid means of disappearing and reappearing. Results of this contrast yielded activation differences in inferior and mid-occipital regions of extrastriate cortex (BA 18/19), inferior temporal cortex (fusiform gyrus, BA 37), hippocampus and parahippocampal cortex, middle and superior temporal cortex (BA 21/22/48, left BA 38), superior parietal lobe (BA 5/7) as well as in superior and mid-orbital prefrontal regions (BA 6, BA 9, BA 9/46/45, BA 10, left BA 9/32). On the other hand, a contrast of Shrinking > Unoccluded (Figure 3, green tones) was designed to isolate processes involved in tracking a moving target undergoing an ecologically invalid means of disappearing and reappearing relative to tracking a fully visible target over a period of time. Results of this contrast yielded activation differences in the cuneus (BA 18), lingual gyrus (BA 37), medial frontal cortex in pre-SMA and anterior cingulate (BA 6/8/32) and right inferior frontal gyrus (BA 45/47).
The contrast comparison of Occluded > Shrinking (Figure 4, blue tones) was designed to isolate cognitive processes involved in tracking a moving object through space and time during periods of invisibility relative to tracking a moving target undergoing an ecologically invalid means of disappearance/reappearance. Results of this comparison yielded activation differences in superior and mid-occipital cortex (BA 18/19) including the cuneus (medial BA 18), superior parietal lobule (BA 5/7) including the precuneus (BA 5), inferior temporal cortex (BA 37), middle and superior temporal cortex (BA 21/22/48), insula (BA 48), midbrain regions including the hippocampus and parahippocampus (BA 27/30/37), thalamus, caudate and putamen, as well as in frontal regions of cortex in the pre- and post central sulcus (BA 4/6), superior frontal cortex (BA 9), dorsolateral prefrontal cortex (BA 45/46) and right mid-orbitofrontal cortex (BA 10/11/47). Interestingly, the contrast of Shrinking > Occluded (Figure 4, green tones) yielded no voxels showing greater activation in Shrinking relative to Occluded trials.
Maintenance of object representations across temporary gaps in space and time might comprise a combination of lower and higher perceptual and cognitive mechanisms. We used an object tracking task and functional neuroimaging to begin to clarify these mechanisms, in part by distinguishing them from mechanisms involved in tracking fully visible objects and from mechanisms involved in estimating a simple temporal gap. Results from our behavioral task demonstrated that observers’ were highly engaged in the tracking task, and judgments of target arrival time were reliably different across the three conditions suggesting different strategies may have been employed. When the target was fully visible, judgments of target arrival time tended to be late, implying a strategy in which observers tracked the visible motion and initiated a button press as the target reached the appropriate location. When the target was occluded, judgments of target arrival time tended to be early, suggesting that observers anticipated its arrival, and did not wait to press when it reappeared, which would have led to a longer latency. This suggests that observers maintained a representation of the spatiotemporal information during occlusion and interpolated the invisible motion of the target. When the target shrank and expanded, judgments of target arrival time were close to the actual time of reappearance. Our intuitions of a time-keeping strategy were supported anecdotally: Several observers reported that they employed a “counting” strategy in which they simply tried to time the reemergence from learning the timing and counting to themselves after the target imploded. These reaction time differences suggest that the three tasks evoked a different cognitive state, mental operation, or strategy to perform the different tasks as accurately as possible.
The Unoccluded > Occluded and Unoccluded > Shrinking contrasts yielded increased activation in extrastriate (occipital, inferior temporal and posterior parietal) visual cortical areas. This is not surprising given that in unoccluded trials observers tracked a continuously visible target object moving in a constant trajectory. Increased activation in these regions of extrastriate cortex has been previously reported during conditions of attentive tracking relative to passive viewing of multiple moving objects (Culham, Brandt, Cavanagh, Kanwisher, Dale & Tootell, 1998). The Occluded > Unoccluded contrast yielded increased activation in precentral sulcus, inferior parietal lobule, temporal cortex, and prefrontal cortical regions along the dorsal medial wall. Notably, almost identical foci in the precentral sulcus and inferior parietal lobule show an attentive tracking load effect (i.e., activation increases with the number of moving items that are simultaneously tracked) (Culham 2001, neuron). Moreover, these same areas also activate during visual working memory (Courtney et al., 1997) and sustained visual attention (Serences & Yantis, in press cereb cortex). The Occluded > Shrinking contrast yielded increased bilateral activation in lateral occipital cortex (LOC) that may have been evoked by the continued representation of the object. Consistent with this interpretation, several other neuroimaging studies also report increased neural activity in LOC during tasks of form perception and object recognition (Lerner, Hendler & Malach, 2002; Malach, Reppas, Benson, Kwong, Jiang, Kennedy, Ledden, Brady, Rosen & Tootell, 1995), illusory contours (Ffytche & Zeki, 1996; Hirsch, DeLaPaz, Relkin, Victor, Kim, Li, Borden, Rubin & Shapley, 1995), and perceptual completion of static surfaces (Mendola, Dale, Fischl, Liu & Tootell, 1999; Stanley & Rubin, 2003). In all comparisons against Unoccluded, we found increased activation in the pre-SMA. Past studies have found increased activation in this area when and internal representation of time must be maintained (e.g., estimating a length of time or the timing of a motor response)
Maintaining active representations of objects through occlusion is likely accomplished by a combination of mechanisms such as perceptual completion (Nakayama, He & Shimojo, 1995), selective attention (Awh, Jonides & Reuter-Lorenz, 1998; Scholl, 2001), and visual working memory (Pasternak & Greenlee, 2005). Moreover, mechanisms supporting inferred motion and trajectory extrapolation (Assad & Maunsell, 1995; Barborica & Ferrera, 2003) and preparatory oculomotor behaviors (Curtis, 2006; Curtis & D’Esposito, 2006a; Curtis, Rao & D’Esposito, 2004) constantly update the visual system with the moving target’s changing location in space. These mechanisms may work in concert to maintain a representation of the moving object in space and time despite perceptual interference such as occlusion.
Selective attention may serve as a crucial higher order mechanism that facilitates representations of dynamic object occlusion in visual working memory. Increased responses in posterior parietal cortex have been associated with maintaining the locus of visuospatial attention in working memory (Todd & Marois, 2004), and the overall magnitude of posterior parietal activation may be an indicator of observers’ visual working memory capacity (Todd & Marois, 2005). In addition, several frontal, posterior parietal, and temporal cortical areas show evidence of persistent activity during delay periods when observers maintain a representation of an object or its position in working memory (Curtis & D’Esposito, 2006b). Therefore, the activations associated with attentive tracking through occlusion that we report here may reflect some of the same mechanisms that support maintenance of an object’s spatiotemporal information in visual working memory.
A recent study compared human parietal cortex activation during an occlusion task, in which the object simply blinked out of existence (Olson, Gatenby, Leung, Skudlarski & Gore, 2003). The authors found that a portion of the posterior parietal cortex, bilaterally, showed a greater response during occlusion. We report here that what appears to be the same portion of the parietal cortex showed greater activation during occlusion than shrinking. This activation may reflect the activity of neurons in posterior parietal cortex that are motion sensitive and increase their rates of firing during a period when a moving object is briefly occluded (Assad & Maunsell, 1995). Therefore, the posterior parietal cortex may be involved in processing spatiotemporal properties, including the inferred motion, of objects. Moreover, we extend the Olson et al. results by demonstrating that the posterior parietal cortex is only one part of a larger neural network that additionally includes LOC, superior temporal, superior frontal, and premotor cortices, that together are involved in tracking objects through occlusion. Further studies are necessary to tease apart the relative contributions of these areas in object tracking. [This is a good place to put a sentence or two about what might be special about maintaining “spatiotemporal” representations above spatial + temporal representations.
These data provide evidence of separate mechanisms involved in maintaining an object representation during covert tracking under conditions of full visibility versus two kinds of temporary concealment. The evidence points to different networks of cortical regions supporting the cognitive mechanisms (i.e., form and motion perception, spatial attention, and visual working memory) that we propose to be involved in object tracking. More importantly, we used functional neuroimaging techniques to isolate the cognitive operations supporting the active maintenance of an object’s spatiotemporal representation and ultimately the mental state of tracking an object continuously through occlusion.
We thank the MRI Users Group at NYU for helpful suggestions during earlier stages of this research, and Dorothy Schirkofsky for programming the original experiment. SMS thanks David Heeger and Souheil Inati for discussions of experimental design and data analysis. This research was supported in part by federal funds from NIH (R01-HD40432 and R01-HD048733) and NSF (BCS-0418103) to SPJ and from The Beatrice and Samuel A. Seaver Foundation administered by the Center for Brain Imaging at New York University to SMS.
The data reported in this experiment have been deposited in the fMRI Data Center (http://www.fmridc.org). The accession number is _____.