|Home | About | Journals | Submit | Contact Us | Français|
We examined the role of visual attention in the multiple object tracking (MOT) task by measuring the amplitude of the N1 component of the event-related potential (ERP) to probe flashes presented on targets, distractors, or empty background areas. We found evidence that visual attention enhances targets and suppresses distractors (Experiment 1 & 3). However, we also found that when tracking load was light (two targets and two distractors), accurate tracking could be carried out without any apparent contribution from the visual attention system (Experiment 2). Our results suggest that attentional selection during MOT is flexibly determined by task demands as well as tracking load and that visual attention may not always be necessary for accurate tracking.
We often need to simultaneously monitor several objects in our environment. Everyday examples include team sports such as basketball in which each player needs to keep track of players on his own team as well as his opponents and automobile drivers who need to know the positions and motion trajectories of cars in neighboring lanes while monitoring oncoming traffic. The relative ease with which we do this might imply that we simply monitor all the objects in our visual field but, in fact, it is easy to show that this isn't the case, as demonstrated in the multiple object tracking task.
The multiple object tracking task (MOT) was developed by Pylyshyn and Storm (1988) as a laboratory analogue of the divided visual attention situation described above. In this task, observers are first shown a small set (8-16) of identical objects (e.g., black circles). A subset of them are designated as targets by briefly flashing them and then all objects move randomly about the display while the observer attempts to keep track of the targets. At the end of the trial, the objects become stationary and the observer selects the target objects. The main finding is that observers can accurately track approximately four objects and that once this limit is exceeded, accuracy declines precipitously. Thus, divided attention appears to be limited to four objects, a limit that also appears in a variety of other visual attention tasks (see Cowan, 2001) such as whole report (Sperling, 1960), change detection (Luck & Vogel, 1997; Pashler, 1988), and subitizing (Mandler & Shebo, 1982; Trick & Pylyshyn, 1994).
Here we consider the question of how people selectively attend to targets and exclude distractors in the MOT task. Several different selection mechanisms are known to operate in vision, including selection by features, locations, and objects. For example, when we search for a shape such as a letter, we “activate’ the features of that shape (“angular”, “horizontal line”) with the result that only visual objects that posses those features interfere with our search (Bacon & Egeth, 1994). This mechanism doesn't seem to be at work in MOT because all the objects are identical and cannot be distinguished on the basis of their features.
Selection by location has been extensively studied beginning with Helmholtz (1867/1925; see Wright & Ward, 2008 for a history of these early investigations of visual attention) who noted that we can voluntarily pay attention to peripheral locations without changing our fixation, an ability known as covert orienting (Posner, 1980). The result is faster and more accurate detection and identification of information appearing in the attended area (Averbach & Coriell, 1961; Eriksen & Hoffman, 1972, 1973; Eriksen & Yeh, 1985; Posner, 1980; Posner, Snyder, & Davidson, 1980). This mechanism has been likened to a “spotlight” illuminating the attended area. The spotlight of attention appears to have a limited spatial resolution (Intriligator & Cavanagh, 2001) and may include spatial locations adjacent to the attended area (Eriksen & Hoffman, 1972, 1973; Hoffman & Nelson, 1981). In addition, it has been likened to a zoom lens that can be focused on a small area to provide detailed information about a few objects or can have a broad focus to yield less detailed information about many objects (Eriksen & St. James, 1986; Eriksen & Yeh, 1985). The attentional spotlight can also be divided and simultaneously directed to different spatial locations (Awh & Pashler, 2000; Franconeri, Alvarez, & Enns, 2007; Malinowski, Fuchs, & Muller, 2007; McMains & Somers, 2004; Muller, Malinowski, Gruber, & Hillyard, 2003).
In addition to feature and location-based selection mechanisms, researchers have identified a third possible selection mechanism based on objects (see Scholl, 2001 for a review), although it is presently unclear whether this system might ultimately be reduced to selection based on some combination of features and location. For example, O'Craven, Downing, and Kanwisher (1999) presented observers with a partially transparent picture of a house superimposed on a similarly formatted picture of a face. On some trials the face moved back and forth while the house remained stationary and observers were required to monitor the direction of motion. O'Craven et al., found that areas of the brain that are known to be activated by faces (the “fusiform face area”) were also active here while “house areas” (parahippocampal place area) were relatively silent. This pattern was reversed when observers attended to the house image. These results show first, that superimposed images can be separately attended, even though they occupy the same general location and second, that attending to one feature of an object (its motion) results in the selection of other aspects of that same object (its shape: face or house). The selection process itself, which is responsible for admitting faces and excluding houses, might be based on the feature of motion. However, the unit of selection appears to be the whole object and not just individual features (Blaser, Pylyshyn, & Holcombe, 2000; Duncan, 1984; Egly, Driver, & Rafal, 1994).
Pylyshyn and colleagues (Pylyshyn, 2001, 2003; Pylyshyn & Storm, 1988; Sears & Pylyshyn, 2000) have offered a theory of performance in the MOT task that rejects both features and spatial locations as the basis of selection. Pylyshyn (2003) acknowledges the need for a focal attention system that can select different locations in the visual world but argues that there is an earlier, more primitive selection mechanism that serves the function of “indexing” or pointing to objects. Importantly, he maintains that “objects are indexed directly, rather than via their properties or their locations” (p. 202). Each object to be tracked is assigned an index that passively adheres to that object as it moves. The finding that observers can only track four objects suggests a corresponding limit in the number of available indexes (Pylyshyn & Storm, 1988; Scholl & Pylyshyn, 1999; Scholl, 2001; for a review see Scholl, 2009).
According to Pylyshyn, object tracking is carried out by an indexing system that is separate from the location-based spatial attention system. Indexes allow people to determine properties of the tracked objects, such as their color, location, or shape but these properties themselves are not the basis for tracking. If one needs to know the color or some other property of a tracked object, visual attention can be allocated to a particular object using the index as a pointer to the relevant object. Thus it is the indexing system that distinguishes targets from distractors and this mechanism is distinct from the purely location-based “spotlight” of attention. This is an important claim because it is pivotal to the assertion that the proposed indexing system, which is presumed to be the principal basis for attending to objects, is a separate selection mechanism and cannot simply be reduced to some combination of feature and location-based attention systems. The initial evidence for this claim was provided by Pylyshyn and Storm (1988) who argued that a unitary focus of spatial attention could not be moved among tracked objects quickly enough to support the accurate performance that was observed in the typical MOT experiment. They argued instead for the existence of a set of indexes which can track the objects in parallel.
This argument, however, doesn't exclude a variation of attentional selection in which attention may be split into multiple, independent foci (Awh & Pashler, 2000; Franconeri, Alvarez, & Enns, 2007; Malinowski, Fuchs, & Muller, 2007; McMains & Somers, 2004; Muller, Malinowski, Gruber, & Hillyard, 2003), each devoted to one of the tracked objects (Cavanagh & Alvarez, 2005). In fact, the multi-focal attention model appears quite similar to the indexing model with indexes and attentional foci playing similar roles. Both models suggest that MOT is accomplished by virtue of a limited resource (indexes or visual attention) being simultaneously allocated to tracked objects.
Current evidence doesn't appear to discriminate between these two models. For example, a variety of evidence has accumulated supporting the object-based nature of tracking. When observers attempt to track one end of a bar-bell type of object, tracking accuracy declines because they can not keep the two ends separate (Scholl, Pylyshyn, & Feldman, 2001) as would be expected if they are tracking a single object rather than part of an object. Similarly, observers have trouble tracking substances that flow smoothly from one location to another (vanMarle & Scholl, 2003), perhaps because the visual system maintains a distinction between objects and substances and tracking can only be applied to the former. Although this evidence is consistent with the claim that at least sometimes the unit of selection is an object, it doesn't rule out the possibility that the mechanism of selection is still spatial in nature.
There is also evidence that seems to argue against a simple indexing mechanism, particularly the idea that limits on object tracking are due to an underlying limit on the number of indexes. First, the number of objects that can be accurately tracked is not fixed at four (or any other number for that matter). Alvarez and Franconeri (2007) showed that this number could vary from one object to as many as eight depending on factors such as speed and spatial separation between the objects. They suggested that the underlying limit in MOT was a finite processing capacity that is shared among the tracked objects. This system can only track a small number of items when conditions are difficult and each object demands a large share of the available capacity but it can track a much larger number of objects when tracking is easy and each object makes small demands on the total capacity.
The limited capacity mechanism identified by Alvarez and Franconeri (2007) might be visual attention which becomes “diluted” as it is divided among additional locations containing target objects. And just as visual attention has a limited spatial resolution, so too does MOT (Franconeri, Lin, Pylyshyn, Fisher, & Enns, 2008; Shim, Alvarez, & Jiang, 2008). Several experiments have shown that one important source of errors in MOT is the likelihood that observers will mistakenly start tracking a distractor object if it passes too close to a target (O'Hearn, Landau, & Hoffman, 2005; Pylyshyn, 2004). This kind of error seems to follow naturally from the idea that the spotlight of visual attention is limited in its spatial resolving power, as has been noted by many other investigators (for example, Eriksen & Hoffman, 1972, 1973; Eriksen & Eriksen, 1974; Hoffman & Nelson, 1981; Intriligator & Cavanagh, 2001).
One way to evaluate the role of visual attention in MOT is to require observers to detect brief probes presented during the MOT task. If a separate focus of attention is devoted to each target, then probes presented on targets should be detected better than probes presented on distractors or background areas. This prediction rests on the observation that when visual attention is allocated to a location, all information in the attended region receives enhanced processing, even when it is irrelevant (Luck, Fan, & Hillyard, 1993). Therefore, we would expect that if observers allocate visual attention to an object for the purpose of tracking it, there will also be benefits in processing other information, such as probe objects, that appear in the same spatial neighborhood as the tracked object. Note that, according to Pylyshyn's (2003) conception of indexes, such a finding would not necessarily follow from indexing theory because indexes allocated to tracked objects do not automatically provide access to any of those objects' properties, let alone other visual information, such as probes, that are distinct objects in their own right. In addition, indexes do not provide explicit information about an object's spatial location. Spatial location, like color, is simply another object property that can be retrieved, if needed, by using the index as a pointer, so there isn't any reason why probes that are near to tracked objects should enjoy any advantage over more distant ones.
Nevertheless, Pylyshyn (2006) performed just such a probe experiment and found that surprisingly, probe detection was better for both target and background locations compared to distractor locations. This result doesn't easily fit with the idea that visual attention is being preferentially allocated to targets. Instead, Pylyshyn suggested that distractors are being inhibited. This makes sense because it is distractors passing close to targets that pose the main threat to good performance in the MOT task. A more recent paper (Pylyshyn, Haladjian, King, & Reilly, 2008) showed that distractors that could be preattentively segregated from targets by appearing in a different depth plane received less inhibition, reinforcing the idea that inhibition is applied to distractors when they are close to and could be mistaken for targets.
Although probe detection experiments appear to provide intriguing evidence about how attention is allocated during MOT, we have to keep in mind that observers are being placed in a dual-task situation and they may be adopting strategies that are different than those used in the “pure” MOT task. In addition, better probe detection could reflect attentional effects at a variety of levels of processing (Vogel, Woodman, & Luck, 2005). An alternative approach investigates the role of visual attention in MOT by examining brain activity, in the form of event-related brain potentials (ERPs), elicited by task-irrelevant “probe flashes” presented on targets, distractors, or background areas. One advantage of this approach is that ERP amplitude can provide a measure of attention without requiring observers to make any judgment about the probes and therefore may constitute a relatively unobtrusive measure of attention during MOT. In addition, different ERP components, such as the N1 and P300, appear to index different stages of attention so such measures may also provide information about the level at which attention operates during MOT.
Extensive prior research shows that the amplitudes of two “early” ERP components (the P1 and N1) that are generated in extrastriate visual cortex are larger for stimuli appearing in attended spatial locations (Di Russo, Martinez, & Hillyard, 2003; Di Russo et al., 2001; Hillyard & Anllo-Vento, 1998; Luck, Woodman, & Vogel, 2000; Mangun & Hillyard, 1988; Martinez et al., 2001). Importantly, P1 and N1 amplitudes have been shown to be unaffected by attention to stimulus features such as color or size (Hillyard & Anllo-Vento, 1998; Hillyard & Munte, 1984). More recent experiments have shown similar effects for probes appearing on attended surfaces or objects (Martinez et al., 2006; Martinez, Ramanathan, Foxe, Javitt, & Hillyard, 2007; Martinez, Teder-Salejarvi, & Hillyard, 2007; Valdes-Sosa, Bobes, Rodriguez, & Pinilla, 1998) but these object-based effects could be mediated by visual attention conforming to the shapes of attended objects and surfaces. Current evidence then indicates that the amplitude of these two components is specific to visuo-spatial attention and is not affected by feature-based attention. If visual attention is preferentially allocated to targets during MOT then we would expect larger P1/N1 components for target flashes relative to both distractor and background flashes. Alternatively, if visual attention suppresses distractor items during MOT, as suggested by Pylyshyn (2006), then we would expect to observe larger P1/N1 components for both target and background flashes relative to distractor flashes.
Drew, McCollough, Horowitz, and Vogel (2009) applied this logic to probes appearing on targets, moving distractors, stationary distractors, and blank display areas. They found that target probes elicited larger P1 and N1 components than other probes, suggesting that visual attention is enhancing tracked objects during MOT. The present Experiment 1 is similar in many respects to the study carried out by Drew et al. but uses moving texture-defined objects rather than the luminance-defined objects used by Drew et al. in an effort to equate the physical properties of probes occurring on objects and background areas (see Flombaum & Scholl, under revision; Flombaum, Scholl, & Pylyshyn, 2008 who used similar stimuli in an MOT task). In addition, this experiment serves as a comparison for later experiments that required responses to the probes (Experiment 2) and increased tracking load (Experiment 3).
In this experiment, participants tracked two out of four objects while probe flashes were presented on targets, distractors or the background. All probe flashes were irrelevant to the observers since they did not require any behavioral response and could not be used to aid tracking performance since the probes were equally likely to appear in any of these locations. As a result, we were able to use the amplitude of the P1/N1 components generated by the probe flashes to measure the allocation of visual attention while participants were performing the MOT task by itself.
Eighteen right handed, neurologically normal volunteers (ages 18-30 years) participated in this experiment. With the exception of MD, all were naïve to the purpose of the experiment and were paid $10/hour for their participation. All participants reported normal or corrected-to-normal acuity and provided informed consent.
Stimuli were generated on a Dell 2.99 GHz computer running custom software written with Blitz3D (Sibly, 2005) and presented on an 18 inch Dell CRT (1024 × 768 pixel resolution; 75 Hz frame rate). Eye fixation was monitored using a Tobii ×50 50-Hz eye tracker (Tobii Technology, Stockholm Sweden) controlled by a Sony 2.86 GHz computer. Testing was conducted in a dimly-lit, electrically shielded room with a chinrest maintaining a 70 cm viewing distance. Under these conditions, the total viewable area of the screen subtended approximately 27.5° by 21.1°, but stimuli appeared only in the center 19.7° by 14.0° of the screen and the remainder of the screen was black.
Participants tracked 2 out of 4 objects (squares, 1.43°) that moved haphazardly throughout the display. The objects and background were constructed from randomly generated black (0.10 cd/m2) and white (76.75 cd/m2) pixels resulting in displays in which the objects were invisible when stationary (Examples of these displays can be viewed at http://hoffman.psych.udel.edu/TextureMOT/index.html). The primary advantage of this display format is that objects and background are physically identical during tracking (Flombaum et al., 2008). At the beginning of each trial, the objects were outlined by black placeholders so that they were visible to the observer. The initial location of each object was selected randomly with the constraint that objects could not overlap.
Participants initiated each trial by pressing a mouse button which caused two of the objects' placeholders to blink on and off five times for 200 ms, identifying those objects as targets. Once the cueing was complete, the placeholders remained invisible while the objects and background moved for a period of 10 seconds. When the motion ended, the placeholders reappeared and participants used the mouse to select the targets. Immediate feedback was provided by changing the color of the placeholder to green for a correctly selected target and red for an incorrectly chosen distractor. This selection phase ended once participants selected 2 objects. Throughout the trial, a blue fixation circle (0.34°) was present at the center of the display and participants were instructed to maintain fixation. Additional feedback was provided at the end of the trial about the percentage of correct tracking trials (i.e., the percentage of trials in which both targets were correctly chosen) as well as the number of eye movements and blinks that were detected during the trial.
Motion trajectories were computed offline for each participant. At the beginning of each trial, randomly selected speeds ranging between +/− 2° per second in the horizontal and vertical directions were assigned to objects and the background texture. In order to keep the objects visible, these initial speeds were constrained so that the objects and background always began with non-zero velocities. For the same reason, the objects and background moved in different directions with different velocities throughout the trial. In order to produce a “wandering” motion, the horizontal/vertical speeds of each object and the background were adjusted randomly and independently in increments between +/− 0.5° per second with the constraint that the speeds never exceeded 2° per second. On average, the objects and background moved 1.2° per second. The object motions were also constrained so that the objects reflected off of each other and the edges of the display.
In order to measure the allocation of attention during tracking using ERPs, 12 probe flashes (white squares that were identical in size to the objects; 76.75 cd/m2) were presented (one at a time) during each MOT trial. The first probe was presented between 400-600 ms after the initiation of motion on each trial while the last probe occurred at least 900 ms before the motion ended. Probes remained visible for 106 ms (i.e., 8 refresh frames) and the time between probes was randomly drawn from the interval 400-600 ms (i.e., 30-45 refresh frames). On each trial, probes were distributed equally on targets, distractors, and the background in both visual fields and were presented in random order. All probes moved along the same trajectory as the surface on which they occurred. In other words, object probes followed the same trajectories as the probed object and background probes followed the same trajectory as the background. Since the background moved according to similar constraints as the objects and both were composed of random textures, the background and objects probes were physically comparable. Probes that appeared on the background were presented at randomly chosen locations so the centers of the probes were at least 2.86° from the edge of the display and from the centers of any objects. Participants were informed that the probes could appear on any of the objects or the background, that they could not be used as an aid in tracking, and that they were irrelevant to the tracking task.
Each participant viewed a total of 100 experimental MOT trials in a single session. As a result, participants viewed a total of 200 probe presentations in each condition created by crossing visual field (left vs. right) with probe location (target, distractor, or background). For each participant, pairs of trials were identical except that the identities of the targets and distractors were switched. In other words, for each trial, there was a companion trial that had identical motion trajectories and probe presentations, but the targets for one trial were the distractors on the companion trial. Additionally, participants viewed approximately 10 randomly generated practice trials prior to the beginning of the experimental trials.
The electroencephalogram (EEG) was recorded with an Electrical Geodesics Inc. (EGI; Eugene, OR) 128-channel Geodesic Sensor Net with individual electrode impedances kept below 50-75 kΩ, as recommended by the manufacturer. The data were referenced online to the vertex, bandpass filtered between .01-80 Hz, and were digitized at 200 Hz. Subsequent processing was performed offline using EGI Net Station 4.1.2. The data were bandpass filtered between 1-55 Hz and then were segmented using an epoch that began 100 ms prior to the onset of the probe flash and ended 800 ms after. Net Station's artifact detection routines were then applied. Individual channels were marked as bad if there was zero variance, the fast average amplitude exceeded 200 μV, or the differential average amplitude exceeded 200 μV. Individual segments were rejected if they contained eye movements or blinks (threshold = 70 μV) or if more than 10 channels were marked as bad. For the remaining segments, bad channels were replaced by interpolating from surrounding channels. Finally the segments were averaged, rereferenced to the average reference, and baseline corrected using a 100 ms prestimulus interval.
As expected, tracking accuracy was fairly high. On average, participants correctly selected both target objects on 92.6% of the trials (SD = 4.6). Using the formula provided by Pylyshyn and Annan (2006; see also Hulleman, 2005 for a similar formula), this translates to an average of 1.86 out of a maximum of 2 objects effectively tracked (SD = .09)1. Data from the incorrect trials were not included in the subsequent ERP analyses.
Two separate N1 components were observed in this experiment. The posterior N1 occurred over lateral posterior electrode locations and peaked approximately 175 ms post-stimulus for contralateral stimuli (Figure 1). The posterior N1 was measured as the average amplitude over a time window from 165-205 ms post-stimulus. The anterior N1 occurred over lateral fronto-central electrode locations, peaked approximately 150 ms post-stimulus, and was measured as the mean amplitude 120-160 ms post-stimulus (Figure 2). Each analysis of the amplitude of these components was conducted on the average of four contiguous channels for each hemisphere. These channels showed the largest N1 amplitudes (averaged over target, distractor, and background conditions) for contralateral stimuli in the specified time windows. The channels used for the posterior N1 were 58 (P9), 59 (P7), 65, and 66 for the left hemisphere and 92 (P8), 93 (P6), 97 (P10), and 98 (Tp8) for the right hemisphere. The channels used for the anterior N1 were 7, 13, 21 (Fc1), and 29 (Fc3) for the left hemisphere and 5, 113, 119 (Fc2), and 124 (F4) for the right hemisphere. Note that the P1 amplitudes observed in this experiment were very small and did not show any interpretable effect of probe location so P1 analyses are not included in the following results. All subsequent ANOVAs use the Greenhouse-Geisser correction when appropriate.
Figure 1 shows that probes evoked a contralateral N1 over posterior cortex that peaked at approximately 175 ms. The amplitude of the N1 appeared to be smallest for probes presented on distractor objects and comparable for target objects and the background. This was confirmed by a 2 (hemisphere: left vs. right) × 3 (probe location: target, distractor, background) repeated-measures ANOVA which showed a significant effect of probe location (F(2, 34) = 5.83, p < .05, MSE = .342). No other effects were reliable (Fs < 1, ps > .38). Follow up t-tests2 indicated that N1 amplitudes were larger for target vs. distractor probes (t(17) = 3.45, p < .01) as well as background vs. distractor probes (t(17) = 2.29, p < .05). Target and background probes did not differ (t(17) = .60, p = .56). This pattern suggests that the N1 for distractor probes is being suppressed relative to target and background probes.
Anterior N1s are shown in Figure 2 where it appears that target probes elicited the largest N1 followed by background probes and then the distractor probes. Contralateral anterior N1 amplitude was evaluated with a 2 (hemisphere: left vs. right) × 3 (probe location: target, distractor, background) repeated-measures ANOVA which showed a main effect of probe location (F(2, 34) = 13.95, p < .001, MSE = .157). No other effects were reliable (hemisphere: F(1, 17) = 4.21, p = .06; hemisphere × probe location: F(2, 34) = 2.57, p = .11). Follow-up t-tests showed a significant difference between target and distractor probes (t(17) = 4.62, p < .001) and target vs. background probes (t(17) = 2.33, p < .05). In addition, the distractor probes resulted in smaller N1s than background probes (t(17) = 3.60, p < .01). If we consider the background probe to be a neutral baseline condition, then the amplitude of the N1 was increased for target probes and decreased for distractor probes. The results for the anterior N1 suggest that attention tracks objects during MOT by preferentially allocating visual attention to targets while simultaneously suppressing distractors.
In this experiment, participants tracked two out of four objects while irrelevant probe flashes were presented on targets, distractors, or the background. The amplitude of the N1 component of the ERP was used as a measure of the amount of visual attention allocated to each of these display areas. The posterior N1 showed a pattern consistent with the idea that distractors are suppressed during MOT. Target and background N1s were comparable in amplitude and both were larger than distractor N1s. For the earlier occurring anterior N1, targets were larger than backgrounds which, in turn, were larger than distractors. This pattern is consistent with simultaneous suppression of distractors and enhancement of the targets.
The present results indicate that the suppression of distractors operates fairly early in processing, at least as early as the anterior N1 which had a peak latency of 150 ms. In addition, the anterior N1 showed that targets were enhanced relative to both distractors and the background which is consistent with earlier reports that this component is larger for attended relative to unattended flashes (Hillyard & Anllo-Vento, 1998). Apparently the same process that is engaged when observers are cued to attend to a static display location is also used to track moving targets. This process appears to be one of enhancement of neural activity at early locations in the visual system for attended objects together with suppression of unattended objects. This suppression pattern was also observed in the later posterior N1 at a latency of 175 ms.
These results are broadly consistent with Pylyshyn's (2006) finding that observers are more likely to detect briefly presented probes occurring on targets and background areas relative to distractor probes, a pattern interpreted as evidence that distractors are suppressed during MOT. According to Pylyshyn (2004), although indexes “attached” to targets are the main mechanism underlying object tracking, indexes may sometimes be lost to nearby distractors and are a significant source of tracking errors (see also O'Hearn et al., 2005). Suppression of distractors can then be seen as a useful “add-on” process that helps to ensure the success of the indexing mechanism when distractors are confusable and close to a target.
There are several reasons, however, to be cautious in accepting this pattern of results as supporting the claim that MOT generally involves suppression of distractors. First, the results of a similar study by Drew et al. (2009) suggest that tracked objects are enhanced during MOT without suppression of distractors. There are several differences between the Drew et al. study and the present one that might account for the different pattern of findings in the two studies. Our objects consisted of random dot textures moving on a moving random dot background while Drew et al. used luminance-defined objects. Our probe flashes involved changing the luminance of the object itself while their probe was a separate object occurring inside the contours of a hollow object. Finally, their displays included eight distractors (four moving and four stationary) as opposed to two distractors in our study. Additional research will be required to determine which of these differences in methodology are responsible for the different pattern of results.
Second, several experiments (Flombaum et al., 2008; Flombaum & Scholl, under revision; Pylyshyn, 2006; Pylyshyn et al., 2008; Experiments 2 & 3 this paper) have found that detection of background probes is often superior to both targets and distractors and this holds true even when observers are instructed to ignore the tracking task and just detect the probes (Pylyshyn, 2006). This raises the possibility that background probes may simply be more visible than probes occurring on objects and therefore may not constitute the ideal neutral condition for drawing inferences regarding suppression vs. enhancement. Visibility differences might arise because of physical differences between object and background probes. For example, probes on objects will be surrounded by contours which might produce more masking of the probe compared to those appearing on a blank background. In addition, background probes will generally be stationary in contrast to the target and distractor objects which move continuously. If moving objects are at a disadvantage compared to stationary ones (Flombaum & Scholl, under revision), then once again, background probes may have an advantage.
The present method of using moving texture objects on a moving texture background eliminates differences between object and background probes based on whether they are moving or stationary as well as differences in masking by luminance contours. However, it certainly doesn't eliminate all potentially important differences between objects and background areas that might affect probe visibility and so it is important to remain cautious in drawing the inference that distractors are suppressed at early levels during MOT. We will continue to entertain the idea that MOT may rely on mechanisms of target enhancement, distractor suppression, or a combination of the two while recognizing that additional research will be required to settle the issue. The most important inference to draw from these results is simply that early mechanisms of visual attention play a role in favoring tracked objects over distractors.
The results of Experiment 1 support the claim that visual attention does play a role in MOT. However, it isn't clear whether the allocation of visual attention is necessary for MOT or is simply an optional strategy. Experiment 2 addresses this issue by providing observers with an incentive to spread their attention over the entire display during tracking. In this experiment, the probe flashes are task-relevant as observers are asked to detect occasional target probes embedded in the stream of probe flashes occurring on tracked objects (target objects), distractor objects, and the background. If allocation of visual attention to target objects is obligatory for accurate object tracking, we should see superior detection of target probes when they occur on tracked objects relative to distractors and backgrounds. In addition, we should see a pattern of N1 amplitudes similar to that observed in Experiment 1. On the other hand, if the allocation of visual attention observed in Experiment 1 was optional, then observers have no particular incentive to allocate visual attention to tracked objects (targets) and we should observe N1s of comparable amplitude for probes occurring anywhere in the display.
The addition of relevant target probes also allowed us to measure the P300 component which is a later, post-perceptual component that has been associated with consolidation of information into memory (Donchin, 1981; Donchin & Coles, 1988) and is affected by attention (Coull, 1998; Polich & Kok, 1995 for reviews). The P300 is also sensitive to the probability of various categories of eliciting stimuli such that low probability events produce larger amplitude P300 components than high probability events (Duncan-Johnson & Donchin, 1977; Kutas, McCarthy, & Donchin, 1977). If probes occurred equally often on targets, distractors, and the background as in Experiment 1 and observers categorized the probe events as occurring on either objects or the background, then background probes would only occur 1/3 of the time and could produce a large P300 simply because they are rarer than the other events. We tested this by running two versions of the experiment. Experiment 2a was similar to the procedure used in Experiment 1 in which probes occurred equally often on targets, distractors, and the background. In Experiment 2b, half of the probes occurred on the background and the other half were divided equally among targets and distractors. To foreshadow our results, no probability effects were observed on the P300 so we were able to combine the data from both versions of the experiment for added power to detect any potentially small N1 effects.
Forty one right handed, neurologically normal volunteers (ages 18-32 years) participated in this experiment (Experiment 2a n = 27; Experiment 2b n = 14). With the exception of MD (Experiment 2a), all were naïve to the purpose of the experiment and were paid $10/hour for their participation. All participants reported normal or corrected-to-normal acuity and provided informed consent. Four participants from Experiment 2a were excluded from the analyses for the following reasons. One participant exhibited exceptionally poor probe detection performance (less than 20% of the probes were detected). One participant never detected background or distractor probes which we took as evidence that the participant did not follow the instruction to respond to probes regardless of where they occurred. One participant showed excessive alpha in the ERP recording. One participant was excluded because too many channels were bad in the ERP recording (more than 10 channels were marked bad in all of the segments).
These experiments were identical to Experiment 1 except as noted below. In Experiment 1 all probe flashes were irrelevant to the observer, but in these experiments 25% of the probe flashes required a response (“response probes”). Response probes were identical to the irrelevant probes except that there was a small gray square (0.11°; 50.09 cd/m2) in their centers. In Experiment 2a, these response probes were evenly distributed over targets, distractors, and the background. In Experiment 2b, half of the probes occurred on the background and the other half of the probes were equally divided among targets and distractors. In both cases, each probe type occurred equally often in both visual fields. All probes were presented in a random order, but the presentation was constrained so that the first probe on a trial was an irrelevant probe and response probes were separated by at least one irrelevant probe. A total of 12 (Experiment 2a) or 16 (Experiment 2b) probes were presented during each MOT trial. The number of response probes presented during each MOT trial varied from two to four (Experiment 2a) or three to five (Experiment 2b). Responses that occurred 200-900 ms after the presentation of a response probe were counted as hits. If no response was made within this time window, a miss was recorded. Any other responses were counted as false alarms.
Each experiment consisted of 108 trials that lasted 10 seconds (Experiment 2a) or 13 seconds (Experiment 2b). As a result, participants viewed a total of 162 irrelevant probes and 54 response probes per condition created by crossing visual field (left vs. right) and probe location (target, distractor, or background) in Experiment 2a. In Experiment 2b, participants viewed 162 irrelevant and 54 response probes in each visual field for target and distractor probe locations while 324 irrelevant and 108 response probes occurred on the background in each visual field. Observers responded to the “target” probes by clicking a mouse button regardless of whether the probe appeared on a tracked object, a distractor, or the background. Participants were instructed to prioritize MOT accuracy over probe detection accuracy. In addition to feedback about tracking accuracy and eye movements/blinks, participants received feedback about the number of hits, misses, and false alarms that were recorded at the end of each trial.
Tracking accuracy was high in both versions of the experiment. In Experiment 2a, participants correctly selected both target objects on 89.6% of the trials (SD = 5.7) which translates to an average of 1.80 (SD = .10) objects effectively tracked using the formula from Pylyshyn & Annan (2006). In Experiment 2b participants correctly selected both target objects on 87.6% of the trials (SD = 4.9) which translates to an average of 1.76 (SD = .09) objects effectively tracked. It is worth noting that the addition of the probe detection task in these experiments produced a small but reliable decrease in accuracy on the MOT task relative to Experiment 1 (t(53) = 2.49, p < .05). Data from the incorrect tracking trials were not included in any analyses of probe detection or ERPs.
On average, participants correctly detected 70.1% of the response probes (SD = 13.9) in Experiment 2a and 59.5% (SD = 19.9) in Experiment 2b. False alarm rates were very low and averaged only 0.21 false alarms per MOT trial (SD = 0.13) in Experiment 2a and 0.19 false alarms per MOT trial (SD = 0.18) in Experiment 2b. As a result, false alarms were not included in the probe detection analysis. Probe hit rates are shown in Figure 3 where it appears that for both versions of Experiment 2, performance was highest for background probes, intermediate for targets, and lowest for distractors. This was confirmed by a 2 (Experiment: 2a vs. 2b) × 2 (visual field) × 3 (probe location: target, distractor, background) ANOVA. A main effect of experiment indicated that probes were detected more frequently in Experiment 2a than in Experiment 2b (F(1, 35) = 4.87, p < .05, MSE = .139). Importantly, though, this factor did not interact with any other factors (Fs < 2.92, ps > .09). A significant effect of visual field indicated that probes were detected more often in the right visual field (mean = 69.3%, SD = 16.4) than in the left visual field (mean = 63.8%, SD = 17.1; F(1, 35) = 9.08, p < .01, MSE = .014. More interestingly a main effect of probe location (F(2, 70) = 72.31, p < .001, MSE = .010) indicated that performance was highest for background probes (background vs. target: t(36) = 5.73, p < .001; background vs. distractor: t(36) = 12.10, p < .001), followed by target probes (target vs. distractor: t(36) = 6.56, p < .001); distractor probes were detected least often. This pattern is consistent with the existing behavioral work which has been interpreted as evidence for distractor suppression in MOT (e.g., Pylyshyn, 2006, Pylyshyn et al., 2008) although, as we pointed out earlier, the superiority of background probes relative to target probes is not predicted by distractor suppression.
The P300 was observed over central-parietal electrode locations and peaked approximately 475 ms after onset of the probe (Figure 4). The mean amplitude was measured at channels 54 (P1), 55 (Cpz), 62, and 80 (P2) using a 425-525 ms window following the onset of the probe. Note that only data from the response probes were included in the following analysis. A 2 (experiment: 2a vs. 2b) × 2 (visual field: left vs. right) × 3 (probe location: target, distractor, background) repeated-measures ANOVA did not show any reliable effects or interactions with the experiment factor (Fs < 3.74, ps > .06). There was a main effect of visual field (F(1, 35) = 4.78, p < .05, MSE = 1.21) indicating that right visual field stimuli produced larger P300s than left visual field stimuli (right visual field: mean = 3.29 μV, SD = 1.94; left visual field: mean = 2.88 μV, SD = 2.18). There was also a main effect of probe location (F(2, 70) = 30.62, p < .001, MSE = 1.261); no interactions involving probe location were reliable (Fs < 2.24, ps > .11). Background probes produced the largest P300, followed by targets, with distractors producing the smallest P300 (target vs. distractor: t(36) = 3.53, p < .01; target vs. background: t(36) = 3.86, p < .001; distractor vs. background: t(36) = 7.45, p < .001). P300 amplitude, shown in Figure 4, nicely mirrors the behavioral probe detection results reported above and elsewhere (Pylyshyn, 2006). In addition, the comparability of results across the two versions of Experiment 2 appear to rule of probability confounds as an explanation for the larger P300s associated with background probes. We conclude that at the level of the P300, targets and distractors are differentiated and that this pattern is consistent with behavioral data interpreted as evidence for distractor suppression in MOT (e.g., Pylyshyn, 2006, Pylyshyn et al., 2008) although, again, the larger amplitude P300 for background probes is not predicted by distractor suppression.
In order to maximize our statistical power, the following N1 analyses treat the data from the two versions of Experiment 2 as if they came from a single experiment with 37 participants. Figure 5 amplitude plots of the N1 as a function of probe condition. The N1 appears as a contralateral negativity over lateral posterior electrode sites with a latency of approximately 180 ms post-stimulus. Its amplitude is remarkably similar across the different probe locations. N1 amplitude was quantified as the average voltage over a time window from 160-200 ms post-stimulus for contralateral electrodes 59 (P7), 60, 65, and 66 for the left hemisphere and 85, 86, 92 (P8), and 93 (P6) for the right hemisphere. Note that only the data from the irrelevant probes were used in the following analysis (i.e., the response probes were excluded). As in Experiment 1, N1 amplitude was evaluated with a 2 (hemisphere: left vs. right) × 3 (probe location: target, distractor, background) repeated-measures ANOVA. No reliable main effects or interactions were observed (Fs < 1, ps > .41). In other words it appears that, contrary to the results of Experiment 1, there was no reliable effect of probe location on posterior N1 amplitude.
The anterior N1 was observed over lateral fronto-central electrode locations and peaked approximately 155 ms post-stimulus (Figure 6). The anterior N1 was not as clearly differentiated in terms of scalp topography as in Experiment 1, partly because it overlaps with the posterior N1 which has a slightly earlier onset in this experiment. Consequently, we used the same electrodes as in Experiment 1 for analysis of this component. The anterior N1 was defined as the mean amplitude in a 135-175 ms post-stimulus window measured at electrodes contralateral to the probe flash. Note that only the data from the irrelevant probes were used in the following analysis (i.e., the response probes were excluded). A 2 (hemisphere: left vs. right) × 3 (probe location: target, distractor, background) repeated-measures ANOVA did not find any reliable main effects or interactions (Fs < 2.13, ps > .13).
The failure to find any effect of probe location on either anterior or posterior N1 amplitude stands in contrast to the results of Experiment 1 which showed robust effects of probe location on both N1 components. It seems unlikely that the lack of significant effects can simply be attributed to a lack of power. Using G*Power 3.0.10 (Faul, Erdfelder, Lang, & Buchner, 2007) we determined that there was sufficient power (power = .80, α = .05) to detect main effects of probe location with effect sizes of .0353 and .0158 for the anterior and posterior N1 components respectively (effect sizes are partial η2 values). The effect sizes in Experiment 1 were as follows: partial η2= .45 for the anterior N1 and partial η2 = .24 for the posterior N1 so we had reasonable power to detect effects less than 1/12 the size of those observed in the previous experiment.
Our behavioral results provide a replication of previous experiments (Flombaum et al., 2008; Pylyshyn, 2006; Pylyshyn et al., 2008) showing that the ability to detect target probes during MOT depends on probe location: accuracy was highest with background probes, intermediate for targets, and lowest for distractors. This pattern has been taken as evidence for distractor suppression (Pylyshyn, 2006; Pylyshyn et al., 2008) although the finding that background probes are detected more frequently than target probes appears to require an explanation over and above the idea of distractor suppression. The amplitude of the P300 component corresponded closely to the behavioral detection data, which perhaps is not surprising in the light of previous work showing that P300 amplitude is closely related to hit rate in signal detection paradigms (Imai & Tsuji, 2004).
The effects of probe location on N1 amplitude, however, were quite different. Contrary to the results of Experiment 1, we found no effects of probe location on the amplitudes of either the anterior or posterior N1 components. In other words, the addition of task-relevant probes in Experiment 2 seems to have altered the distribution of visual attention reflected in the N1 components during MOT. We assume that in an attempt to detect target probes occurring anywhere in the display, observers distributed their visual attention over the entire display rather than restricting it to areas containing tracked objects. A distributed mode of allocating spatial attention would account for the lack of N1 differences associated with different probe locations.
Another possible explanation of the lack of N1 effects in this experiment is that attention was switched periodically between the tracking and probe detection tasks throughout the trial. On the surface, this appears plausible since it has been shown recently that MOT can be periodically interrupted without degrading tracking performance. For example, Horowitz et al. (2006) demonstrated good tracking performance even when the tracking display was hidden for periods of up to 1/2 second during which the objects continued to move.
Could it be that in the current experiment, visual attention was involved in the tracking task, but periodically switched to a more broadly tuned state in order to perform the probe detection task? This sort of account seems unlikely because it still predicts that probes appearing on targets during periods when attention was allocated to the tracking task should have resulted in larger N1s for targets compared to distractors and that is not what we found. Depending on what fraction of time attention is allocated to targets, the N1 difference would be smaller than the one observed in Experiment 1 but the power analysis shows that there was sufficient power to detect effects more than 12 times smaller than those observed in Experiment 1 where there was no reason to switch attention from the tracked objects.
Another possibility is that observers did not need to periodically switch to a different attentional state in anticipation of a probe flash because flashes in Experiment 2 were able to capture attention automatically. In other words, observers attended to target objects in the MOT task while also adopting a display-wide attentional set for sudden onsets so that their attention could be rapidly allocated to probes (Most, Scholl, Clifford, & Simons, 2005). If this attention switch occurred rapidly enough to enhance the N1 component, then essentially all probes, regardless of their locations, would be “attended” and produce equivalent N1 components,
This explanation seems unlikely because it appears that the N1 reflects the state of attention existing at the time of probe onset and not changes in that allocation that occur in response to the probe onset. Attention switching in response to a sudden onset is reflected in a different component (the N2PC, Luck & Hillyard, 1994a) which has a latency just beyond that of the N1. In addition, others have reported N1 attention effects in paradigms that might have encouraged the same sort of attentional set by using relevant target events that could occur in an attended location or elsewhere (Eimer, 1997; Mangun & Hillyard, 1988, 1991)3.
For example, Mangun and Hillyard (1988) told participants that their primary task was to respond to infrequent target events that occurred in a pre-specified attended location. As a secondary task, participants were instructed to also respond to target events that occurred in either of two “unattended” locations. In this case, as in the current experiment, it would have been advantageous for participants to adopt an attentional set whereby the onset stimuli could capture attention in order to discriminate targets from nontargets at all locations in the display. However, the results still showed larger P1/N1 components for the stimuli in attended locations relative to the unattended locations. So, at least for the time being, it seems likely that the lack of N1 effects in this experiment is attributable to a broad distribution of attention rather than an attention switching strategy. Nonetheless, we acknowledge that the ability of observers to perform the MOT task with asynchronous access to the display makes attention switching a potentially viable strategy in dealing with dual-task situations and will require further research before it can definitively be ruled out as an explanation for the striking ability of MOT to coexist with other attention demanding activities.
In sum, the results of this experiment suggest that suppression is a flexible process that can be applied at different processing levels depending on the task (Vogel, Woodman, & Luck, 2005). In Experiment 1, where probe flashes were irrelevant, it made sense to filter out distractors as soon as possible and differences between target and distractor probes were observed in both anterior and posterior N1 components. However, in Experiment 2 where probes were relevant and occasionally required detection responses, early suppression would mean a high miss rate for distractor probes. In this case, it makes sense to allow the probes access to higher stages of analysis that might be able to make a preliminary determination of whether the probe is likely to be a target that requires a response. This finding also suggests that visual attention is not a necessary process for multiple object tracking (at least when tracking load is fairly light) because tracking can apparently occur without differential visual attention allocated to targets and distractors. Furthermore, these results suggest that caution is warranted in interpreting behavioral probe detection rates during MOT in terms of the allocation of visual attention (especially when tracking load is light) since selectivity in probe detection rates was observed in the absence of any N1 differences.
The results of Experiment 1 showed that visual attention was engaged during the MOT task while Experiment 2 showed that accurate tracking could still occur when attention was distributed over the entire display. These results suggest that the investment of visual attention in tracking that we observed in the first experiment was “optional” and not required for accurate tracking. However, it is important to consider the generality of this finding. It may be that visual attention isn't important for “easy tracking” but plays an important role when tracking is “difficult”. In Experiment 3, we increased tracking difficulty by requiring observers to track three targets among three distractors. As in Experiment 2, probes were task-relevant. If visual attention is needed for tracking under more difficult circumstances, then we might be able to reinstate the effects of attention on the N1 components that we observed in Experiment 1.
Nineteen right handed, neurologically normal volunteers (ages 18-32 years) participated in this experiment. All were naïve to the purpose of the experiment and were paid $10/hour for their participation. All participants reported normal or corrected-to-normal acuity and provided informed consent. Three participants were excluded from the analyses for the following reasons. One participant exhibited exceptionally poor probe detection performance (less than 1% of the probes were detected). Two participants showed excessive alpha in the ERP recording.
Since the results of Experiment 2 did not show any effects of the probability manipulation, we did not include any such manipulation in this experiment. This experiment was identical to Experiment 2a except as noted below. In Experiment 2, participants tracked two out of four objects. In order to make the tracking task more difficult, participants in the current experiment tracked three out of six objects. We expected that this increase in difficulty would lead to more tracking errors, so this experiment consisted of 120 MOT trials (rather than 108 as in Experiment 2) in order to compensate for the loss of probe presentations due to tracking errors. As a result, each participant viewed a total of 180 irrelevant and 60 response probes per condition created by crossing visual field (left vs. right) with probe location (target, distractor, or background).
Tracking accuracy in this experiment was still fairly good, but was lower than the previous experiment (t(51) = 5.14, p < .001). In this experiment, participants correctly selected both target objects on 79.5% of the trials (SD = 7.5) compared to 88.9% of the trials (SD = 5.4) in Experiment 2. This drop-off in performance is sensible given that, in this experiment, participants were required to track an additional object in the presence of an additional distractor. Using the formula provided by Pylyshyn and Annan (2006; see also Hulleman, 2005 for a similar formula) observers effectively tracked an average of 2.66 out of a maximum of 3 objects in this experiment (SD = .13).
On average, participants correctly detected 52.3% of the response probes (SD = 19.1) compared to 66.6% (SD = 16.0) in Experiment 2 indicating that the increased difficulty of the tracking task resulted in poorer probe detection performance (t(51) = 2.81, p < .01). As in the previous experiment, false alarm rates were very low and averaged only .19 false alarms per MOT trial (SD = .16). As a result, false alarms were not included in the probe detection analysis.
Probe hit rates are shown in Figure 7 where it appears that performance was highest for background probes, intermediate for targets, and lowest for distractors. This was confirmed by a 2 (visual field) × 3 (probe location: target, distractor, background) ANOVA. A main effect of probe location (F(2, 30) = 26.39, p < .001, MSE = .011) indicated that performance was highest for background probes (background vs. target: t(15) = 2.44, p < .05; background vs. distractor: t(15) = 7.79, p < .001), followed by target probes (target vs. distractor: t(15) = 4.85, p < .001); distractor probes were detected least often. No other effects or interactions were reliable (Fs < 1.2; ps > .33). This pattern is consistent with the existing behavioral work (e.g., Pylyshyn, 2006; Pylyshyn et al., 2008; our Experiment 2) which has been interpreted as evidence for distractor suppression in MOT.
The P300 was observed over central-parietal electrode locations and peaked approximately 500 ms post-stimulus (Figure 8). The mean amplitude was measured using a 450-550 ms window following the onset of the probe at channels 54 (P1), 61 (Po3), 62, and 79 (Po4). Note that only data from the response probes were included in the following analysis. A 2 (visual field: left vs. right) × 3 (probe location: target, distractor, background) repeated-measures ANOVA showed a main effect of visual field (F(1, 15) = 9.92, p < .01, MSE = .738) indicating that right visual field stimuli produced larger P300s that left visual field stimuli (right visual field: mean = 2.99 μV, SD = 2.05; left visual field: mean = 2.43 μV, SD = 1.91). There was also a main effect of probe location (F(2, 30) = 5.30, p < .05, MSE = 1.833), but the interaction was not reliable (F < 1, p > .56). Background and target probes produced similar amplitude P300 components (t(15) = 1.33, p = .20) that were both larger than the distractor P300 (target vs. distractor: t(15) = 2.42, p < .05; distractor vs. background: t(15) = 3.25, p < .01; Figure 8). As in Experiment 2, probe location affects the amplitude of the P300 and the pattern is consistent with distractor suppression.
Figure 9 shows amplitude plots of the N1 as a function of probe condition. The N1 appears as a contralateral negativity over lateral posterior electrode sites with a latency of approximately 180 ms post-stimulus. N1 amplitude was quantified as the average voltage over a time window from 160-200 ms post-stimulus for contralateral electrodes 52 (P5), 59 (P7), 60, and 66 for the left hemisphere and 78, 86, 92 (P8), and 93 (P6) for the right hemisphere. Note that only the data from the irrelevant probes were used in the following analysis (i.e., the response probes were excluded). N1 amplitude was evaluated with a 2 (hemisphere: left vs. right) × 3 (probe location: target, distractor, background) repeated-measures ANOVA. A main effect of hemisphere indicated that larger amplitude N1 components were observed at left hemisphere than right hemisphere electrode locations (F(1, 15) = 4.85, p < .05, MSE = 2.674). More importantly, a main effect of probe location was observed (F(2, 30) = 4.65, p < .05, MSE = .485). As can be seen in Figure 9, target and background probes produced comparable N1 components (t(15) = 1.15, p = .27) and target probes produced larger N1s than distractor probes t(15) = 4.68, p < .001; especially in the left hemisphere though the interaction was not reliable (F(2, 30) < 1, p = .39). This pattern of data is partially consistent with the suppression of distractors at the level of the N1 except that N1 amplitudes for background and distractor probes were not reliably different (t(15) = 1.52, p = .15). However, it is clear that as in Experiment 1 and contrary to the results of Experiment 2, posterior N1 amplitude is larger for targets than distractors, suggesting the involvement of visual attention in MOT.
The anterior N1 was observed over lateral fronto-central electrode locations and peaked approximately 150 ms post-stimulus (Figure 10). As in Experiment 2, the anterior N1 was not as clearly differentiated in terms of scalp topography as in Experiment 1, partly because of overlap with the posterior N1 which began slightly earlier in this experiment than in Experiment 1. Consequently, we used the same electrodes as in Experiment 1 for analysis of this component. The anterior N1 was defined as the mean amplitude in a 130-170 ms post-stimulus window measured at electrodes contralateral to the probe flash. Note that only the data from the irrelevant probes were used in the following analysis (i.e., the response probes were excluded). A 2 (hemisphere: left vs. right) × 3 (probe location: target, distractor, background) repeated-measures ANOVA did not show a significant effect of hemisphere (F(1, 15) = 3.92, p = .07, MSE = .286) or interaction (F(2, 30) = .267, p = .70, MSE = .191). More interestingly, a significant effect of probe location (F(2, 30) = 11.63, p < .001, MSE = .205) indicated that target probes produced the largest N1s (target vs. distractor: t(15) = 4.49, p < .001; target vs. background: t(15) = 4.61, p < .001) while background and distractor N1 amplitudes did not differ (t(15) = .33, p = .74). In other words, the target N1 was increased relative to both distractors and the background. Thus, contrary to the results of Experiment 2 but consistent with Experiment 1, visual attention seems to be involved in the tracking task. In this experiment, the effect of probe location on anterior N1 amplitude is one of pure target enhancement which is consistent with the findings of Drew et al. (2009).
The observant reader may have noticed another component in our anterior recordings from all three experiments (Figures (Figures2,2, ,66 and and10)10) that we have ignored until now. Following the anterior N1 there is a positive component (the P2) that is larger for background probes than either target or distractor probes which are similar to each other. We quantified this as the average amplitude of electrodes 6 (Fcz), 7, 107, and 129 (Cz) in the time window 195-235 ms. The average amplitude across the three experiments was 0.62 μV for targets. 0.63 μV for distractors, and 1.11 μV for background probes. A repeated-measures ANOVA revealed a significant main effect of probe location (F(2, 136) = 27.65, p < .001, MSE = .36) and no other effects or interactions were reliable (Fs < 1.8, ps > .14). Follow up t-tests revealed significant differences between the background probes and both kinds of object probe: target probes (t(70) = 6.88, p < .001) and distractor probes (t(70) = 6.54, p < .001). There was no significant difference between targets and distractors (t(70) = .07, p = .94). In addition, the effect of probe location was similar across the three experiments as the Experiment by Probe Location interaction was not significant.
These results show that background probes are indeed different than probes that occur on objects and that this difference appears in a time interval that is later than the N1. We examine the implications of these P2 differences in the General Discussion.
In this experiment, we made the tracking task more difficult by adding a target and a distractor to the display. The increase in tracking difficulty was accompanied by a return of attention effects on both the anterior and posterior N1 components. The anterior N1 showed a pattern of pure target enhancement in which N1 amplitude to target probes was greater than both distractors and background probes which did not differ. This can be contrasted to the anterior N1 effects in Experiment 1 which showed a combination of target enhancement and distractor suppression and Experiment 2 which showed no probe location effects at all. For the posterior N1 component, target and background probes were equivalent and larger than distractor N1s although the background probe was not significantly different from the distractor. Thus we have weak evidence that there may be distractor suppression at the level of the posterior N1, as we found in Experiment 1. In any case, the important point is that both the anterior and posterior N1 components once again showed attention effects. As was the case in Experiment 2, the P300 component showed a pattern that mirrored the behavioral detection results with P300s for both background and target probes being larger than distractors.
This pattern of results suggests that when observers are required to detect probes occurring throughout the display and displays are relatively sparse as they were in Experiment 2, they adopt an attention allocation policy in which attention is spread broadly over the display, perhaps because early suppression might result in unacceptably low accuracy in detecting probes. When the difficulty of the primary tracking task is increased, observers respond by increasing the allocation of early visual attention processes to the tracked targets, resulting in larger N1s to target probes relative to distractors. In a sense, the pattern observed in Experiment 2 represents a compromise between maintaining good performance on a difficult object tracking task while continuing to detect target probes throughout the display at a reasonable level. Clearly, across these three experiments, we are looking at different points on the trade-off function defining joint performance on the two tasks (Alvarez, Horowitz, Arsenio, DiMase, & Wolfe, 2005) and an important task for future research will be to assess the full attention-operating curve using both early and late ERP components.
Our goal was to use the amplitude of the N1 component of the ERP to probes occurring in different parts of a display to assess the role of visual attention in multiple object tracking (MOT). In order to simplify our summary of results, we will ignore for the moment the issue of whether N1 amplitude effects reflect distractor suppression or target enhancement and consider a difference between target and distractors as evidence of an attention effect. The issue of distractor suppression is considered in detail below.
In Experiment 1, observers tracked two objects among two distractors, and we found that irrelevant probes on tracked objects produced larger anterior and posterior N1s than probes on distractors which we take as evidence that observers allocated more visual attention to tracked objects than to distractors. This finding is similar to that of Drew et al. (2009) who used luminance defined objects and denser displays (two targets and eight distractors).
In Experiment 2, observers performed the same tracking task while trying to detect occasional targets in the stream of probes. Although tracking performance was similar to that observed in Experiment 1, making the probes relevant had a dramatic effect on the N1 component with equivalent amplitudes observed for target, distractor, and background probes. We interpret this as evidence that observers distributed visual attention over the entire display in an effort to detect probes on distractors and background areas as well as on targets. The fact that tracking accuracy remained high suggests that the differential allocation of visual attention to targets vs. distractors observed in Experiment 1 wasn't necessary for tracking.
The results of Experiment 2 suggest that visual attention is not a necessary component of multiple object tracking. However it might play an important role with higher tracking loads as suggested by theories such as Lavie's load theory (Lavie, 2005; Lavie, Hirst, de Fockert, & Viding, 2004) which holds that selectivity tends to increase with increasing perceptual load. To test this, in Experiment 3 we required observers to track three targets in a display of six objects. Note that this also increased display density so that, on average, targets passed closer to distractors and this should increase the perceptual load of the tracking task. As in Experiment 2, the probes were relevant. In this case, robust attention effects were once again observed with larger N1 amplitude associated with target probes compared to distractors. This apparent shift of visual attention to the tracking task was accompanied by a drop in accuracy on the probe task compared to Experiment 2 which we view as evidence for a competition for visual attention between the two tasks, a competition that wasn't apparent for the sparser displays used in Experiment 2.
These results have important implications for understanding the role of visual attention in MOT. Experiment 2 showed that observers can sometimes track multiple targets without differentially allocating visual attention to targets vs. distractors. Of course, this conclusion needs to be qualified by the usual concerns regarding acceptance of the null hypothesis. But certainly any such attention effects would appear to be vanishingly small as Experiment 2 had sufficient power to detect attention effects 1/12th the size of those observed in Experiment 1. This suggests that differential visual attention to targets compared to distractors is not necessary for MOT and we need to look elsewhere for the primary mechanism underlying object tracking.
One possibility is that object tracking uses a mechanism like that proposed by Pylyshyn (2003) in his visual indexing theory. Pointers are assigned to tracked objects at the beginning of the trial and continue to be bound to their indexed objects throughout tracking. According to Pylyshyn, object indexes do not contain information about properties of the object and so would be unaffected by whether the object briefly increased in luminance. The index itself does not even contain explicit information about the location of the tracked object. It just provides a pointer that can be used to access properties of the indexed object when needed. In fact, according to index theory, there shouldn't necessarily be any advantage in detecting probe flashes on tracked objects vs. distractors because the index is attached to the tracked object and shouldn't provide privileged access to nearby probes (which constitute separate objects).
Current evidence suggests that an indexing mechanism with properties like these is located in the inferior intra-parietal sulcus. Recent fMRI experiments show that this area is activated during MOT with activation levels increasing with target set size (Culham, et al., 1998). Xu and Chun (2009) review evidence that this area is capable of individuating a maximum of four objects during working memory tasks which is similar to the maximum number of tracked objects in MOT suggested by Pylyshyn. Interestingly, activation in this area appears to depend only on the number of objects and not their complexity (Xu & Chun, 2006) which is consistent with Pylyshyn's suggestion that indexes are merely pointers to objects and do not themselves carry information about the features or properties of indexed objects.
If tracking is primarily carried out by a visual indexing mechanism, visual attention may play an important ancillary role in reducing interference from distractors. An important source of errors during MOT is the tendency to mistake nearby distractors for targets resulting in tracking of the distractor (Pylyshyn, 2004; O'Hearn et al., 2005). This may be viewed as a failure to separately individuate the target and distractor and may reflect the coarse spatial resolution of visual representations in the parietal area (Intriligator & Cavanagh, 2001). Visual attention is known to increase spatial resolution (Yeshurun & Carrasco, 1999) and could reduce errors in tracking by improving target individuation. This could take the form of target enhancement, distractor inhibition, or both. This error reduction process would clearly become more important as display density increases and this may explain why tracking was associated with visual attention to targets in Experiment 3 but not Experiment 2.
Are the attention effects that we observed due to enhancement of the target, suppression of the distractor, or some combination of the two? We consider this question for two separate sets of measures: early (N1 amplitude) and late (behavior and P300). In Experiment 1, we found evidence for both target enhancement and distractor suppression in the anterior N1 component while the posterior N1 showed a “pure” distractor suppression pattern. Experiment 3 showed “pure” target enhancement in the anterior N1 and a distractor suppression pattern for the posterior N1 (although statistical support for this latter effect was weak). These results suggest that both target enhancement and distractor suppression may be present at early stages in visual processing of objects during MOT and these effects may be mediated by visual attention.
These early effects of attention, however, appear to be dissociable from later attention effects reflected in probe detection accuracy and P300 amplitude. In Experiment 2, we found that N1 amplitude was the same for all probe locations while detection accuracy and P300 amplitude showed strong effects of probe location (background > target > distractor). Pylyshyn and colleagues (Pylyshyn, 2006; Pylyshyn et al., 2008) have used detection performance for probes presented during MOT to argue that distractors are suppressed, particularly when they are likely to be confused with targets. This conclusion rests on the finding that distractors probes are detected less often than probes on targets and blank background areas (Pylyshyn, 2006). However, many (but not all) experiments have also shown that accuracy for background probes was superior to target probes, a pattern that we observed as well in Experiments 2 and 3. This raises the question of whether background probes may simply produce stronger signals than object probes purely due to their physical properties, such as an absence of masking by nearby contours. If this were the case, it would invalidate using background probes as a “neutral condition” for inferring enhancement and suppression effects.
Using texture-defined objects like those employed here, Flombaum, Scholl, and Pylyshyn (2008) continued to find an advantage for background probes which argues against a simple (luminance) contour-interference explanation. In a subsequent paper, Flombaum and Scholl (under revision) carried out a series of experiments investigating the background probe advantage, including an experiment in which the target and distractor objects occasionally stopped during the trial. Detection accuracy for probes on stopped objects was much higher relative to probes on these same objects when they were moving. This finding implicates movement as the source of poorer performance on object probes compared to background probes, which are generally stationary. This was confirmed in an experiment using texture-defined objects presented on either stationary or moving texture backgrounds. They found that presenting probes on a moving background eliminated their detection advantage and reduced performance to levels that were slightly worse than nearby targets and distractors. The reason that moving objects impair probe detection may be that the same transient channels that are responsible for detecting the onset of the probe are also involved in processing motion of the moving object (Breitmeyer, 1984).
Notice that in our experiment, moving backgrounds did not eliminate the background probe advantage which might appear to be a failure to replicate the Flombaum and Scholl (under revision) results. However, their probes were small squares that might have been effectively masked by the moving pixels of the background. Our probes were much larger and might not be masked by the moving background which was composed of small micropatterns of random back and white pixels. In any case, these results together suggest that the background probe advantage in detection may be due, at least in part, to greater masking of probes by moving objects compared to probes on stationary backgrounds. The question remains whether there is any evidence of distractor suppression after eliminating the differential effects of masking.
Pylyshyn, Haladjian, King, & Reilly (2008) recently reported an experiment in which half of the distractors appeared on a different depth plane than the target. They found that target and different-depth distractors were detected equally well and better than same-depth distractors, suggesting that distractors that can be preattentively grouped away from targets are not inhibited. This method provides evidence for distractor suppression without the confound of background probes appearing on stationary backgrounds.
In summary, it appears that the pattern of probe detection accuracy that we and others have observed (background > target > distractor) may be due, at least in part, to a stronger signal associated with background probes which are not masked by a moving object as are the object probes. This possibility is reinforced by our finding that the amplitude of the P2 component of the ERP was enhanced for background probes relative to both targets and distractors which were similar to each other. Previous research (Luck & Hillyard, 1994b) showed that popout targets presented during a visual search task were associated with larger P2 components suggesting that our background probes may have had similar popout properties. The ability of background probes to automatically attract attention might help explain why they are detected more often than probes appearing on objects. This suggests that background probes may not constitute a valid neutral condition for inferring enhancement vs. suppression effects and that other procedures, such as that used by Pylyshyn et al. (2008), may be required to make this determination.
The fact that the background probe advantage is observed in a component (the P2) that occurs later than the N1 is consistent with our observation that the detection advantage for background probes and the large P2 that accompanies this advantage can be observed in circumstances in which earlier components, such as the N1, show no effect whatsoever of probe location (Experiment 2). This suggests that the reduced masking associated with background probes affects processes that occur after the N1 and the pattern of distractor suppression that we observed in the N1 component in Experiments 1 and 2 may not be affected by the masking confound. This is also consistent with the fact that the background N1 is never larger than the target N1 as it is in the case of the late-occurring P300.
Supporting evidence can also be found in the results of Drew et al. (2009) who examined three different nontarget probes: moving distractor, stationary distractor, and empty space. If the sort of differential masking effects outlined above affected N1 amplitude, we would expect small N1s for the moving object distractor relative to the both the stationary distractor and empty space conditions. However, contrary to the masking predictions, N1 amplitude was slightly (but significantly) larger for the moving distractor probes compared to both of the other two probe types. Thus, there appears to be little evidence of a background probe advantage appearing in the N1 component. Nonetheless it will be important to replicate our N1 suppression pattern using conditions like those employed by Pylyshyn et al. (2008) in which all probes occur on moving objects. Based on this logic, Doran and Hoffman (in preparation) carried out an experiment that examined whether distractor suppression depended on the distance of distractors from the tracked objects. They found a distinct distractor suppression pattern in which probes appearing on distractors near to the target object elicited smaller posterior N1s than probes appearing on far distractors. Target and far distractor probes were equivalent.
Regardless of whether the late measures reflect target enhancement or distractor suppression, an intriguing question remains regarding the dissociation of attention effects between early and late measures observed in Experiment 2. Detection accuracy and P300 amplitude were larger for targets compared to distractors even though N1 amplitudes were the same. We used this dissociation above to argue that an indexing mechanism rather than visual attention may be the primary mechanisms involved in tracking. How could an indexing mechanism produce differences in probe detection without affecting N1 amplitude? One possibility is that indexes result in a sustained activation of target object representations in object-related areas of visual cortex such as the lateral occipital complex (LOC). These target object activations could, in turn, suppress distractor object representations through the kind of winner-take-all network proposed by Desimone and Duncan (1995). LOC is an important area for processing objects (Grill-Spector, Kourtzi, & Kanwisher, 2001) and also appears to be the generator site of the posterior N1 (Martinez et al., 2007).
Sustained differences between targets and distractors would essentially show up as differences in baseline amplitude in ERPs and would be eliminated by the usual process of baseline correction. In other words, differences in baseline activation levels between targets and distractors would not appear in the amplitude of the N1 component elicited by the probes. These differences in activation could, however, affect how quickly and accurately information about probe flashes could be accessed. It is interesting to note that Drew and Vogel (2008) described an ERP component (the contralateral delay activity or CDA) that is sustained throughout the tracking trial and whose amplitude increases with target set size. This component appears to have a scalp distribution that is similar to the N1 and may reflect the sustained activation of target representations in LOC proposed above.
What implications does the preceding discussion have for the N1 suppression effects we observed in Experiment 1? Clearly it suggests caution in interpreting the pattern of N1 amplitudes in Experiment 1 as evidence of distractor suppression. Additional research using better neutral conditions, such as those described in Pylyshyn et al. (2008), need to be combined with ERP measures to examine whether N1 amplitude reflects target enhancement or distractor suppression. The important point for the present is that there is clear evidence that visual attention plays a role in MOT, particularly for dense displays.
Our results, based on ERPs elicited by probe flashes on targets, distractors, and background areas, show that observers preferentially allocate visual attention to targets during MOT, a conclusion in agreement with Drew et al. (2009). However, we also showed that when tracking load is light, and probes are task-relevant, observers are able to distribute their attention broadly over the display without much loss in tracking accuracy, suggesting that visual attention may not be the primary mechanism underlying MOT. Object tracking may rely on an “indexing” mechanism that is distinct from visual attention. Increasing the difficulty of the tracking task resulted in a reinstatement of visual attention effects and a loss in probe detection accuracy. These results suggest that visual attention may be particularly important during tracking of objects in dense displays where distractors may be confused with targets.
Thanks to Todd Horowitz, Ed Vogel, Jeremy Wolfe, and an anonymous reviewer for helpful comments on an earlier draft. Thanks to Meg Lamborn, David McCoy, Hyun Young Park, Amanda Skoranski, and Laura Yarnall for their invaluable assistance collecting data. This work was supported in part by NIH grant RO1 NS 050876, sub award 8508-53692 to JEH from Johns Hopkins University.
1It can be difficult to compare accuracy across MOT experiments that use different set sizes since changes in the number of targets and/or in the number of distractors affect guessing probabilities. However, it is possible to correct for differences in guessing probabilities using formulas such as the one provided by Pylyshyn & Annan (2006, see also Hulleman, 2005). In essence, these formulas take the number of objects correctly tracked on each trial and remove the probability of guessing correctly to create a measure of the number of objects effectively tracked.
2Note that the use of multiple follow-up t-tests is appropriate when an ANOVA produces a significant main effect of a factor with three levels. In this case, there is no need to use any correction procedure (e.g., Bonferroni correction) since the family-wise Type I error rate does not become inflated (Cardinal & Aitken, 2006).
3In addition, this sort of attention switching seems unlikely given that the same sort of attentional set would be useful in the following Experiment 3, but once again attentional selection was observed in the N1.