Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Atten Percept Psychophys. Author manuscript; available in PMC 2010 October 20.
Published in final edited form as:
Atten Percept Psychophys. 2010 October; 72(7): 1765–1775.
doi:  10.3758/APP.72.7.1765
PMCID: PMC2957661

Direction information in multiple object tracking is limited by a graded resource


Is multiple object tracking (MOT) limited by a fixed set of structures (“slots”), a limited but divisible resource, or both? Here we answer this question by measuring the precision of the direction representation for tracked targets. The signature of a limited resource is a decrease in precision as a square root of the tracking load. The signature of fixed slots is a fixed precision. Hybrid models predict a rapid decrease to asymptotic precision. In two experiments, observers tracked moving disks and reported target motion direction by adjusting a probe arrow. We derived the precision of representation of correctly tracked targets using a mixture distribution analysis. Precision declined with target load according to the square-root law up to six targets. This finding is inconsistent with both pure and hybrid slot models. Instead, directional information in MOT appears to be limited by a continuously divisible resource.


Since we live in a constantly changing environment, we need to update our representations of the world over time. One somewhat artificial example would be air traffic control, in which the controller needs to update her knowledge of aircraft positions and flight plans in order to ensure that takeoffs and landings go smoothly. But some version of this basic cognitive task confronts us in our everyday life, whether we are driving on a busy street, walking through a crowd, or chaperoning a child’s birthday party.

The primary laboratory tool for studying updating, particularly spatial updating, is the multiple object tracking (MOT) task (Pylyshyn & Storm, 1988). In MOT, observers are presented with an array of identical objects. A subset of these objects is designated as targets. The observer must then track these targets while all of the objects move independently for several seconds (or minutes, as in Wolfe, Place, & Horowitz, 2007), and at the end report which items were targets (for recent reviews, see Cavanagh & Alvarez, 2005; Scholl, 2009).

The typical finding in such experiments is that observers can track 3–5 objects. What is the nature of this capacity limit? In the visual short-term memory (VSTM) literature, a similar limit of 3–5 items has been observed. Debate in the VSTM field has contrasted two accounts, which we will term “slot” theory and “flexible resource” theory. According to a slot-based account (Awh, Barton, & Vogel, 2007; Luck & Vogel, 1997), a fixed number of objects can be stored in VSTM, regardless of their complexity. In contrast, graded resource accounts argue that the number of objects that can be stored is inversely related to the complexity of the objects because more complex objects take up more of the fixed resource (Alvarez & Cavanagh, 2004; Eng, Chen, & Jiang, 2005). This debate has proven quite productive for the study of VSTM.

The MOT literature was originally dominated by the slot approach. The best known model is Pylyshyn’s (1989) FINST or visual index model, in which a fixed set of pointers can be assigned to objects for tracking, enumeration, or other visual routines; the phenomenon of MOT was predicted on this account. Here the limit on tracking is the number of pointers.

Alvarez and Franconeri (2007) recently proposed a flexible resource model. In their model, targets are tracked by flexible indexes (FLEXes). The total number of FLEXes is limited by the availability of a finite resource. This resource determines the spatial resolution of each FLEX, such that when fewer items are tracked, resolution is higher. This account predicts that when spatial resolution is at less of a premium, observers could track more objects by allocating the resource more thinly. Kazanovich and Borisyuk (2006) have proposed a somewhat similar approach in which tracking is accomplished by a limited set of central oscillators, and the number of oscillators is limited by a finite resource. Here the finite resource is the phase space of the oscillations, such that when there are more oscillators, it is more difficult to keep them from synchronizing with each other. Thus, tracking is better when fewer independent oscillators are necessary, because they can be further apart in phase space.

Two major approaches to modeling MOT have not come down on either side. Oksama and Hyönä (2008) proposed that target relevant location information is stored in VSTM, meaning that the limits on spatial tracking would derive from the limitations on VSTM itself, which, as we noted above, are currently under debate. Thus, Oksama and Hyönä’s model of MOT would be considered a slot model if VSTM turned out to be slot-like, whereas a victory for the flexible resource view of VSTM would turn the Oksama and Hyönä model into a flexible resource account of tracking. Yantis (1992) proposed that observers actually tracked only a single object, a virtual polygon whose vertices were formed by the targets. Capacity limits do not naturally fall out of Yantis’ model, though it might be inferred that the limit is again linked to the resolution of VSTM.

The first evidence to directly address the slot vs. resource question comes from Alvarez and Franconeri (2007). They measured the maximum speed at which a given number of targets could be tracked, and found that the velocity limit decreased smoothly as a log function of the number of targets, supporting their flexible resource model. A converging experiment showed that when objects were moving faster, close encounters between objects were more disruptive. Further support for the flexible resource model came from a study by Iordanescu, Grabowecky, and Suzuki (2009), who asked observers to click on the position of a missing target. They found that the precision of responses increased for targets “under threat” from nearby distractors, and decreased for targets in isolated areas of the display.

While these data point towards a flexible resource model, we have two concerns. First, generic tracking accuracy is the wrong measure for testing a resource theory. Second, a reduction in tracking performance with speed rules out only the simplest of fixed-slot architectures.

While tracking accuracy seems like the common sense benchmark for testing theories of tracking, in fact it is problematic. Conceptually, there are two ways in which observers could make errors in an MOT task. First, they could simply lose track of a target. In this case, the observer’s response would be based on a guess. Second, they could swap a target for a distractor, such that they successfully tracked an object that simply happened to be the wrong object. In the first scenario the observers would track an object for only part of the trial, whereas in the second scenario the observers would track an object, though not always the correct object, for the whole trial. We argue that only the first scenario reflects a failure of capacity1.

Manipulations such as increasing object speed or decreasing the separation between objects are likely to lead to more of the second type of error. This would lead to reduced tracking performance when measured in the conventional way, but would not necessarily reflect reduced capacity. For example, consider an observer who tracks four targets with perfect accuracy at a low speed. At a higher speed, she might still be capable of tracking four objects. However, because of the increased frequency of close encounters, each target is swapped with a distractor, so she ends up with a score of zero correctly tracked targets. In fact, Franconeri, Lin, Pylyshyn, Fisher, and Enns (2008) argued that these close encounters were the only real limit on tracking performance. Thus, reduced tracking accuracy cannot necessarily be taken as evidence for or against a fixed slot architecture.

Instead, it may make more sense to think about how well targets are being tracked. For example, consider a task in which there are four items: two targets and two distractors. Each item stays in its own quadrant of the visual field throughout a trial. No matter how long the trial lasts or how fast the items move, an observer would always be able to indicate which items were targets. However, if they were asked to click on the position of the lower left target, for example (as in Iordanescu, et al., 2009), we would expect to find effects of tracking load, speed, or duration on how precise their responses were. Indeed, Howard and Holcombe (2008) reported just such effects for tracking load on position in multiple object tracking.

Zhang and Luck (2008) applied this insight to the study of visual short term memory. In the standard Luck and Vogel (1997) task, for example, observers are given a set of colored squares to remember, and have to detect any change in the colors after a retention interval. The measure of memory in this task is correct detections. Zhang and Luck instead asked observers to indicate the color that they remembered at a given location, using a color wheel (cf Wilken & Ma, 2004). Their dependent variable was the error between the true color and the reported color, expressed as an angle along the color wheel. They then assumed that these errors came from two distributions: a uniform distribution on trials when the probe was not remembered, and a von Mises (circular Gaussian) on trials when the probe was remembered. Fitting a simple mixture model allowed them to compute the mixture probability Pt, which indicates how often the probe was remembered, as well as the standard deviation of the von Mises distribution, which indicates the resolution with which the color was represented on trials when it was remembered. Note that these two parameters are independent: one might remember very few items, but remember them with high precision, or conversely remember many items with low precision.

A straightforward slot model predicts that resolution should be constant as a function of the number of items to be remembered, because each remembered item has its own slot, and forgotten items are factored out by the mixture model. A pure resource model predicts that resolution should decrease indefinitely as more items are loaded into memory, because the resource is spread more thinly.

If we conceptualize resolution in terms of the standard deviation of the representation, then this function should follow a square root law, since the standard deviation of the average of n samples will be proportional to the standard deviation of the individual samples divided by n (Bonnel & Miller, 1994; Palmer, 1990). Increasing the number of targets decreases the number of samples, so if the standard deviation of the resolution for one targets is s, then the resolution for two targets should be 2s, for three targets 3s, for four targets should be 2s, and so forth.

In addition to their methodological innovation, Zhang and Luck (2008) also proposed two theoretical alternatives between a simple fixed-slot architecture and a pure resource account. In the slots + resources model there are a fixed number of slots in memory, but a variable resource can be distributed unequally between these slots. Thus, if the observer is only remembering one item, all of the resource is put into that slot, and the memory is very high resolution. If two or three items are in memory, the resource is divided among them, leading to lower resolution, following the square-root law. However, only a fixed number of items can be remembered. If the number of items to be remembered exceeds this limit, the excess items are simply not remembered. Thus, resolution declines as more targets are added, but only up to the point where the number of items equals the observer’s capacity. Additional items have no effect on resolution, because they are not remembered and thus contribute uniformly distributed errors to the distribution.

The slots + averaging model makes similar predictions from a different logic. In this model, when the number of to-be-remembered items is at or above capacity, that means that each item has a single slot assigned to it. When fewer items are to be remembered, items can be assigned to more than one slot. This increases resolution because each slot provides an independent sample of the color. Thus, the model predicts that resolution will be constant when the memory load is above capacity, and improve via the square-root law as load drops below capacity.

These two intermediate models thus make the same prediction: the resolution of the representation should decline as the number of items increases, up to the observer’s capacity, at which point additional items should not matter. Note that a decline in resolution means an increase in the standard deviation parameter of the fitted von Mises distribution. This is exactly the pattern that Zhang and Luck (2008) observed in short term color memory. A subsequent experiment allowed them to demonstrate that the slots + averaging model was a better way to explain the data than the slots + resources model.

While this technique has not yet been applied to MOT data, a recent paper by Ma and Huang (2009) analyzed data from several experiments using the trajectory deviation paradigm of Tripathy and colleagues (Tripathy & Barrett, 2004; Tripathy, Narasimhan, & Barrett, 2007). In this paradigm, a number of objects move across the display at a constant speed. When they reach the midpoint, some of the objects may change direction subtly; the observer’s task is to detect these changes. The number of trajectories that observers can simultaneously monitor is roughly 3–4, similar to the MOT limit (Tripathy, et al., 2007). Ma and Huang tested Zhang and Luck’s hybrid models against a Bayesian observer limited only by neural noise rather than a structurally fixed capacity, and found that the limited capacity models could not explain the data as well as the Bayesian observer. However, the trajectory deviation paradigm differs from standard multiple object tracking in a number of ways. Observers in these experiments were trying to detect an event (a change in the direction of one or more moving objects) occurring at a predictable spatiotemporal location (when the objects reach the midpoint of the display). Observers were not required to select targets and ignore distractors, if they confused one item for another there was no penalty. Thus, while both tasks involve moving objects, we may not be able to generalize from Ma and Huang’s findings. In fact, Vul and colleagues (Vul, Frank, Tenenbaum, & Alvarez, 2009) have argued that an ideal observer model of MOT cannot account for the reductions in performance with increasing tracking load without introducing a capacity limit.

More to the point, Howard and Holcombe (2008) demonstrated that precision for position, spatial frequency, and orientation of tracked objects decreases as a function of the number of targets, even at tracking loads below the conventional four-item limit. These data argue against the standard slot model, and Howard and Holcombe argued as much. However, the precision data they report appear to increase less with tracking load than a pure resource model would predict, suggesting a hybrid model with perhaps three slots. This example points up the importance of analyzing data only from trials where the observer correctly tracked the object. While Howard and Holcombe made some effort to separate those responses likely to be guesses, they did not independently report the precision of the representation for correctly tracked objects.

In this paper, we applied Zhang and Luck’s (2008) analysis to the multiple object tracking paradigm. Instead of measuring the number of targets correctly tracked, we measured the resolution of the representation of tracked targets. In particular, we measured direction information. Since direction is circular, it is somewhat more amenable to the Zhang and Luck analysis technique than position. Theoretically, an ideal observer would use both position and its derivative (i.e. trajectory) to track items, assuming that trajectories were not too variable, and human observers seem to follow this principle.

There are also empirical reasons to believe that observers encode trajectory information in attentional tracking. Verghese and colleagues have shown that extended trajectories engage focused attention (Verghese & McKee, 2002), allowing better detection than a series of disconnected brief trajectories (Verghese, McKee, & Grzywacz, 2000; Verghese, Watamaniuk, McKee, & Grzywacz, 1999). Holding trajectories in memory disrupts MOT performance (Shen, Makovski, & Jiang, 2006), and repeating trajectories improves performance (Makovski, Vazquez, & Jiang, 2008; Ogawa & Yagi, 2003). Trajectory information is employed in recovering targets that disappear and reappear during MOT.

Finally, the use of trajectory information allows us to make at least a qualitative comparison with the results of trajectory deviation tasks (Ma & Huang, 2009; Narasimhan, Tripathy, & Barrett, 2009; Tripathy, et al., 2007).

In order to measure direction information, we employed a method of adjustment technique (Figure 1), which allows us to measure the distribution of errors around the true trajectory. Following Zhang and Luck (2008), we then derived an estimate of the variance of this distribution as our measure of resolution for correctly tracked targets. An important feature of the Zhang and Luck approach is that it separates guessing responses from noisy responses, theoretically allowing us to measure the resolution of tracked targets independent of guesses on untracked targets.

Figure 1
Method. Observers were presented with an array of identical dark gray disks on a light gray background. A variable number of targets were designated at the start of the trial (a). All disks then moved independently for several seconds (b). In Experiment ...

In Experiment 1, observers tracked 1, 2, or 4 targets among 8 total disks, and we collected both direction information and standard target-distractor classification responses. The results showed that resolution decreased as more targets were tracked, clearly rejecting the pure slots model. Furthermore, precision was much better for targets than distractors, and distractor precision decreased as load increased, supporting a flexible resource account. However, tracking four targets did not sufficiently exceed capacity for the hybrid models to predict a plateau in resolution. In Experiment 2, we increased the maximum number of targets and the total number of items. Observers tracked 1, 2, 3, or 6 targets among 12 total disks. We used only direction probes on targets in Experiment 2. We found that the pure resource model captured the data much better than either of the hybrid models.

General Method


For each experiment, twelve observers were recruited from the volunteer observer panel of the Visual Attention Laboratory of Brigham and Women’s Hospital and compensated for their time at $10/hour. Observers gave informed consent, as approved by the Partner’s Healthcare Corporation Institutional Review Board. All observers passed the Ishihara color-blindness screen, and scored 20/25 or better (with correction) on the Mentor B-VAT acuity test.

Apparatus & Stimuli

Experiments were conducted on Macintosh computers running Mac OS 10.4. Experiments were programmed in MATLAB 7.4 using the Psychophysics Toolbox version 3.08 (Brainard, 1997) {Pelli, 1997 #2479}. Stimuli were presented on 21” diagonal CRT monitors, either SuperScan Mc801 RasterOps or Mitsubishi Diamond Pro 91TXM. Monitors were set to a resolution of 1024 × 768 and a refresh rate of 75 Hz. Observers were placed at a viewing distance of approximately 57 cm from the monitor, such that 1 cm on the monitor subtended 1 degree of visual angle (dva).

Tracking stimuli were dark gray disks subtending 2.1 dva in diameter, including a 0.2 dva black border, presented on a medium gray background.


Each trial began with the presentation of eight (Experiment 1) or twelve (Experiment 2) disks at random positions in a 5 × 6 grid, with cells spaced 4.2 dva apart. Targets were designated by flashing white four times at 1 Hz. There were one, two, or four targets in Experiment 1, and one, two, three or six targets in Experiment 2. All disks then began moving at 8 dva/s in random directions. Disks moved smoothly, changing direction only when they approached within 2.1 dva of the edge of the display. Disks that reached this buffer zone changed direction as if bouncing off an invisible boundary. Disks were allowed to occlude one another; the black border was included to help ensure correct disambiguation in this case (Viswanathan & Mingolla, 2002).

After a variable tracking duration (minimum 5.84 s, mean 7.41 s, sd 6.79 s), the disks stopped moving, and one disk was probed. There were two probe tasks: classification probes and direction probes. In Experiment 1, targets and distractors were equally likely to be probed. Both tasks were performed on every trial. The order of probes was counterbalanced across observers. In Experiment 2, we only employed direction probes, and only targets were probed.

The classification probe consisted of the probe disk turning blue. The observer’s task was to indicate whether the probe disk was a target or not by pressing the up arrow key for “yes” or the down arrow key for “no”.

The direction probe was a blue arrow, extending 2.1 dva from the centroid of the probe disk. The observer’s task was to rotate the arrow, using the mouse, until it matched the probe disk’s trajectory. The initial angle was selected at random on each trial. In Experiments 2, after this initial response, the probe arrow was replaced at a different random angle and the observer was asked to adjust the angle a second time. This allowed a more precise estimate of the internal representation2.

Observers were given feedback as to the accuracy of their answers for both types of probe. In the practice trials, the correct angle was illustrated with a red arrow after each trial.

Each block consisted of 10 practice trials, followed by 160 experimental trials; the number of blocks varied across experiments.

Data Analysis

Angular error data were analyzed according to the method devised by Zhang and Luck (2008). We assume that there are two types of trials: trials on which the observer knew the probed item’s trajectory, and trials on which the observer did not know the probed item’s trajectory. On known trajectory trials, responses are assumed to be drawn from a von Mises (circular normal) distribution centered on the actual trajectory, while on unknown trajectory trials, responses are assumed to be drawn from a uniform distribution. The observed distribution of trajectory errors is thus a mixture of these two distributions.

The von Mises distribution has two parameters: the mean μ and concentration parameter x. For an angle α, the probability density function of is given by:

fvm(α[mid ]μ,κ)=eκcos(αμ)2πI0(κ)

I0 denotes a Bessel function of order 0.

The uniform distribution is bounded at -π and π. The mixture distribution of errors ε is thus given by:


We assumed that the correct responses would be centered on the true direction (i.e., μ = 0)3. There were thus two free parameters: Pt, which controls the proportion of known trajectory trials, and x, the concentration parameter of the von Mises distribution. These parameters were fit individually for each observer and each condition using maximum likelihood estimation. Data fitting was implemented in MATLAB using the Statistics Toolbox.

The concentration parameter x can be converted to a standard deviation σ according to Equation 3:


As an example of the mixture model procedure, Figure 2 illustrates data from one observer, in the 2 target condition of Experiment 2. The bars plot a histogram of the angular errors. The broken line plots the estimated distribution of errors from trials where the target was tracked (i.e. the von Mises distribution, fVM, with σ = 9.7), while the dotted line plots the estimated distribution of errors on untracked trials (the uniform distribution, fU). The solid line represents the mixture distribution, .80 fVM + .20 fU.

Figure 2
Illustration of the mixture model procedure. Gray bars represent a histogram of the angular errors for one observer tracking two targets in Experiment 1, in 20° bins. The broken line (tracked) represents the estimated (von Mises) distribution ...

The obtained Pt and σ values were then treated as descriptive statistics and used to evaluate the four models, described below. The models attempt to predict σ, the standard deviation of the directional distribution.

Pure slots model

The pure slot model predicts a fixed resolution regardless of the number of targets, so any effect of tracking load on σ will disconfirm this model.

The remaining three models use the fitted σ value from one load condition to predict the σ in the remaining load conditions. Thus, these models do not have any free parameters. Capacity was estimated for each observer in Experiment 1 by multiplying the fitted Pt value from the four target condition by four. In Experiment 2, we multiplied the fitted Pt value from the three target condition by three4.

Pure resource model

The pure resource model assumes that resolution is maximal when tracking one target, and that adding targets reduces resolution. We took σ for the one target condition and multiplied it by the square root of the number of targets to obtain predictions for the remaining conditions.

Slots + resources model

This model assumes that σ increases as in the pure resource model, up to the observer’s capacity, at which point it remains constant.

Slots + averaging model

This model also assumes a fixed number of slots, but when there are fewer targets than slots, more than one slot can be assigned to a given target, leading to a more precise representation of target direction via more samples. Thus, σ at maximum load is taken to indicate the resolution of a single slot. When load drops below capacity, σ is divided by the square root of the number of targets.

Note that the two hybrid models are essentially equivalent, except that the slots + resources model uses σ for the one target case to predict σ at greater loads, while the slots + averaging model works down from σ at the maximum load.

Tracking performance

The proportion of targets tracked can be estimated independently from the classification data and the direction error distribution analysis. Using the classification data, we derived a corrected accuracy value a as H-F, where H is the hit rate, and F the false alarm rate. This is intended to correct for guessing (see Cowan, 2001). The distribution fits provide Pt as a direct measure of proportion tracked targets.


Results from Experiment 1 are shown in Figure 3. The left hand panel of Figure 3 shows the estimated σ values as a function of number of targets. σ for targets (plotted as circles) increases significantly as observers are asked to track more targets (F(2, 22) = 43.2, p <.0001 ). Since σ is the inverse of resolution, this demonstrates that resolution declines with tracking load, thus disconfirming the simple slot model. Unsurprisingly, observers had poor information about direction for distractors (diamonds). The increase in σ for distractors was not significant (F(2, 22) = 2.2, p =.14), but may reflect a redistribution of flexible resources away from distractors as the task becomes harder.

Figure 3
Proportion of targets tracked and distribution parameters as a function of tracking load for Experiment 1 (top panels) and Experiment 2 (bottom panels). Left hand panels plot σ, the standard deviation of the fitted von Mises distribution, for ...

We cannot discriminate between the remaining models in these data. The pure resolution model predicts that σ should increase with tracking load more or less indefinitely. The hybrid models predict that σ should increase monotonically with tracking load until capacity is reached. The signature of reaching capacity is that the apparent number of items tracked should plateau. The left hand panel in Figure 4 shows the number of items tracked, according to both classification (k) and distribution (NPt) analyses. The two measures are positively correlated, but not perfectly (Spearman’s r =.71,.65, and.49 for the 1, 2, and 4 target conditions, respectively). Both measures show the same trend, however, which is that there is no apparent plateau in the number of tracked items. Thus, target loads were at or below observers’ capacity in this experiment, and all three remaining models would make similar predictions under these conditions.

Figure 4
Tracking capacity as a function of tracking load for Experiment 1 (left panel) and Experiment 2 (right panel). Triangles denote capacity derived from the distribution analysis (computed as NPt), and squares denote capacity derived from the classification ...

Experiment 2 was designed with two aims. First, we wanted to be able to observe a plateau in tracking capacity, which would allow us to discriminate between pure resource and hybrid models. Second, we wished to eliminate the “dual task” aspect of Experiment 1, in which observers had to make two different kinds of responses (classification and direction), and were sometimes asked about the direction of distractors that they were not supposed to track. Therefore, Experiment 2 was closely modeled on Zhang and Luck’s (2008) Experiment 2. Observers tracked one, two, three, or six targets among twelve total disks. We probed only targets, and used only direction probes. The left hand panel of Figure 4 shows that σ again increased significantly with the number of targets (F (3, 33) = 7.6, p =.0005). However, as the right hand panel shows, the increased number of distractors helped to reduce tracking performance (NPt), such that we now observe a plateau when tracking more than two targets. This allows us to test the three variable resolution models. Since the models were fit separately to each observers’ data without any parameters, we compared the models to the data using within-subject ANOVAs. Figure 5 shows these comparisons separately for each model, so that we can illustrate the within-subjects confidence intervals (for the data x models interaction) in addition to the RMS error. Note that the pure resource and slots + resource models use the one target data to predict the remaining data, so the one target data are not included in the ANOVA or RMS error analyses, but are shown on the figure (with s.e.m. error bars) for comparisons. The slots + averaging model uses the six target condition for this purpose.

Figure 5
Experiment 2 data versus models. Left panel is the pure resource model, middle column the slots + resources model, right column the slots + averaging model. Data are plotted as circles and solid lines, model predictions as squares and dotted lines. Error ...


Directional precision for tracked targets decreases markedly as the load increases, disconfirming the pure slot model. Furthermore, we see no evidence of a plateau in resolution, as would be expected from a hybrid model which included a fixed set of slots. Instead, we found that σ, the inverse of resolution, increased according to the square-root law, in accordance with the predictions of the pure resource model.

Combined with recent evidence that attentional resources are dynamically redistributed during MOT (Iordanescu, et al., 2009), these properties suggest that performance limitations in MOT are not simply the product of a limited capacity system, but a true limited resource system (Logan, 2002). Instead of a fixed set of structures (e.g., visual indexes), the limit in MOT appears to be a fixed pool of resources that can be divided thinly among several targets or concentrated on a single target (Alvarez & Franconeri, 2007; Kazanovich & Borisyuk, 2006).

The relationship between direction information and tracking

One potential objection to our work is that by measuring direction information, we are in some sense not really studying tracking per se, which is traditionally associated with position rather than direction (Keane & Pylyshyn, 2006). We have two responses to this objection. The first is that while direction information is our primary dependent variable, the task demands did require observers to track the targets, as in more traditional MOT studies. In Experiment 1, of course, we directly measured standard tracking performance using the classification responses. In Experiment 2, however, since we only probed targets, observers had a strong incentive to track targets, especially given how poor direction reports were for distractors in Experiment 1.

More conceptually, we argue that the difference between multiple object tracking and a spatial memory task is precisely the fact that the objects move, thus requiring the observer to measure not just position but its derivative(s) over time. When we ask the observer whether a particular item is a target, we are asking whether the item at this position is connected by a trajectory to the original position. Any “ideal observer” performing an MOT task would use both position and direction (and speed) to solve the correspondence problem. Indeed, when motion information is reliable, it is used (Fencsik, Urrea, Place, Wolfe, & Horowitz, 2006). Thus, in tasks where objects move smoothly and predictably, as in our experiments, direction information is an integral part of the tracking task.

In this context we should note that Ma and Huang (2009) have come to a similar conclusion about the Tripathy et al. (Tripathy & Barrett, 2004; Tripathy, et al., 2007) trajectory deviation detection task. As we noted, there are important differences between MOT and trajectory deviation detection. However, there is also sufficient overlap to suggest that this pattern of results may be a general property of tasks that are limited by trajectory information. Following Vul et al. (2009) we might speculate that as trajectory information becomes less reliable, a different pattern of results would emerge.

It is important to keep in mind that we do not know whether observers have immediate access to target directions. When Howard and Holcombe (2008) asked observers to report the orientation of tracked gabors, the reported orientations tended to lag up to 40 ms when tracking four targets (up from essentially 0 lag when tracking one target). It would be difficult to repeat their analysis with our data, because we deliberately devised the motion algorithm such that objects changed direction infrequently. This means that an observer could correctly report the direction of a target (that is, report a direction which would be considered as coming from the “tracked” distribution) from memory without knowing the target’s current direction. We do not see this as a problem for our analysis, for three reasons. First, there is a critical difference between our experiment and the Howard and Holcombe (2008) paradigm. In their experiments, targets remained within virtual “cages” in separate regions of the display, so that it was always completely unambiguous which item was being queried. In this design, an observer who is not tracking a particular target could potentially still report something about it from memory. In our paradigm, however, the observer first has to infer which target is being queried in order to associate it with a particular memory. Thus, correct directions could only be recalled for successfully tracked targets. Second, position information lagged behind orientation information in Howard and Holcombe’s experiment, suggesting that observers might not have immediate access to target positions either. Finally, while there are critical differences between memory and tracking as tasks (see below), it seems clear that the mechanisms of tracking and memory are closely bound. In our view, tracking is a cycle of using memory to predict target locations, and updating memory with the currently perceived position and direction of targets. On this view, tracking is memory + updating, so there is no real distinction between reporting the direction of a tracked target and reporting the “remembered” direction of a tracked object. Any discrepancy between remembered and actual properties of the target is simply informing us about the temporal properties of the updating process.

While our experiment wasn’t designed to extract the lag information available in the Howard and Holcombe study, we can look at the directional errors as a function of the time since the last change in direction (i.e. bounce off the wall). If there was a tendency to respond from memory, the proportion of responses that were close to the pre-bounce direction should decline over time. A crude analysis of our data indicated that this proportion was roughly constant over a 200 ms time span since the last bounce, suggesting that this was not a factor in our study. However, it would be informative to conduct an experiment which was able to answer this question more definitively.

There is no reason to think that tracking is limited to just position and direction information. As we noted in the introduction, an ideal observer would use all of the available information to help predict target motion. This would include speed and direction, and possibly even acceleration and rate of direction change. This information could be extracted, for example, from neurons in the MT area that code direction and speed (Lorteije, van Wezel, & van der Smagt, 2008). However, there may be limits on the amount of information that can be stored, such that when more targets are tracked, less information about each target is available.

For example, a number of MOT studies have shown that observers can successfully recover targets that have disappeared for several hundred ms, even when the objects continue to move while invisible (Alvarez, Horowitz, Arsenio, Dimase, & Wolfe, 2005; Horowitz, Birnkrant, Fencsik, Tran, & Wolfe, 2006; Keane & Pylyshyn, 2006). Fencsik, Klieger, and Horowitz (2007) studied performance on this task with and without motion information. In the motion condition, observers tracked the objects as they moved for 2 σ before disappearance, while in the no motion condition, the objects were stationary for 2 σ before the disappearance. Performance was better in the motion condition, presumably because observers could better predict where the objects might end up after the blank interval. However, this advantage declined as the number of targets increased, indicating that only a limited amount of motion/direction information could be stored.

What is the limited resource?

What might be the nature of the resource limiting tracking performance? We can conceive of the resource as a fixed set of samples available for representing direction (as well as other spatiotemporal properties, such as position and speed). Selective attention is the act of assigning more samples to targets than distractors (Prinzmetal, 2005). One possibility is a pool of neurons, for example in the superior parietal lobule or intraparietal sulcus (Culham, et al., 1998; Howe, Horowitz, Morocz, Wolfe, & Livingstone, 2009; Imaruoka, Saiki, & Miyauchi, 2005; Jovicich, et al., 2001; Seiffert, 2003), which can be assigned to targets. As the load increases, the number of neurons assigned to each target decreases, reducing the precision of the representation. This conception is in agreement with the idea that multiple targets are tracked in parallel (Pylyshyn & Storm, 1988; Yantis, 1992). This approach is supported by recent data from our lab (Howe, Cohen, Pinto, & Horowitz, submitted) showing that tracking is easier when all objects move simultaneously rather than sequentially, as predicted by parallel, but not serial models (Eriksen & Spencer, 1969; Huang & Pashler, 2007; Shiffrin & Gardner, 1972).

Alternatively, the limit might be temporal. D’Avossa, Shulman, Snyder, and Corbetta (2006) have proposed that MOT involves switching a single attentional focus among the targets. On this account, attention is devoted to each target for less time as the number of targets increases, again reducing the fidelity of the representation. Howard and Holcombe (2008) found evidence for a mixture of serial and parallel updating in their data, based on the correlations between multiple reports on the same trial. It may be that some attributes are updated in parallel, some in series. Oksama and Hyönä (2008) have proposed a model in which position tracking occurs in parallel but identities are bound to objects in series.

The idea that multiple foci might be sampled periodically has received support from a recent study of divided attention by Van Rullen, Carlson, and Cavanagh (2007). According to their analysis, attention samples periodically even when only a single location is attended. This supports the conception of attention as an oscillatory process, as proposed by dynamic systems theories of attention (Large & Jones, 1999; Martin, et al., 2005). On this approach, the limit on multiple object tracking might lie in the natural rhythm of the attentional oscillation, rather than a fixed number of structures. For example, as noted above, Kazanovich and Borisyuk (2006) have proposed that tracking is limited by the phase space available to the central oscillators used in tracking. If two independent oscillators are close in phase space, they will be more likely to synchronize with one another, leading to loss of tracking of at least one target.

The relationship between tracking and memory

Given the close connections between VSTM and MOT (Cavanagh & Alvarez, 2005; Fougnie & Marois, 2006), it is surprising that we obtained different results from Zhang and Luck (2008). Why should directional precision in MOT decrease indefinitely, (or at least up to six items) while precision in color memory levels off? One possibility is that color and motion vector information are different, and there are different representations for different types of information. Alternatively, the differences might not reflect differences in the kind of information, but in the processes used to accomplish the task. In a memory task, the observer is trying to keep a representation from changing (i.e. degrading) over some time interval, while in a tracking task, the observer is actively changing the representation to keep up with (and predict) the environment. These contrary task demands might call for different representational strategies; there might be different representations for different purposes.

The neuroanatomical counterpart of Zhang and Luck’s (2008) slots + resources hybrid can be found in a recent paper by Xu and Chun (2006), who showed that in a series of VSTM tasks, a slot-like limit of roughly four items could be observed in the activity of the posterior intra-parietal sulcus (PIPS) as well as the lateral occipital complex, while a continuously increasing activity was observed in the anterior intra-parietal sulcus (AIPS). In an MOT task, Howe et al. (2009) found that while “tracking” stationary objects (i.e. attending to and remembering their positions) primarily involved only PIPS, tracking moving objects recruited enhanced activity from a larger network centered on AIPS, as well as MT+, the superior parietal lobule, and the frontal eye fields. Thus, one hypothesis is that AIPS is used for representations that require frequent updating, while PIPS is used for representations that need to be preserved. Under this account, color information in the Zhang and Luck task was stored in PIPS, whereas direction information in our experiments is stored in AIPS. A second hypothesis might be that MT+ represents motion information (e.g., Berman & Colby, 2002; Chawla, et al., 1999; Luks & Simpson, 2004), including direction information (Shulman, et al., 1999), and the lack of fixed capacity limits might arise there. In support of the idea that there are different representations for different types of information, Scholl, Pylyshyn, and Franconeri (2004) reported that change detection during MOT tasks was much better for spatiotemporal properties, such as position or speed, relative to feature information, such as shape or color. However, on a representation for purposes account, we would point out that spatiotemporal properties were changing over time in these experiments, whereas featural properties were not. Howard and Holcombe’s (2008) paradigm might be more fruitful here. In their experiments, position, orientation, and spatial period were varying over time. As previously noted, they found that precision for all three dimensions decreased with load. Applying the mixture distribution analysis to data from similar experiments should allow us to resolve this conundrum.


Increasing tracking load in a multiple object tracking task leads to decreased resolution for information about target direction. This disconfirms a simple fixed-slot architecture for tracking. Furthermore, resolution decreases in a continuous fashion according to the square-root law up to at least six targets. This suggests that some limited resource, which effectively corresponds to the number of samples, is being divided up flexibly among targets. There was no evidence for a role for a fixed number of slots in tracking.


This research was funded by NIH grant MH 75576 to TSH. We would like to thank Piers Howe, Steve Luck, Lauri Oksama, and Jeremy Wolfe for useful suggestions, Weiwei Zhang for assistance with the mixture model analysis, and Christina Howard and Yakov Kazanovich for constructive critiques.


1The two scenarios are not mutually exclusive. If there are short-term fluctuations in capacity during a trial, an observer might lose a target and then try to recover it using position, direction, or feature information (Horowitz, et al., 2007; Makovski & Jiang, 2009). Errors in this recovery process would lead to swaps.

2This is only true as far as the directional errors are independent for the two reports. If memory for the first report influences the second report, then there will be negligible gain for using two reports. The correlation between directional errors for tracking a single target was .49, and increases with the number of targets. We suggest that this increase is due to the fact that as the number of targets increases, so does the proportion of trials on which the observer is merely guessing. On guessing trials, the responses are not drawn from a true representation of direction, so they are likely to be highly correlated. Thus, we infer that the true correlation between directional errors is best estimated by the correlation on the single target trials. If this is true, then the two-response method was partially, but not completely successful at improving the precision of our measurements.

3Relaxing this constraint (i.e., adding μ as a free parameter) did not materially change the results.

4The three target condition was selected to match the Zhang and Luck analysis procedure. We also carried out the same analysis estimating capacity by multiplying Pt from the six target condition by six, but this did not substantially change the results.

Contributor Information

Todd S. Horowitz, Brigham and Women’s Hospital, Harvard Medical School.

Michael A. Cohen, Harvard University.


  • Alvarez GA, Cavanagh P. The capacity of visual short-term memory is set both by visual information load and by number of objects. Psychological Science. 2004;15(2):106–111. [PubMed]
  • Alvarez GA, Franconeri SL. How many objects can you track? Evidence for a resource-limited attentive tracking mechanism. Journal of Vision. 2007;7(13):14, 11–10. [PubMed]
  • Alvarez GA, Horowitz TS, Arsenio HC, Dimase JS, Wolfe JM. Do multielement visual tracking and visual search draw continuously on the same visual attention resources? Journal of Experimental Psychology-Human Perception and Performance. 2005;31(4):643–667. [PubMed]
  • Awh E, Barton B, Vogel EK. Visual working memory represents a fixed number of items regardless of complexity. Psychological Science. 2007;18(7):622–628. [PubMed]
  • Berman RA, Colby CL. Auditory and visual attention modulate motion processing in area MT+ Cognitive Brain Research. 2002;14(1):64–74. [PubMed]
  • Bonnel AM, Miller J. Attentional effects on concurrent psychophysical discriminations: investigations of a sample-size model. Perception & Psychophysics. 1994;55(2):162–179. [PubMed]
  • Cavanagh P, Alvarez GA. Tracking multiple targets with multifocal attention. Trends in Cognitive Sciences. 2005;9(7):349–354. [PubMed]
  • Chawla D, Buechel C, Edwards R, Howseman A, Josephs O, Ashburner J, et al. Speed-dependent responses in V5: A replication study. Neuroimage. 1999;9(5):508–515. [PubMed]
  • Cowan N. The magical number 4 in short-term memory: a reconsideration of mental storage capacity. Behavioral and Brain Sciences. 2001;24(1):87–114. [PubMed]
  • Culham JC, Brandt SA, Cavanagh P, Kanwisher NG, Dale AM, Tootell RB. Cortical fMRI activation produced by attentive tracking of moving targets. Journal of Neurophysiology. 1998;80(5):2657–2670. [PubMed]
  • d’Avossa G, Shulman GL, Snyder AZ, Corbetta M. Attentional selection of moving objects by a serial process. Vision Research. 2006;46(20):3403–3412. [PubMed]
  • Eng HY, Chen D, Jiang Y. Visual working memory for simple and complex visual stimuli. Psychonomic Bulletin & Review. 2005;12(6):1127–1133. [PubMed]
  • Eriksen CW, Spencer T. Rate of information processing in visual perception: some results and methodological considerations. Journal of Experimental Psychology. 1969;79(2):1–16. [PubMed]
  • Fencsik DE, Klieger SB, Horowitz TS. The role of location and motion information in the tracking and recovery of moving objects. Perception & Psychophysics. 2007;69(4):567–577. [PubMed]
  • Fencsik DE, Urrea J, Place SS, Wolfe JM, Horowitz TS. Velocity cues improve visual search and multiple object tracking. Visual Cognition. 2006;14(1):18–21.
  • Fougnie D, Marois R. Distinct capacity limits for attention and working memory. Psychological Science. 2006;17(6):526–534. [PubMed]
  • Franconeri SL, Lin JY, Pylyshyn ZW, Fisher B, Enns JT. Evidence against a speed limit in multiple-object tracking. Psychonomic Bulletin & Review. 2008;15(4):802–808. [PubMed]
  • Horowitz TS, Birnkrant RS, Fencsik DE, Tran L, Wolfe JM. How do we track invisible objects? Psychonomic Bulletin & Review. 2006;13(3):516–523. [PubMed]
  • Horowitz TS, Klieger SB, Fencsik DE, Yang KK, Alvarez GA, Wolfe JM. Tracking unique objects. Perception & Psychophysics. 2007;69(2):172–184. [PubMed]
  • Howard C, Holcombe AO. Tracking the changing features of multiple objects: Progressively poorer perceptual precision and progressively greater perceptual lag. Vision Research. 2008;48(9):1164–1180. [PubMed]
  • Howe PD, Cohen MA, Pinto YA, Horowitz TS. Distinguishing between parallel and serial accounts of multiple object tracking submitted. [PMC free article] [PubMed]
  • Howe PD, Horowitz TS, Morocz IA, Wolfe JM, Livingstone MS. Using fMRI to isolate components of the multiple object tracking task. Journal of Vision. 2009;9(4):1–11. [PMC free article] [PubMed]
  • Huang L, Pashler H. A Boolean map theory of visual attention. Psychological Review. 2007;114(3):599–631. [PubMed]
  • Imaruoka T, Saiki J, Miyauchi S. Maintaining coherence of dynamic objects requires coordination of neural systems extended from anterior frontal to posterior parietal brain cortices. Neuroimage. 2005;26(1):277–284. [PubMed]
  • Iordanescu L, Grabowecky M, Suzuki S. Demand-based dynamic distribution of attention and monitoring of velocities during multiple-object tracking. Journal of Vision. 2009;9(4):1–1. [PMC free article] [PubMed]
  • Jovicich J, Peters RJ, Koch C, Braun J, Chang L, Ernst T. Brain areas specific for attentional load in a motion-tracking task. Journal of Cognitive Neuroscience. 2001;13(8):1048–1058. [PubMed]
  • Kazanovich Y, Borisyuk R. An oscillatory neural model of multiple object tracking. Neural Computation. 2006;18:1413–1440. [PubMed]
  • Keane BP, Pylyshyn ZW. Is motion extrapolation employed in multiple object tracking? Tracking as a low-level, non-predictive function. Cognitive Psychology. 2006;52(4):346–368. [PubMed]
  • Large EW, Jones MR. The dynamics of attending: How people track time-varying events. Psychological Review. 1999;106(1):119–159.
  • Loftus GR, Masson MEJ. Using confidence intervals in within- subject designs. Psychological Bulletin & Review. 1994;1(4):476–490. [PubMed]
  • Logan GD. Parallel and serial processing. In: Pashler H, Wixted J, editors. Stevens’ handbook of experimental psychology (3rd ed.), Vol. 4: Methodology in experimental psychology. Hoboken, NJ US: John Wiley & Sons Inc; 2002. pp. 271–300.
  • Lorteije JA, van Wezel RJ, van der Smagt MJ. Disentangling neural structures for processing of high- and low-speed visual motion. European Journal of Neuroscience. 2008;27(9):2341–2353. [PubMed]
  • Luck SJ, Vogel EK. The capacity of visual working memory for features and conjunctions. Nature. 1997;390(6657):279–281. [PubMed]
  • Luks TL, Simpson GV. Preparatory deployment of attention to motion activates higher-order motion-processing brain regions. Neuroimage. 2004;22(4):1515–1522. [PubMed]
  • Ma WJ, Huang W. No capacity limit in attentional tracking: Evidence for probabilistic inference under a resource constraint. Journal of Vision. 2009;9(11):1–30. [PubMed]
  • Makovski T, Jiang Y. Feature binding in attentive tracking of distinct objects. Visual Cognition. 2009;17(1–2):180–194. [PMC free article] [PubMed]
  • Makovski T, Vazquez GA, Jiang YV. Visual learning in multiple- object tracking. PLoS ONE. 2008;3(5):e2228. [PMC free article] [PubMed]
  • Martin T, Egly R, Houck JM, Bish JP, Barrera BD, Lee DC, et al. Chronometric evidence for entrained attention. Perception & Psychophysics. 2005;67(1):168–184. [PubMed]
  • Masson MEJ, Loftus GR. Using Confidence Intervals for Graphically Based Data Interpretation. Canadian Journal of Experimental Psychology. 2003;57(3):203–220. [PubMed]
  • Narasimhan S, Tripathy SP, Barrett BT. Loss of positional information when tracking multiple moving dots: the role of visual memory. Vision Research. 2009;49(1):10–27. [PubMed]
  • Ogawa H, Yagi A. Priming effects in multiple object tracking: An implicit encoding based on global spatiotemporal information [Abstract] Journal of Vision. 2003;3(9):339a.
  • Oksama L, Hyönä J. Dynamic binding of identity and location information: A serial model of multiple identity tracking. Cognitive Psychology. 2008;56(4):237–283. [PubMed]
  • Palmer J. Attentional limits on the perception and memory of visual information. Journal of Experimental Psychology: Human Perception & Performance. 1990;16(2):332–350. [PubMed]
  • Prinzmetal W. Location perception: the X-Files parable. Perception & Psychophysics. 2005;67(1):48–71. [PubMed]
  • Pylyshyn ZW. The role of location indexes in spatial perception: a sketch of the FINST spatial-index model. Cognition. 1989;32(1):65–97. [PubMed]
  • Pylyshyn ZW, Storm RW. Tracking multiple independent targets: evidence for a parallel tracking mechanism. Spatial Vision. 1988;3(3):179–197. [PubMed]
  • Scholl BJ. What have we learned about attention from multiple object tracking (and vice versa)? In: Dedrick D, Trick L, editors. Computation, Cognition, and Pylyshyn. Cambridge, MA: MIT Press; 2009. pp. 49–78.
  • Scholl BJ, Pylyshyn ZW, Franconeri SL. The relationship between property-encoding and object-based attention: Evidence from multiple object … Manuscript submitted for publication 2004
  • Seiffert AE. Dissociating neural correlates of attentional tracking and attention to visual motion. Journal of Vision. 2003;3(9):868a.
  • Shen YJ, Makovski T, Jiang Y. Short-term visual memory for motion path [Abstract] Journal of Vision. 2006;6(6):35a.
  • Shiffrin RM, Gardner GT. Visual processing capacity and attentional control. Journal of Experimental Psychology. 1972;93(1):72–82. [PubMed]
  • Shulman GL, Ollinger JM, Akbudak E, Conturo TE, Snyder AZ, Petersen SE, et al. Areas involved in encoding and applying directional expectations to moving objects. Journal of Neuroscience. 1999;19(21):9480–9496. [PubMed]
  • Tripathy SP, Barrett BT. Severe loss of positional information when detecting deviations in multiple trajectories. Journal of Vision. 2004;4(12):1020–1043. [PubMed]
  • Tripathy SP, Narasimhan S, Barrett BT. On the effective number of tracked trajectories in normal human vision. Journal of Vision. 2007;7(6):1–18. [PubMed]
  • VanRullen R, Carlson T, Cavanagh P. The blinking spotlight of attention. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(49):19204–19209. [PubMed]
  • Verghese P, McKee SP. Predicting future motion. Journal of Vision. 2002;2(5):413–423. [PubMed]
  • Verghese P, McKee SP, Grzywacz NM. Stimulus configuration determines the detectability of motion signals in noise. Journal of the Optical Society of America A, Optics and image science. 2000;17(9):1525–1534. [PubMed]
  • Verghese P, Watamaniuk SN, McKee SP, Grzywacz NM. Local motion detectors cannot account for the detectability of an extended trajectory in noise. Vision Research. 1999;39(1):19–30. [PubMed]
  • Viswanathan L, Mingolla E. Dynamics of attention in depth: evidence from multi-element tracking. Perception. 2002;31(12):1415–1437. [PubMed]
  • Vul E, Frank M, Tenenbaum J, Alvarez G. Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model. Advances in Neural Information Processing Systems. 2009;22
  • Wilken P, Ma WJ. A detection theory account of change detection. Journal of Vision. 2004;4(12):1120–1135. [PubMed]
  • Wolfe JM, Place SS, Horowitz TS. Multiple Object Juggling: Changing what is tracked during extended multiple object tracking. Psychonomic Bulletin & Review. 2007;14(2):344–349. [PubMed]
  • Xu Y, Chun MM. Dissociable neural mechanisms supporting visual short-term memory for objects. Nature. 2006;440(7080):91–95. [PubMed]
  • Yantis S. Multielement visual tracking: Attention and perceptual organization. Cognitive Psychology. 1992;24:295–340. [PubMed]
  • Zhang W, Luck SJ. Discrete, fixed-resolution representations in visual working memory. Nature. 2008;453(7192):233–235. [PMC free article] [PubMed]