|Home | About | Journals | Submit | Contact Us | Français|
Attention regulates the flood of sensory information into a manageable stream, and so understanding how attention is controlled is central to understanding cognition. Competing theories suggest visual search involves serial and/or parallel allocation of attention, but there is little direct, neural, evidence for either mechanism. Two monkeys were trained to covertly search an array for a target stimulus under visual search (endogenous) and pop-out (exogenous) conditions. Here we present neural evidence in the frontal eye fields (FEF) for serial, covert shifts of attention during search but not pop-out. Furthermore, attention shifts reflected in FEF spiking activity were correlated with 18–34 Hz oscillations in the local field potential, suggesting a ‘clocking’ signal. This provides direct neural evidence that primates can spontaneously adopt a serial search strategy and that these serial covert shifts of attention are directed by the FEF. It also suggests that neuron population oscillations may regulate the timing of cognitive processing.
Theories of attentional control posit both parallel and serial mechanisms (Duncan and Humphreys, 1989; Treisman and Gelade, 1980; Wolfe et al., 1989). Parallel mechanisms are believed to underlie the selection of salient stimuli and those having sought-after features (e.g. stimuli the same color as the target). This is possibly mediated through synchronous activity(Bichot et al., 2005). In contrast, serial mechanisms may underlie the focusing of an attentional “spotlight” onto a particular stimulus(Posner, 1980), as reflected throughout visual cortex (Busse et al., 2008; Desimone and Duncan, 1995; Maunsell and Treue, 2006; Pessoa et al., 2003; Reynolds and Chelazzi, 2004; Woodman and Luck, 1999). There has been interest in whether humans and animals spontaneously search a visual scene in a serial manner (like a moving spotlight) or in parallel. Many of the conclusions are drawn from human studies that used behavioral latencies (e.g., increased latency to find a target as the number of search stimuli increase;(Duncan and Humphreys, 1989; Duncan et al., 1994; Treisman and Gelade, 1980; Wolfe et al., 1989). We sought direct evidence in neurophysiological activity in two monkeys trained to covertly search a visual array (Fig. 1a). Their training did not bias them toward a serial, parallel, or any other type of strategy, instead, we report the strategy they spontaneously adopted.
To contrast top-down attention, which is volitional and could include serial attentional shifts, with bottom-up attention, which is thought to be automatic and always parallel, we used two forms of a visual search paradigm: “search” (top-down) and “pop-out” (bottom-up). In pop-out, the distractors were identical and differed from the target along two dimensions (color and orientation), so the target’s salience automatically drew attention to it (Treisman and Gelade, 1980; Wolfe and Horowitz, 2004). During search, each distractor differed independently from the target in either color or orientation. Because the target matched some of the distractors in each dimension, it was not the most salient stimulus in the array and had to be sought using only its remembered appearance. As we were interested in covert search, monkeys were required to maintain central fixation until they found the target and then make a single saccade directly to it.
Human imaging and monkey neurophysiology all point to a central role of frontal and parietal cortex in directing attention. For example, they show increases in blood flow during both overt and covert shifts in attention (Corbetta and Shulman, 2002; Hopfinger et al., 2000; Taylor et al., 2008) and neurons in the FEF (Bichot and Schall, 1999), dlPFC (Barcelo et al., 2000; Rossi et al., 2007), and posterior parietal cortex (PPC; (Bisley and Goldberg, 2003; Ipata et al., 2006) respond preferentially to attended versus unattended stimuli. We previously found that frontal cortex (dlPFC and FEF) neurons registered top-down shifts of attention with a shorter latency than the parietal cortex (area LIP; (Buschman and Miller, 2007). By contrast, automatic (bottom-up) shifts of attention to a salient stimulus showed the opposite latencies. This suggests that top-down attention signals flow from frontal to parietal cortex (and vice-versa for bottom-up). This is supported by observations that stimulation of the FEF induces attention-like effects in visual cortex (Armstrong and Moore, 2007; Moore and Armstrong, 2003; Moore and Fallah, 2004; Ruff et al., 2008) as well as a recent fMRI study suggesting that FEF influences parietal cortex during visual search (Bressler et al., 2008). Thus, because the frontal cortex seems to lead top-down search, we focused our study on the frontal eye fields (FEF) and the dorsolateral prefrontal cortex (dlPFC).
During top-down search trials, saccadic reaction times (RTs) suggested that both monkeys often adopted a serial search strategy. Figure 1b shows a typical distribution of RTs from an experimental session. RTs were shortest when the target was in the lower-right position and became progressively longer when the target appeared in the lower-left, upper-left, and the upper-right positions. This suggests that the monkey often first covertly attended the stimulus in the lower right quadrant and then, on a substantial proportion of trials, covertly shifted its focal attention in a clockwise manner until the target was found.
A similar pattern was observed across all of the recording sessions. Both animals showed a tendency to start searching from a preferred target location: Monkey S responded quickest to the bottom-right (8/10 recording sessions, p = 3.0 * 10−5, binomial distribution), while monkey W tended to begin from the upper right (8/15, p = 0.0042, binomial distribution). The tendency to search clockwise from that starting point was also significant for both monkeys. We used a cost-analysis to show that the observed pattern of RTs was closer to a clockwise pattern than any other possible search pattern (p < 0.05 for all comparisons by t-test, see Fig. 1c/d and supplementary data). Additionally, we fit several generalized linear models (GLMs) to the RTs, including unordered, clockwise (CW), and counter-clockwise (CCW) models (as well as others, see supplementary data). For search trials the clockwise GLM provided a significantly better fit than the counterclockwise model (18/25 days, p = 0.0073) or any of the alternate models (p = 0.0021). In contrast, there was no significant trend towards an ordered pattern for pop-out trials: no pattern was significantly better than the others in the cost-analysis and the unordered GLM was the best fit. This is consistent with pop-out engaging parallel search.
To estimate the speed at which the animals could shift their attention during visual search we performed a psychophysical experiment in which we varied the number of objects in the visual array from 2 to 4. The behavioral RT to find the target during search increased by 22 ms for every item added to the visual array, suggesting that it took the animals approximately 44 ms to shift their attentional spotlight (see Fig. S1 and experimental methods for details). This estimate fits well with earlier results (Hikosaka et al., 1993) and matches our neural data (see below). Importantly, this method of estimating the time to shift attention does not assume a consistent search pattern or starting point, only that the animal performed a serial search. This is in contrast to a simple estimation of the time to shift attention directly from the raw reaction times observed during the recording sessions (Fig. 1 and Table S1), which would assume the animal always initiated their search from a single location and always searched in the same pattern around the visual field. For example, even if the animal searched in a consistently clockwise manner the behavioral effect would be reduced if they varied their starting position at all. For this reason, we estimated the time to shift attention from the behavioral cost of adding distractors to the search array, finding the time to shift attention to be 44 ms.
These behavioral results suggest that during the search (but not pop-out) task, both animals adopted a covert search strategy that tended to be (but was not necessarily always) serial and clockwise. The animals were not explicitly trained to perform a serial search, but rather spontaneously adopted this strategy. Next, we show that an independent analysis of neural activity supports the same conclusion.
We focused our analysis on FEF and dlPFC neurons involved in directing either attention or the eye to the target location by selecting neurons whose spiking activity reflected the target’s location before and around the saccade (activity from 350 ms before to 150 ms after the saccade, mutual information in independent 25 ms bins, p < 0.05 by randomization test; N = 60 during search, 54 during pop-out for FEF; 70 and 78 for dlPFC). A preferred target location (the one that elicited the most activity) was determined for each neuron by using its activity during the 75 ms after the saccade.
Like behavioral RTs, the activity of FEF neurons showed evidence for a serial pattern during visual search. In Figure 2a, average FEF activity is plotted as a function of time (x-axis) and target location relative to each neuron's preferred location (y-axis). When the target appeared in each neuron’s preferred location, there was a build-up of activity immediately before the saccade (Fig. 2a, top row), as expected.
The second row shows activity on trials during which the target appeared at the location clockwise from the neuron’s preferred location. If monkeys were shifting their attentional focus in a clockwise pattern then attention should be focused onto this neuron’s preferred location before being focused onto the target (which was at the next clockwise location). This was reflected in the earlier, transient, activation of FEF neurons during these trials (Fig. 2a, second row).
The third row (Fig. 2a) shows the clockwise search pattern one step further: when the target was two steps clockwise from each neuron’s preferred location, we observed even earlier activation of FEF neurons. It is important to note that as variability in the temporal precision of activity adds with each shift of attention, the activation with two attentional shifts to the target (third row) was more dispersed in time. In addition, there were relatively few trials with three attentional shifts: as shifts in attention are cumulative 3 shifts only occurred on ~50% of trials (i.e. all trials have at least one shift and a trial with 3 shifts by definition also contains 2 shifts and 1 shift). This also explains the lack of neural signal relating to 4 shifts in attention – only 25% of trials would have 4 shifts, making it very difficult to detect.
Further support for FEF activity reflecting a serial shifting spotlight of attention came from three additional analyses. First, we found the same serial clockwise effect using a vector analysis, suggesting that the clockwise shift in neural activity was seen for the majority of FEF neurons (see Fig. S2). The activity of each neuron at a given point in time was used to construct a vector in visual space that pointed towards the location currently reflected by its activity. The distribution of the direction of these vectors is shown in Fig. S2 and shows a similar pattern to that observed in Fig. 2A. Whereas the average of population activity in Fig. 2 was weighted by the strength of the most activated neurons, all neurons contribute equally to the average in Fig. S2, indicating that the majority of neurons carried activity reflecting the shifting spotlight of attention.
Second, we found evidence for a shifting clockwise attentional spotlight within single trials by comparing pairs of simultaneously recorded FEF neurons whose preferred directions were offset by one clockwise position (N = 47). A shuffle-corrected cross-correlation revealed a significant positive correlation at a 40 ms clockwise offset (p = 0.012, two-tailed t-test against no correlation, see Fig. S4). This means that FEF neurons whose preferred location was one step “downstream” (clockwise) tended to be activated 40 ms after a FEF neuron “upstream”. This fits well with our, and others’(Hikosaka et al., 1993), behavioral estimates of the time to shift attention as well as FEF population activity (Fig. 2, top row and second row).
Finally, there is no structure observable in the eye position over the trial (Fig. S6), eliminating any concern that the observed pattern of neural activity is due to FEF activity reflecting subtle eye movements.
Pop-out is thought to be parallel and, indeed, no such serial pattern of FEF activity was observed on pop-out trials. Instead, neurons were selectively activated when the target was at their preferred location (Fig. 2b), as expected from a parallel mechanism. Relative to search, FEF activity build-up begins earlier in pop-out (about 150 ms before the saccade, Fig. 2b), which is consistent with our prior result(Buschman and Miller, 2007).
No ordering effect was observed in the activity of dlPFC neurons. Instead, dlPFC neurons were only strongly activated by the target appearing in their preferred location (Fig. 2c/d). This lies in contrast to what was observed in FEF. However, this disparity is not due to differences in responsiveness, selectivity for target location, or our ability to decode pre-saccadic activity (see Fig. S3).
Based on these results, it seems that the FEF was more involved in the shifting of attention to search for the target whereas the dlPFC was more involved in identifying the target once it was selected. In other words, it seemed that the FEF led the search while the dlPFC followed. To determine the exact temporal order of activation we performed a shuffle-corrected cross-correlation analysis between FEF and dlPFC neurons with overlapping preferred locations. This revealed a significant positive correlation with FEF spikes preceding dlPFC spikes by 25 ms (p = 0.028, two-tailed t-test against no correlation, see Fig. S5). This is suggestive of the FEF driving the dlPFC during visual search.
Consistent with an interaction between these areas, phase-locking in the 18–34 Hz, “middle”, band of the local field potential was observed between selective FEF and dlPFC electrodes. It peaked in the 40 ms before saccade when both areas reflected the target’s location (phase-locking determined by circular correlation of instantaneous phase, ρ = 0.197, search > pop-out, p = 8.3*10−4 by sign test).
We previously reported an increase in synchronous “middle band” oscillations in frontal cortex local field potentials during visual search (Buschman and Miller, 2007). We noted that this frequency band, 18 to 34 Hz, overlaps our behavioral and neurophysiological estimates of the time to shift attention (~40 ms corresponds to 25 Hz). This raised the possibility that shifts in attention were correlated with LFP oscillations. We tested this from a decoding perspective. The analyses above (like many neurophysiological studies) compare spiking activity over static time windows relative to behavioral or task events. However, as illustrated in Figure 3a, LFPs oscillations are often not strictly phase-locked to external events or actions. They can vary from trial to trial: there can be shifts in phase (purple traces) and/or wavelength (green traces) relative to, for example, the saccade. So, if the shifts of attention were correlated with LFP oscillations, we should be able to improve our estimate of the locus of attention by analyzing spikes over a dynamic window that reflects the changing LFP instead of a static temporal window locked to an external task event.
As shown above, FEF neurons reflected the allocation of attention to the target location just before the saccade, and to the adjacent, counter-clockwise, location just before that. We focused our analysis on decoding the shift of attention between these locations because, as noted above, there were fewer trials with a greater (3 or 4) number of shifts of attention. To capture these attentional shifts in the spiking activity, we defined two analysis windows based on either standard, static, time windows or LFP cycles. To be as conservative as possible we tested a wide range of time windows and found the best (i.e., the strongest effects of shifting attention in neural activity) was with two windows equally dividing a span from 70 ms to 5 ms before the saccade (Fig. 3a). The “late” window was just before the saccade (when attention was at the target location) and “early” was the time window before that (when attention was at the location counterclockwise to the target). We compared this to two analysis windows based on LFPs. In this case, we defined the late window as the LFP cycle just before the saccade and the early window as the LFP cycle immediately before that (see Fig 3a). Both windows range from 120° before to 240° after their respective peaks, encompassing the falling phase of the oscillation, where spike rates are typically highest (we found our FEF neurons were most active at 0.9π, or about 160 degrees, after the peak). The key question was: which analysis windows, LFP or time, better captured the shift of attention from the counterclockwise location to the target location?
Figure 3b shows results from the time and LFP windows from decoding the locus of attention for one example neuron. The solid lines show the neuron’s activity when the target was at the preferred location and the dashed lines when the neuron’s preferred location was one step upstream (counterclockwise) from the target location. Thus, the higher dashed line in the early window reflects attention allocated to the position CCW to the target, and the higher solid line in the late window reflects attention allocated to the target location. Note that, for this neuron, the modulation by attention (the difference between the solid and dashed lines) is greater when spikes are summed over the LFP-based windows compared to time-based windows.
In fact, better decoding with LFP windows was found across the FEF population. To directly compare the dynamic windows based on LFP cycles to the static, windows linked to the saccade, we generated a goodness-of-fit statistic. Each neuron’s activity was compared to an “ideal” neuron that perfectly reflected a clockwise shift of attention (see experimental procedures for detailed description). This revealed superior performance of LFP over time windows: on average there was a significantly greater reduction in error (from the ideal neuron) when using the LFP (p = 0.036, signed rank test) and the number of neurons showing a reduction in error with LFPs was also significantly larger (N = 34 out of 55, p = 0.0054, randomization test, Fig. 4a). As noted above, to ensure that this effect was not due to the particular time windows we chose, we tested a variety of offsets and durations and found that the LFP model outperformed all of them (see Fig. S9). Therefore, on average, the locus of attention can be better decoded from neural activity using an LFP-based rather than time-based analysis window.
Figure 4b shows that the LFP oscillation cycle captures the shift of attention in the average FEF population activity. Plotted is the average FEF population activity binned over LFP phase for trials in which the target was at each neuron’s preferred location (blue line) versus when the target was at the location CCW to the target (green line). When the preferred location was CCW to the target (green line), average activity peaks in the middle of the early LFP cycle and weakens near its end. Then, when the next (late) LFP cycle begins, there is a rise of the average activity on trials in which the preferred location was at the target. The allocation of attention to a location is well-contained within an LFP cycle, suggesting that the shift of attention from one location to the next occurs at the transition between the two cycles. A direct comparison of the LFP and time models can be seen in Fig. S7.
To ensure that our observed effects were due to the specific frequency band of interest and were not the result of our analysis technique, we compared the time-based model of decoding the spotlight of attention to a LFP-based model using the “next” higher frequency band, 35 to 65 Hz. As this band is double the frequency of the 18–34 Hz band we summed across two LFP cycles in order to match the LFP window to our observed time of 40 ms to shift the spotlight of attention (i.e. for the first two cycles before saccade attention was taken to be at the target, and for the two cycles before that attention was taken to be counter-clockwise to the target). Unlike our results with the 18–34 Hz band, using LFPs from this control frequency band did not yield significantly better decoding of the attention shift than the time model. When compared to the best time model, the best LFP model using the new frequency band explained 7% less variance across the population. Individual cells also failed to show a significant effect: there was an even split in neurons that had lower error for each model (27 for LFP, 28 for time, p = 0.32 by randomization test), and the average difference in error between the two models was not significantly different from zero (p = 0.80, by signed rank test). This suggests that the observed correlation between the shifting spotlight of attention and the local field oscillations is specific to the ‘middle’ frequency band and not an artifact of our analysis.
So far our analyses have shown that shifts in attention were correlated with beta-band LFP oscillations. In addition, as can be seen in Fig. 3, these oscillations can change their frequency from trial to trial. If both of these statements are true, then one would expect a relationship between the frequency of the LFP oscillation and the behavioral RT. In other words, a slower clock (lower frequency) on a given trial should result in a slower attentional shifts and thus longer behavioral RTs, while a faster clock should speed up search and produce a faster RT. This is what we found: there was a significant correlation between the frequency of LFP oscillation and how fast the animal found the target on a particular trial (Fig. 5). We determined the frequency of oscillation for every trial by inverting the average peak-to-peak distance across both cycles of interest. The resulting distribution of frequencies across trials was distributed into 20 equal bins, based on their rank ordering, and the average reaction time was determined for each bin (Fig. 5). There was a significant correlation between the exact frequency of the LFP on a given trial and the resulting reaction time (ρ=-0.67, p = 1.6*10−3). This suggests that as the frequency of the population oscillation increased the reaction time to find the target decreased. In other words, speeding up the clock allows a faster shifting of the spotlight, helping to, on average, find the target faster.
We present both behavioral and neurophysiological evidence that primates can spontaneously adopt a serial, covert, visual search strategy. This does not mean that covert visual search will always be serial. Psychophysical and neurophysiological studies suggest a mixture of both parallel and serial mechanisms and have shown that different tasks engage them to differing degrees (Duncan and Humphreys, 1989; Thornton and Gilden, 2007; Wolfe et al., 1989; Wolfe and Horowitz, 2004). Our task may have promoted a serial strategy because, while the target stimulus identity varied, the potential target locations were consistent from day to day. Thus, the monkeys may have adopted a strategy of focusing attention on each location individually to take advantage of this consistency. In general, while some cases of visual search might rely almost entirely on serial or parallel mechanisms, the majority of tasks likely fall in between, a mixture of both (Bichot et al., 2005). In any case, the question of whether visual search has serial or parallel mechanisms has long been discussed (Duncan and Humphreys, 1989; Duncan et al., 1994; Treisman and Gelade, 1980; Wolfe et al., 1989), and we believe our study provides some of the first direct, neural, evidence for a spontaneous serial visual search.
FEF appears to play a role in both saccade programming and attention. In our current task both are required, as the animal must shift their attention covertly before finding the target and making an overt eye movement. This raises the question of whether similar results would have been observed in a task that did not explicitly require eye movements. One might expect so: there is a well established link between shifts in attention and saccade planning (Inhoff et al., 1989; Peterson et al., 2004; Rizzolatti et al., 1987), and FEF neurons are known to show correlates of attentional shifts during visual search, even when no eye movements are required (Thompson et al., 1997). Regardless, a direct, simultaneous, comparison between tasks would likely to yield new insight into FEF function.
Our results contrast our two frontal regions, FEF and dlPFC. FEF reflected covert shifts of attention to locations that did not contain the target while the dlPFC instead reflected the location of the target and not the shifts of attention that preceded its selection. This suggests that the FEF, a brain area known to be centrally involved in volitional eye movements, may also play a leading role in volitional shifts of attention (Rizzolatti et al., 1987). This is consistent with observations that FEF neurons reflect the visual target locations even in the absence of a saccade(Thompson et al., 1997) and that sub-threshold stimulation of FEF induces ‘attention-like’ effects in posterior cortex (Armstrong and Moore, 2007; Moore and Armstrong, 2003; Moore and Fallah, 2004).
In contrast, the dlPFC may play more of a role in comparing the currently attended stimulus to a target stimulus held in short-term memory. PFC neurons have neural correlates of active short-term memory and matching (Funahashi et al., 1989; Fuster, 2008; Fuster and Alexander, 1971; Miller et al., 1996). In humans, the lateral PFC is critical for top-down modulation of sensory cortex during target detection (Barcelo et al., 2000) and reflects target probability (Casey et al., 2001). In monkeys, microstimulation of the monkey lateral PFC biases target selection (Opris et al., 2005) and a high proportion of its neurons reflect whether a stimulus is a target now, other times, or never (Kusunoki et al., 2009). Taken with our results, this suggests that the dlPFC may play a greater role in directing behavior to targets rather than shifts of attention in general. We should note that although we did not find activity in dlPFC reflecting attentional shifts to non-target stimuli, our sampling was limited to the region just anterior to FEF (which seemed most likely to carry such signals). Finally, the dlPFC is also likely to play a role in the maintenance of the current ‘rule’ and thus, in establishing the animal’s search strategy (Miller and Cohen, 2001).
We found that oscillations in FEF population activity may be used to regulate the covert shift of attention. This may explain psychophysical observations of a periodic allotment of attention (Cavanagh et al., 2007). Synchronous oscillations of neural activity have been shown to be correlated with performance in a wide variety of tasks. Recent work in visual attention has highlighted the role oscillations may play in spatial attention, either by boosting stimulus representations through synchrony (Engel et al., 2001; Fries et al., 2001; Fries et al., 2008; Landau et al., 2007; Womelsdorf and Fries, 2007) or dynamically establishing communication between areas (Engel et al., 2001; Pesaran et al., 2008; Siegel et al., 2008; Womelsdorf and Fries, 2007).
LFP oscillations associated with the shifting of the spotlight of attention could be extrinsically or intrinsically generated. On one hand, the LFP signal could be reflecting a separate population of oscillatory neurons whose purpose is to regulate the timing of neural processing. In contrast, the oscillatory LFP may be intrinsically generated by the process of serially attending to different locations in a rhythmic manner. For example, the network of neurons may shift the spotlight of attention by inhibiting the currently attended location and then exciting the next; producing an oscillatory wave as this process repeats. Future experiments will help to differentiate between these two mechanisms. Regardless of the mechanism, we find that shifts of attention are correlated with oscillations in the network and that these oscillations seem to occur in a specific frequency band.
Indeed, one role for neural oscillations may be to aid complex, multi-step computations. An oscillating wave of inhibition would allow for computations to be temporally constrained on a timescale that makes sense for learning. Additionally, it would ensure that all the involved neurons are simultaneously activated, allowing information to be released in a “packet” at a time when a downstream area is ready to receive it (i.e., when they are both in depolarizing phase of the oscillations). Conversely, it may impede areas from communicating when they are out of phase, thus sculpting and targeting the flow of neural activity. One prediction of this model is that the frequency of the associated oscillatory activity would vary with the nature of the computation. Highly localized computations may be able to oscillate at higher frequencies while more complex, integrative, computations occur with a slower oscillation.
For example, consider comparisons between attention tasks requiring different behavioral responses (i.e. overt vs. covert attention). One might expect the increased ‘overhead’ of moving the eyes with each attentional shift during overt search would result in a slower time constant, and thus a lower frequency oscillation. In contrast, a purely covert search task without eye movements might result in faster shifts of attention and thus locking to a higher frequency oscillation. Even more localized computations (such as working memory) might lock to even higher frequencies. Indeed, computations might use the closest inherent ‘eigen-frequency’ or resonance of the cortical network to support its representation.
Another consequence of oscillations playing a role in cognition would be the ‘discretizing’ of events. For example, attention appears to be allocated in discrete chunks of time dependant on oscillatory activity and not as a continuous function that smoothly shifts from location to location at will. Indeed, this discretizing of computation has some history (VanRullen and Koch, 2003), including psychophysical support suggesting shifts in attention play a role in discretizing perception (Kline et al., 2004; VanRullen et al., 2005).
Finally, we do not want to discard the role of time in neural processing. It is clear that in many cases oscillatory activity does not play a clear role in the computations occurring in the brain. However, our results do provide evidence that oscillations may help to time cognitive functions by parsing complex, multi-step, operations into manageable discrete computations. The brain is likely to have mechanisms to time processing throughout its widespread networks – synchronous, oscillating, activity may provide such a “clocking signal”.
Two monkeys were trained to perform a visual search task as outlined in Figure 1. The trial was initiated when the animal fixated a point at the center of the screen. Fixation was required within 1.6 degrees of visual angle of the fixation point. After a short fixation period (500 ms), the animal was presented with a sample, colored, oriented bar for 1000 ms, centered on fixation. The sample stimulus was removed and the monkey then maintained central fixation over a 500 ms memory delay, which ended with the presentation of a visual search array. The array elements were identical in size and shape to the sample and appeared four degrees from fixation. One of the array items matched the sample in both color and orientation (the target). Monkeys needed to make a direct, linear, saccade from central fixation to the target and hold their gaze at the target for 150 ms to receive an apple juice reward. Any deviations from the correct saccade path, including saccades to non-target stimuli, were recorded as errors and not rewarded. This ensures the search process was covert.
The number of search array items was held at 4 during recording and the items always appeared at positions 45, 135, 225, and 315 degrees from the vertical meridian (see Fig. 1). This was true during both tasks and was the only thing held constant across days. In pop-out, the non-targets (distractors) were all identical; differing from the target by 90 degrees and colored as the opposite color of the target color. This caused the target stimulus to be the most salient object in the scene and therefore “grab” attention. In visual search, distractors differed independently from the target by either color or orientation. Under these conditions the target stimulus is not the most salient and therefore top-down, endogenous direction of attention was needed. The difference in color and orientation between the target stimulus and the distractors was the same as the difference between target stimuli on different trials. This allowed a target stimulus on one trial to be a distractor stimulus on the next.
The search and pop-out tasks were interleaved in blocks of approximately 35 trials each. The animals performed a minimum of 720 correct trials during recording sessions, ensuring at least 10 trials for each of the 9 possible targets (3 colors by 3 orientations) at each location and for each task. Data is presented from 25 recording sessions (10 in monkey S, 15 from monkey W).
Two male rhesus monkeys, weighing approximately 6 kg each, were used for all training and electrophysiological recordings. All procedures followed MIT Committee on Animal Care and NIH guidelines.
The recording well was placed at approximately 23 mm AP from the interaural plane. Microstimulation was used to demarcate the frontal eye fields from dorsolateral prefrontal cortex (see below for details; (Bruce and Goldberg, 1985). Up to 25 electrodes were simultaneously, and acutely, inserted into the frontal cortex. A total of 515 neurons were recorded across the two anatomical regions (272 neurons from dorsolateral prefrontal cortex, dlPFC, and 243 neurons from the frontal eye fields, FEF). Acute recording allowed us to sample different sets of sites in each recording session.
We analyzed neurons for which we had recorded activity on a minimum of 60 trials for each target location. This was 248 dlPFC neurons and 225 FEF neurons during the pop-out task and 251 dlPFC neurons and 225 FEF neurons during the search task. Similar results were obtained for each animal alone, so they are combined for presentation. We focused on neurons that carried significant information about the target location at some point during the trial were used in this manuscript (FEF: N = 60 for search, N = 54 for pop-out; dlPFC: N = 70 for search, N = 78 for pop-out). Selectivity was measured with a mutual information analysis in a sliding window manner across independent 25 ms time bins (Buschman and Miller, 2007). Significance was determined with a randomization test. The criterion was significant (p < 0.05) information about the target location for two consecutive bins (which corrects for multiple comparisons made across time).
Due to the large number of simultaneously recorded neurons there was no optimization of the stimulus parameters for recording. Likewise, neurons were not pre-selected for responsiveness. Rather, we randomly selected neurons for recording, ensuring a sampling of neuron properties and a more complete view of cortical function.
Microstimulation was used to demarcate the frontal eye fields from dorsolateral prefrontal cortex. Stimulation was delivered as a 200 ms train of bi-phasic pulses with a width 400 µs and an inter-pulse frequency of 330 Hz using the same electrodes used for recording. Current level was started at 150 µA and reduced to find the threshold at which an eye movement vector was elicited 50% of the time. Only sites that had thresholds of stimulation amplitudes less than 50 µA were classified as belonging to the frontal eye fields (Bruce and Goldberg, 1985). Anterior sites were classified as belonging to the dlPFC. In general, stimulation at dlPFC sites did not elicit eye movements even at the highest current amplitude tested (150 µA).
Before recording we performed psychophysical tests to determine the rate at which shifts of attention occurred. This was done by determining the cost of adding a distractor to the search array during visual search. The number of objects in the array was varied from 2 to 4 objects and, as seen in Fig. S1, we found that the cost was 22 ms per item.
Using the psychophysically estimated cost of adding a distractor to the search array it is possible to estimate the time to shift the attentional spotlight from one stimulus to another: first, we model the time to search the visual array as coming from two sources: one fixed and one variable. The fixed component of the reaction time is due to a variety of events that occur with every trial regardless of task condition. These include, but are not limited to, visual perception, comparing the attended to stimulus to the remembered one, and the initiation of eye movement. The variable reaction time component is associated with the cost of actively finding the target – as the target is located randomly in the array the animal has a fixed chance of finding the target on each subsequently attended location. If we assume a strong inhibition of return, then we can directly model the observed reaction time as a combination of fixed reaction time and the scaled cost of shifting attention when there are n stimuli to search through:
The cost of adding a distractor to the search array is therefore:
Using the reaction time cost derived from our psychophysical experiments (22 ms/item), we can estimate the time needed for a shift in attention to be roughly 44 ms/item.
Figure 2 plots the average normalized activity across all analyzed neurons, showing a clear clockwise trend reflecting the spotlight of attention. All analysis is done relative to the neuron’s preferred direction (see below). This allows all neuron’s to contribute to the overall average, without making any assumptions about where the animal began their search (instead, only using the knowledge of where the animal ended their search). This is advantageous for two reasons. First, it allows us to completely dissociate the neural evidence for a serial search pattern from the behavioral evidence. Second, as the starting point of the animal can only be estimated for an entire session, it is less reliable than the saccade, which marks where the animal ended its search on every trial.
The preferred direction of each neuron was determined using the post-saccadic response in a 75 ms window following the saccade. A vector of activity was created across trials in which the target was at each of the four possible locations. The direction of this post-saccadic vector was taken to be the preferred direction of the neuron. By using a purely post-saccadic response to determine the preferred direction we avoid ‘contaminating’ the pre-saccadic activity used to follow the shifting spotlight of attention.
These analyses were performed on all selective neurons, regardless of when they were selective. This ensures a constant number of neurons across all four target locations. The minimum number of trials observed for any of the selective neurons was 63, with the average number of trials during search above 90. This provides us with a high signal-to-noise ratio. There was no significant difference in number of trials between locations for either dlPFC or FEF during either search or pop-out. An ANOVA found no significant differences (p = 0.9997, FEF/search; p = 0.9964, dlPFC/search; p = 0.9988 FEF/pop-out; p = 0.9827 dlPFC/pop-out).
The firing rate for each individual neuron was normalized in a 40 ms window, slid with 10 ms steps, by constructing a z-score of activity for each neuron, in each bin. The z-score was computed across all correct trials, regardless of target location (i.e. vertically across the figure). This normalization procedure has several advantages. First, it removes the impact of changes in firing activity unrelated to attention and allows for direct comparison of the relative increases in firing rate across different neurons (by normalizing by the standard deviation in each neuron’s activity). In other words, we are measuring the amount of information in the neural signal, rather than raw spiking activity.
The advantage of this technique is even greater when comparing the LFP and time-based models of the shifting spotlight of attention. Without this normalization procedure an increase in observed spiking activity over the time model could be due to spike-field synchronization and not due to a greater isolation of the information about the location of the spotlight of attention. By performing the z-transform we are able to measure the amount of information in the neural signal. Finally, the z-transform allows us to easily determine whether the observed average was significantly above (or below) zero.
For comparison purposes, Fig. S8 shows the average, raw population activity from all selective neurons. As with the normalized firing rate, the shifting spotlight of activity can be seen in the raw firing rate. The increase in separation between the early and late allocation of attention using the LFP model is also clear in the raw activity.
It is important to note that although z-scores are ideal for averaging across neurons (since it equalizes the contribution of each individual neuron) they are relative measures and cannot be used to infer differences between brain areas in the timing of their spiking activity. For example, FEF neurons responding to the target in their preferred location (Fig. 2, top row) must ‘overcome’ the residual activity from ‘CW’ trials (Fig. 2, second row; i.e. those where the FEF neurons were responding to attention into their receptive field but the target was CW). This effect is best observed in Fig. S7 where one can see the sub-threshold, non-signficant, information about attention at the CW position (dashed orange line) continues into the “late” cycle. As dlPFC neurons only show target activity, they are not ‘delayed’ by this earlier activation. In order to directly measure the temporal relationships of spiking activity between areas we used a cross-correlation measure (see Fig. S5).
Decoding multiple shifts of attention on a particular trial is a cumulative process. This results in an uneven distribution of trials with 1, 2, 3, or 4 shifts in attention. For example, all trials must have at least a single shift in attention (preceding the saccade), fewer (but most) will have at least 2 shifts, fewer still will have 3 shifts, and the fewest number of trials will have 4 shifts. In other words, trials with 3 shifts must, by definition, include a first and second shift and not all trials will contain 2 or more shifts in attention.
To ensure sufficient and non-biased sampling of data for our analyses, we did not preselect neurons for showing evidence of an attentional shift and we grouped all trials together. Thus, when we analyzed 3 shifts of attention, we are including a subset of trials with 3 shifts (or 4 shifts) as well as a subset without 3 shifts (only having 1 or 2). As noted in the main text, these trials add noise to the analysis that grows with the greater number of attention shifts. The effect on the analysis can be best seen in the activity related to a third shift in the spotlight of attention: although clearly significant, it was slightly earlier than strictly expected and was more temporally diffuse (Fig. 2).
The alternative would have been to attempt to isolate trials on which the reaction time suggested the animal performed 1, 2, 3, or 4 shifts. However, that would have biased the average population values toward the very property we had observed. Instead, we adopted the more conservative approach of averaging together all trials regardless of the reaction time on that trial.
In order to determine the role of the previously observed ‘middle’ frequency band (18–34 Hz) in clocking the shifts in attention, it is necessary to filter the local field potentials into our frequency band of interest. Local field potentials from the entire trial were filtered using a digital IIR filter consisting of 13 sections of Chebyshev, Type II filters. The filter was of order 26 and was attenuated to at least −40 dB in the stop-band regions (below 18 Hz and above 34 Hz). The Type II Chebyshev filter avoided any rippling within the pass-band but did have a non-linear phase offset across frequencies. In order to compensate for this non-linear phase effect, we filtered the LFP signal both forward and backward in time, ensuring a zero phase shift. As this increases the effective order of the filter we filtered across the entire trial’s signal, ensuring enough data to avoid edge effects. For the analysis in Fig. 5, the exact frequency of the LFP signal was determined for each trial by inverting the average peak-to-peak distance across both cycles of interest.
To investigate the role of neural oscillations in shifting attention, we compared our ability to decode the locus of attention when using static time windows to dynamic windows based on the local field potential. Across the population we can see a general improvement (see results above, as well as Fig. 2, Fig. 4, and Fig. S7).
In order to quantify the ability of a given model (either based on static time windows or dynamic LFP windows) to predict the locus of attention we compared each neuron’s response to an ‘ideal’ neuron. In our case, the ideal neuron would be one that followed attention in a binary manner: it would be maximally active when attention was into its receptive field and inactive when attention was away. For example, it would show high activation followed by low activation when the target was clockwise to its preferred location and the opposite pattern (low early, high late) when the target was in its preferred location. The error for each individual neuron was taken to be the distance between the neurons observed, relative, average firing rate and this desired, ‘ideal’ one. The model that reduces this error to the greatest extent is the most appropriate model for that neuron.
Similarly, we can test the time and LFP models across the entire population by using a generalized linear model (GLM) to predict neural activity. The percent of variance in the activity of neurons explained by the shifting spotlight of attention was used as the metric for determining the goodness of fit of each model. The model with the greatest percent explained variance was the preferred one.
In order to ensure that we made the fairest comparison possible, we tested the LFP model against a variety of time models. The time model windows were allowed to vary in size (from 30 ms to 55 ms, covering our 18–34 Hz range) and in offset from saccade (ranging from 40 ms prior to 40 ms after the saccade). As with the LFP model, if a window exceeded the time of saccade on a given trial then the window was truncated at the saccade. The overall results are shown in Fig. S9: the LFP model explained a greater percentage of the variance observed in the data than any of the temporal models tested. Based on this analysis, the best fitting time model was one which had bins of 32.5 ms and an offset of 5 ms before the saccade (i.e. the first window ranged from 70 ms to 37.5 ms before the saccade and the second window ranged from 37.5 to 5 ms before the saccade), as shown in Figure 3.
Supported by NSF grant SBE0354378 and National Institute of Neurological Disorders and Stroke grant R01NS035145. We thank S. Henrickson and M. Wicherski for comments on the manuscript; W. Asaad, J. Roy, M. Siegel, and K. Maccully for technical and other support.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.