|Home | About | Journals | Submit | Contact Us | Français|
In the visual system, spatial attention enhances sensory responses to stimuli at attended locations relative to unattended locations. Which brain structures direct the locus of attention, and how is attentional modulation delivered to structures in the visual system? We trained monkeys on an attention-switch task designed to precisely measure the onset of attentional modulation during rapid shifts of spatial attention. Here we show that attentional modulation appears substantially earlier in the lateral intraparietal area (LIP) than in an anatomically connected lower visual area, the middle temporal area (MT). This temporal sequence of attentional latencies demonstrates that endogenous changes of state can occur in higher visual areas before lower visual areas, and satisfies a critical prediction of the hypothesis that LIP is a source of top-down attentional signals to early visual cortex.
The understanding of visual and motor function has been aided by having discrete endpoints from which a hierarchical understanding of the neural circuits can begin (the retina and the muscle, respectively). However, many neural processes are not strictly tethered to an externally measurable event. For example, attention can be shifted volitionally in the absence of an eye movement or explicit visual cue. Defining the neural circuits underlying such shifts of internal state are a major outstanding problem in neuroscience.
In humans, functional imaging (Corbetta et al., 1998) and studies of patients with cortical damage (Hillis, 2006) suggest that a circuit localized to frontal and parietal cortex plays a central role in the allocation of spatial attention. In nonhuman primates, several lines of evidence suggest that the lateral intraparietal area (LIP) may be a source of attentional modulation to visual cortex: visual responses in LIP neurons are strongly dependent on attention (Gottlieb et al., 1998), neuronal activity in populations of LIP neurons correlate closely in time with attentional state (Bisley and Goldberg, 2003, 2006), lesions in LIP disrupt performance on attentional tasks (Wardak et al., 2004) and microstimulation in LIP can recapitulate some of the behavioral manifestations of attention (Cutrell and Marrocco, 2002). Here we focus on area LIP and an anatomically connected lower visual area, the middle temporal area (MT) (Lewis and Van Essen, 2000). Recently, it has been observed that the neuronal spike rates and local field potentials between MT and LIP exhibit increased synchronization with attention (Saalmann et al., 2007). However, there is little evidence for a causal relationship between activity in LIP and attentional modulation in the visual system. One simple test of such a relationship is timing – for example, if attentional modulation in MT is dependent on modulation in LIP, then the modulation should arise first in LIP. Here we test that hypothesis. We trained monkeys to shift attention rapidly to and from the receptive field location of the neurons under study, and we measured the time course of the accompanying rapid modulations of neuronal firing. With this approach, we were able to demonstrate that attentional modulation begins ~60 ms earlier in LIP than in MT, consistent with a top-down flow of attentional information.
Two male monkeys (Macaca mulatta) were implanted with a head post, scleral search coil and recording chamber to allow monitoring of eye movements and single neuron recordings. All surgical and experimental procedures were in accordance with Harvard Medical School and National Institutes of Health guidelines.
The attention-switch task is shown in Figure 1A. At the beginning of a trial the stimulus consisted of a central fixation spot and two annuli, one red and one green, in opposite hemifields at equal eccentricity. The annuli were blurred with a Gaussian luminance profile. The monkey had to maintain gaze within a fixation window throughout the trial (2° × 2° square, centered on a fixation spot). After the monkey fixated, there was a 500-ms delay before two fields of coherently moving random dots appeared within the annuli (“dot onset”). The monkey’s task was to detect a transient increase in the speed (53 ms, 4 video frames) of either dot patch (the “speed pulse”) and respond by releasing a touch bar within a requisite time window (200 – 600 ms). The color of the fixation point (red or green) cued the monkey as to which patch (surrounded by red or green annulus) was more likely to contain the speed pulse (85% valid cues, 15% invalid cues). On 40% of trials the fixation point color cue switched at an unpredictable time during the trial to indicate that the likely speed pulse location had switched. Each trial had at most one cue switch. After an initial fixed delay of 400 ms, additional delays until speed pulses and cue switches, as well as between cue switches and speed pulses, were selected randomly from an exponential distribution (mean = 1 s). The location of the initially cued patch and the color of the annuli were alternated in blocks of 50 and 200 trials, respectively. Thus, after 400 trials, every combination of annulus color (in the receptive field) and initially cued location had been tested.
Stimuli were presented on a computer monitor positioned 57 cm in front of the animal (40° × 30°, 75 Hz refresh, 1152 × 870 resolution). Background luminance was near black (0.001 cd/m2). The fixation point was a 0.4 degree diameter red or green circle (luminance in cd/m2: monkey M, red: 2.7, green: 3.0; monkey B, red: 2.4, green: 5.2). Dot-patch stimuli consisted of 100% coherently moving, unlimited-lifetime, random dots. Dots were squares with 0.1-degree sides, at a density of 7 dots/degree2 and moving at 12 degrees/second. Dot luminance was 0.01 cd/m2. We chose to use low contrast stimuli following reports suggesting that attentional modulation is largest at low contrast (Martinez-Trujillo and Treue, 2002; Reynolds and Desimone, 2003). Annuli surrounding the moving dot patches were 0.5 degrees thick and separated from the perimeter of the dot patches by 0.5 degrees. The annuli were blurred with a Gaussian luminance profile to reduce edge effects (peak luminance in cd/m2: monkey M, red: 0.2, green: 0.4; monkey B, red: 0.3, green: 0.3).
Where possible, dot patches were placed in the center of the receptive field of the recorded neuron. Eccentricities ranged from 5 to 16 degrees. The dot-patch motion in the receptive field was set in the neuron’s preferred direction as determined by a direction-mapping task that we ran before the main task for each neuron. The other dot patch was always placed at the equivalent position reflected across the fixation point and had the opposite direction of motion. The size of the dot patches was scaled with eccentricity (ranging from 4.5 to 9.4 degrees in diameter). The magnitude of the speed change was chosen to maintain valid correct performance in the target range (65 – 75% correct) and varied from session to session (range for monkey B: 1.6x to 2.5x, range for monkey M: 1.35x to 1.7x).
The recording chamber was placed at stereotactic coordinates P3 L10, which allowed a dorsal approach to areas MT and LIP. The chamber was outfitted with a guide-tube/grid system (Crist Instrument). MRI was used to confirm sulcal anatomy and chamber placement. Single unit recordings were conducted using tungsten microelectrodes (Frederick Haer & Co, 75 µm diameter, 5 MΩ impedance). Single unit action potentials were isolated using a dual window discriminator (Bak Electronics) and recorded at 1-ms resolution. Horizontal and vertical eye position were monitored using a scleral search coil (Riverbend Instruments) and recorded at 200 Hz. Spike and eye-position recording, stimulus presentation and task control were handled by a Macintosh computer running custom software with a computer interface (ITC-18, Instrutech Corporation).
MT and LIP cells were identified by reference to sulcal anatomy and characteristic physiology. MT cells were characterized by highly direction-selective receptive fields with diameters roughly equal to eccentricity (Maunsell and Van Essen, 1983b, 1987). LIP cells were characterized by robust, spatially-tuned responses in a memory delayed saccade task (Colby et al., 1996). Additionally, cells were considered within the target area if they were encountered between cells with characteristic properties. All such stably isolated units were recorded. For monkey M, the majority of the MT population was recorded after the LIP population. Near the end of recording smaller populations from each area were recorded in interleaved sessions. For monkey B, MT cells were recorded in three roughly equal sets of sessions occurring before, in the middle of, and after the LIP sessions.
A goal of this study was to compare the latency of attentional modulation between MT and LIP neurons. Ideally this comparison would be made between simultaneously recorded neurons. However, a critical concern in comparing the response timing between two neurons is that the visual stimulus should be comparably placed inside the receptive field of both neurons. If instead the stimulus is optimally placed for one neuron but not the other, the response timing could differ between the two neurons based solely on differences in effectiveness of stimulation (Schall et al., 2007). The problem is exacerbated if the neurons are also direction selective -- as are most MT and LIP neurons (Fanini and Assad, 2008) – and their preferred directions are not aligned. Thus the probability of finding simultaneous pairs of neurons that are well-matched for stimulus-response properties is extremely low. In addition, because the magnitude of attentional modulation is typically only a fraction of the baseline response rate, accurately determining the onset latency of the modulation necessitates averaging across many trials, which is only possible if individual neurons can be stably isolated for long periods (the median number of trials per neuron in our study was 657 with a range of 200 – 1960). We thus opted to record from single MT and LIP neurons in order to optimize the visual stimulation and quality of the neural recordings. However, it is therefore possible that intersession differences (instead of inter-areal differences) may contribute to observed differences in MT and LIP neural responses. We present several analyses below and in the Supplementary Results that argue that this is not the case.
Spike-rate functions for individual cells were generated by convolving 1-ms-binned histograms with a Gaussian (sd = 20 ms). Population spike-rate functions were calculated by averaging individual neurons’ spike-rate functions. For no-switch trials, data were included up until the speed pulse began. For switch trials, data were included from 400 ms after dot patch onset (to exclude the onset transient from the pre-switch data) until the speed pulse. All trials were included in the analysis. On trials with early releases or fixation breaks, data were included up until 300 ms before the release or break.
For each cell we first determined the magnitude of attentional modulation for the ongoing neuronal response before and after a cue switch (Herrington and Assad, 2009). We computed an attentional modulation index equal to (RIN − ROUT) / (RIN + ROUT), where RIN and ROUT are the neural response in spikes/second when attention is directed in or out of the receptive field, respectively. We only measured the single unit attentional latency for cells with attentional indices pre- or post-switch greater than 0.03. To determine the latencies for individual units, we used three methods (described below). A confidence interval for each latency estimate was assessed with a bootstrap technique. For each cell, we randomly selected with replacement a number of trials equal to the original number of trials for the cell. The latency was determined on this new data set. This was repeated 1000 times, and the standard deviation of the resulting latencies was taken as the standard error of the original latency value (Efron and Tibshirani, 1994). Latencies were included in further analyses if the standard error was less than 70 ms. The three methods to determine latencies for individual units were as follows:
Most LIP and MT neurons responded with a burst of activity at dot onset, followed by a sustained response. However, for many cells the sustained response increased or decreased over time, which complicated our ability to determine a baseline from which to measure the onset of attentional modulation. For example, for a neuron whose activity is decreasing over time, a baseline taken over a window 300 ms before the switch will overestimate the actual spike rate immediately before the switch.
The deviation-threshold method was designed to address this problem. We first estimated what the peri-switch spike-rate function would have been had the cue switch not occurred for both the attend-in and attend-out conditions (the “expected spike-rate functions”). We began with the spike-rate function aligned on dot onset from all attend-in or attend-out trials using data up until the time of a cue switch (on switch trials) or speed pulse (on no-switch trials). This represented the neuron’s spike rate in the absence of cue switches or speed pulses (the “unperturbed spike-rate function”). Because cue switches occurred at variable times after dot onset, the neural activity aligned on the cue switch is composed of many different time windows after dot onset. For each trial we selected a window beginning 400 ms after dot onset and ending at the time of the speed pulse— the cue switch occurring at a variable time in between. The peri-switch spike-rate function was the average of all of these trials’ windows after aligning on the time of the cue switch. In parallel, the expected spike-rate function is the average of the same set of time windows with the same alignment, but with data taken from the unperturbed spike-rate function.
Because the expected spike-rate function is calculated by averaging the unperturbed spike-rate function many times, it exhibits greatly reduced noise. As expected, the actual pre-switch spike-rate functions matched closely with our expected spike-rate functions and only deviated from expectation after presentation of the cue switch (Fig. S1). For each neuron, variation in the actual spike-rate function was estimated by calculating the standard deviation over a window from −300 ms to 0 ms relative to the cue switch. For OUT-IN latencies, the latency threshold was the expected spike-rate function plus 3 standard deviations. For IN-OUT latencies, the latency threshold was expected spike rate function minus 3 standard deviations. The attentional latency was considered the earliest time after the cue switch at which the actual spike-rate function crossed the latency threshold and remained there for at least 50 ms.
The spike-rate-threshold method (Maimon and Assad, 2006b) is similar to the deviation-threshold method, but does not take into account slopes in the spike-rate function. For this method, we used the actual peri-switch spike-rate function. We calculated a baseline mean and standard deviation over a window from −300 ms to 0 ms relative to the cue switch. For OUT-IN latencies, the latency threshold was the baseline mean plus 3 standard deviations. For IN-OUT latencies, the latency threshold was baseline mean minus 3 standard deviations. The attentional latency was considered the earliest time after the cue switch at which the spike-rate function crossed the latency threshold and remained there for at least 50 ms.
For the slope-threshold method, we detected upward or downward deflections in the spike-rate function independent of the absolute spike rate. We used the same peri-switch spike-rate function as above but differentiated it to look at spike-rate slope. A baseline standard deviation of the differentiated function was calculated from −400 to 0 ms relative to cue switch. The attentional modulation was considered to begin at the earliest point after which the slope crossed a one standard deviation threshold and remained there for 40 ms. Only positive slopes were accepted for OUT-IN latencies, and only negative slopes for IN-OUT latencies.
We detected microsaccades in the eye-position records using an adaptation of previously described techniques (Martinez-Conde et al., 2000; Herrington et al., 2009). Eye-position records were differentiated and smoothed with a 25 ms sliding boxcar. Eyes were considered to be moving if the velocity was greater than 5°/s, and stopped otherwise. Additionally, eyes were considered to have stopped if any two subsequent velocity measurements differed in direction by more than 30°. The remaining moving epochs were considered saccades if they were at least 10 ms in duration, 0.05° in length and if there had not been a saccade in the previous 20 ms. Peak saccade velocity and saccade magnitude were linearly related (the main sequence). Accuracy of the saccade algorithm was further confirmed by visual inspection of raw eye-movement traces for a subset of the data.
We trained two monkeys on an attention-switch task (Fig. 1A, see Methods). The monkey fixated a point at the center of a computer monitor and was cued to attend to one of two peripheral patches of coherently moving dots in order to detect a near-threshold transient speed increase (the “speed pulse”). The two dot patches were surrounded by blurred annuli, one red and one green. The color of the fixation point (red or green) cued the monkey as to which dot patch was more likely to contain the speed pulse on that trial (85% at cued patch, 15% at uncued). On 40% of randomly interleaved trials the fixation point color switched mid-trial to indicate that the likely speed pulse location had switched, inducing the animal to shift his focus of attention. The colors of the annuli were alternated in blocks so that red-green and green-red fixation point changes had no fixed relationship to the direction of the cued attentional shift. After an initial fixed delay of 400 ms, additional delays until speed pulses or cue switches were drawn from an exponential distribution (mean = 1 s) so that the monkey could not predict event timing (Luce, 1986).
Having near-threshold speed pulses at both the cued and uncued location allowed us to assess the monkey’s attentional state during performance of the task. Consistent with previous work in monkeys and humans (Posner, 1980; Ciaramitaro et al., 2001; Cook and Maunsell, 2002), both monkeys exhibited increased detection frequency and decreased reaction times for speed pulses at the cued relative to uncued location (Fig. 1B–E). In response to the cue switch the behaviorally favored location switched to the newly cued patch, consistent with the animal using the cue to redirect spatial attention to the most behaviorally relevant location. A detailed behavioral time course and a comparison between the behavior and neurophysiology have been described previously (Herrington and Assad, 2009).
We recorded from 118 LIP neurons (55 from monkey M, 63 from monkey B) and 67 MT neurons (36 from monkey M, 31 from monkey B) during performance of the attention-switch task. For each neuron, one dot patch was placed in the neuron’s receptive field and the other at equal eccentricity reflected across the fixation point. Figure 2 shows the responses of two single neurons (one LIP and one MT) during the task. The onset of the dot stimuli triggered a transient response followed by a sustained response that was increased when the dot stimulus in the neuron’s receptive field was cued relative to when it was uncued (Fig. 2, left panels). We quantified the magnitude of the attentional modulation using an attentional index (AI, see Methods) which was highly significant in both areas for both monkeys (median AI for combined pre-and post-switch data, monkey M, MT: 0.08, LIP: 0.17; monkey B, MT: 0.02 LIP: 0.06, all p < 0.001, signed-rank test for zero median) (Herrington and Assad, 2009).
On switch trials, we observed a rapid reversal of attentional modulation shortly after cue switches, reflecting the reallocation of spatial attention (Fig. 2, right panels). The rapid modulations in the neuronal responses after a cue switch were present in single neurons in both MT and LIP (Fig. 2). A similar response pattern was evident in the population average response in both areas and for both monkeys (Fig. 3, Fig. S2). We compared the time course of this attentional shift between the LIP and MT populations. There were two types of switch trials: those requiring switching attention from out of the receptive field to in (OUT-IN) and the reverse (IN-OUT). Figure 4 plots the baseline-subtracted activity for 152 individual neurons aligned on OUT-IN cue switches. This is the subset of the recorded neurons (82% of the total) that exhibited at least a minimal degree of modulation by attention (see Fig. 4 legend). Even viewed in this raw form it is evident that the increase in spike rate, reflecting the shift of attention into the neuronal receptive field, occurred for most LIP responses (gray) well before even the earliest MT responses (black).
In Figure 5A,B we show the population average spike rate function for OUT-IN shifts of attention. The onset of attentional modulation occurred in LIP ~60 ms earlier than in MT (Fig. 5A,B). To quantify this trend, we determined the attentional latency for the individual neurons in the population using three methods (see Methods). We favored the deviation-threshold method as it dealt naturally with variability in the slope of spike-rate functions encountered in single neurons (Fig. S1). Consistent with the population average, LIP neurons were modulated significantly earlier than MT neurons (Fig 5E,F; LIP vs. MT median in ms, rank-sum test, monkey M: 166 vs. 228, p < 0.001, monkey B: 230 vs. 281, p = 0.002). The spike-rate-threshold method assayed threshold-crossing departures from the pre-switch spike rate, and did not take into account ongoing nonstationarities in the pre- and post-switch activity that are factored into the deviation-threshold method. Nonetheless the difference in population latencies between LIP and MT was consistent between methods (Fig. 5H,I). We also used a third method, the slope-threshold method, which detected the upward or downward deflection of spike rate after cue switches on OUT-IN or IN-OUT trials, respectively. As expected, this method gave earlier values for the attentional latencies because it was sensitive to the earliest deflection in the spike-rate function, whereas the other two methods required the spike rate to climb above a statistical threshold before the latency was noted. However, the values were shifted equally for the MT and LIP populations, maintaining the key finding that LIP neurons were modulated earlier (Fig. 5K,L).
Is the same latency difference evident for shifts of attention out of the receptive field? We could not address this question in monkey B due to a small, transient decrease in spike rate shortly after the cue switch (Fig. 3B,D) (see below). The presence of this dip obscured the latency of endogenous attentional disengagement from the receptive field location in monkey B. In monkey M, where there was no such dip in activity, attentional disengagement on IN-OUT trials also began earlier in LIP than in MT (Fig. 5C,G,J,M; median in ms, ranked sum test, LIP: 305, MT: 348, p = 0.03). We also observed that for both MT and LIP, the OUT-IN switches occurred earlier than IN-OUT switches (e.g., Fig. 2), similar to previous results in V1 (Khayat et al., 2006) and V4 (Motter, 1994). This was clear in the MT and LIP population averages (Fig. 3), as well as across the population of single neurons (median difference between IN-OUT and OUT-IN latencies in ms, paired sign-rank test, MT: 136, p < 0.001, LIP: 142, p < 0.001).
Because the MT and LIP neurons were recorded in separate sessions (to ensure that stimulus location and direction were optimized for the receptive field of each neuron; see Methods), it is possible that inter-session differences in behavior or stimulus placement between MT and LIP recording sessions could have led to the observed latency differences. However, we found that these intersession differences were small and could not account for our findings (Supplementary Results).
For both animals the attentional modulation was greater in LIP than in MT (rank-sum test for difference in median AI, monkey M, p = 0.002; monkey B, p < 0.001). One potential concern is that the larger magnitude of attentional modulation in LIP could favor earlier detection of attentional modulation in LIP than MT, even if the underlying latency were the same in the two populations. To control for this possibility we selected a subpopulation of LIP neurons in each monkey that had the same average magnitude of attentional modulation as the MT population. This magnitude-normalized LIP population was still modulated by attention ~60 ms earlier than the MT population, similar to the entire population (Fig. 6A,B). A more formal approach based on multiple regression produced a similar result, as follows. We defined the magnitude of attentional modulation for individual neurons as the difference in spike rate between two time windows: −400 to 0 ms before the cue switch and 400 to 800 ms after the cue switch. As expected, on a neuron-by-neuron basis there was a slight negative correlation between the magnitude of attentional modulation and the attentional latency, though it only reached statistical significance for monkey M’s LIP population (r = −0.3, p = 0.05, all other p > 0.25). To test whether differences in the magnitude of attentional modulation on a neuron-by-neuron basis could explain the interareal difference in attentional latency, we regressed the single neuron attentional latencies against two variables, the cortical area (MT or LIP) and the magnitude of attentional modulation, either separately or together. Addition of the cortical area to a regression model with the magnitude of attentional modulation alone significantly increased the fraction of explained variance (Table 1, monkey M: p = 0.001, monkey B: p = 0.006, partial F test). Furthermore, including the magnitude of attentional modulation in the regression had minimal impact on the regression coefficient or 95% confidence intervals for the areal variable alone. This analysis suggests that differences in the magnitude of attentional modulation likely contributed only 5 – 10 ms of the observed 60 ms interareal difference.
Monkey B exhibited a small dip in neural activity at short latency after cued switches of attention both into and out of the receptive field (Figure 3B,D). We quantified the dip in activity for individual neurons as the difference in mean spike rate between two time windows, 0–50 ms and 125–175 ms after the cue switch for OUT-IN trials. These windows corresponded to a baseline epoch immediately after the cue switch and the trough of the LIP population dip in monkey B, respectively. Consistent with Figure 3, the dip in activity was far more evident in monkey B than monkey M (median dip – baseline activity in spikes/s, Wilcoxon signed rank test; monkey B: LIP: −3.8, p < 0.001, MT: −4.5, p < 0.001; monkey M: LIP: 0.3, p = 0.35, MT: −1.7, p = 0.01). On average, following the start of dot-patch motion neural responses tended to decrease over time, perhaps due to spike-rate adaptation. This slight downward slope results in an overestimate of the magnitude of the dip responses. To address this concern we performed a similar analysis comparing activity from 125–175 ms post-cue-switch to the same time epoch from the expected peri-switch spike-rate function calculated for detecting single unit attentional latencies (deviation-threshold method). Using this approach, which normalizes for the expected slope of the spike-rate function, monkey B’s dips remain while monkey M’s do not (monkey B: LIP: −2.3, p < 0.001, MT: −2.2, p < 0.001; monkey M: LIP: 1.4, p < 0.001, MT: −0.2, p = 0.68).
The origin of the dip is unclear. One possibility is that the dip could have resulted from eye movements within the fixation window occurring in response to the cue switch. However, elimination of trials with small eye movements within 500 ms after the cue switch did not alter the magnitude of the dip (Fig. 7B). Others have observed a similar dip in LIP and FEF and attributed it to the resetting of cumulative processes (Sato and Schall, 2001; Roitman and Shadlen, 2002). Alternatively, the dip may reflect the monkey’s attention being drawn to the fovea in response to the cue before shifting to the cued stimulus location (Busse et al., 2008).
We also considered whether the OUT-IN latency difference between MT and LIP in monkey B could be explained by some difference in the dip in activity between the two areas. For example, perhaps MT has a larger dip that occludes its attentional effect for longer. Several observations argue against this possibility. First, the peak magnitude of the dip was roughly constant between the two areas, but LIP appeared to recover earlier than MT (Fig. 5B). Second, monkey M’s data were largely uncontaminated by the dip, yet exhibited the same robust timing difference (Fig. 5A). Third, eliminating from the population those cells with the largest peri-switch dips did not disrupt the interareal latency difference (Fig. 6C,D).
Neuronal responses in LIP are modulated by the planning and execution of eye movements (Barash et al., 1991b, a). Therefore, we considered the possibility that some difference in responses related to microsaccades could underlie the apparent difference in attentional latency between LIP and MT neurons. For example, the animals might reliably make microsaccades in response to the cue switch. We thus identified microsaccades in the raw eye movement traces, using previously described methods (Martinez-Conde et al., 2000). Contrary to our expectation, for monkey B the microsaccade rate decreased after cue switches, such that only 4% of OUT-IN switch trials had a detectable microsaccade within 500 ms after the cue switch (compared to 11% in the 500 ms before the cue switch). Eliminating these 4% of trials did not alter the post-switch physiology in either MT or LIP (Fig. 7B). In monkey M, noise in the eye position signal complicated detection of the smallest microsaccades. Using the same microsaccade criterion as for monkey B resulted in many false-positive microsaccade detections as determined by visual inspection of eye movement traces, with 26% of OUT-IN switch trials being flagged as having a microsaccade within 500 ms after a cue switch. Even after using this conservative criterion to eliminate trials with potential microsaccades, there was no discernable effect on the onset of attentional modulation in either area (Fig. 7A).
LIP has been proposed to act as a two-dimensional map of visual salience that serves as a source of attentional modulation to visual cortex (for review see Goldberg et al., 2006). Anatomically, LIP is appropriately situated. It projects broadly to both ventral and dorsal stream visual areas, including robust reciprocal connections with area MT (Blatt et al., 1990; Baizer et al., 1991; Lewis and Van Essen, 2000). One prediction of this hypothesis is that during a shift of attention activity in LIP must be updated before attentional modulation shifts in visual cortex. We tested this prediction using an attention-switch task that allowed us to measure the onset of attentional modulation in single neurons in LIP and MT. In fact, attentional signals appeared ~60 ms earlier in LIP than MT. These results illustrate a general approach by which differences in timing can help define hierarchical relationships between brain areas subserving cognitive functions (Miller and D'Esposito, 2005).
We would also expect attentional modulation to arise earlier in LIP than in MT at the beginning of the trial after dot stimulus onset. However, attentional modulation arose gradually over this time period, presumably because the fixation-point and annuli-color cues were present before the onset of the dot stimuli and because there were no speed-pulse test stimuli in the first 400 ms after stimulus onset. Thus a precise attentional latency could not be determined at the start of the trial.
One potential limitation of our study is that the LIP and MT recordings were done in separate sessions. We did this to ensure that the visual stimulus would be optimized with respect to the receptive field location and preferred direction of motion for each neuron in the study. However, it is therefore possible that some of the inter-areal difference in attentional latency could be due to differences between the LIP and MT recording sessions. To counter this possibility we performed the recordings in interleaved blocks of sessions in order to minimize the potential impact of behavioral drift across sessions. Furthermore we demonstrated in the Supplementary Results that differences in stimulus and behavior across sessions were small and, when present, were unable to account for the observed inter-areal difference in attentional latency.
Recently, Saalmann et al. also suggested that attentional signals flow from LIP to MT based on their observation that LIP exhibits a slight phase lead (5–7 ms at ~35Hz) in the spike train coherence between LIP and MT during sustained attention (Saalmann et al., 2007). The authors suggested that 5–7 ms is roughly consistent with expected axonal transmission delays between the areas. Notably, this is markedly less than the ~60 ms difference in attentional latency we observed. A delay of 60 ms does not seem consistent with a simple feedback mechanism limited by axonal propagation. It is possible that the mechanisms underlying the onset of attentional modulation in our study differ from those at play during sustained attention. However, the inter-areal coherence observed by Saalmann et al. was generally weak (only 10 of their 29 MT-LIP cell pairs exhibited significant coherence, even using a generous p < 0.1 cutoff), and the authors’ estimate of the phase relationship was presumably also noisy, although they did not quantify the reliability of their phase estimate for individual pairs. More generally, assigning a direction to the “flow” of activity is not straightforward for an oscillatory process. For example, their observed phase-lead of 6 ms at 35 Hz (29 ms period) is equivalent to phase-leads of 35 ms, 64 ms, etc. or phase-lags of −23 ms, −52 ms, etc (i.e., plus or minus multiples of the period). Saalmann et al. did present a second analysis showing that MT spikes more frequently follow within 15 ms of LIP spikes in the attentional versus non-attentional conditions, but that analysis apparently did not account for the increase in the overall spike rate in the attentional conditions.
Several other areas of the brain have been implicated as sources of attentional modulation including the frontal eye fields (FEF) (Moore and Armstrong, 2003; Moore and Fallah, 2004; Wardak et al., 2006) and superior colliculus (SC) (McPeek and Keller, 2004; Muller et al., 2005). LIP, FEF and SC are an interconnected set of brain regions with multiple roles including executing saccadic eye movements and covert shifts of attention. Microstimulation (Cutrell and Marrocco, 2002; Cavanaugh and Wurtz, 2004; Moore and Fallah, 2004; Muller et al., 2005) or inactivation (McPeek and Keller, 2004; Wardak et al., 2004; Wardak et al., 2006) of any of the three areas have behavioral effects consistent with the enhancement or disruption of attention, respectively. Furthermore, microstimulation of FEF produces effects in area V4 mimicking several key aspects of attentional modulation including stimulus-dependent enhancement of visual responses and improved stimulus discriminability (Moore and Armstrong, 2003; Armstrong et al., 2006; Armstrong and Moore, 2007; Ekstrom et al., 2008). These effects may be carried directly by FEF projections to V4, through an intermediate area such as LIP or the pulvinar, or by a combination of the two. The hierarchical relationship between FEF and LIP is not presently clear. The pattern of anatomic connectivity suggests either lateral connectivity or frontal-to-parietal feedback (Felleman and Van Essen, 1991; Stanton et al., 1995). Favoring frontal-to-parietal feedback, stimulus selectivity may appear earlier in frontal than parietal cortex in tasks requiring endogenous shifts of attention (Buschman and Miller, 2007; Grent-'t-Jong and Woldorff, 2007).
In contrast to these and our present results, a recent study by Khayat et al. described attentional signals arising at equivalent latencies after stimulus onset in FEF and V1 in a curve-tracing task (Khayat et al., 2009). Because portions of the FEF and V1 data sets were collected in separate experiments from different animals their findings must be interpreted with caution. Nevertheless, their results are consistent with the hypothesis that the origin of attentional signals may depend on the behavioral demands of the task (Buschman and Miller, 2007; Khayat et al., 2009). A curve-tracing task, which requires high-resolution spatial information to identify the attentional target, may elicit earlier attentional modulation in a cortical area like V1 that can supply that information. In contrast, in our task the fixation-point color cue is unlikely to have been discriminated in MT, an area that has poor chromatic sensitivity (for review see Gegenfurtner and Kiper, 2003). Rather, the color-change cue signal likely arises first in other color-sensitive areas (V1, V2, V4, IT, etc.). One possibility is that attentional modulation may have occurred in LIP before MT in our experiment because of the inputs to LIP from visual areas in the temporal visual pathway, such as V4 and IT (Blatt et al., 1990; Lewis and Van Essen, 2000). In contrast, if the post-cue dip in spike rate for monkey B reflects an exogenously cued shift of attention to the fixation point (Busse et al., 2008), it is notable that there was no difference in the latency of the dip between MT and LIP. This may represent a difference between endogenous and exogenous attentional shifts in our data set.
Given the direct anatomic projection from LIP to MT (Lewis and Van Essen, 2000), it is perhaps surprising that the difference in attentional latencies in our data was as large as 60 ms. In contrast, the difference in median visual response latencies to the onset of the dim moving dots (median MT vs. LIP latency, rank-sum test, monkey M: 114 vs. 111 ms, p = 0.93, monkey B: 115 vs. 144 ms, p = 0.007) or to a brighter direction-mapping stimulus (monkey M: 67 vs. 83.5 ms, p = 0.007; monkey B: 58 vs. 77 ms, p = 0.003) was at most 20–30 ms (see Supplemental Results for details). There are several possible explanations. Additional delays may result from attentional signals being transmitted via intermediate areas such as the pulvinar (Maunsell and Van Essen, 1983a; Ungerleider et al., 1984; Baizer et al., 1993). Alternatively, both areas could be receiving attentional signals from a common source such as the FEF that either requires longer to arrive in MT or longer to establish its impact on the local circuit in MT than in LIP. More generally, it may not be appropriate to conceptualize of the spread of attentional signals by analogy to monosynaptic transmission. The spread of attentional signals may depend on establishing patterns of activity across larger networks of neurons for which interareal transmission delays are only one factor. In support of this idea, a recent study demonstrated that sustained activity in LIP can be modeled by recurrent excitatory networks, perhaps flexibly spanning multiple cortical areas (Ganguli et al., 2008). Such a model is consistent with studies showing increased intra-areal and inter-areal coherence with attention (Fries et al., 2001; Saalmann et al., 2007; for review see Womelsdorf and Fries, 2007). Given the reciprocal nature of the connectivity, it is perhaps simplistic to describe any process as exclusively feedforward or feedback. For example, in our study although the attentional modulation began earlier in LIP, there was still some temporal overlap between the two areas.
Aside from changes in the magnitude and timing of attentional effects, the overall time course of the attentional shift is remarkably similar between LIP and MT (Fig. 3). In both areas the onset of attentional modulation occurred earlier than the offset, although we were only able to quantify this effect for one of our animals. This is similar to observations made previously in V1 (Khayat et al., 2006), MT (Busse et al., 2008) and V4 (Motter, 1994), and strengthens the argument that this asymmetry is a general feature of the mechanisms governing the onset and offset of attention throughout visual cortex.
Several aspects of LIP activity are well suited to its hypothesized role as a source of attentional signals to visual cortex (for review see Goldberg et al., 2006). The response to visual stimuli is rapid (Bisley et al., 2004) and highly dependent on the salience of the stimulus (Gottlieb et al., 1998; Constantinidis and Steinmetz, 2001, 2005; Ipata et al., 2006). Spatial representations in LIP are also flexible—short latency, spatially specific responses can occur in response to visual and auditory cues (Mazzoni et al., 1996) or predicatively before impending saccades (Duhamel et al., 1992).
However, other studies have proposed alternative functions for LIP activity including representing internal decision variables (Platt and Glimcher, 1999; Roitman and Shadlen, 2002; Huk and Shadlen, 2005; Maimon and Assad, 2006a; Churchland et al., 2008), the log likelihood of reward (Yang and Shadlen, 2007), relative economic value (Dorris and Glimcher, 2004; Sugrue et al., 2004), hazard rate (Janssen and Shadlen, 2005) and time (Leon and Shadlen, 2003). We have assumed that the modulations we observed in LIP and MT reflect the same underlying process, but it is also possible that these modulations are fundamentally different. If so, the differences in timing that we observed may reflect independent and unrelated processes. Although we cannot rule out this possibility, a more parsimonious explanation is that these quite similar modulations reflect a common process. For example, in all of the LIP studies described above, the animal’s attention would presumably be drawn toward or away from the receptive field location of the neuron under study depending on the particular variable that is manipulated (e.g., reward, hazard rate, etc.). Indeed, attentional modulation in visual area V4 is known to reflect both the hazard rate (Ghose and Maunsell, 2002) and task difficulty (Spitzer et al., 1988; Boudreau et al., 2006) in a manner that is difficult to distinguish from many of these modulations observed in LIP (Maunsell, 2004).
Until we better understand the function of the activity in each of these areas, we must be cautious in interpreting our data and extending our conclusions to other paradigms. Nevertheless, using a modification of a common spatial attention paradigm we found that the modulation in LIP substantially leads that in MT. This is consistent with the proposed attentional signal in LIP driving the subsequent signal in MT, but not vice versa.
This work was supported by two grants from the National Eye Institute, EY-12106 to JAA and Vision Core Grant EY12196, and from the Medical Scientist Training Program to TMH (T32 GMO7753-26).