|Home | About | Journals | Submit | Contact Us | Français|
We measured cortical activity with functional magnetic resonance imaging to probe the involvement of early visual cortex in visual short-term memory and visual attention. In four experimental tasks, human subjects viewed two visual stimuli separated by a variable delay period. The tasks placed differential demands on short-term memory and attention, but the stimuli were visually identical until after the delay period. Early visual cortex exhibited sustained responses throughout the delay when subjects performed attention-demanding tasks, but delay-period activity was not distinguishable from zero when subjects performed a task that required short-term memory. This dissociation reveals different computational mechanisms underlying the two processes.
Neurophysiological measurements (neuroimaging and single-unit electrophysiology) have shown that visual cortex does not merely serve as the first cortical site for the bottom–up sensory processing of visual information. Specifically, visual cortex can exhibit sustained activity even in the absence of a visual stimulus (Chawla, Rees, & Friston, 1999; Haenny, Maunsell, & Schiller, 1988; Kastner, Pinsk, De Weerd, Desimone, & Ungerleider, 1999; Luck, Chelazzi, Hillyard, & Desimone, 1997; Ress, Backus, & Heeger, 2000; Reynolds, Pasternak, & Desimone, 2000; Silver, Ress, & Heeger, 2007). Such activity is observed when subjects are required to maintain attention during a delay in anticipation of a visual target, implying that these early visual areas receive top–down attentional modulation from higher-level cortical areas. Alternatively, the sustained activity seen in the absence of a visual stimulus might be due to the rehearsal of a short-term memory representation of the anticipated target. Iconic memory operates on a short time scale, on the order of 250–500 ms, or at most 1500ms (Averbach & Sperling, 1961; Landman, Spekreijse, & Lamme, 2003; Sperling, 1960). It remains an open question whether early visual areas can be recruited for tasks requiring memory for longer periods of time.
Psychophysical evidence predicts that visual short-term memory (a component of visual working memory) involves representations in early visual cortex. For some discrimination tasks, subjects can tolerate delay periods up to 30 s between the two stimulus intervals without any apparent decrement in performance; selective interference is produced by presenting task-irrelevant stimuli during the delay with properties related to those of the remembered stimulus, indicating that the memory is closely associated with encoding mechanisms (Bennett & Cortese, 1996; Magnussen & Greenlee, 1999; Magnussen, Greenlee, Asplund, & Dyrnes, 1991; Pasternak & Greenlee, 2005). Likewise, presenting a priming stimulus can produce an improvement in contrast detection thresholds for similar target stimuli presented up to 16 s later (Tanaka & Sagi, 1998a, 1998b, 2000). These effects are highly specific to stimulus attributes such as location, orientation, spatial frequency, and speed, suggesting memory at an early stage in visual cortex.
There is some physiological evidence as well for a neural correlate of short-term memory in early visual cortex, including V1 (Awh et al., 1999; Pasternak & Greenlee, 2005; Pessoa, Gutierrez, Bandettini, & Ungerleider, 2002; Postle, Awh, Jonides, Smith, & D’Esposito, 2004; Supèr, Spekreijse, & Lamme, 2001); however, it is not clear whether the contextual modulation reported by these groups was due to short-term memory per se, attention, or some other cognitive process required for the task (see Section 4). It has been hypothesized that attention and short-term visual memory are linked in terms of their underlying neural mechanisms (see Kastner & Ungerleider, 2000, for a review, and see Section 4 for further references); however, previous studies have not directly compared memory with attention.
We describe here a series of experiments that probed the involvement of early visual cortex in visual short-term memory and visual attention. All of the experimental tasks consisted of two visual stimuli presented at the same spatial location and separated by a variable delay period, but differed in their dependence on attention and short-term memory. The experiments had three key features. (1) Variable delay periods were used, to encourage subjects to maintain short-term memory, spatial attention, or both. The use of variable delays also allowed us to disambiguate the fMRI responses to the visual stimuli from those recorded during the delay (during which no visual stimulus was present). (2) The tasks were designed to place differential demands on short-term memory and attention, but the stimuli were visually identical until after the delay period. This guaranteed that any difference in cortical response during the delay period was not due to different bottom-up sensory processing. (3) The tasks were performed at psychophysical threshold, to encourage subjects to use a strategy that relied on the most sensitive visual mechanisms to perform the tasks.
Our results revealed a dissociation between the activity in visual cortex during maintenance of attention and short-term memory. Much of early visual cortex showed sustained activity throughout the delay when the subjects performed attention-demanding tasks, but not when subjects performed a task that required short-term memory. This dissociation was most robust in V2, V3, LO1 and LO2, but was also found in V1, hV4 and V3A/B. Such a dissociation reveals different computational mechanisms underlying the two processes of attention and short-term memory, and implies that sustained delay-period activity in early visual cortex plays a role in endogenous spatial selection.
Five experienced subjects participated in this study with written consent; one additional experienced subject participated in a psychophysical control experiment, also with written consent. Procedures were in compliance with the safety guidelines for MRI research and approved by the New York University Committee on Activities Involving Human Subjects. Each of the five subjects who participated in the full study were scanned in 7–9 sessions: one session to obtain a high-resolution anatomical volume; one session to measure the retinotopic maps in visual cortex; one session to measure the hemodynamic impulse response function (HRF) in visual cortex; and 4–6 sessions to measure fMRI responses in the delayed-comparison, detection and discrimination tasks.
Stimuli were presented on an LCD flat panel display. Subjects were supine and viewed the display through an angled mirror in the bore of the magnet. Stimuli were sinusoidal gratings presented in an annulus around fixation (inner radius 1 deg, outer radius 3 deg), and were presented briefly (120–200 ms) to minimize light adaptation. Subjects were instructed to fixate a small target (square, 0.33 deg) at the center of the display that was presented continuously to encourage stable eye position. Subjects performed four different tasks, each in a separate scanning session; some subjects needed an extra session to complete some of the tasks. Subjects completed 8–16 runs, for a total of approximately 100 trials, per task condition.
For all tasks, a high-contrast target (40% contrast; spatial frequency randomized from 1–3 cpd; orientation and phase fully randomized) was presented briefly (200 ms), followed by a variable delay period (1–16 s, uniform distribution). The delay period was randomized so that subjects could not anticipate when each trial would end; the randomization was uniform so that there were enough long-duration delays to disambiguate the fMRI response during the delay period from the response at the beginning and end of the trials. Subjects received feedback after each response. Each trial was followed by a long (16 s) inter-trial interval to allow the hemodynamics to return mostly to baseline. The four tasks differed as follows.
Following the presentation of the first visual target stimulus and the variable delay period, a second high-contrast target was briefly presented (200 ms) that differed slightly in both spatial frequency and orientation from the initial target. Subjects were then cued to compare either the spatial frequency or the orientation of the stimuli, and responded with a button press. The spatial frequency and orientation differences between targets were determined for each subject individually by extensive behavioral testing (several hours across several days) prior to imaging to be at threshold (80% correct). During the scanning sessions, thresholds were maintained by a staircase procedure. To discourage the subjects from adopting a verbal strategy and encourage them to adopt a visual memory strategy, the spatial frequency, orientation, and phase of the first stimulus were substantially jittered (spatial frequency was randomized from 1–3 cpd, and orientation and phase were fully randomized) (Lages & Treisman, 1998). Furthermore, subjects did not know until after having seen both stimuli whether they would be asked to compare spatial frequency or orientation. Magnussen et al. found that these comparisons can be performed without detriment at delays beyond 16 s (Magnussen & Greenlee, 1992, 1999; Magnussen et al., 1991; Magnussen, Greenlee, & Thomas, 1996). Behavioral testing on our subjects prior to scanning confirmed that performance was ~80% correct and was stable across delay durations.
Following the presentation of the first visual target stimulus and the variable delay period, a low-contrast target was displayed briefly (120–170 ms) on half of the trials. Display duration and contrast were determined for each subject individually by extensive practice, prior to imaging, to be at threshold (80% correct), and performance level was maintained in the scanner by a staircase procedure. The low-contrast target’s orientation and phase were fully randomized. Spatial frequency was randomized but restricted to range from 2.5 to 3.5 cpd; spatial frequencies within this range are roughly equally detectable (Campbell & Robson, 1968). A final cue then signaled the subject to respond via a button press as to whether the target had been shown. Note that for this task, the initial stimulus did not need to be remembered, but instead merely signaled that the trial had started, thereby cueing the subject to attend to the spatial location defined by the annulus. A variant of this task, in which the cue stimulus constrained the parameters of the low-contrast target, is described next.
To confirm that attention contributed to the detection task, we performed a psychophysical dual-task control experiment in separate experimental sessions outside the scanner. Four subjects (three from the main study, and one additional) performed the detection task along with a rapid serial visual presentation (RSVP) task in the periphery (5 deg eccentricity) (Braun, 1998; Lee, Itti, Koch, & Braun, 1999; Lee, Koch, & Braun, 1997; Sperling & Melchner, 1978). For the RSVP task, the letter R was presented repeatedly at random orientations (rate of presentation was about 4 Hz, adjusted individually for each subject to control task difficulty). Occasionally (on average every 2 s), a mirror-reversed R was shown (at a random orientation), and subjects indicated by keypress if they saw the target. Subjects were instructed to focus primarily on the grating-detection task for two blocks of trials (50 trials per block), and on the RSVP task for two blocks. When attention was diverted by the RSVP task, performance on the detection task was significantly worse (Table 1).
This task was identical to the detection task, except that the low-contrast target was constrained to have the same orientation (but not spatial frequency) as the initial high-contrast stimulus. To control detectability, spatial frequency for the low-contrast stimulus had to be more constrained than for the high-contrast stimulus, as described above.
Following the presentation of the first visual target stimulus and the variable delay period, a low-contrast target was displayed briefly (120–170 ms); half the time it was oriented slightly counter-clockwise of vertical and half the time it was oriented slightly clockwise of vertical. Contrast of the low-contrast target stimulus was set individually to each subject’s threshold as estimated in the detection task; display duration, spatial frequency range, and phase randomization were the same as in the detection task. Offset from vertical was determined individually for each subject by extensive practice prior to imaging to be at threshold (80% correct), and performance level was maintained by a staircase procedure. A final cue signaled the subject to respond via a button press whether the target was oriented to the left or to the right. For this task, as in the detection task, the initial stimulus merely signaled that the trial had started.
We used magnetic resonance imaging at 3T (Allegra, Siemens, Erlangen, Germany) to measure blood-oxygen level-dependent (BOLD) changes in cortical activity. During each fMRI scan, a time series of volumes were acquired using a T2*-sensitive EPI pulse sequence (repetition time 2000 ms, echo time 30 ms, flip angle 82 deg, 3×3×3 mm3 voxels, field of view 192 × 192 × 108 mm3, 36 slices oriented to cover the entire cerebral cortex). We acquired images using custom radio-frequency coils (NM-011 transmit head coil and NMSC-021 four-channel phased-array receive coil, NOVA Medical, Wakefield, MA).
To minimize head motion, subjects were stabilized by use of a bite bar, foam padding, or both. Post-hoc image registration was used to correct for residual motion in the functional data (Jenkinson, Bannister, Brady, & Smith, 2002). Data from the first 10 s of each fMRI scan, during which subjects maintained fixation, were discarded to minimize transient effects of magnetic saturation and to allow the hemodynamics to achieve steady state. Further preprocessing of the fMRI data was as follows. First, we band-pass filtered the time series at each voxel to compensate for the slow drift typical in fMRI measurements (Biswal, Hudetz, Yetkin, Haughton, & Hyde, 1997; Biswal, Yetkin, Haughton, & Hyde, 1995; Biswal, Van Kylen & Hyde, 1997; Purdon & Weisskoff, 1998; Smith et al, 1999; Zarahn, Aguirre, & D’Esposito, 1997). The cutoff frequencies were 0.0125 and 0.125 Hz, making the low frequency cutoff much lower than the inverse of the longest delay period to ensure that the filter did not attenuate the sustained delay-period activity. Next, we divided each voxel’s time series by its mean intensity to convert the data from arbitrary image intensity units to percent signal modulation (and to compensate for the decrease in mean image intensity with distance from the receive coil). The resulting time series were averaged over a region of cortical gray matter corresponding to each of several visual cortical areas. Methods for defining the visual cortical areas are outlined below.
We then performed two complementary analyses of data to estimate: (i) the fMRI response time course for each trial type, and (ii) the response amplitudes for each of the three intervals of the behavioral task (first target presentation, delay period, second target presentation/behavioral response). The data were analyzed separately for each of the predefined retinotopic cortical areas.
We used standard event-related analysis methods (Burock, Buckner, Woldorff, Rosen, & Dale, 1998; Burock & Dale, 2000; Dale, 1999) to estimate the fMRI response time course for each trial type. For each task, we binned the trials into four trial-type bins, depending on the delay-period duration (1–4 s, 4–8 s, 8–12 s, 12–16 s). We calculated the fMRI responses from each scan by trial-triggered averaging, specifically by averaging throughout the region of cortical gray matter corresponding to each visual cortical area, and averaging across trials within each delay-period bin. The fMRI responses were averaged, separately for each subject, across the 8–16 runs of each task condition. The mean responses were plotted with error bars representing the standard error of the mean (SEM) across these 8–16 repeats (i.e., with N = number of trials in each delay-period bin). Fig. 2B shows example fMRI responses from V3 (in two subjects) for trials with different delay-period durations.
We used multiple linear regression to estimate the amplitudes of the responses to each of the three distinct intervals of each task. This was done by constructing a linear model of the underlying neural activity for each 240 s scan (Fig. 2A). Each trial was modeled with three components: (1) a transient at the beginning of each trial (an impulse at t = 0) that reflected the onset of the first visual target stimulus (labeled S1), (2) a sustained component that had a constant amplitude and lasted throughout the delay period (labeled d; duration = 1–16 s) and (3) a transient component at the end of each trial, which captured the presentation of the second visual target stimulus as well the subject’s behavioral response (labeled s2). These three components (s1, d, s2) served as the model of the underlying neural activity, which was then convolved with a model of the hemodynamic impulse response function (HRF) and band-pass filtered in the same way as the measured fMRI data to yield a design matrix with three predictors of the fMRI measurements. The HRF was measured separately for each subject (see below). Linear regression was used to estimate the three response amplitudes that, when multiplied by the three predictors, yielded the best (least-squares) fit to the data. We fit the model to the data from all scans for each experimental condition, separately for each subject and each visual area, to obtain estimates for the three response amplitudes (s1, d, s2). We measured the goodness of fit of the model to the trial-triggered average of the data by calculating r2 = 1 − (variance of the residuals/variance of the data), where residuals = (model – data).
We performed two complementary analyses to confirm that the results were robust. First, a t-test on the delay-period amplitude (d) across subjects was performed to test for difference from zero (Table 3). Second, we computed an index of delay-period activity (d/s1). This analysis was important because the absolute value of the response amplitude (d) was arbitrary (depending, for example, on the choice of normalization for the HRF), and because we were interested in the delay-period activity relative to the stimulus-evoked activity. We performed paired t-tests across subjects on these ratios across experimental conditions (i.e., pairing two of the four tasks), to determine if the different tasks produced different levels of delay-period activity (Fig. 5).
The hemodynamic impulse response function was measured for each subject in a separate experiment (Fig. 3). In these “HRF scans”, target stimuli (identical to those presented as the first visual targets in the main experiments) were presented briefly (200 ms) every 15–19.5 s (onset was jittered to allow for subsequent analysis). During every scan, subjects performed a task at fixation to control their attention. The white fixation point dimmed with random timing, on average every 5 s, and subjects were instructed to press a button when they detected the dimming. Pulse sequence parameters were the same as for the main experiment, except that a TR of 1.5 s was used, and we placed 27 slices covering all of visual cortex. Data were preprocessed as described above, except that the cutoff frequencies for the band-pass filter were 0.0167 and 0.167 Hz; this small difference was due to the different TR. The data from each subject were modeled by a difference of two gamma functions (Glover, 1999; as implemented by the SPM software package, www.fil.ion.ucl.ac.uk/spm/). We calculated a nonlinear least-squares fit of the model to the data to estimate five parameters: amplitude of neural response, delay of response, delay of undershoot, dispersion of response, and dispersion of undershoot. We measured the goodness of fit of the model to the mean response evoked by the stimulus (technically, the mean stimulus-evoked response was computed as a deconvolution of the response timecourse (Dale, 1999)). The HRF measurements were combined across V1, V2 and V3, and a single HRF per subject was used to analyze the data from all visual areas in the main experiments. The best-fit parameters and r2 (goodness of fit) values for each subject are listed in Table 2. We also measured and fit the HRF in each visual area separately for each subject, and reanalyzed the data using separate HRFs for each visual area, with similar results.
A high-resolution anatomical volume was acquired of each subject’s brain using a T1-weighted, 3D-MPRAGE pulse sequence (1 × 1 × 1 mm3 voxels). These anatomical volumes were used to: (1) register the functional data across scanning sessions, (2) restrict the functional data analysis to gray-matter voxels, and (3) computationally flatten the gray matter to create flattened visualizations of the cortical activity (flat maps). Gray and white matter were segmented, and flat maps were computed using custom software (Larsson, 2001).
Each functional fMRI session began by acquiring a set of anatomical images in the same slices as the functional images (T1-weighted, MPRAGE pulse sequence). An image registration algorithm (Nestares & Heeger, 2000) was used to align these in-plane anatomical images to the high-resolution anatomical volume of the observer’s brain (see above), so that the data from a given subject were co-registered across scanning sessions.
Retinotopic visual areas (V1, V2, V3, hV4, VO1, V3A/B, V7, LO1, LO2) were defined using standard methods (DeYoe et al., 1996; Engel, Glover, & Wandell, 1997; Engel et al., 1994; Larsson & Heeger, 2006; Sereno et al., 1995). Briefly, to estimate the angular component of the retinotopic maps, subjects viewed a 45° wedge of flickering checkerboard that rotated slowly (24 s period, clockwise for half of the runs and counterclockwise for the other half) about a central fixation point. Analogously, to estimate the radial component of the retinotopic maps, subjects viewed an annulus of flickering checkerboard that moved radially (24 s period, 25% duty cycle, expanding for half of the runs and contracting for the other half). These stimuli evoked traveling waves of activity across each of the retinotopic areas. We averaged the response timecourses across repeated runs (time-reversing and time-shifting the responses to the counterclockwise and contracting stimuli), fit the average time series with a sinusoid, and calculated the response phase, separately for each voxel. The phase values measured temporal delay of the fMRI responses relative to the beginning of the experimental cycle and, therefore, corresponded to the polar (or radial) component of the topographic map. Visual areas were identified by visualizing the response phases on flat maps, following the conventions of Larsson and Heeger (2006).
Visual area ROIs were restricted to the portion of cortex responsive to the portion of the visual field in which the stimuli were presented. In a separate block-alternation experiment, subjects maintained central fixation while an annulus of flickering checkerboard alternated with a uniform gray screen (12 blocks, 20 s each). These localizers were run at the beginning and end of each scanning session. Data were analyzed by averaging across the repeated scans, and fitting a sinusoid to the time series separately for each voxel. ROIs were restricted by choosing a correlation threshold (r > 0.35) and phase window (2 radians, shifted for each subject to maximize ROI size by compensating for the latency of the hemodynamics) to select contiguous regions of gray matter that responded strongly to the block alternation. Average ROI sizes are listed in Table 3. An additional control ROI was defined corresponding to the periphery of V1 beyond the outer radius of the annulus.
Subjects performed a visual detection task in which the target stimulus was a low-contrast sinusoidal grating displayed in an annulus around fixation (Fig. 1B; see Section 2). An initial high-contrast grating stimulus indicated the beginning of each trial, and was followed by a variable delay period (1–16 s) during which the subject had to maintain spatial attention to detect a low-contrast grating that was presented following the delay period on half of the trials. If early visual cortical areas were involved in spatial selection by boosting the activity of neurons whose receptive fields overlapped with the target stimulus annulus, then activity in those cortical areas would be expected to exhibit spatially selective, sustained activity during the delay period.
Early retinotopic visual cortical areas (including V1, V2 and V3) showed sustained activity during the delay period, replicating previous reports (Kastner et al., 1999; Silver et al., 2007). Example time courses are plotted in Fig. 2; these V3 responses were selected as illustrative because they exhibited the same amount of delay-period activity as quantified by a delay-period index (see below) but with different HRFs for each of the two subjects (see Fig. 3). Trials were sorted into four bins based on length of delay period. For short trials, the response to the initial high-contrast stimulus (which for this task merely indicated the start of the trial) cannot be distinguished from the delay period or the detection period, due to the sluggishness of the hemodynamics. However, for long delay periods, the different phases of the trial can be distinguished. The activity peaked in response to the visual stimulus, and then fell, but did not return to baseline. Rather, it was sustained throughout the delay period and rose again at the end of the trial in response to the low-contrast target, the visual cue indicating the trial’s end, the subject’s behavioral response, and/or the end of the trial itself (as was seen also in the periphery, see below for details). The curve dips closer to baseline for Subject 2 than for Subject 1. This, however, is due to the different shape of the hemodynamic impulse response function (HRF) for the different subjects (Fig. 3), rather than to a difference in delay-period activity, as revealed by the quantitative analysis described below. This example underscores the necessity of independently measuring the HRF for each subject.
To confirm that the sustained activity was spatially selective and related to task demands, we measured activity in subregions of V1 corresponding to the periphery of the visual field outside the stimulus annulus. These peripheral regions did not respond during the delay periods, indicating that the sustained response was not due to global arousal (compare Fig. 4B and C). Instead, these peripheral regions responded transiently only at the end of the trials. Similar transient responses in the periphery have been reported previously (Jack, Shulman, Snyder, McAvoy, & Corbetta, 2006; Shulman et al., 2003; Silver et al., 2007), and might be interpreted as a release from inhibition in areas not involved in the visual task.
It is unlikely that the sustained activity was due to eye movements. We used experienced subjects and the stimuli were presented in an annulus surrounding fixation. There is ample evidence that experienced subjects are able to maintain fixation to within 10–20 min of arc (Kowler, 1990, 1991). Furthermore, the results of a previously reported control experiment (Ress et al., 2000) demonstrated that the best performance was obtained when the eyes were held steady on the fixation point. In this control experiment, the delay-period duration was fixed at 1 s, the stimulus (3° inner radius, 6° outer radius, plaid made of two orthogonal 0.5 cycle/deg component gratings, 0.75 s duration, 4 Hz contrast-reversing) was slightly different from that used in the current detection experiment, and subjects were instructed either to hold central fixation or to move their eyes to the target annulus on each trial.
To quantify the sustained delay-period activity, we estimated the amplitude of the responses in each visual area to the three task components (first visual target stimulus (s1), delay period (d), second visual target stimulus (s2)) using linear regression (see Section 2). We modeled the response using transients for the first and second visual stimuli and a constant-amplitude component for the delay period, convolved with the measured HRF for each subject (Fig. 2A). Our three-component model fit the data well. Supplementary Table 1 lists the r2 values separately for each visual area and each subject.
Delay-period amplitudes (d) were statistically significant in 7 out of 9 ROIs (Table 3). We estimated response amplitudes for the three task components for each subject across all delay periods, and performed a t-test across subjects to determine if the delay-period estimates were significantly different from zero. See Table 3 for the results, combined across subjects, in all nine visual area ROIs and all four experimental conditions. Results for individual subjects are in Supplementary Table 2.
We performed a complementary analysis that quantified the delay-period activity relative to the responses evoked by the visual stimuli (Fig. 5; see Section 2) because the absolute values of the model estimates are not informative (they depend on arbitrary scale factors such as the normalization for the HRF). A delay-period index of 0 would have meant that there was no sustained response during the delay period, and an index of 1 would have meant that the sustained delay-period responses were equal in amplitude to the visually-evoked responses. The average measured delay-period index (DPI) across subjects was 21% in V1, 30% in V2, and 24% in V3. See Supplementary Table 1 for DPI values from each individual subject and visual area.
Task design was similar to the detection task, with the following differences (Fig. 1A; see Section 2). The first visual target stimulus was informative and needed to be remembered. After the variable delay period, a second high-contrast visual target stimulus was displayed, with a slightly different orientation and spatial frequency from the first. Subjects were then cued to compare either the spatial frequency or the orientation of the two visual target stimuli. The spatial frequency, orientation and phase of the first visual stimulus were randomized so as to make it very difficult to solve the task using a verbal strategy rather than a visual one. If early visual cortical areas were involved in maintaining a short-term memory representation of the first visual target, then activity in those cortical areas would be expected to exhibit spatially selective, sustained activity during the delay period.
In contrast to the findings for the detection task, we did not find evidence for sustained activity during the delay period in subregions of early visual cortex that corresponded retinotopically to the stimulus annulus (Fig. 4A). Similar results were found for all measured visual areas (Fig. 5). We quantified the delay-period activity in the same way as for the detection task, and found that the amplitude estimates were not significantly different from zero (Fig. 5, Table 3, and Supplementary Tables 1 and 2). Note that the two tasks were visually identical until after the delay period when the second visual target was presented, so that any difference in neural response must have been due to top-down task demands rather than purely visual stimulus-driven factors. The dissociation seen during the delay period confirmed that the delay-period activity found in the detection task could not be interpreted as trivially resulting from a sustained visual response due to the initial target stimulus. It also adds to our confidence that the sustained activity seen for the detection task was not due to eye movements, nor to general (non-spatially selective) arousal or vigilance. Finally, it highlights the role of uncertainty; when there was no uncertainty as to when or where the (high-contrast) target was shown, no delay-period activity was recorded. Importantly, we did find evidence for sustained delay-period activity in other brain areas during the delayed-comparison task, including in pre-frontal, parietal and occipital cortex; we will report these results in more detail elsewhere.
It is possible that early visual cortex did indeed sustain activity during the delayed-comparison memory task, but that the signal was too small to be recorded by fMRI. This might be expected given that, for the delayed-comparison task, only those neural channels that were tuned to (or near) the orientation and spatial frequency of the initial visual target stimulus would have been needed to remember the stimulus image and successfully complete the task. This is in contrast to the detection task, in which the low-contrast target stimulus could appear at any orientation, so that all orientation channels were needed. To test this possible interpretation, we designed two control experiments. Like the delayed-comparison task, both control tasks depended on a smaller subpopulation of neurons; like the detection task, both control tasks used low-contrast target stimuli requiring the maintenance of spatial attention. The first control was a cued-detection task in which the low-contrast target stimulus was constrained to have the same orientation as the (now informative) high-contrast cue stimulus (Fig. 1C; see Section 2). This task was still a detection task requiring the allocation of attention and its maintenance across a delay, but it was similar to the delayed-comparison task in that those neurons that were tuned to the first stimulus were most relevant for detecting the target. Previous work has shown that presenting similar cue stimuli can improve contrast-detection thresholds for target stimuli presented up to 16 s later (Tanaka & Sagi, 1998a, 1998b, 2000), indicating that subjects are capable of using the orientation information available in the cue to improve task performance. A number of visual cortical areas showed sustained delay-period activity for this task (Fig. 5, Table 3 and Supplementary Table 2), suggesting that the lack of activity found in the delayed-comparison task was not merely due to a smaller population of relevant neurons.
The second control task was a discrimination task at threshold contrast; subjects had to discriminate the orientation of a low-contrast visual target stimulus following the variable delay period (Fig. 1D; see Section 2). Neurons tuned to orientations left and right of vertical would be used to discriminate the slight orientation offsets; therefore, it is reasonable to expect that fewer neurons would be recruited for this discrimination task than for the detection of a target grating that could be displayed at any orientation (Jazayeri & Movshon, 2006). This task also served to test whether the sustained activity seen in the detection experiments was unique to detection. A number of visual cortical areas exhibited sustained delay-period activity during this discrimination task, indicating that the activity was not due to detection per se but rather to the attentional demands of perceiving a very low-contrast stimulus (Fig. 5, Table 3 and Supplementary Table 2).
Psychophysical thresholds and percent correct for all five subjects and all four tasks are summarized in Supplementary Table 3. By design, mean percent correct was 81% (controlled by a staircase procedure; range: 76–91%). We found no evidence for a difference in behavioral performance between the detection and the cued-detection tasks. The orientation threshold was greater for the delayed-comparison task than for the discrimination task, despite the fact that the delayed-comparison task used high-contrast visual stimuli. This confirmed the difficulty of the delayed-comparison task, and also indicated a difference in strategy. For the discrimination task, subjects could compare the percept on each trial with vertical. For the delayed-comparison task, there was no way to use such a strategy because the comparison orientation varied randomly from trial to trial. Instead, subjects had to maintain the orientation of the first stimulus in short-term memory.
Although generally the results were similar across subjects and visual areas, there were some notable differences (Table 3, Supplementary Tables 1 and 2, and Fig. 5). Area VO1 did not show delay-period activity for any of the tasks. The most robust dissociation between sustained delay-period activity for the tasks was observed in V2, V3, LO1 and LO2. V1 and hV4 showed a significant difference in delay-period index between the detection task and the delayed-comparison task, but showed weaker sustained activity during the detection and discrimination tasks, which were statistically significant for some analyses but not for others. V3A/B showed significant delay-period activity for the detection and discrimination tasks, and not for the delayed-comparison task, though the difference in delay-period index between the tasks was not significant. Finally, V7 was not well activated by the stimulus, as reflected in the small ROI size (Table 3); nonetheless, the results in this area are in line with the other retinotopic areas. We also analyzed the data using HRF parameters estimated for each ROI separately, and found the same pattern of results.
It is informative to compare hV4 with LO1, as they are both positioned topographically adjacent to parts of V3, with hV4 located next to ventral V3 and LO1 next to dorsal V3 (Larsson, Landy, & Heeger, 2006); this implies that they might be at comparable levels of the visual hierarchy. In our study, LO1 showed the most robust dissociation in delay-period activity, in terms of statistical significance and consistency across different types of analysis, while hV4 was one of the areas that showed the weakest results in those terms.
In addition to variation across visual areas, there were some differences between subjects. Most notably, one subject (subject 4) out of five did exhibit delay-period activity for the delayed-comparison task (Supplementary Tables 1 and 2), though there was no evidence for differences in this subject’s behavioral performance (Supplementary Table 3).
We repeated all analyses leaving out the shorter trial durations (trials shorter than 2, 4, or 6 s) to test how the results depended on the short delay periods. The short delay periods were included in the experimental protocol primarily to control subjects’ behavior, encouraging them to maintain attention and/or short-term memory continuously. These short delay periods, however, were expected to provide little constraint on the interpretation of the fMRI data. The delay-period index remained essentially the same in all visual areas and all four experiments. As we dropped trials, the statistical significance of some of the comparisons decreased, which was not surprising since there were fewer trials in the analysis, whereas other comparisons gained statistical significance. The results for hV4 were the most sensitive to which trials were included in the analysis, while those for LO1 were the least sensitive, in line with what we found for other analyses. On the whole, the pattern of results remained the same when trials with shorter durations were excluded from the analysis, and supported the same conclusions.
Early visual cortex exhibited spatially selective and sustained activity in the absence of a visual stimulus when subjects maintained attention so as to successfully detect a low-contrast visual target, but did not show evidence of sustained activity while subjects remembered a high-contrast visual stimulus. The dissociation was most robust in V2, V3, LO1 and LO2, but was also found in V1, hV4 and V3A/B. This dissociation might have been thought to reflect a difference in the size of the neural subpopulation needed to perform the tasks. That explanation was ruled out by the significant delay-period activity measured in our two control experiments, supporting the interpretation of a genuine distinction between the neural processing underlying short-term memory and attention.
It is possible that some neurons in early visual cortex did increase their activity during the delay in the delayed-comparison task, but that activity in other neurons decreased at the same time, so that the population average remained unchanged; fMRI cannot reveal such modulations of activity that are washed out in the population. It is in principle impossible to prove a null hypothesis; however, by showing that robust delay-period activity is measured in early visual cortex during an alternate task that is visually identical throughout the delay, we have demonstrated that, at the very least, these early visual cortical areas behave quite differently under the two conditions.
Although the concepts of short-term memory and attention are notionally distinct, consideration of the tasks reveals that they share much in common and that they might be expected to share the same neural processes. The delayed-comparison (short-term memory) task also required that subjects be attentive, and the detection (attention) task required subjects to maintain spatial information about the location of the target. A widely held theoretical framework for attention posits that cueing an observer to attend to a particular visual feature (e.g., vertical orientation) or a particular location in the visual field modulates the activity of a selective subpopulation of neurons, those that respond preferentially to that stimulus feature or location (Reynolds & Chelazzi, 2004). The neural activity can be sustained for a relatively long period of time (tens of seconds), as long as attention is maintained. A similar theoretical framework for visual short-term memory holds that a selective subpopulation of neurons exhibits an elevated response when subjects are cued to remember the features or location of a particular stimulus for later comparison, and that this response is sustained for as long as the memory is maintained (Brody, Romo, & Kepecs, 2003; Fuster, 1997).
Previously, one might have expected the same selective subpopulation of neurons to be active during the delay periods of both tasks. Indeed, it has been hypothesized that visual attention and short-term memory are tightly linked to one another (Awh & Jonides, 2001; Corbetta & Shulman, 2002; Kastner & Ungerleider, 2000). Evidence for the connection between them is two-fold. First, behavioral studies have shown that diverting attention during a delay disrupts spatial short-term memory (Awh, Jonides, & Reuter-Lorenz, 1998; Smyth, 1996; Smyth & Scholey, 1994), and that remembered locations show similar improvement in visual processing to that found for attended locations (Awh et al., 1998; Awh, Smith, & Jonides, 1995). Second, neuroimaging and physiology studies have reported overlap in the cortical regions mediating attention and short-term memory, not only in frontal and parietal cortex (Awh & Jonides, 2001; Corbetta & Shulman, 2002), but also in visual cortex (as discussed below). Nonetheless, our results show a clear dissociation between the delayed-comparison task, in which we did not find evidence for sustained activity in early visual cortex during the delay period, and the detection and discrimination tasks, for which early visual cortex was active even during the visual-stimulus-free delay. It has been hypothesized that short-term memory for location is distinct from short-term memory for object features, such as orientation and spatial frequency (Awh & Jonides, 2001). The results presented here could be understood as evidence in support of such a distinction, because we probed short-term memory for stimulus features whereas Awh and his colleagues studied spatial short-term memory. Unlike those studies, however, our experiments directly compared tasks with differential demands on the maintenance of attention and the maintenance of short-term memory.
A similar conclusion to ours was reached by a recent study reported by the Kastner group (McMains, Fehd, Emmanouil, & Kastner, 2007). They found that sustained activation during an expectation period preceding a visual stimulus did not reflect the stimulus preferences of different visual areas, and concluded therefore that this increased baseline activation did not reflect an activated memory template. However, their study did not explicitly manipulate memory, unlike the experiments we report here.
Psychophysical data suggest that visual short-term memory may involve relatively early visual representations (Bennett & Cortese, 1996; Blake, Cepeda, & Hiris, 1997; Magnussen & Greenlee, 1992, 1999; Magnussen et al., 1991, 1996; Nilsson & Nelson, 1981; Pasternak & Greenlee, 2005; Regan, 1985; Vogels & Orban, 1986). The involvement of early visual cortex in short-term memory implied by these psychophysical studies contrasts with the emphasis on prefrontal cortex in neuroimaging and physiological studies (e.g., Courtney, Petit, Haxby, & Ungerleider, 1998; Haxby, Petit, Ungerleider, & Courtney, 2000; Miller, Erickson, & Desimone, 1996; Ranganath & D’Esposito, 2005), though some previous electrophysiology and neuroimaging studies have reported activity in early visual areas (including V1) during visual short-term memory (Awh et al., 1999; Pasternak & Greenlee, 2005; Pessoa et al., 2002, 2004; Supèr et al., 2001).
The discrepancy in these studies might be attributed (1) to differences in the stimuli, e.g. whether V1 neurons are selectively responsive to the stimuli, or (2) to differences in the tasks, e.g. threshold-difficulty vs. easier discriminations. Using a threshold-difficulty discrimination task encourages subjects to adopt a particular strategy that relies on the most sensitive and robust sensory mechanisms. An easy task, on the other hand, might be performed using any of a variety of strategies, hence any of a variety of neural mechanisms. However, that logic cannot account for our results because we used stimuli that are known to evoke strong, selective responses in early visual cortex, and the tasks in our experiments were performed at psychophysical threshold.
Supèr et al. (2001) reported that cells in primate V1 showed contextual modulation for figure vs. ground that persisted during a delay of 2 s when the stimulus was off; this persistence was only seen when the stimulus was behaviorally relevant, indicating that the activity reflected the active maintenance of information that was needed for the monkey to perform the task. The monkeys made a saccade to the location in which the figure had been shown; thus, the maintenance of spatial attention cannot be distinguished by this task from the maintenance of a working memory representation. Additionally, the sustained contextual modulation they reported was not a sustained increase in activation. For both figure and ground conditions, the neural activity dropped to or below baseline about 250 ms after the stimulus was turned off; however, it dropped more for ground than for figure, and this difference persisted throughout the delay period.
Two groups have reported sustained delay-period activity in extrastriate cortex recorded by fMRI in the absence of a visual stimulus while subjects performed short-term memory tasks (Pessoa et al., 2002, 2004). Though at first it may appear that the results we present here are in conflict with these earlier reports, in fact Postle et al. (2004) interpreted the delay-period activity that they measured as due to attention-based rehearsal, and proposed that this activity reflected spatial selective attention utilized by subjects to succeed at the spatial memory task. Our results are fully consistent with these findings; when spatial attention was employed, we measured activity in early visual cortex during the delay period. However, our experiments, unlike theirs, were specifically designed to distinguish between short-term memory and attention. We found that when differential demands were placed on these two processes, the activity in early visual cortex was different. There are other factors that distinguish the studies presented here from earlier studies, including (1) that we used variable delay periods rather than fixed delays, (2) that we measured the HRF for each subject individually in a separate and independent experiment, and (3) that we defined early visual cortical areas retinotopically. Combined with an experimental design that allowed us to directly compare short-term memory and attention conditions, these features enabled us to distinguish sustained baseline shifts in activity during the delay period from the responses to visual stimuli that came before and after the delay. We found evidence for such a sustained response during attentionally demanding tasks, but not during a mnemonically demanding task.
In understanding why we did not find evidence for sustained activity during the delay period for the delayed-comparison task, we must consider the possibility that iconic memory does recruit early visual cortex, but that success in our task did not allow subjects to adopt a strategy based on iconic memory. We probed memory out to 16 s, whereas iconic memory is limited to a time scale of 250–500 ms, or at most 1500 ms (Averbach & Sperling, 1961; Landman et al., 2003; Sperling, 1960). If iconic memory cannot be sustained over more than a few seconds, it will be a challenge to study it using fMRI techniques, due to individual differences in, and the sluggishness of, the hemodynamics.
Although we did not find evidence in retinotopic visual areas for sustained activity during the delay period for the delayed-comparison task, we did find evidence (to be reported elsewhere) for sustained activity in other occipital areas, as well as in parietal and prefrontal areas. These findings are consistent with the growing literature on sustained cortical activity related to visual short-term memory (Cohen et al., 1997; Courtney, Ungerleider, Keil, & Haxby, 1997; Curtis & D’Esposito, 2003; Desimone, 1996; Fuster & Jervey, 1982; Goldman-Rakic, 1987; Jonides, Lacey, & Nee, 2005; Miller et al., 1996; Pasternak & Greenlee, 2005; Pessoa et al., 2002; Rothmayr et al., 2007; Smith & Jonides, 1998; Todd & Marois, 2004, 2005; Vogel & Machizawa, 2004; Xu & Chun, 2006).
It may be important to draw a distinction between visual short-term memory, in which observers are presented with a stimulus and instructed to maintain a representation of it for later comparison, and mental imagery, in which subjects are instructed to recall a visual representation from long-term memory or to manipulate it (e.g., mental rotation). Mental imagery in the absence of a visual stimulus has been reported to elicit activity in early visual cortex (Kosslyn et al., 1993; Kosslyn & Thompson, 2003; Le Bihan et al., 1993), though others did not find such activity (D’Esposito et al., 1997; Ishai, Ungerleider, & Haxby, 2000; Mellet, Petit, Mazoyer, Denis, & Tzourio, 1998; Roland & Gulyas, 1994). It is not obvious, however, why the maintenance of a visual representation should be different for mental imagery and short-term memory. There are also many differences between our experiments and the mental imagery studies, including differences in the types of stimuli and the fact that our experiments systematically varied the delay durations.
There is conflicting evidence regarding the role of early visual cortex in visual attention. It is widely believed that top–down projections from prefrontal or parietal areas provide the attentional control signals that modulate processing in visual cortex (Corbetta & Shulman, 2002). It has been reported that these prefrontal and parietal areas exhibit sustained delay-period activity while attention is maintained, whereas visual cortical areas respond differently, being active only transiently in the presence of a visual stimulus (Corbetta, Kincade, Ollinger, McAvoy, & Shulman, 2000; Hopfinger, Buonocore, & Mangun, 2000; Shulman et al., 1999). On the other hand, some evidence suggests just the opposite: activity in some parietal areas is transient and hence correlated with shifts in attention, whereas activity in extrastriate visual cortex is sustained and hence correlated with the maintenance of attention (Yantis et al., 2002). Some physiological studies have found increased baseline firing rates for attentional tasks in extrastriate cortex (but not in V1) in the absence of visual stimulation (Haenny et al., 1988; Luck et al., 1997; Mehta, Ulbert, & Schroeder, 2000; Reynolds et al., 2000; Williford & Maunsell, 2006).
In contrast, human fMRI studies of visual attention have consistently found increased activity in early visual cortex, including V1, even in the absence of a visual stimulus (Kastner et al., 1999; McMains et al., 2007; Ress et al., 2000; Silver et al., 2007). fMRI is particularly sensitive to measuring small-amplitude activity, as the hemodynamic response pools over large populations of neurons and integrates over time; this might explain why fMRI studies have been able to consistently measure sustained activation even when physiological studies have not. The discrepancy between these findings might also be due to a difference in task difficulty (Ress et al., 2000). The physiological studies typically employed easy tasks and overtrained subjects, while some of the human fMRI studies used tasks at psychophysical threshold. Our results are, therefore, in line with previous human fMRI studies in finding sustained activity in early visual cortex during the delay period for threshold-difficulty detection and discrimination tasks. Task difficulty, however, is not the only relevant factor; we observed striking differences in sustained activity between tasks in spite of the fact that task difficulty was controlled to be the same for each task.
It has been suggested, based on evidence from fMRI, that attentional modulation effects increase from earlier to more advanced levels of the visual hierarchy (Kastner et al., 1999; Kastner & Ungerleider, 2001; O’Connor, Fukui, Pinsk, & Kastner, 2002; Tootell et al., 1998). In particular, Kastner’s group has found stronger attentional effects in hV4 than in earlier visual areas (1999, 2001, 2002, 2007). This is in contrast to the results we report here, which show a weaker effect in hV4 as compared with V1–V3 (Fig. 5, Table 3, Supplementary Tables 1 and 2). This difference is not simply due to a difference in how we define our attentional index. The unnormalized delay-period activity shows the same pattern; also, we recalculated the attentional index with a variety of denominators (including the response to the second stimulus, and the response to the high-contrast flickering checkerboard in the localizer scans) with the same results. Nor was the difference due to the fact that the HRF estimate was based on V1–V3; we re-analyzed the data using HRF estimates for each ROI separately, with the same pattern of results (see Section 2).
The discrepancy may be due to differences in stimuli and task demands. The Kastner group used large, colorful, complex stimuli in image-recognition and counting tasks, while our study used smaller black-and-white gratings in threshold detection and orientation–discrimination tasks. In the Kastner group’s experimental design, subjects were required to count or respond to the presence of easily detectable target stimuli following a fixed-duration expectation period. Subjects were instructed to attend during this fixed expectation period; however, there was no temporal uncertainty, and attention during this expectation period was not necessary for success in the behavioral tasks. The baseline shift described by the Kastner group refers to sustained activity during this task-irrelevant and fixed-duration expectation period (1999 (2007). It is remarkable, and an indication of the robustness of the effect, that they succeeded in measuring sustained activity even under conditions in which attention allocation was motivated only by the experimenters’ instructions and not by task demands. This is in contrast to the task-relevant attention in the absence of a visual stimulus during a variable delay period that was investigated in the present study and in previous work from our lab (see Section 2, Table 1) (Silver et al., 2007).
Signal detection theory and attentional selection offer a computational framework for understanding our results in terms of their computational demands. One plausible interpretation is that performance in the detection and discrimination tasks benefited from endogenous spatial selection. The low-contrast target stimulus did not provide much information about its form (orientation and spatial frequency), location, or time of onset; such uncertainty might have forced subjects to rely on (top–down) cognitive signals to select the relevant sensory information. We hypothesize that spatial selection is implemented by boosting the relevant neuronal signals in early visual cortex, i.e., those corresponding to the stimulus location. Evidence from psychophysical experiments suggests that perceptual decisions depend on a maximum-of-outputs or winner-takes-all class of decision rules (Baldassi, Megna, & Burr, 2006; Palmer, 1995; Palmer, Verghese, & Pavel, 2000; Shaw, 1980; Verghese, 2001). Under such a decision rule, increased baseline firing rates in our low-contrast detection and discrimination experiments would have improved performance accuracy essentially by selecting only the most relevant sensory signals for decision and action. The delayed-comparison task, on the other hand, involved little need for selection; the target stimulus had high contrast, and therefore provided clear information as soon as it was presented about its form and location. If this interpretation is correct, then we would expect to find comparable spatially selective, sustained delay-period activity not only when detecting low-contrast stimuli on a uniform background but also when scrutinizing a high-contrast background stimulus for a spatially localized change, or equivalently when detecting a spatially localized high-contrast stimulus presented on a large high-contrast background. Hence, our results suggest a specific role for early visual cortex in spatial selection.
Support contributed by: NEI (R01-EY11794), NIMH (R01-MH69880, F31-MH076611), NSF GRF, and the Seaver Foundation.
Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.visres. 2007.12.022