|Home | About | Journals | Submit | Contact Us | Français|
While a core function of the working memory (WM) system is the active maintenance of behaviorally relevant sensory representations, it is also critical that distracting stimuli are appropriately ignored. We used functional magnetic resonance imaging to examine the role of domain-general WM resources in the top-down attentional modulation of task-relevant and irrelevant visual representations. In our dual-task paradigm, each trial began with the auditory presentation of six random (high load) or sequentially-ordered (low load) digits. Next, two relevant visual stimuli (e.g., faces), presented amongst two temporally interspersed visual distractors (e.g., scenes), were to be encoded and maintained across a 7-sec delay interval, after which memory for the relevant images and digits was probed. When taxed by high load digit maintenance, participants exhibited impaired performance on the visual WM task and a selective failure to attenuate the neural processing of task-irrelevant scene stimuli. The over-processing of distractor scenes under high load was indexed by elevated encoding activity in a scene-selective region-of-interest relative to low load and passive viewing control conditions, as well as by improved long-term recognition memory for these items. In contrast, the load manipulation did not affect participants' ability to upregulate activity in this region when scenes were task-relevant. These results highlight the critical role of domain-general WM resources in the goal-directed regulation of distractor processing. Moreover, the consequences of increased WM load in young adults closely resemble the effects of cognitive aging on distractor filtering [Gazzaley et al., (2005) Nature Neuroscience 8, 1298-1300], suggesting the possibility of a common underlying mechanism.
Many of our behavioral goals necessitate that information be maintained in an active and available state for a brief period time via a cognitive process known as working memory (WM) (Baddeley & Hitch, 1974). Given the severe capacity constraints on WM (Cowan, 2005), it is not only important that task-relevant information be adequately attended and encoded, but it is also critical that task-irrelevant information is appropriately filtered (Vogel, McCollough, & Machizawa, 2005). Selective attention provides a mechanism for the enhancement of neural activity in sensory regions representing task-relevant information, a process thought to be driven by goal-directed top-down modulatory influences (Desimone & Duncan, 1995; Kanwisher & Wojciulik, 2000; Kastner & Ungerleider, 2000). However, it is less clear whether the attenuation of task-irrelevant sensory activity is also accomplished by an actively mediated control process (Engle, Conway, Tuholski, & Shisler, 1995; Knight, Staines, Swick, & Chao, 1999; Tsushima, Sasaki, & Watanabe, 2006), or whether it is an indirect consequence of limited attentional resources being allocated elsewhere (Pinsk, Doniger, & Kastner, 2004; Rees, Frith, & Lavie, 1997).
In an effort to gain insight into the nature of these attentional control mechanisms and their involvement in WM (Awh, Vogel, & Oh, 2006; Olivers, 2008), we sought to determine how individuals' ability to selectively upregulate and downregulate visual activity during a visual WM task would be influenced when their attentional resources were partially occupied by a non-visual WM task. We adapted a visual WM task paradigm (Gazzaley, Cooney, McEvoy, Knight, & D'Esposito, 2005) in which a series of face and scene images are presented on each trial, and participants are either instructed to encode the scenes while ignoring the faces, encode the faces while ignoring the scenes, or to passively view the images without trying to remember them. Recent functional magnetic resonance imaging (fMRI) results from our laboratory demonstrated that individuals are able to selectively enhance activity levels in visual association regions representing behaviorally relevant visual stimuli and suppress activity in regions representing irrelevant visual stimuli, relative to activity levels during a passive viewing control condition (Gazzaley, Cooney, McEvoy, et al., 2005). The present study utilized this general task paradigm, but participants were faced with the additional demand of encoding a sequence of auditorily-presented digits before each trial and maintaining this information throughout the trial.
If domain-general WM resources are necessary to ensure that distracting stimuli are ignored, then irrelevant visual representations should be insufficiently filtered when these resources are depleted by the concurrent digit WM task, despite the different sensory modalities of the two tasks. Alternatively, if distractor filtering results from limited processing resources being reallocated to the active encoding and maintenance of relevant stimuli, then irrelevant visual representations should be even more suppressed by the high load digit WM task. A previous fMRI study (de Fockert, Rees, Frith, & Lavie, 2001) provided compelling evidence consistent with the former scenario, but only assessed the consequences of WM load on distractor processing and not on relevant target stimuli. This made it difficult to determine whether WM resources are necessary to maintain one's attentional processing priorities per se (Lavie, Hirst, de Fockert, & Viding, 2004), in which case the attentional enhancement of relevant stimuli should also be diminished, or whether distractor filtering is disproportionately impaired by WM load. Moreover, the WM load-dependent increase in distractor processing was obtained in a task paradigm where strong Stroop-type conflict (Stroop, 1935) existed between simultaneously presented visual targets and distractors. The design of the present fMRI study allows for detailed assessment of how non-visual WM load influences attentional up-modulation and down-modulation of distinct visual representations under circumstances where targets and distractors are presented sequentially and neither semantically conflict, nor directly compete for limited visuospatial processing resources.
17 volunteers (5 females), ages 18-27 years (mean age = 21.7) participated in this study after providing written informed consent in accordance with a protocol approved by the University of California, Berkeley Committee for the Protection of Human Subjects. All participants were right-handed and had normal hearing as well as normal, or corrected-to-normal, vision. Participants were prescreened for the presence of medical, neurological, or psychiatric illnesses, and the use of prescribed drugs. One participant exhibited head motion during the fMRI session that was well beyond our quality assurance criteria, and his data were thus excluded from all analyses.
The first task participants performed in the scanner was a face/scene localizer, designed to isolate face- and scene-selective regions of interest in visual association cortex. Participants attentively viewed alternating 16 sec blocks of face stimuli, scene stimuli, and rest periods (7 blocks of each condition) and were instructed to indicate whenever an infrequent 1-back match (i.e., an immediate stimulus repetition) occurred by pressing buttons with both forefingers.
Each run began with a text display instructing participants to either i) Remember Faces, ii) Remember Scenes, or iii) Passively View the visual stimuli (Fig. 1). The instructions were to be applied to all trials of the upcoming run, and they remained on the screen until the participant made a button press indicating that s/he was ready to begin the run, at which point scanning commenced. At the beginning of every trial, regardless of which task condition they were performing, participants listened to a sequence of six spoken digits presented over MR-compatible headphones (Confon HP-SI01, Magdeburg, Germany). Volume levels were adjusted before the experiment began to ensure that the digits were clearly perceived against the background of the scanner noise. The audio files were recorded from a male speaker, with a separate file for each digit (0-9). The digit stimuli lasted an average of 750 ms each, yielding a total duration of approximately 4.5 sec for the sequence. Participants were required to fixate on a centrally displayed fixation cross while listening to the digits and trying to remember them. On 50% of the trials, the digit sequence consisted of six unique randomly ordered digits (high load condition). On the other 50% of the trials, the digit sequence was “1,2,3,4,5,6” (low load condition). Our choice of this particularly easy low load sequence was motivated by the goal to ensure that bottom-up stimulus content was balanced across conditions (six auditory digits on all trials), the desire to replicate the effects of a previous version of this experiment that did not include the digit WM task (Gazzaley, Cooney, McEvoy, et al., 2005), and the fact that a conceptually similar study used this type of ordered sequence for its low load condition (de Fockert, et al., 2001). Our decision to use six digits was based on a related dual task study that found significant behavioral effects with this level of digit load (Lavie, et al., 2004). Within each run, high and low load trials were pseudo-randomly intermixed.
At exactly 6 sec after trial onset (approx. 1.5 sec after the last digit), four grayscale images were presented sequentially. Each image was displayed for a duration of 800 ms, with a 200 ms blank interval elapsing between stimuli. Two of these images were photographs of novel faces, and two were photographs of novel landscape scenes. The order in which the face and scene stimuli were presented was pseudo-randomized and counterbalanced, with all possible orderings allowed. All images were 225 pixels wide by 300 pixels tall, and subtended approximately 5 by 6 degrees of visual angle. The face stimuli were cropped to an ovoid shape and smoothly blurred along the contours, so that only the internal features of the face were visible. Both male and female faces were used, although the sex of the face stimuli used within each trial was held constant.
Depending on the instructions given at the beginning of the run, participants attempted either to remember the two faces or to remember the two scenes, while trying their best to ignore the two irrelevant images; in the Passively View condition, they passively viewed the four images without explicitly attempting to encode them into memory or to ignore their presence. After the fourth image, a fixation cross appeared, and participants tried to hold the relevant images, if any, in mind across a 7 sec delay period, while continuing to maintain the digit sequence. Next, a probe image appeared for a duration of 1 sec. In the Remember Faces and the Remember Scenes conditions, the probe stimulus was either one of the two relevant images from the initial set (50% of trials) or a novel image (50% of trials), though always of the relevant stimulus class. Participants made a button press response indicating whether or not they recognized the probe image. In the Passively View condition, an arrow appeared as the probe, and participants made a response indicating whether the arrow was pointing to the left or right. In all conditions, 1.5 sec after the termination of the probe stimulus, a single digit appeared on the visual display for 1 sec, and participants made a button press response indicating whether or not that digit was a member of the initial series. A variable duration inter-trial interval (ITI) of 3.5, 5.5, 7.5, or 9.5 sec (mean = 6 sec) then elapsed while participants maintained fixation and waited for the next trial to begin. The fixation cross turned red 500 ms before the start of each trial to alert participants to attend to the upcoming digit sequence. Each Remember Faces and Remember Scenes run consisted of 16 trials, and each Passively View run had 14 trials. The order in which participants performed the three task conditions was counterbalanced across participants, but each participant experienced a fixed run order (e.g., Remember Scenes, Passively View, Remember Faces) that repeated 3 times for a total of 9 runs. At the end of each run, participants were provided with feedback on their performance the digit and image memory tasks.
Approximately 5 min after conclusion of scanning, participants performed a surprise recognition memory test, in which they were presented with a subset of 138 of the scene images they had encountered while in the scanner, intermixed with 138 novel scene stimuli. One scene image was chosen for recognition testing from each of the trials they had performed in the scanner, with the criterion that the image only appeared once in the course of the experiment (i.e., it was never presented a second time as a probe stimulus). Images were displayed one at a time on a computer monitor, and participants indicated their level of recognition on a 4-point scale: 1 = definitely new; 2 = probably new; 3 = probably old; 4 = definitely old. Given our a priori plans to focus our fMRI analyses on enhancement and suppression of scene representations, based on that fact that previous studies utilizing a similar task paradigm had found modulatory effects to be more robust in scene-selective brain regions than face-selective brain regions (Gazzaley, Cooney, McEvoy, et al., 2005; Gazzaley, Cooney, Rissman, & D'Esposito, 2005), we chose only to test post-experiment memory for the scene images. This self-paced recognition task took participants approximately 10 min to complete.
MR data were collected on a Varian INOVA 4.0 Tesla scanner (Palo Alto, CA) equipped with a transverse electromagnetic send-and-receive radio frequency coil. Functional data were acquired using a two-shot T2*-weighted blood oxygen level dependent (BOLD) sensitive echo-planar imaging (EPI) sequence (TR = 2 sec, TE = 28 ms, FOV = 22.4 cm2, matrix size = 64 × 64, in-plane resolution = 3.5 × 3.5 mm). Each functional volume consisted of 18 axial slices of 5 mm thickness with a 0.5 mm interslice gap and provided nearly whole-brain coverage. Each functional run was preceded by 8 sec of dummy gradient RF pulses to achieve steady-state tissue magnetization. Anatomical images coplanar with the EPI data were collected using a gradient-echo multislice (GEMS) sequence (TR = 2000 ms, TE = 5 ms, FOV = 22.4 cm2, matrix size = 256 × 256, in-plane resolution = 0.875 × 0.875 mm). E-Prime software (CMU, Pittsburgh, PA) was used to present stimuli and record responses, which participants made using an MR-compatible button box.
Following acquisition, MRI data were reconstructed, and a phase map correction was applied to remove Nyquist ghosts. Data were corrected for between-slice timing differences using a sinc interpolation method and were re-sampled to 1 sec temporal resolution (half of the total TR) by combining each shot of half k-space with the bilinear interpolation of the two flanking shots. Subsequent processing was performed using SPM2 software (www.fil.ion.ucl.ac.uk) run under Matlab 6.5. Functional data were realigned to the first volume acquired using a 6-parameter rigid body motion correction procedure and were spatially smoothed with an 8 mm FWHM Gaussian kernel.
Scene-selective ROIs, corresponding to the parahippocampal place area (Epstein & Kanwisher, 1998), were functionally defined for each participant in native space (i.e., not spatially normalized) based on contrasts generated from the localizer task data. These ROIs, were defined separately in the right and left hemisphere as the cluster of 40 contiguous voxels exhibiting highest t-values from the Scene > Face contrast within the parahippocampal and lingual gyri, often extending posteriorly into the fusiform gyrus. To provide a summary of the mean anatomical localization of these individually-defined ROIs, we performed a group-level analysis of the spatially-normalized fMRI data from the localizer task, which yielded the following MNI coordinates of peak Scene > Face effects: left hemisphere [-30, -40, -14], right hemisphere [30, -42, -12]. Our fMRI analyses focused exclusively on activity in scene-selective regions because they respond strongly to scene stimuli and minimally to face stimuli (Downing, Chan, Peelen, Dodds, & Kanwisher, 2006), and thus provide a robust and highly selective marker of scene processing (Gazzaley, Cooney, McEvoy, et al., 2005). Although face-selective regions, such as the fusiform face area (Kanwisher, McDermott, & Chun, 1997), do show a statistically greater response to face stimuli than scene stimuli, the response profile is not nearly as selective to faces as scene-selective regions are to scenes (Downing, et al., 2006). Given that face-selective regions still respond quite strongly to scene stimuli, they do not provide a clean marker of face processing in this task (i.e. to the extent that face-selective voxels still contribute to the cortical representation of scene stimuli, the enhancement or suppression of face representations will be partially counteracted by the concomitant suppression or enhancement of scene representations in these voxels).
Task-dependent activity changes were assessed by analyzing trial-averaged BOLD timecourses, in an effort to stay as close as possible to the raw fMRI data and preserve temporal information (see also Jha & McCarthy, 2000). The fMRI timeseries of all voxels within each ROI was extracted and high-pass filtered (cutoff period = 128 sec). A summary timeseries was generated for each ROI by taking the first eigenvariate across all voxels. The timeseries was then converted to percent signal change, which was done separately for each run to prevent differences in mean signal across runs from influencing the results. Trial-averaged timecourses were created by averaging across all trials of a given condition, beginning with the onset of the digit sequence and taking a 30 sec window of data for each trial. For display purposes, timecourse graphs were interpolated with a cubic spline function. Condition-specific activity changes were assessed by averaging selected 3 sec segments of the timecourse data and performing planned comparisons using two-tailed paired samples t-tests.
The full set of descriptive statistics for the behavioral data from the digit and image tasks are shown in Table 1. Response times (RT) were computed based on correct trials only. Two-tailed paired samples t-tests were used to assess the significance of load effects on behavioral performance. Behavioral data from the Remember Faces and Remember Scenes conditions were combined to increase statistical power, given that, despite a numerical trend toward an increased load effect on the Remember Scenes error rate, there were in fact no significant difference across these conditions in the magnitude of the load effects on error rate (Remember Faces load cost (1.82%) vs. Remember Scenes load cost (8.07%): t(15) = 1.63, p = 0.12) or RT (Remember Faces load cost (23.0 ms) vs. Remember Scenes load cost (19.8 ms): t(15) = 0.17, p = 0.87). Performance on the image WM memory task was significantly affected by the digit load manipulation (Fig. 2), with increased errors (t(15) = 3.03, p = 0.008) and slower RTs (t(15) = 2.59, p = 0.021) to the probe images under the high load condition as compared to the low load condition. As a confirmation that maintenance of 6 random digits (high load) was indeed more challenging than 6 sequentially-ordered digits (low load), participants made significant more errors (t(15) = 3.06, p = 0.008) and responded significantly slower (t(15) = 4.91, p = 0.0002) to the probe digits that occurred on Remember Faces and Remember Scenes trials. The digit load manipulation did not affect performance in the Passively View condition of the image task (error rate: p = 0.33; RT: p = 0.17), where participants had to indicate the right/left direction of an arrow and performed at ceiling (<1% errors). Error rates on the digit WM task were also not significantly influenced by digit load on Passively View trials (p = 0.13), but as with the other conditions, participants were significantly slower in the high load condition (t(15) = 3.06, p = 0.008).
Long-term memory for scene stimuli encountered in the scanner was indexed by participants' 4-point recognition ratings from the post-experiment recognition test (Fig. 3). The rating data reveal that participants completely failed to recognize the scene stimuli they had encountered during the low load variant of the Remember Faces condition (henceforth referred to as the Ignore Scenes condition because the data are discussed with respect to the consequences of attention on scene representations). These to-be-ignored scene stimuli were not rated significantly different from novel stimuli (p = 0.55) and were given significantly lower recognition ratings than scenes encountered during Passively View (t(15) = 4.18, p = 0.0007). This pattern provides support that these task-irrelevant scenes were successfully ignored by participants. However, scene stimuli encountered during the high load variant of the Ignore Scenes condition were remembered significantly better than novel stimuli (t(15) = 3.82, p = 0.0015), and recognition ratings were no different than those for scenes encountered during Passively View (p = 0.52). A direct contrast between the low and high load variants of the Ignore Scenes condition confirmed that these to-be-ignored scenes went on to be recognized significantly better when encountered under high load (t(15) = 3.79, p = 0.0016), suggesting that participants overly encoded task-irrelevant visual stimuli when faced with high digit load. In contrast to this load-related improvement in subsequent memory for to-be-ignored scenes, recognition ratings for the to-be-remembered scenes (Remember Scenes condition) showed a marginally significant decrease for scenes encoded under high load versus low load (t(15) = 2.00, p = 0.062). This indicates that the long-term memory representations formed during the encoding of relevant visual stimuli encountered under high load were slightly less robust than those formed for stimuli encountered under low load. For both the high and low load conditions, scenes encoded during Remember Scenes were remembered significantly better than those encountered during Passively View (low load: t(15) = 2.70, p = 0.015; high load: t(15) = 2.25, p = 0.039). Subsequent memory for scenes encountered during Passively View was not significantly affected by load (p = 0.26), and ratings were significantly greater than for novel stimuli (low load: t(15) = 3.48, p = 0.0031; high load: t(15) = 2.72, p = 0.015).
Trial-averaged BOLD timecourses from the right and left hemisphere scene-selective ROIs revealed a similar pattern of activity, and hence we chose to average the data across hemispheres and perform all statistical tests on the resulting bilateral ROI (see also Yi, Woodman, Widders, Marois, & Chun, 2004) (Fig. 4). In all conditions, this ROI showed a relatively flat BOLD timecourse during the initial auditory digit presentation stage of the task. The BOLD response attributable to image processing begins to rise approximately 8 sec into the trial (2 sec after the first image appears). Image processing activity peaks at 12-13 sec, and then gradually declines until it rises again in response to the probe image (presented at 17 sec) and probe digit (presented at 19.5 sec). At the probe stage of the trial, the bottom-up stimulus content is no longer held constant across conditions, and thus the inherent scene selectivity of this ROI is readily apparent. Inspection of the image processing activity reveals a selective WM load effect on Ignore Scenes activity. From the top panels of Fig. 4, one can see that on low load trials, Ignore Scenes activity peaked at the same level as Passively View, whereas on high load trials, Ignore Scenes activity peaked well above the Passively View level. This selective load effect on Ignore Scenes activity is readily apparent when the low load and high load timecourses for each task condition are plotted in matched pairs (bottom panel).
While our model-free timecourse-based analysis theoretically affords us the ability to perform statistical contrasts at every time point, we avoided this approach due to issues of multiple comparisons and temporal autocorrelation. We opted instead to divide the portion of the BOLD timecourse most attributable to image processing into two distinct 3 sec epochs (Fig. 5). A series of planned random effects statistical contrasts (two-tailed paired-samples t-tests) were performed on the averaged data from each epoch to compare activity levels for each individual task condition across loads, as well as to separately compare the Remember Scenes and Ignore Scenes conditions with the Passively View baseline within each load. The first epoch (11-13 sec after trial onset) was meant to capture peak image encoding activity and was chosen by identifying the peak time point from the attention-neutral Passively View condition (12 sec, regardless of load) and averaging it with the two temporally adjacent time points. The second epoch (15-17 sec) was meant to capture late encoding and maintenance-related activity. It was chosen by selecting the last time point before the probe image was presented (16 sec) and averaging it with the two adjacent time points (probe-related activity did not begin to rise until 19 sec, and thus did not influence this second epoch).
In both the low and high load conditions, Remember Scenes activity was significantly greater than Passively View activity during both the first epoch (low load: t(15) = 3.83, p = 0.00006; high load: t(15) = 5.03, p = 0.0000002) and the second epoch (low load: t(15) = 3.97, p = 0.00004; high load: t(15) = 4.63, p = 0.000002). This represents attention-dependent enhancement of activity in this scene-selective ROI when scenes are relevant. This modulatory effect is present throughout encoding and maintenance and is uninfluenced by the digit load. However, the load manipulation did influence the activation timecourse of the Ignore Scenes condition. In the low load condition, Ignore Scenes activity did not differ from Passively View activity during the first epoch (p = 0.99), while in the high load condition, Ignore Scenes activity rose significantly above the Passively View level (t(15) = 2.63, p = 0.019). A direct contrast of low and high load Ignore Scenes activity during the first epoch also demonstrated this load-dependent effect (t(15) = 3.39, p = 0.004), while the equivalent low versus high load contrasts for Remember Scenes and Passively View did not reveal significant load effects (all p's > 0.45). This result indicates that the load manipulation selectively influenced Ignore Scenes activity, with participants over-activating this scene-selective ROI under the high load condition. During the second epoch, Ignore Scenes activity fell significantly below the Passively View level in both the low and high load conditions (low load: t(15) = 2.55, p = 0.022; high load: t(15) = 2.82, p = 0.013). Direct contrasts of low versus high load activity did not reveal any significant differences during the second epoch (all p's > 0.7). While we believe our model-free timecourse analysis provides the clearest presentation of our data, it should be noted that the same general pattern of effects was obtained when a general linear model (GLM) analysis was used to estimate activity during the image encoding phase of each task condition.
We sought to determine whether having participants perform an auditory/phonological WM maintenance task concurrently with a visual WM task requiring selective attention would influence their ability to modulate activity in visual regions. Hypothesis-driven fMRI analyses focused on BOLD activation timecourses obtained from a scene-selective ROI, which was identified as the most robust marker of top-down modulation in a previous variant of this task (Gazzaley, Cooney, McEvoy, et al., 2005). The digit WM load manipulation had no effect on visual WM-related activity in the scene-selective ROI during the Remember Scenes or Passively View conditions, but peak encoding activity in the Ignore Scenes (Remember Faces) condition was significantly elevated during high load trials relative to low load trials. This indicates that when taxed by high WM load, participants failed to appropriately attenuate processing of the irrelevant visual stimuli. The load-induced increase in distractor processing was further evidenced by the fact that participants showed better incidental long-term recognition memory, as assessed by a post-experiment recognition test, for irrelevant scenes they had previously encountered under high load than those that had appeared in low load trials.
The finding that irrelevant visual stimuli were excessively processed when participants had to concurrently maintain a high non-visual WM load suggests that distractor filtering, at least under these circumstances, requires active cognitive control. If attenuation of task-irrelevant scene representations in the Ignore Scenes condition was merely a consequence of attentional resources being allocated to remembering faces, then activation of scene representations should have been further attenuated under high load by the additional attentional demands posed by digit maintenance. Thus, one can infer that the digit WM task usurps attentional control resources, presumably mediated by frontal and parietal lobe regions, that are involved in the maintenance and implementation of task goals and the associated top-down modulation of sensory processing (Bunge, Ochsner, Desmond, Glover, & Gabrieli, 2001; Kane & Engle, 2002; Mayer, et al., 2007; McNab & Klingberg, 2008). Our findings highlight the mechanistic overlap of WM and selective attention (Olivers, 2008), and are generally consistent with the results of de Fockert and colleagues (2001), who also observed an increase in task-irrelevant visual processing when a WM load of four digits was imposed. In their study, famous face stimuli were always irrelevant, and participants had to judge whether the written name superimposed over each face referred to a politician or pop star. The profession associated with the names was often incongruous with that of the faces, creating Stroop-type conflict (Stroop, 1935). Their results showed that fusiform activity, associated with face processing, increased under high WM load, but since the face stimuli were never task-relevant and they did not functionally localize a visual word form region, they were unable to draw conclusions about how the load manipulation influenced the enhancement of relevant visual information.
Our experimental approach offers the opportunity to expand upon these findings, given our ability to assess the consequences of non-visual WM load on both task-relevant and task-irrelevant visual processing, relative to a passive viewing control condition. Participants in our task were fully able to enhance the activation of relevant visual representations under high load, suggesting that increased WM load did not result in a generalized loss of modulatory control. Rather, the high load digit WM task selectively interfered with participants' ability to suppress or disengage their attention from task-irrelevant visual stimuli. Importantly, our results were obtained in a task setting in which there was no direct competition between relevant and irrelevant visual stimuli. Our stimuli were never simultaneously present, and thus did not compete directly for visuospatial attention and processing resources, nor were they structured to generate any Stroop-like semantic conflict with each other, unlike the stimuli and tasks used in previous studies (de Fockert, et al., 2001; Egner & Hirsch, 2005; Lavie, et al., 2004). Our results thus suggest that the active cognitive control of distractor processing is not limited to situations in which explicit conflict between targets and distractors must be overcome.
Our findings are consistent with a number of behavioral studies investigating the role that WM resources play in inhibitory control. Performance on the antisaccade task is adversely affected by cognitive load, with participants making significantly more reflexive errors (looking towards the target when they should look away from it) when performing a concurrent mental arithmetic task (Roberts, Hager, & Heron, 1994) or 2-back auditory letter WM task (Mitchell, Macrae, & Gilchrist, 2002). Performance on the go/no-go task is also impaired by letter WM load (Hester & Garavan, 2005). These studies suggest that WM resources are important for the suppression of prepotent responses. In the context of our task, control processes are likely needed to override the prepotent urge to attend to every picture. WM load manipulations have also been shown to influence negative priming effects, with the magnitude of negative priming, a putative index of inhibitory processing (Neill, 1977), decreasing as verbal WM load increased (Conway, Tuholski, Shisler, & Engle, 1999; Engle, et al., 1995). Moreover, recent neuroimaging work utilizing within-subject conjunction analyses revealed that WM tasks and inhibitory control tasks often involve overlapping neural components (McNab, et al., 2008).
Given that non-visual WM load selectivity impaired the regulation of task-irrelevant visual processing in our study while sparing control of task-relevant visual processing, it might seem that the high load digit WM task taxed cortical regions necessary for inhibitory control. However, we are hesitant to claim that the observed load effects reflect changing levels of top-down inhibitory signaling per se. Inspection of the raw activation timecourses (Fig. 4) and their epoch-specific summaries (Fig. 5) suggests that enhancement of task-relevant activity (Remember Scenes > Passively View) occurs earlier than the suppression of task-irrelevant activity (Passively View > Ignore Scenes). Interestingly, the load-dependent increase in task-irrelevant activity occurs during this earlier period (where the BOLD signal indexes initial image processing and encoding), with a similar temporal profile to that of the enhancement effect. Thus, the apparent WM load effect on distractor suppression may be alternatively conceptualized as inappropriate enhancement of task-irrelevant visual representations. This early over-activation of irrelevant representations on high load trials is counteracted later in the trial timecourse by successful suppression. During this later period, where BOLD signals presumably represent a mixture of residual stimulus processing activity and maintenance activity, the Ignore Scenes activation level drops significantly below that of the Passively View condition during both high and low load trials. This supports the view that modulatory attentional signals continue to exert their influence on the activation levels of specific posterior visual representations during WM maintenance (Lepsien & Nobre, 2007; Lewis-Peacock & Postle, 2008; Ranganath, Cohen, Dam, & D'Esposito, 2004; Ranganath, DeGutis, & D'Esposito, 2004). However, despite this late emergence of distractor suppression on high load trials, the early over-activation of irrelevant scene representations may be responsible for the stronger long-term memory representations formed for these scenes. This behavioral load effect is illustrated by the results of the surprise post-experiment recognition task (Fig 3), in which participants reported no recognition of scenes they had encountered during low load Ignore Scenes trials (i.e., they were rated as being no more familiar than novel scenes and significantly less familiar than scenes from Passively View), while reporting a significantly stronger level of recognition for scenes encountered during high load Ignore Scenes trials.
Our findings raise the question of why the high load digit WM task selectivity influenced the processing of task-irrelevant visual stimuli, while exhibiting little effect on the processing of task-relevant images. It is unlikely that the WM load manipulation caused participants to lose track of whether faces or scenes were relevant for a given trial, since trials from the three attentional conditions (Remember Faces, Remember Scenes, and Passively View) were blocked into separate scanning runs that lasted over six minutes each, giving participants ample opportunity to adopt a stable attentional set. Moreover, a generalized failure to maintain attentional processing priorities would have also resulted in diminished enhancement of relevant stimuli under high load. It is more plausible that the WM demands posed by the high load condition consumed critical attentional control resources necessary for implementing the efficient disengagement of one's attention from the irrelevant images. When performing the visual WM memory task blocks (i.e. not the Passively View condition) participants probably defaulted to a goal state in which they devoted a certain amount of attention to each image when it was first presented to maximize their opportunity to effectively encode it in the 800 ms allotted, should it be of the relevant stimulus class. In the low load condition, participants were readily able to prevent attention from being further allocated to images they determined to be irrelevant distractors, resulting in activation levels that peaked no higher than those obtained during passive viewing and subsequently dropped significantly below this baseline level. However, in the high load condition, the resource-demanding processes of assessing relevancy and/or terminating the perceptual analysis of distractors were likely delayed in their deployment. This resulted in an early over-processing of the irrelevant visual stimuli, whose neural representations were not successfully suppressed until later in the trial.
Had we designed this experiment with trial-unique attentional cues specifying whether faces or scenes were relevant, it is possible that the digit load manipulation would have resulted in a more general failure of task goal maintenance (Kane, Bleckley, Conway, & Engle, 2001; Kane & Engle, 2003), where the enhancement of relevant visual representations would also be compromised under high WM load. It is worth noting that our post-experiment recognition memory task did reveal a marginally significant decrement in the strength of long-term memory representations for those task-relevant scenes encountered under high load, relative to their low load counterpart. This suggests that at least a small cost of divided attention on the long-term encoding of relevant visual stimuli is present in our data, despite its failure to influence the degree of attentional enhancement observed in our scene-selective ROI.
However, given the disproportionate cost of WM load on distractor filtering, there may be something fundamentality different about the process of disengaging attention from distractors that makes this cognitive operation particularly vulnerable to fail when attentional control resources are depleted. Elegant behavioral work by Dosher and Lu (2000) on the control of spatial attention has suggested that the process of filtering out irrelevant perceptual features (external-noise exclusion) is mechanistically separable from the process of enhancing the representations of relevant stimuli. The importance of a goal-directed distractor filtering mechanism is also suggested by recent neuroscientific work indicating that that the efficacy with which individuals can prevent task-irrelevant stimuli from being encoded into memory is a critical determinant of individual differences in WM capacity (McNab & Klingberg, 2008; Vogel, et al., 2005). The apparent dissociation between the enhancement of relevant stimuli and the filtering of irrelevant distractors may have more to do with the unique control processes that prevent the allocation of attention to distracting stimuli than the presence of a distinct top-down inhibitory (i.e. GABAergic) signaling system for suppressing irrelevant perceptual representations. Indeed, in a recent functional connectivity analysis exploring the strength of prefrontal interactions with a scene-selective ROI as a function of stimulus-relevance, we produced evidence suggesting that the successful filtering of irrelevant scene representations is mediated by a reduction in excitatory signals from the prefrontal cortex (Gazzaley, Rissman, et al., 2007). However, excitatory and inhibitory signals are difficult to differentiate with BOLD fMRI, and thus the potentially important role of inhibitory signaling cannot be ruled out. While beyond the scope of the present manuscript, we hope that future work will help elucidate the complex interplay the between prefrontally-mediated goal representations and top-down modulatory control signals, so as to further our understanding of the neural mechanisms of distractor filtering.
The behavioral and neural effects that emerge when participants perform the high load digit WM task concurrently with the visual WM task suggest that these domain-specific WM tasks engage a common domain-general processing component. The digit sequences were presented auditorily to promote phonological encoding and maintenance, and yet the high load condition of this verbal WM task reduced the efficiency of visual selective attention and impaired visual WM performance. This cross-domain interference is consistent with the results of a recent behavioral study which found visual WM impairments when participants had to concurrently recite a random 7-digit sequence, but not when they had to recite their own telephone number (Morey & Cowan, 2004). Such results indicate that visual and verbal WM maintenance likely rely on a common capacity-limited attentional control resource. Our results suggest that this shared processing resource is not only utilized to accommodate the attentional demands of high load maintenance, but is also involved in implementing the control necessary to attenuate the processing of distracting stimuli. However, we cannot rule out the possibility that at least some of the cross-domain behavioral interference observed under high load in our study is due to participants using verbal codes to support maintenance of the visual stimuli (Postle & Hamidi, 2007), since participants' ability to form and rehearse verbal feature labels would likely be diminished on high digit load trials.
The existence of a domain-general attentional control resource does not preclude the existence of domain-specific attentional systems (Cocchini, Logie, Della Sala, MacPherson, & Baddeley, 2002; Duncan, Martens, & Ward, 1997), but rather suggests there are circumstances where these capacity-limited systems are not fully independent. Most demonstrations of WM load-dependent increases in distractor processing have involved verbal WM tasks performed concurrently with visual selective attention tasks. However, studies that have assessed the effect of visual WM load on the processing of visual distractors have found either no effect of WM load (Yi, et al., 2004) or decreased distractor processing under high load (Rose, Schmid, Winzen, Sommer, & Buchel, 2004). When visual attention is taxed by a perceptually demanding visual task, irrelevant visual distractors are typically strongly suppressed (Lavie, et al., 2004; Pinsk, et al., 2004; Schwartz, et al., 2005; Yi, et al., 2004). Limited within-modality attentional resources are easily exhausted by demanding perceptual tasks, which leaves little attention left over to process distractors, resulting in early and strong attentional gating (Lavie, 2005). Thus, WM load manipulations will not always have the same consequences on distractor processing; the type of load is critical (Kim, Kim, & Chun, 2005; Park, Kim, & Chun, 2007). In our study, because the modality of the WM load did not overlap with that of the visual distractors, it did not facilitate their perceptual suppression. Rather, the auditory/phonological WM load likely interfered with amodal attentional control processes that are essential for regulating distractor processing.
In the present study, we have demonstrated the selective effect of non-visual WM load on the ability of healthy young adults to attenuate the processing of task-irrelevant visual stimuli. It is interesting to note that a group of healthy older adults (60-77 years old) showed a strikingly similar failure to appropriately filter out distracting stimuli when they performed this same visual WM task without the additional burden of the digit maintenance task (Gazzaley, Cooney, Rissman, et al., 2005). As with the present study's WM load manipulation, aging did not affect the enhancement of task-relevant representations. To illustrate the strong resemblance of this age-related suppression deficit to the present WM load effects, we extracted activation timecourses from the bilateral scene-selective ROIs of the 16 older adults from the Gazzaley et al. (2005) study1. Much like younger adults under high WM load (Fig. 4), the activation pattern seen in older adults (Fig. 6, bottom) revealed an early over-activation of task-irrelevant representations. As expected, the younger control group from that study (Fig. 6, top) showed a pattern of enhancement and suppression effects similar to those observed in our study's low load condition. The weaker suppression effect observed in the low load condition of the present study (Ignore Scenes activity did not drop significantly below the Passively View baseline level until the portion of the timecourse indexing late encoding and maintenance activity) could be attributable to the fact that the low digit load still taxed WM resources to a certain extent. This may have served to reduce the degree of attainable suppression relative to that obtained in the Gazzaley et al. (2005) study, which had no non-visual WM demands. It is noteworthy that both the effect of digit load and the effect of cognitive aging on distractor filtering were manifest early in the activation timecourse, and diminished during the maintenance period. While it is hard to make precise statements about cognitive timing based on fMRI data alone, a recent electroencephalogy (EEG) study using the same general task paradigm produced convergent evidence of an early over-activation of task-irrelevant representations in older adults, which interestingly was followed by successful suppression, as in the present study (Gazzaley, et al., 2008). Thus, both cognitive load and cognitive aging appear to reduce the efficiency and/or speed with which distractor filtering operations can be brought to bear.
The fact that taxing young adults with a secondary WM task is sufficient to induce a similar top-down modulatory control deficit to that seen in the older population implies that older adults may have a reduced amodal WM capacity that is resulting in their selective suppression deficit. While simple WM maintenance tasks, such as digit span, are minimally affected by normal aging (Dobbs & Rule, 1989), performance on complex WM span tasks, which assess the ability to use controlled attention in the face of interference, show age-related impairments (Gazzaley, Sheridan, Cooney, & D'Esposito, 2007; Wingfield, Stine, Lahar, & Aberdeen, 1988). Even though the digit WM task used in our study was a simple maintenance task, when performed in conjunction with a visual WM task that included distracting images, the same higher level domain-general WM resources depleted by aging were likely taxed. Future work will be needed to isolate the neuroanatomical substrates of capacity limitations in this critical controlled attention component of the WM system, especially given that individual differences in WM capacity are known to be highly correlated with individual differences in distractor filtering capabilities (Conway, Cowan, & Bunting, 2001; Kane & Engle, 2003; Vogel, et al., 2005). While frontal regions have been implicated in both age-related cognitive impairments (Hedden & Gabrieli, 2004; West, 1996) and individual differences in WM capacity (Kane & Engle, 2002), it remains to be seen whether the distractor filtering impairment observed in younger adults under high WM load is functionally homologous to those observed in older adults. To the extent that these goal-directed attentional control impairments are attributable to processing limitations in common brain structures, WM load manipulations, such as the dual-task approach used in the present study, may provide a useful tool for simulating and investigating both psychological and neurobiological aspects of cognitive aging.
This work was supported by an NSF Graduate Research Fellowship and an NIH National Research Service Award (J.R.), and grants from the National Institutes of Health (A.G. and M.D.) and the Veterans Administration Research Service (M.D.).
1fMRI data from the Gazzaley et al. (2005) study were previously reported only as GLM parameter estimates from 7-voxel left-lateralized scene-selective ROIs. To improve our ability to compare results across studies, we re-defined the ROIs from that study using the same approach implemented in the present study. It is also important to note that beyond the addition of the digit WM load manipulation, the experimental parameters of the present study have slight differences with those of Gazzaley et al. (2005) including changes in the delay length, ITI, and number of trials per condition.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.