Data are reported from 18 right-handed, native-English speaking volunteers (9 female; age range: 18–27 yrs; mean = 20.4, SD = 2.9), each of whom gave written informed consent in accordance with procedures approved by the Institutional Review Board of Stanford University. All participants were recruited from the Stanford community, reported no history of neurological trauma or disease, and were remunerated at a rate of $20/hr. Data from two additional participants were acquired but excluded from analyses, due to their failure to exhibit a behavioral reorienting effect (see below).
Stimuli consisted of 660 black-and-white line drawings of common objects and 80 artificial ‘greeble’ objects (Gauthier & Tarr, 1997
). Common objects were drawn from the International Picture Naming Project (Szekely et al., 2004
) and from a set previously used by Uncapher & Rugg (2009)
. Greebles and common objects were transformed into black-and-white line drawings in Photoshop CS v8.0 (Adobe Systems Inc.). For the purpose of counterbalancing, common objects were divided into 33 sets of 20 objects each, and greebles into five sets of 16 objects each. Object sets were rotated across study and test lists across subjects, and greebles across study lists. For each subject, 10 study lists were created from these sets, each containing 40 objects and 16 greebles. In each study list, 46 stimuli were presented in the cued location (‘valid’ trials, see below; 32 objects and 14 greebles) and the remaining 10 in the non-cued location (‘invalid’ trials; 8 objects and 2 greebles), yielding an 18% probability that a stimulus would be invalidly cued.
Of the 400 common objects encountered during the study phase, 350 were carried forward into the test phase (to minimize test list length). Thus, in each of 10 sessions, 35 common objects served as the critical study items (27 valid and 8 invalid). In the later memory test, 180 common objects served as foils. A separate set of 10 common objects and 3 greebles were used to create a practice study list.
Within the present task context, ‘top-down attention’ refers to the goal-directed allocation and maintenance of visuospatial attention evoked by a preparatory cue (described below), and ‘bottom-up attention’ refers to a shift of visuospatial attention evoked by an object occurring outside the focus of attention. We are agnostic as to whether bottom-up shifts of attention occur by means of automatic, involuntary, or ‘exogenous’ processes vs. goal-directed, voluntary, or ‘endogenous’ processes (for a review and discussion see Corbetta et al., 2008
; see also Posner & Cohen, 1984
; Downar et al., 2001
; de Fockert et al., 2004
; Kincade et al., 2005
; Serences et al., 2005
; Indovina & Macaluso, 2007
; Burrows & Moore, 2009
The experiment consisted of 10 scanned incidental study sessions followed by one non-scanned memory test. Each study session lasted approximately 6.5 min, and the memory test lasted approximately 40 min. The interval between the end of the study phase and the start of test phase was approximately 10–15 min. Participants received instructions and practiced the study task prior to entering the scan suite; study task performance was analyzed during this training session to ensure that participants understood and complied with the task instructions (see below). Moreover, eye movements were visually monitored by two investigators and feedback was given to the participant to ensure participants could perform the study task while maintaining fixation; participants were trained to ceiling level, continuing until no visually detectable saccades occurred at any point during the training session. We acknowledge that this training does not ensure that eye movements did not occur during scanning. However, given recent studies of overt vs. covert attentional shifts that demonstrate that the same fronto-parietal regions are engaged when eye movements occur and when they do not (e.g., Ikkai & Curtis, 2008
), we believe the possible presence of eye movements in the present study does not alter the interpretation of the findings. We also note that eye movements were not recorded during scanning in many prior studies of Posner attention cueing, including the study to which we compare our attention-related effects (Corbetta et al., 2000
During each study session, participants viewed a black screen containing two white boxes (subtending 5° horizontal and vertical angles) appearing to the right and left of a white central fixation crosshair (). The nearest edges of the boxes subtended 1° horizontal angle from central fixation. The beginning of each trial was indicated by the appearance of a green arrow cue, which replaced the central fixation for 1 s. The arrow pointed to the left or right, cuing subjects to covertly shift their attention (without moving their eyes) to the corresponding box. After a variable cue-to-stimulus interval (CSI)—either 1, 3, or 5 s—an object appeared for 500 ms in one of the two boxes. On 82% of trials, the object appeared in the cued box (‘valid’ trials), while on the remaining trials it appeared in the non-cued box (‘invalid’ trials). Upon appearance of the object, participants were to indicate whether the stimulus represented a real object or belonged to the artificial object class of greebles (index and middle finger key press, respectively); response hand was counterbalanced across subjects. Speed and accuracy were given equal emphasis in the task instructions. A variable intertrial interval (ITI) of 1.5, 3.5, or 5.5 s separated the offset of the object stimulus from the onset of the following cue. The variable CSIs and ITIs were pseudo-randomly distributed across trials such that the regressors in the General Linear Model (see fMRI data analysis) that estimated the neural responses to cues and objects were minimally correlated (< 0.13), thus allowing activity elicited by each phase in the trial (cue and object) to be independently assessed. Stimuli were presented in pseudo-random order, with no more than three trials of one item-type (objects/greebles) occurring consecutively. Stimuli were projected onto a mirror mounted on the MRI headcoil.
Figure 1 Experimental design. Left, During the study phase, subjects were scanned while incidentally encoding a series of visually presented objects and greebles in a variant of the Posner cueing paradigm. Items were preceded by an arrow cue, pointing to a box (more ...)
Immediately following the final study session, participants were transferred from the scanner to a neighboring testing suite and received instructions for the surprise memory test. The test list comprised 350 studied (old) and 180 unstudied (new) common objects, centrally presented, individually, in pseudo-random order. For each test object, participants were to indicate whether or not they had encountered the item in any of the study sessions, and to indicate their level of confidence. One of four responses (sure old, unsure old, unsure new, sure new) was made with the index or middle fingers of each hand. Old and new responses were made using separate hands, with high and low confidence indicated by middle and index fingers, respectively. The mapping of hands to old or new responses was counterbalanced across participants.
Each test trial began when the white fixation crosshair changed to red for 500 ms (), after which the test object was presented and remained onscreen until a response was made or until 4 s elapsed (if no response was made). If the object was judged to be new, the test advanced to the next trial. If the object was judged old, it remained onscreen for up to 4 more seconds, during which time participants attempted to recollect the location at which the object had appeared during study (cued with the appearance of a “Left?”, “Right?” prompt; ). Participants made a left or right index finger button press to indicate memory for the object appearing on the left or right side of the screen, respectively, and made their best guess when uncertain. A 1.5 s ITI (displaying a white fixation crosshair) separated test trials. To mitigate fatigue, participants were given a self-paced break after each third of the test.
fMRI data acquisition
Whole-brain imaging was performed on a 3T Signa MR scanner (GE Medical Systems). Anatomical images were collected using a high-resolution T1 –weighted spoiled gradient recalled (SPGR) pulse sequence (130 slices; 1.5 mm thick; 256 × 256 matrix; .86 mm2 in-plane resolution). Functional images were obtained from 30 4mm-thick axial slices, aligned to the AC-PC plane and positioned to give full coverage of the cerebrum and most of the cerebellum, using a T2* –weighted two-dimensional gradient-echo spiral-in/out pulse sequence [repetition time (TR) = 2 s; echo time (TE) = 30 ms; flip angle 75°; 64 × 64 matrix; 3.44 mm2 in-plane resolution]. Data were acquired in 10 sessions, comprised of 190 volumes each. Volumes within sessions were acquired continuously in a descending sequential order. The first four volumes were discarded to allow tissue magnetization to achieve a steady state.
fMRI data analysis
Data preprocessing and statistical analyses were performed with Statistical Parametric Mapping (SPM5, Wellcome Department of Cognitive Neurology, London, UK: http://www.fil.ion.ucl.ac.uk/spm/software/spm5/
), implemented in MATLAB 7.7 (The Mathworks, Inc.). For each participant, all volumes were realigned spatially to the first volume, and then to the across-run mean volume. All volumes were corrected for differences in acquisition times between slices (temporally realigned to the acquisition of the middle slice). The anatomical volume was coregistered to the mean functional volume, and then a unified segmentation procedure (Ashburner and Friston, 2005
) was applied to segment the anatomical volume into gray matter, white matter, and cerebrospinal fluid. The segmented images were deformed to probabilistic maps of each tissue type in Montreal Neurological Institute (MNI) space, and the resulting deformation parameters were applied to the functional images for normalization. Functional images were resampled into 3-mm3
voxels using non-linear basis functions (Ashburner and Friston, 1999
). Functional images were concatenated across sessions. Normalized functional images were smoothed with an isotropic 8-mm full width half maximum (FWHM) Gaussian kernel.
Statistical analyses were performed in two stages of a mixed effects model. In the first stage, neural activity was modeled by a delta function (impulse event) at the onset of each cue and each stimulus. These functions were convolved with a canonical hemodynamic response function (HRF) and its temporal and dispersion derivatives (Friston et al., 1998
) to yield regressors in a General Linear Model (GLM) that modeled the BOLD response to each event type. The two derivatives modeled variance in latency and duration, respectively. Analyses of the parameter estimates pertaining to the dispersion derivative of cue-related effects are reported below (parameter estimates pertaining to the derivatives of other items contributed no theoretically meaningful information beyond that contributed by the canonical HRF, and thus are not reported).
The timeseries were high-pass filtered to 1/128 Hz to remove low-frequency noise and scaled to a grand mean of 100 across both voxels and scans. As described below, parameter estimates for events of interest were estimated using one of two GLMs. Nonsphericity of the error covariance was accommodated by an AR(1) model, in which the temporal autocorrelation was estimated by pooling over suprathreshold voxels (Friston et al., 2002
). The parameters for each covariate and the hyperparameters governing the error covariance were estimated using Restricted Maximum Likelihood (ReML). Effects of interest were tested using linear contrasts of the parameter estimates. These contrasts were carried forward to a second stage in which subjects were treated as a random effect. Unless otherwise specified, whole-brain analyses were employed when no strong regional a priori
hypothesis was possible; in these cases, only effects surviving an uncorrected threshold of p < .001 and including four or more contiguous voxels were interpreted. When we held an a priori
hypothesis about the localization of a predicted effect, we corrected for multiple comparisons by employing a small volume correction (SVC) using family-wise error (FWE) rate based on the theory of random Gaussian fields (Worsley et al., 1996
). The peak voxels of clusters exhibiting reliable effects are reported in MNI coordinates.
Regions of overlap between the outcomes of two contrasts were identified by inclusively (or ‘conjunctively’) masking the relevant SPMs. When two contrasts are independent, the statistical significance of the resulting SPM can be computed using Fisher’s method of estimating the conjoint significance of independent tests (Fisher, 1950
; Lazar et al., 2002
). For each inclusive masking procedure, the constituent contrasts were determined to be independent with tests of orthogonality for linear contrasts. This test concludes that two linear contrasts are statistically independent if the sum of the products of the coefficients is equal to zero; i.e., for contrasts A and B: if a1
= 0, then contrasts are orthogonal. The goal of inclusive masking in the present study was to search within regions that exhibited one pattern of activity to identify whether any voxels also showed a second pattern. As such, we maintained the original threshold of the contrast identifying the first pattern (p < .001), using a threshold of p < .05 for the masking contrast, to give a conjoint significance of p < .0005. Exclusive (or ‘disjunctive’) masking was employed to identify voxels where effects were not shared between two contrasts. The SPM constituting the exclusive mask was thresholded at p < .10, whereas the contrast to be masked was thresholded at p < .001. Note that the more liberal the threshold of an exclusive mask, the more conservative is the masking procedure.
To identify the relationship between attention effects and encoding effects, we orthogonalized the two factors in our design: (a) top-down attention effects were identified by interrogating cue-related activity for all items, collapsing over subsequent memory status, (b) bottom-up attention effects were identified by comparing validly cued vs. invalidly cued items, and (c) positive and negative subsequent memory effects were identified by comparing subsequently remembered vs. forgotten items. To accommodate these analyses, two GLMs were estimated. The first was optimized to model stimulus-related effects (i.e., subsequent memory effects and bottom-up attention effects). Study trials were segregated according to subsequent memory [high confidence hits (HCH), low confidence hits (LCH), misses (M)] and cue validity (valid, invalid), resulting in six object-related regressors. Two additional regressors modeled greebles (validly and invalidly cued). Objects for which memory was not later tested were modeled as events of no interest, as were objects for which a response was omitted at test. Nine additional regressors modeled the cue-related activity associated with each of the aforementioned stimulus types. Finally, six regressors modeled movement-related variance (three rigid-body translations and three rotations determined from the realignment stage), and session-specific constant terms modeled the mean over scans in each session.
The second GLM implemented a parametric modulation analysis designed to detect top-down attention effects. Previous Posner cueing studies (e.g., Corbetta et al., 2000
) identified top-down attention effects by isolating activity elicited by cues to shift covert attention. In order to separately estimate neural responses associated with cues vs. stimuli, prior studies employed ‘catch trials’ in which no stimulus appeared post-cue (e.g., Corbetta et al., 2000
). Here we opted to de-correlate cue- and stimulus-related activity by parametrically varying the cue-to-stimulus interval (CSI: 1, 3, or 5 s). Furthermore, because cue-related activity could reflect not only top-down attention effects but also low-level visual responses to the cue itself, we used a parametric modulation analysis to distinguish activity transiently responding to the cue, from that responding across the duration the variable CSIs. In other words, by isolating BOLD responses that were elicited by the cue to shift attention and that were also sustained throughout the variable interval over which attention was to be maintained, we could rule out the possibility that any cue-related effects were simply a reflection of low-level visual responses. To accomplish this, we included an orthogonal variable in the GLM to isolate top-down effects, identifying cue-related activity that increased with increasing CSI duration. It should be noted that because the HRF effectively integrates the total activity over a period of a few seconds, parametric modulators identify activity that is modulated in duration or magnitude (or both). As we discuss below, the obtained data suggest that the parametric modulator captured variance in duration of the cue-related HRF. In sum, for this second GLM, one regressor modeled all cue-related activity, and a second regressor parametrically modulated the first according to the CSI following each cue. Thus, the second regressor for each subject in this GLM was of equal length to the first regressor and comprised a vector of values where each value represented the duration of the CSI for the corresponding cue. This regressor identified voxels in which cue-related activity varied as a linear function of the duration over which attention was to be maintained. To accommodate the remainder of the known variance, a third regressor modeled all stimulus-related activity (not segregated according to subsequent memory or validity), and movement and session effects were modeled as described above.
An alternative method of interrogating sustained BOLD responses across the CSI is to analyze effects associated with the dispersion derivative of the cue-related HRF. In the present design, cue-related attention effects (but not cue-related visual effects) might show a more sustained response, which would be captured by significant loading on the dispersion derivative. Accordingly, below we report the cue-related dispersion derivative outcome, which confirmed the parametric modulator outcome. However, we note that a dispersion derivative analysis is less optimal than a parametric modulation analysis for our design for two reasons. First, a dispersion derivative analysis can only accommodate variance in the duration of the HRF up to approximately 1 s, and here the CSI varied from 1, 3, or 5 s. Second, the differing
durations of the CSI are suboptimally modeled by a dispersion derivative, which is fit across all trial types (i.e., 1, 3, and 5 s CSIs). This trial-by-trial variability is better modeled by a parametric modulation analysis (Henson, 2007
Response profiles of regions of interest (ROIs) identified in map-wise analyses were investigated using the deconvolution algorithm implemented in MarsBar (marsbar.sourceforge.net). This algorithm deconvolves the BOLD signal in the ROI using a finite impulse response function, which assumes no shape for the hemodynamic response. From these analyses, the cluster-wise integrated percent signal change was extracted for further characterization of the functional response (see Results for details).
The second main goal of the present study was to determine whether the functional dynamics of the putative dorsal and ventral attention networks changed when an event memory was effectively formed relative to when one was not. An understanding of how these attention networks dynamically interact with other neural structures during the encoding of event information may inform how episodic memories are created. We therefore sought to determine whether neural components of the dorsal and ventral attention networks showed connectivity profiles that predicted later memory success or failure.
Multivariate connectivity analyses were conducted by submitting seed regions – ROIs identified in the univariate analyses as exhibiting top-down or bottom-up attention effects (see next paragraph) – to psychophysiological interaction (PPI) analyses to determine whether they showed memory-related functional connectivity with other regions in the brain (Friston et al., 1997
). In this manner, we investigated whether the connectivity between attention-related regions and other brain regions differed as a function of encoding success or failure.
Seed regions were components of PPC that exhibited relevant attention effects: a left dPPC (mIPS/SPL) region that displayed top-down attention effects, and bilateral vPPC (TPJ) regions that displayed bottom-up attention effects (for details, see Results section, ‘Neural correlates of top-down and bottom-up attention’). Using standard PPI analysis techniques, seed clusters were individually defined for each subject on the basis of the random effects group analyses. For each subject, the data for each seed region was the principal eigenvariate of all significant (p < .05) voxels within a 4-mm sphere, centered on the local peak maximum that fell within 2*FWHM of the smoothing kernel (i.e., 16 mm) and was within the anatomical region of interest, identified from each subject’s normalized structural scan. These subject-specific timeseries represented the ‘physiological’ component of the PPI. All but one of the 18 subjects met all criteria for every seed region; PPI analyses were performed on these 17 subjects. The timeseries were adjusted for variance associated with effects of no interest, and then a deconvolution with the hemodynamic responses was performed. The resultant vector was weighted by a contrast vector representing the relevant ‘psychological’ factor (in this case, a subsequent memory contrast: HCH vs. M), and then reconvolved with the hemodynamic responses. The outcome of this process formed the ‘psychophysiological interaction’, or PPI regressor. This regressor models the between-condition difference in regression slopes between each voxel in the brain and the seed region. This PPI regressor was entered into a GLM, along with regressors modeling main effects of the psychological and physiological factors (i.e., the condition contrast vector and the timeseries, respectively). In line with the univariate GLMs, we additionally modeled movement-related effects as well as session-specific constant terms. Because PPI analyses are inherently less powered than their univariate counterparts (as only the unshared variance between the three regressors is attributed to the interaction term, or the PPI regressor), we adopted a whole-brain uncorrected threshold of p < .005, four-voxel extent. Voxels that surpassed threshold in these PPI analyses can be interpreted as showing a significant difference in connectivity with the attention-related seed region as a function of later memory outcome.