|Home | About | Journals | Submit | Contact Us | Français|
Under conditions of rapid serial visual presentation (RSVP), subjects display a reduced ability to report the second of two targets (Target 2; T2) in a stream of distractors if it appears within 200–500 ms of Target 1 (T1). This effect, known as the attentional blink (AB), has been central in characterizing the limits of humans’ ability to consciously perceive stimuli distributed across time. Here we review theoretical accounts of the AB and examine how they explain key findings in the literature. We conclude that the AB arises from attentional demands of T1 for selection, working memory encoding, episodic registration and response selection, which prevents this high-level central resource from being applied to T2 at short T1–T2 lags. T1 processing also transiently impairs the re-deployment of these attentional resources to subsequent targets, and the inhibition of distractors that appear in close temporal proximity to T2. While these findings are consistent with a multi-factorial account of the AB, they can also be largely explained by assuming that the activation of these multiple processes depend on a common capacity-limited attentional process to select behaviorally relevant events presented amongst temporally distributed distractors. Thus, at its core, the attentional blink may ultimately reveal the temporal limits of the deployment of selective attention.
Our visual environment constantly changes across the dimensions of both time and space. Within the first few hundred milliseconds of viewing a scene, the visual system is bombarded with much more sensory information than it is able to process up to awareness. To overcome this limitation, humans are equipped with filters at a number of different levels of information processing. For example, high-resolution vision is restricted to the fovea with acuity drastically reduced at the periphery. Such front-end mechanisms reduce the initial input, however they still leave the visual system with an overwhelming amount of information to analyze. To meet this challenge, the human attentional system prioritises salient stimuli (targets) that are to undergo extended processing and discards stimuli that are less relevant for behavior after only limited analysis (Broadbent, 1958; Bundesen, 1990; Desimone & Duncan, 1995; Duncan, 1980; Kahneman 1973; Neisser, 1967; Pashler, 1998; Shiffrin & Schneider, 1977; Treisman, 1969).
Given the vital role attention plays in visual cognition, it is not surprising that over the last 50 years understanding the nature of the mechanisms involved in visual attention has been one of the major of goals of both cognitive science and neuroscience (Miller, 2003). For the most part, this research has focussed on understanding how humans process information distributed across space (e.g., see Pashler, 1998, for an extensive review). However, in the last 15 years there has been intense interest amongst researchers in the mechanisms and processes involved in deploying attention across time (see Shapiro, Arnell & Raymond, 1997, for an earlier review).
Here we review research on temporal attention, specifically focussing on arguably the most widely studied effect in the field; the attentional blink (AB; Raymond, Shapiro & Arnell, 1992). We begin by briefly discussing rapid serial visual presentation (RSVP) (Potter & Levy, 1969), which is the paradigm primarily used to study the AB, and then describe the AB effect and theoretical accounts, both informal and formal, that have been put forward to explain the phenomenon. Following this, we examine how key findings in the literature fit with each model, and conclude by highlighting the mechanisms that are most likely responsible for the AB. Research exclusively investigating the neural substrates of the AB will not be discussed here, as this has recently been summarised elsewhere (see Hommel, Kessler, Schmitz, Gross, Akyürek, Shapiro & Schnitzler, 2006; Marois & Ivanoff, 2005).
In rapid serial visual presentation (RSVP; Potter & Levy, 1969; Fig. 1A), stimuli appear sequentially at the same spatial location, for a fraction of a second each (e.g., 100 ms), and subjects are typically required to report either all the items presented (full report) or to report pre-specified target item(s) and ignore the remaining distractor stimuli (partial report). The basic rationale behind RSVP paradigms is that, by stressing the temporal processing mechanisms to their limit, researchers are able to assess the rate at which information is analyzed and encoded (Chun & Wolfe, 2001; Coltheart, 1999).
A striking characteristic of temporal attention is that, even with RSVP rates of up to approximately 16 items/second, the selection and encoding of a single target is quite easy. Lawrence (1971) found at this presentation rate target report accuracy was approximately 70% with RSVP streams of words that contained a single target defined either featurally (upper case letters as opposed to lower case letters) or categorically (animal word amongst non-animal words). Similarly, in a seminal paper, Potter (1975; see also Potter & Faulconer, 1975; Thorpe, Fize & Marlot, 1996) demonstrated that, when subjects searched RSVP streams of scenes (8 items/second), accuracy was comparable whether they looked for a particular stimulus they had seen previously or one that had simply been described to them.
The results discussed above could be taken to suggest that target processing in RSVP is complete after only 100 ms. However, it can be shown that this is not the case when an additional target is added to an RSVP stream. In fact, at presentation rates of approximately 100 ms/item, subjects show a remarkable deficit in reporting the second (T2) of two different targets presented among distractors if it appears within approximately 200–500 ms of the first target (T1; Broadbent & Broadbent, 1987; Raymond et al., 1992). This effect is the AB (Fig. 1B) and is an important discovery as it helps characterize the limits of our ability to consciously perceive stimuli that are distributed across time (Sergent & Dehaene, 2004).
Broadbent and Broadbent (1987) were the first to report an AB when they presented subjects with RSVP streams of words containing two targets defined either by category or letter case. On trials where the first target was reported correctly, second target performance was impaired if it appeared up to half a second after the first target. Broadbent and Broadbent (1987) explained this result by proposing that while early perceptual features were extracted in parallel from RSVP streams, at short temporal intervals target identification processes interfered with one another, thereby resulting in the second target deficit.
A similar post-target processing deficit was found by Weichselgartner and Sperling (1987). In one of their experiments, subjects were presented with RSVP streams of digits at the rate of 100 ms/item, and their task was to name an outlined or bright digit (T1) and then the three stimuli that directly followed it. Subjects typically reported T1, the subsequent item, and then the stimulus that appeared 400 ms after the target. Weichselgartner and Sperling (1987; see also Reeves & Sperling, 1986) took this pattern of results as evidence for the existence of two partially overlapping attentional processes: A fast acting automatic process responsible for detecting (identifying) T1 as well as the item that directly followed it, and a slow effortful process that led to the recall of stimuli presented later in the stream.
Raymond et al. (1992), who first coined the term “attentional blink”, provided a crucial extension to the earlier work by demonstrating that the previously observed target-processing deficit was an attentional, rather than a sensory, limitation. In their experiment, RSVP streams of black letter stimuli were presented at the rate of 100 ms/item and the subjects were required to name a single white target letter (T1) and detect the presence/absence of the letter “X” as T2. Raymond et al. (1992) found that on trials where T1 was reported correctly, T2 performance was impaired if it appeared within half a second of the first target. Crucially, detection of the second target strongly improved when subjects ignored T1. This finding demonstrated that the effect was due to attentional rather than sensory limitations, as the same visual stimuli yielded different effects depending on task requirements. Two other important characteristics of the AB were also revealed in the Raymond et al. (1992) study. While T2 accuracy was impaired if it appeared within 200–600 ms of T1 (and T1 required report), there was virtually no deficit when the second target was presented directly after the first target, an effect now known as lag 1 sparing (see Figure 1; Potter, Chun, Banks & Muckenhoupt, 1998; Visser, Bishof & Di Lollo, 1999). In addition, T2 performance was strongly improved when T1 was followed by a blank gap in the RSVP stream, rather than by a distractor, suggesting that the stimulus following the first target plays a vital role in generating the AB. Before turning to a discussion of these and other key findings in the literature we first review theories of the AB and Lag 1 sparing.
Several theoretical accounts have been introduced to explain the AB. For the most part, theories of the phenomenon have been informal, with researchers simply describing the processes that underlie the effect. However, recently a number of computational frameworks have explicitly modeled the T2 deficit and other relevant findings. In this section, we begin by describing informal theories of the AB and then computational frameworks that formalize many of the ideas put forward in these purely descriptive accounts.
Raymond et al. (1992) proposed that the AB was the result of a suppressive mechanism that inhibited post-target stimuli in order to reduce target and distractor featural confusion. This gating theory predicts that, when viewing a dual-target RSVP stream, an attentional episode is triggered after the physical features (e.g., colour, shape) of the first target are detected. The initiation of this attentional episode is likened to a gate opening to admit T1 for the purpose of identification. When an item follows T1, its features will also be processed along with those of the first target, thus increasing the chance that the features of the two stimuli will be confused. For example, the colour of T1 may be incorrectly bound to the identity of the subsequent stimulus. In order to reduce interference from post-target stimuli and increase the probability that T1 will be correctly reported, the stimuli following the first target are suppressed at an early perceptual level. Raymond et al. (1992) likened this inhibitory process to the gate closing. This model assumes that this attentional gate stays closed until identification (e.g., colour and identity bound together) of T1 is complete, a process which Raymond et al. (1992) hypothesized took approximately 500 ms. Thus, the AB arises because T2 is inhibited when it appears in close temporal proximity to the first target. When the second target appears after T1 has been identified, the gate will no longer be closed and as a result the second target can be the subject of focussed attention. Lag 1 sparing is said to occur when T1 is followed directly by the second target because both stimuli are admitted by the gate and undergo identification processes together. Furthermore, Lag 1 sparing is deemed to be dependent on T1 and T2 not being separated by a distractor rather than the two target stimuli appearing within 100 ms of one another.
Recently, Olivers and his colleagues (Olivers, Van der Stigchel & Hulleman, 2007; see also Olivers & Meeter, 2008) have revised and extended Raymond et al.’s (1992) inhibition hypothesis. They suggest that inhibition is not initiated to prevent color binding errors between T1 and T1+1, but rather to suppress distractors so that they do not interfere with target processing. This inhibition takes place at a relatively late stage of visual information processing, after the conceptual representations of the RSVP stimuli have been activated (but prior to working memory). In this revised inhibition framework, the T1+1 distractor is processed along with T1 because its temporal proximity to the first target confers it the same attentional enhancement (boosting) that T1 receives. To prevent additional distractors from receiving this attentional boost and interfering with T1 processing, post-T1+1 stimuli are strongly suppressed, thereby impairing T2 performance at short T1–T2 lags. This model is discussed in more detail in the formal model section (the boost and bounce theory).
In Raymond et al.’s (1992) gating theory, it is the potential for featural confusion during T1 identification that leads to the AB. Subsequently, Shapiro, Raymond and Arnell (1994) obtained results challenging the conclusion that the identification of T1 was necessary to elicit the second target deficit as they found it also occurred when T1 only required detection. Consequently, Shapiro et al. (1994; see also Isaak, Shaprio & Martin, 1999; Shapiro & Raymond, 1994) proposed the interference theory to account for their findings.
Based on Duncan and Humphreys’s (1989) model of spatial visual search, Shapiro et al.’s interference theory assumes that, when viewing an RSVP stream, initial perceptual representations are established for each stimulus. These representations are compared to selection templates (generated from the task instructions) and those stimuli that most closely match are selected and registered in visual working memory. Once in this store, each item is assigned a weighting based on the available space and its similarity to the templates. Typically, both targets as well as the items that directly succeed each of them (T1+1 and T2 +1) enter working memory due to their temporal proximity to the targets. In working memory, items interfere with one another as retrieval processes are undertaken during report of the targets. In this model, an AB occurs when the targets are separated by a short interval because T2 receives a diminished weighting due to the limited capacity of working memory, leaving it more open to interference from the other items in the store, and therefore reducing the likelihood of it being reported. Shapiro et al. (1994; see also Isaak et al., 1999; Shapiro & Raymond, 1994) suggest that an AB is not observed at long T1–T2 Lags because visual working memory “may be flushed after sufficient time has passed with no demand made on it” (p.371; although this aspect of the theory appears to be difficult to reconcile with the fact that both targets require report in the standard AB task at the end of the RSVP stream and therefore would both need to be maintained even at long lags). This framework suggests that Lag 1 sparing occurs because stimulus interference is reduced when T2 appears directly after the first target as only three items enter VSTM: T1, T2 and T2+1. Thus, according to this model, Lag 1 sparing is determined by the characteristics of the T1+1 stimulus rather than the temporal gap between T1 and T2.
Chun and Potter (1995) presented a number of important findings for understanding the mechanisms responsible for the AB. Firstly, they provided evidence that was inconsistent with Raymond et al.’s (1992) gating theory, as they observed a significant AB when the targets were defined categorically (searching for two black letter targets among black digit distractors) rather than perceptually (red target among black distractors), thereby demonstrating that the deficit can still arise even when there is no potential for a feature conjunction error between the colour of T1 and the identity of the T1+1 stimulus. In addition, this result demonstrated that the AB was not the result of a task switch between the two targets as both letters required identification and were drawn from the same stimulus set. Finally, by revealing how the AB was modulated by the extent to which the targets and distractors were both featurally and categorically similar, Chun and Potter’s (1995) study also highlighted the influence of target-distractor discriminability on this deficit (see also Dux & Coltheart, 2005; Maki, Bussard, Lopez & Digby, 2003).
To account for their findings, Chun and Potter (1995) proposed a two-stage model of the AB. In stage 1, a stimulus activates its stored conceptual representation. Recognition at this stage occurs rapidly, so the specific identities of most items in an RSVP stream are available (Potter, 1975, 1976, 1993), but this information is volatile and susceptible to both decay and overwriting by subsequent stimuli. Consistent with this notion, Giesbrecht and Di Lollo (1998; see also Dell’Acqua, Pascali, Jolicœur, & Sessa, 2003; Giesbrecht, Bischof & Kingstone, 2003) demonstrated that an AB only occurs if T2 is backward masked. To avoid being overwritten, stimuli must undergo the capacity limited stage 2 processing, during which they are encoded and consolidated into working memory. Stage 2 processing is initiated when relevant target features are identified in stage 1, triggering a transient attentional response that leads to the target being encoded in working memory. The model explains the AB as being due to the severe capacity limitations of this second stage of processing. Consequently, when the second target is presented in close temporal proximity to T1, it must wait to be encoded into working memory until stage 2 processing of T1 is completed, and therefore is more susceptible to decay and interruption by distractors. Lag 1 sparing is said to occur when T2 appears directly after T1 because due to the slow temporal dynamics of the attentional system, T2 receives the same enhancement and access to stage 2 processing as the first target. Thus, this model predicts that Lag 1 sparing is chiefly determined by the temporal distance between T1 and T2. There is now considerable neuroimaging and electrophysiological support for a two-stage framework as several studies have demonstrated that while visual areas respond to both missed and reported T2s, parietal-frontal regions selectively respond to reported second targets (see Gross; Schmitz, Schnitzler, Kessler, Shapiro, Hommel, & Schnitzler, 2004; Kranczioch, Debener, Schwarzbach, Goebel, & Engel, 2005; Marois, Yi & Chun, 2004).
A similar bottleneck model was proposed by Ward, Duncan and Shapiro (1996; see also Duncan, Ward & Shapiro, 1994) to account for the second-target deficit they observed with a modified AB task. Ward et al. (1996) investigated the speed with which attention could be shifted to targets when these items were distributed across both time and space. In their experimental conditions, only the two targets, followed by their respective masks, were presented on distinct corners of an invisible diamond. By varying the stimulus onset asynchrony (SOA) between the two targets, Ward et al. (1996) demonstrated that report of the second target was impaired, relative to a control single-target condition, if it appeared within approximately 450 ms of Target 1. Ward et al. (1996) proposed the attentional dwell time hypothesis to account for their results and those of the standard AB. This hypothesis asserts that the two target objects compete for limited capacity visual processing resources, with the winner of this competition undergoing extended processing at the expense of the loser. Because of its head start in the competition, T1 is typically the winner, thereby leaving T2 open to interference from the mask and therefore increasing the likelihood that it will go undetected. As lag 1 sparing was not found by Ward et al. (1996) they did not incorporate an explanation of this effect into their theory. Indeed, it should be noted that in a detailed meta-analysis Visser et al. (1999) concluded that lag 1 sparing is rarely observed under conditions where the target stimuli are spatially displaced or there is a multi-dimensional attentional set switch between T1 and T2 (e.g., T1 is a categorically defined letter whereas T2 is a color defined digit, see also Potter et al., 1998).
Jolicœur, (1998, 1999; Jolicœur & Dell’Acqua, 1998; see also Ruthruff & Pashler, 2001) extended the two-stage account of Chun and Potter (1995) to explain not only the AB but also its relationship to the Psychological Refractory Period (PRP; see Pashler, 1994). The PRP refers to subjects’ tendency to respond more slowly to the second of two sensory-motor tasks as the SOA between them is reduced. This Task 2 postponement is thought to result from a central capacity-limited stage that prevents two response selection processes from being performed concurrently (Pashler, 1994). Jolicœur (1998) investigated the extent to which interference during response selection influenced the magnitude of the AB. His experiments used a similar methodology to that employed by Raymond et al. (1992) except that T1 required an immediate rather than a delayed response for some of the trials (T2 response was always offline). The inclusion of this speeded T1 task ensured that subjects performed response selection to Task 1 online, thereby creating at short lags a processing overlap between T1 response selection and T2 working memory encoding. Jolicoeur’s (1998) results revealed that a larger AB was elicited in speeded T1 trials relative to unspeeded trials. Furthermore, the magnitude of the deficit increased as the T1 reaction times and number of T1 response alternatives increased. These findings provided clear evidence that response selection to Task 1 significantly exacerbates the AB. To account for these findings, Jolicœur (1998, 1999; Jolicœur & Dell’Acqua, 1998; see also Ruthruff & Pashler, 2001) proposed the central interference theory. This model is similar to Chun and Potter’s (1995) theory, with the key difference being that in Jolicœur’s (1998, 1999; Jolicœur & Dell’Acqua, 1998) framework, both response selection and working memory encoding require capacity-limited central processing.
Potter, Staub and O’Connor (2002) proposed a further extension to the two-stage account (Chun & Potter, 1995), challenging the hypothesis that T1 gained privileged access to limited capacity processing resources due to its temporal position. It had previously been found that at lag 1, performance for T2 was typically superior to that for T1 and that the report order of these two targets was often reversed (Chun & Potter, 1995; see also Hommel & Akyürek, 2005), hinting that T1 may not always be the first item to enter the bottleneck. To test this hypothesis, Potter et al. (2002; Bachmann & Hommuk, 2005) presented a word target in each of two concurrent, spatially displaced, RSVP streams of symbol distractors (one stream above the other) to reduce the temporal lag between the targets without altering stimulus duration. The results demonstrated that when the targets were separated by 13–53 ms, report of T2 was superior to that of T1, which is the opposite pattern of results to those typically found in AB experiments. By contrast, at an SOA of 100 ms T1 and T2 performance was comparable (this is an example of Lag 1 sparing occurring with spatially displaced targets), and at an SOA of 213 ms the standard T2 deficit emerged. The superior report of T2 at very short SOAs indicated that T1 is not always consolidated before T2.
To account for their findings, Potter et al. (2002) proposed the two-stage competition model of visual attention. This theory postulates that targets compete in stage 1 to gain access to the capacity-limited second stage of processing, with the first target that is initially identified entering the stage 2 bottleneck first. The model explains the T1 deficit at very short lags (13–53 ms) as follows: When T1 is detected, an attentional window is opened and the processes involved in initial identification begin. Crucially, when T2 appears very shortly after T1, it benefits from the attentional window having already been opened, and accrues resources faster than would have been the case if the attentional window had not been opened after T1 detection. As a result, T2 is identified more efficiently and enters the bottleneck before the first target (although the model is not specific about how resources would accrue faster for T2 than for T1 at these short SOAs). By contrast, at the presentation rates typically used in RSVP tasks - approximately 100 ms per item - T1 will have already been identified and gained access to stage 2 before the second target arrives, therefore resulting in a T2 deficit. According to this account, Lag 1 sparing is dependant on T1 and T2 having an SOA of approximately 100 ms because under these conditions both the T1 and T2 stage 1 representations are stable enough so that attention (stage 2 processing) can process both items without cost.
While Chun & Potter’s (1995) original hypothesis that the AB primarily results from a central bottleneck of information processing has been incorporated in several recent models of the AB, there is considerable debate regarding the number and location (along the information processing pathways) of such bottlenecks. For one, Awh, Serences, Laurey, Dhaliwal, van der Jagt and Dassonville (2004) challenged the hypothesis that the depletion of a limited capacity central processing resource was responsible for the AB. These researchers suggested that rather than reflecting the competition for a single visual processing channel (stage/resource), the AB arises from capacity limited processing in a multitude of processing channels. While they observed an AB when a face target temporally preceded a letter/digit target, no such T2 cost was observed when the order of target presentation was reversed (letter/digit first, then face). The data were explained by the hypothesis that face recognition engages both featural and configural information processing, thereby transiently preventing the featural channel from processing any subsequent letter/digit stimuli, whereas letter/digit identification relies only on the featural channel, thereby allowing the configural channel to process any subsequently presented face stimuli. However, Awh et al.’s findings of multiple bottlenecks have recently been questioned by Landau and Bentin (2008; see also Jackson & Raymond, 2006), with these researchers suggesting that Awh et al.’s (2004) results reflect the salience of face stimuli rather than the existence of multiple bottlenecks. Moreover, and as mentioned earlier, the finding that drawing on the central processing stage of response selection affects the AB (Jolicœur, 1998, 1999; Jolicœur & Dell’Acqua, 1998; Ruthruff & Pashler, 2001) suggests that the deficit involves a central amodal bottleneck of information processing. This is further reinforced by studies that have observed an AB even when the two target stimuli originate from different modalities, (Auditory vs. Visual; e.g., Arnell & Duncan, 2002; Arnell & Jenkins, 2004; Arnell & Jolicœur, 1999; Arnell & Larson, 2002; but see Chun & Potter, 2001; Duncan, Martens & Ward, 1997; Potter et al., 1998), although it has been difficult to rule out a task-switching account of this cross-modal attention deficit (Chun & Potter, 2001; Potter et al., 1998). Similarly, it is still unsettled as to whether this central amodal stage of information processing wholly encompasses the AB bottleneck, or whether this deficit arises from processing limitations at both this amodal stage and at an earlier visual stage of information processing (Chun & Potter, 2001; Jolicœur, Dell’Acqua & Crebolder, 2001; Ruthruff & Pashler, 2001).
A final extension to Chun and Potter’s (1995; see also Potter et al., 2002) bottleneck theory is that offered up by Dux and Harris (2007a) who tested whether the encoding bottleneck also limited distractor inhibition. Dux and Harris (2007a; see also Drew & Shapiro, 2007) presented subjects with RSVP streams similar to those employed by Chun and Potter (1995), with black letter targets appearing amongst black digit distractors. The crucial manipulation was that on half the trials, the items directly proceeding (T1−1) and succeeding (T1+1) the first target were either identical or different from one another. Dux and Harris (2007a) reasoned that if target selection involves distractor inhibition, then repeating the distractors on either side of T1 would reduce the masking strength of the T1+1 distractor as its representation would have been suppressed by the earlier presentation of the same character. This suppression of the T1+1 distractor would in turn improve T1 accuracy and, therefore, reduce the AB. This is indeed what Dux et al. observed, suggesting that distractor inhibition plays a key role in RSVP target selection (see also Dux, Coltheart & Harris, 2006; Dux & Marois, 2008; Maki & Padmanabhan, 1994; Olivers & Watson, 2006). Importantly, in a subsequent experiment, when they now repeated the distractors on either side of the second target instead of the first, Dux and Harris (2007a) found that distractor repetition did not benefit T2 report when the T2−1 distractor was presented during the blink. These data suggest that distractor suppression is impaired by the AB bottleneck because it does not take place unless the distractor receives attention. Consistent with this notion, Dux and Harris (2007a) did observe distractor suppression when the T2−1 distractor was presented at lag 1, a position that favors attentional processing of that distractor along with the first target (see also Dux & Marois, 2008).
It should be noted that Drew and Shapiro (2006) also found that the AB was attenuated when T1 was temporally flanked by repeat distractors. However, these authors proposed a different account to Dux and his colleagues (Dux et al., 2006; Dux & Harris, 2007a), suggesting that this effect was caused by the same mechanism(s) responsible for repetition blindness (RB; see Kanwisher, 1987). RB refers to subjects’ impaired ability to report both occurrences of a repeat stimulus in RSVP if they appear within 500 ms of one another, and is typically thought to result from subjects’ failure to register both repeat targets as episodically distinct objects rather than from inhibition of the second target (e.g., see Kanwisher, 1987). Dux et al. (2006) suggested that the distractor repetition effect and RB reflect different mechanisms because the former does not occur between letter stimuli that only differ in case, whereas the latter is found under these conditions as well. In addition, Dux and Harris’s (2007a) finding that the distractor repetition effect taps the same mechanism as the AB also suggests that it differs from RB as the processes underlying the AB and RB have been doubly-dissociated (Chun, 1997a). Finally, Dux and Marois’s (2008) distractor priming effects described below are convergent evidence for the inhibition of distractors in RSVP. Nevertheless, further research is required to isolate the mechanisms that give rise to the distractor repetition effect found in RSVP and to understand how it relates to RB.
All of the models discussed above, with the exception of the inhibition account (Olivers et al., 2007; Raymond et al., 1992), are consistent with the notion that the AB results from the depletion of capacity-limited attentional resources by T1 processing, leaving too few of these resources available at short lags to be applied to the second target. Employing an innovative paradigm, Di Lollo, Kawahara, Ghorashi and Enns (2005; see also Olivers et al., 2007) provided data that posed a challenge for such T1 capacity limited models. They presented subjects with RSVP streams that contained three successive targets (all of which required delayed report), with the third target appearing in a position where the blink is typically maximal - lag 2. When the targets were members of the same category (e.g., letters) there was no deficit in reporting T3 (Olivers et al., 2007, refer to this effect as ‘spreading of the [Lag 1] sparing’), a result that is inconsistent with T1 resource depletion accounts of the AB. However, impaired T3 report was observed if the second target belonged to a different category than the other target stimuli (e.g., a computer symbol as opposed to letters).
Di Lollo et al. (2005) proposed the temporary loss of control (TLC) hypothesis to explain their results. This theory postulates that RSVP processing is governed by a filter configured to select targets and exclude distractors, which is endogenously controlled by a central processor that can execute only a single operation at a time (as acknowledged by Di Lollo et al., 2005, this feature of the model adds a capacity-limited component to the framework). When a target is initially identified, the central processor switches from monitoring to consolidation processes, and the input filter is then under exogenous control (but see Nieuwenstein, 2006, for evidence that endogenous control is maintained during the AB). When the second target is drawn from the same category as T1, the input filter’s configuration is unaltered and as a result this target is processed efficiently. If, however, T2 is drawn from a different category, it takes longer to process and is more susceptible to interruption from distractors. More importantly, the input filter’s configuration is disrupted and needs to be reconfigured, resulting in all subsequent stimuli being processed less efficiently until this reconfiguration takes place. Di Lollo et al. (2005) suggest that both these sources of disruption contribute to the manifestation of the AB in this three-target paradigm. In addition, the same disruption of input configuration account is invoked to explain the AB in a typical two-target RSVP paradigm except that disruption here is triggered by the T1+1 distractor instead of by a categorically distinct target. According to the TLC model, Lag 1 sparing is observed with the sequential presentation of intra-category targets because such presentation does not disrupt the input filter. Thus, like the inhibition (Raymond et al., 1992; Olivers et al., 2007) and interference (Shapiro et al., 1994) theories, but contrary to bottleneck models (e.g., Chun & Potter, 1995), the TLC account predicts that Lag 1 sparing is dependent on the nature of the T1+1 stimulus rather than T1 and T2 appearing within 100 ms of one another.
The delayed attentional reengagement account introduced by Nieuwenstein and his colleagues (Nieuwenstein, 2006; Nieuwenstein, Chun, van der Lubbe, & Hooge, 2005; Nieuwenstein & Potter, 2006; Nieuwenstein, Potter & Theeuwes, 2009), suggests that the AB reflects the dynamics of attentional selection: When viewing a dual-target RSVP stream, the presentation of the first target elicits the deployment of top-down attentional resources to that stimulus. However, once target information ceases to be presented (either due to the appearance of a distractor or a blank gap in the RSVP stream), these resources are disengaged from the RSVP stream. In this model, the AB occurs because subjects are unable to rapidly re-engage top-down attention to the second target shortly after it has been disengaged from the first target stimulus. Lag 1 sparing results from attention being sustained for T2 processing because T1 is followed by additional goal-relevant target information (T2) rather than by irrelevant information (distractor/blank gap). This model makes no specific prediction regarding the cause of Lag 1 sparing – whether it is determined by the nature of the T1+1 stimulus or the temporal distance between T1 and T2 - because in this framework both the duration of the T1 enhancement and nature of the post-T1 information influences RSVP performance.
It follows from this theory that experimental conditions that help re-engage or sustain attention after T1 processing should diminish the AB. Consistent with this prediction, Nieuwenstein et al. (2005; see also Maki, Frigen & Paulson, 1997) demonstrated that if T2 is immediately preceded by a distractor that shares featural characteristics with that target - a manipulation that should help re-engage attention prior to T2 presentation - the AB is reduced. Furthermore, the AB was virtually abolished when the task required report of all the RSVP stimuli, an experimental condition that is presumed to continuously engage attention throughout the RSVP stream (Nieuwenstein & Potter, 2006).
A number of researchers have suggested that a combination of the mechanisms described above provides the most complete account of the AB. Vogel, Luck and Shapiro (1998; see also Sergent, Baillet & Dehaene, 2005; Vogel & Luck, 2002) demonstrated using event-related potentials (ERPs) that T2 did not elicit a P300 – a component believed to reflect the updating of working memory (Donchin, 1982; Donchin & Coles, 1988) – during the AB, suggesting that missed T2s do not enter the working memory store. To explain their results, Vogel et al. (1998; see also Maki, Couture, Frigen & Lien, 1997; Maki, Frigen et al., 1997; Shapiro, Arnell et al., 1997) proposed a model that incorporated aspects from both the two-stage framework (Chun & Potter, 1995) and the interference theory (Shapiro et al., 1994; Shapiro & Raymond, 1994). This account suggests the existence of two processing stages, with stimuli first being conceptually processed before being selected to undergo capacity limited encoding into visual working memory. Whether stimuli are selected for extended processing after their semantic representations are activated is determined by how closely they match target templates. Furthermore, due to interference between stimuli during preliminary conceptual processing, distractor items that appear in close temporal proximity to the second target are often incorrectly consolidated. Thus, both a bottleneck in working memory and interference between the conceptual representations of stimuli are hypothesised to give rise to the AB.
More recently, Kawahara, Enns and Di Lollo (2006) have also suggested that a combination of independent mechanisms contribute to the failure to identify the second target under RSVP conditions. Specifically, they propose that a combination of Di Lollo et al.’s (2005) TLC hypothesis and bottleneck models provide the best account of the AB. They predict that three factors determine target accuracy under RSVP conditions. Specifically, 1) switching from rejecting distractors to selecting targets when presented with the first target affects T1 performance, whereas 2) the disruption of an input filter once T1 encoding has commenced, and 3) an encoding bottleneck that delays T2 processing and leaves it susceptible to backward masking at short lags (due to online T1 processing), both affect T2 performance.
Chartier, Cousineau and Charbonneau (1994) presented a framework to account for the AB observed when subjects searched a stream of green digit distractors for two red digit targets. This model predicts that from an input layer, stimuli are evaluated via two networks, with one performing number identification and feeding this information into an auto-associator (working memory), and another comparing the color of each stimulus to the target color specified by task instructions. Stimuli are admitted to and maintained in the auto-associator - as well as selected for report - based on their weighting. An item’s weighting is in turn determined by the extent to which an attentional gate is open when the stimulus is represented at the identification layer (i.e., perceptually presented). If the gate is open when the stimulus is identified, it will have a high weighting and be more likely to be admitted to the auto-associator and reported. This gating mechanism opens when the color comparison process recognizes that a stimulus is a target. However, this gating process is inhibited and thus becomes less efficient when another stimulus is being encoded into working memory. It is this inhibitory process, together with its slow recovery rate, that leads to T2 being weakly weighted at short T1–T2 lags, thereby giving rise to the AB. In this model, Lag 1 sparing occurs because the time that the gate remains open exceeds the presentation duration of T1.
This model of the AB is an application of Taylor and Rogers’s (2002) influential theory of attention control: the Corollary Discharge of Attention Movement model (CODAM). Fragopanagos, Kockelkoren and Taylor (2005) suggest that RSVP processing proceeds in the following manner: Initially, stimuli pass through input and object map modules before they reach a working memory module, where they become consciously available. Crucial to this model’s account of the AB is the role played by an inverse model controller (IMC), which attentionally boosts items in the object map in order for them to be admitted into working memory. The AB occurs because this attentional boost is withheld from T2 at short lags to prevent it from interfering with the encoding of the first target. A monitor module inhibits the IMC until T1 processing is complete, which is determined by the monitor continually comparing the target representation to a predictor of the current stimulus, which is computed via the current attentional control signal (corollary discharge) from the IMC. When T2 appears in close temporal proximity to the first target, the IMC will be suppressed because the target representation will be that for T1, while the corollary discharge will represent the second target. As a result, T2 will not be attentionally enhanced and therefore not enter working memory. At longer lags, both the target representation and corollary discharge will be that for the second target, and hence no second target deficit will be observed. In this framework, Lag 1 sparing results from the temporal dynamics of the IMC inhibition, which doesn’t onset until after the T1+1 stimulus has been presented. Consequently, it too will be attentionally enhanced and enter working memory.
Nieuwenhuis, Gilzenrat, Holmes and Cohen (2005) suggest that the AB reflects the activation dynamics of the Locus Coeruleus (LC). The LC is a brainstem nucleus that is thought to contain up to half of all noradrenergic neurons in the central nervous system (Berridge & Waterhouse, 2003). It projects widely to many areas of cortex and it has been suggested that its innervation particularly influences regions involved in attentional processing (Nieuwenhuis, Aston-Jones & Cohen, 2005). With respect to attentional tasks, it has been hypothesized that the presentation of a salient stimulus triggers LC neurons, causing the release of norepinephrine in brain areas innervated by the LC, thereby enhancing the responsivity of these areas to their input. This LC response is phasic, and the duration of norepinephrine modulation effects are also fleeting, lasting less than approximately 200 ms. After this initial firing, the LC enters into a refractory period where it does not respond to subsequent salient stimuli for approximately 500 ms. In Nieuwenhuis et al.’s (2005) AB model, there are two major components: the LC, and a behavior network that is made up of input, detection and decision layers. On detection of a salient stimulus, the LC transiently adjusts the gain of the behavioral network by simulating the release of norepinephrine. Following this phasic response, the LC is suppressed and unavailable to enhance subsequent target processing, thus causing the AB. Importantly, the magnitude of this post-T1 suppression and thus, the AB, is tied to the size of the LC phasic response to T1 giving this framework a capacity-limited component. As is the case with many of the models described above, Lag 1 sparing results due to the temporal dynamics of the attentional enhancement rather than the nature of the T1+1 stimulus.
The global workspace model is an influential theory of conscious perception and attentional control (Baars, 1989; Dehaene, Sergent & Changeux, 2003). This framework has many similarities to the bottleneck theories described above and predicts that for individuals to become aware of a specific stimulus, the item must enter a global neuronal workspace. This workspace activates neurons with long-distance axons capable of connecting distinct brain regions, such as those responsible for higher level processing and those involved in initial sensory analysis. Once a stimulus has activated a sufficient set of workspace neurons, activity becomes self-sustained and this item can then be employed by a variety of neural areas via the long-distance connections. However, once activated by a stimulus these workspace neurons inhibit neighboring workspace neurons thus making them unavailable for subsequently presented stimuli. When two targets are presented in close temporal proximity, they each go through an initial sensory stage of information processing by distinct neuronal assemblies (“feed forward sweep”) that do not inhibit one another. The AB results when these two neural assemblies compete for access to the global workspace (“top-down amplification”), with the winner’s activity becoming self-sustained and triggering consciousness. Recovery from the AB occurs at later lags once the T1 brain-scale state has subsided, allowing the global workspace to become available for T2. Lag 1 sparing occurs due to the delayed onset of the workspace inter-neuronal inhibition, thereby allowing T1 and T2 to both enter consciousness. Thus, in this model Lag 1 sparing is governed by time rather than by the T1+1 item.
In the boost and bounce theory (Olivers & Meeter, 2008) capacity limits play no role in the generation of the AB. This model has two major stages: sensory processing and working memory. During sensory processing, both the perceptual features of a stimulus, such as its shape, color and orientation, and its high-level representations, including semantic and categorical information, are activated. As stimuli are presented at the same spatial location in RSVP, each item’s activation strength (during sensory processing) is influenced by those stimuli that appear around it due to forward and backward masking. Working memory plays several roles in the model. Firstly, it maintains task instructions establishing an attentional set. Secondly, it stores encoded representations, where items to be reported have been linked to a response. Finally, and most importantly, working memory employs an input filter that enhances the processing of stimuli that match the target set and inhibits stimuli that do not (i.e., distractors). Specifically, the input filter inhibits the distractors presented before T1, thereby preventing them from gaining access to working memory, and attentionally enhances T1, which can therefore gain access to this store. Because of its temporal proximity to T1 and the dynamics of the enhancement, the T1+1 distractor also receives a strong attentional “boost” despite the fact that it is a distractor from a different stimulus set than the target. The attentional enhancement of a distractor stimulus - an item that does not require report - triggers strong but transient suppression (“the bounce”) of subsequently presented stimuli by the input filter to prevent the T1+1 distractor from entering working memory, thus causing the AB. According to this model, Lag 1 sparing occurs due to the duration of the initial T1 boost, and this sparing extends to lags 2 and 3 of a continuous stream of targets (i.e., spreading of the sparing, Di Lollo et al., 2005; Olivers et al., 2007) because no inhibitory signal is elicited when the T1+1 (and T1+2) stimulus originates from the same target set as T1.
Bowman and Wyble (2007) presented a sophisticated theory of temporal attention and working memory known as the Simultaneous Type/Serial Token (STST) Model and recently extended it to become the Episodic STST framework (eSTST; Wyble, Bowman & Nieuwenstein, 2009). This model borrows heavily from Chun and Potter’s two-stage theory (1995; see also Chun, 1997a; Kanwisher, 1987) and suggests that the AB reflects processes involved in episodically distinguishing objects from one another. The eSTST account predicts that all stimuli in the RSVP are identified at a conceptual stage (i.e., have their type representation activated). However, for these stimuli to be reported they must have this identity information bound to a token in working memory that provides episodic information about the stimulus (e.g., the position of the item in the RSVP stream relative to other stimuli). For a type representation to be bound to a token, it must be attentionally enhanced by a blaster that is transiently triggered when a target is detected in the RSVP stream and it is the slow temporal dynamics of this blast that gives rise to Lag 1 sparing. However, because the binding process for a target is capacity-limited at the stage of episodic registration (i.e., two stimuli can not be encoded in the correct temporal position at the same time) and is susceptible to interference from other targets, the blaster is suppressed until the first target is linked to its specific token and consolidated into working memory. This T1-triggered blaster suppression prevents the T2 type from being bound to a token, thereby triggering an AB. Thus, the eSTST theory conceptualises the AB as arising due to a kind of unconscious perceptual strategy that helps subjects overcome their capacity limits in episodic registration by suppressing the processing of future targets until T1 is episodically registered.
An important aspect of this theory, and a key difference between the STST and eSTST models, is that due to the interaction of excitatory and suppressive processes, the blaster is maintained in an enhanced mode when target items appear in an uninterrupted sequence. As a result, the identity encoding of several successive targets can be successfully performed, thereby giving rise spreading of the sparing. However, the model further predicts that, under successive target conditions, there should be a high proportion of target report order reversals (it also explains other episodic errors, such as RB (Kanwisher, 1987)) and T1 accuracy should be reduced because of increased competition between T1 and T2. Both of these predictions, which are not explicitly modelled by other AB frameworks, have been experimentally confirmed (Bowman & Wyble, 2007; see also Chun & Potter, 1995).
The formal models described above are all connectionist frameworks that involve many parameters and complex interactions between layers of artificial neurons to model the AB. Recently, Shih (2008) has put forward a mathematical model – the attentional cascade theory – that has fewer parameters but yet provides a detailed account of the system that performs RSVP processing (this model, however, does not make any predictions regarding the neural processes involved in the AB). Like the eSTST theory, this framework borrows heavily from the two-stage account of Chun and Potter (1995); however it also incorporates characteristics of Shapiro et al.’s (1994; Shapiro & Raymond, 1994) interference theory. Specifically, the attention cascade model predicts that stimuli are initially processed along one of two channels: a mandatory pathway or a bottom-up salience pathway. Stimuli processed by the mandatory pathway activate conceptual long-term memory representations that are then passed into a “peripheral” sensory buffer. If a representation in this buffer matches the target template, it will trigger an attentional window and be enhanced. Following this enhancement, if there are enough available encoding resources, the target’s representation will undergo encoding/consolidation where its strength will grow further, leading to it being passed into a decision processor (within working memory). Stimuli with strong bottom-up salience can also trigger the attentional window and enter directly into the decision processor (bottom-up salience pathway). Indeed, it is this component of the model that can account for the finding that salient distractors that share features with the target set can also trigger an AB for a subsequent target (Folk, Leber & Egeth, 2002; Maki & Mebane, 2006). In Shih’s framework, the AB occurs because the encoding/consolidation processor is capacity limited, causing a second target appearing in close temporal proximity to a first one to wait for this encoding resource to become available, thereby leaving that second target susceptible to interference and decay. Lag 1 sparing is said to result from the duration of the attentional enhancement, which extends beyond the presentation time of T1. Importantly, because the duration of this attentional enhancement window may vary according to task demands, it could encompass the successive presentations of several targets in an RSVP, allowing all of them to be reported successfully. Thus, Shih’s model can also account for the spreading of the sparing results observed in three-target RSVP tasks. However, it should be noted that given that this model’s account of the spreading of the sparing is dependent on task strategy, it is somewhat difficult to see how the framework explains this result under conditions where uniform and varied trials are randomly intermixed instead of being blocked (e.g., Olivers et al., 2007).
In a recent computational model Taatgen, Juvina, Schipper, Borst and Martens (in press) have proposed that the AB reflects a protective production rule that prevents T2 from interfering with T1 consolidation. Borrowing from Anderson’s (2007) Adaptive Control of Thought – Rational (ACT-R) architecture, the threaded cognition framework conceptualises cognition as multiple processes that are threaded through a single processor (a single resource). With respect to dual-target RSVP search, this model predicts that target detection and consolidation can operate in parallel. However, due to default task allocation policies, target detection is held offline during the encoding of another target into working memory. Thus, at short T1–T2 lags subjects have impaired T2 performance because they adopt an implicit strategy to suppress detection of this second target until consolidation is complete. The model accounts for Lag 1 sparing and spreading of that sparing by assuming that the system recognises that targets appearing directly after T1 require report and that this supersedes the control production rule that protects consolidation. As a result, detection is not suppressed for these target stimuli. It should be noted that while this model appears compatible with the item-based account of Lag 1 sparing, it also has a temporal component as the application of the production rule and consolidation rate are independent of the stimulus presentation rate. On the surface, this model bears strong similarity to the eSTST framework (both suggest T2 detection is strategically suppressed during T1 processing). However, a key difference between the theories is that in the threaded cognition account, there are no capacity limitations; the AB merely reflects the application of an unnecessary protective rule. Indeed, Martens, Munneke, Smid & Johnson (2006) have identified a group of subjects that apparently are immune to the AB deficit, and Taatgen et al. (2009) suggest that these individuals do not apply the production rule under RSVP conditions.
Consideration of the theories described above indicates that virtually all of them provide adequate accounts of both the AB and Lag 1 sparing. In addition, many of these models include mechanistically comparable stages of RSVP processing. Indeed, each theory predicts that at least one or more of the following processes leads to the AB:
In this section we review key empirical results and how they fit with the mechanisms described above. Our summary and concluding remarks highlight the models that provide the most detailed account of the AB and associated findings.
As previously discussed, there is little evidence to support Raymond et al.’s (1992) gating hypothesis that it is the potential for featural confusion between T1 and T1+1 that gives rise to the AB by triggering the perceptual inhibition of post-T1 stimuli. Shapiro et al.’s (1994) finding that detection of T1 is sufficient to produce an AB and Chun and Potter’s (1995) result that categorically defined targets also elicit the second target deficit are inconsistent with this framework, as gating theory predicts that it is the potential for feature conjunction errors between the first target and the T1+1 item that leads to the AB. It should also be noted that these results cannot be explained by the gated auto-associator model (Chartier et al., 1994), as this framework was only designed to explain ABs observed when both targets are defined by color. However, it should also be noted that the gated auto-associator model is not alone in terms of being limited to a particular task. Due to their specificity, many of the formal models described are only able to account for the AB under a particular set of conditions. For example, Wyble et al.’s (2009) eSTST framework only models RSVP performance where targets are defined categorically (e.g., two digit targets amongst letter distractors).
Further evidence against gating theory and its prediction that the AB has a perceptual locus is that stimuli that are not reported from RSVP streams nevertheless undergo semantic/conceptual processing. Luck, Vogel and Shapiro (1996) examined the extent to which missed second targets were processed in the AB using ERPs and observed an N400 – a component associated with semantic processing – for a second target in an AB task even when that target failed to be reported. Similarly, Shapiro, Driver, Ward and Sorensen (1997) found in a three-target RSVP search that a missed T2 could conceptually prime report of a subsequent target. In addition, Marois et al. (2004) have demonstrated, using functional magnetic resonance imaging (fMRI), that distractor manipulations activate high-level visual areas in the brain (see also Kranczioch et al., 2005; Sergent et al., 2005). And it is not only missed T2s that undergo semantic analysis, but also distractors: Maki, Frigen et al. (1997; see also Chua, Goh & Hon, 2001) presented subjects with RSVP streams of words and found that a distractor that appeared in close temporal proximity to the second target could semantically prime that target’s report. Thus, there is good evidence that in standard RSVP tasks non-reported stimuli undergo considerable processing.
It should be noted, however, that under some conditions, initial processing of RSVP stimuli does display capacity limits. Like Luck et al. (1996), Giesbrecht, Sy and Elliott (2007) examined the N400 for missed second targets in an RSVP and found that while an N400 was observed for T2 when T1 involved a low perceptual load (T1 spatially flanked by congruent distractors), it was completely suppressed for trials where T1 involved a high perceptual load (T1 flanked by incongruent distractors). Similarly, in a series of studies, Dell’Acqua, Jolicœur, Robitaille and Sessa (Dell’Acqua, Sessa, Jolicœur, & Robitaille, 2006; Jolicœur, Sessa, Dell’Acqua, & Robitaille, 2006a; Jolicœur, Sessa, Dell’Acqua, & Robitaille, 2006b) have found that the N2pc, a pre-semantic ERP component thought to be associated with visuo-spatial shifts of attention, is also influenced by the AB. In addition, Williams, Visser, Cunnington and Mattingley (2008) have shown with fMRI that activity in primary visual cortex is also sensitive to the AB. No current theory of the AB can fully account for these results. However, they can be accommodated by assuming that in such non-standard AB tasks - where there are either spatially displaced target or distractor stimuli (as was the case in the studies described above) - that the attentional resources devoted to target processing (see Lavie, 2005) can lead to missed/non-reported RSVP stimuli only being processed up to early perceptual levels. These exceptions aside, the bulk of the evidence reviewed here does suggest that, at least for standard AB tasks, missed stimuli are processed post-perceptually.
The preceding section suggests that the perceptual inhibition of post-T1 stimuli due to the potential for a feature conjunction error between T1 and the T1+1 stimulus is not responsible for the AB. However, can gating theory be saved by assuming that such inhibition takes place at a post-perceptual level of information processing and that it is elicited by the T1+1 distractor due to it interfering with T1 encoding (boost and bounce model; Olivers & Meeter, 2008)? A key prediction of the boost and bounce theory is that it is inhibition of post-T1 stimuli that give rise to the AB. However, this hypothesis is inconsistent with the results of an individual differences analysis of the AB (Dux & Marois, 2008; see also Martens et al., 2006) that suggested that subjects who inhibit post-T1 distractors actually exhibit enhanced T2 performance at short Lags (attenuated ABs). These results are not only opposite to those predicted by post-T1 suppression accounts of the AB (Olivers & Meeter, 2008; Raymond et al., 1992), they also provide support for the hypothesis that a failure of distractor inhibition contributes to the AB (see Dux & Harris, 2007a).
The evidence that missed targets and distractors in RSVP undergo considerable processing is consistent with all of the AB models that conceptualise this deficit as a post-perceptual phenomenon. Similarly, Dux and Marois’s (2008; see also Dux & Harris, 2007a) individual differences study is only problematic for models which predict that sustained suppression of all stimuli post T1+1, triggered by a post-T1 distractor, gives rise to the AB (e.g., Olivers & Meeter, 2008; Raymond et al., 1992). A point of greater conjecture amongst models of the AB, however, is whether the deficit is contingent on online T1 processing (i.e., T1 working memory encoding/episodic registration/response selection; see bottleneck theories, hybrid models, global workspace model, gated auto-associator model, CODAM, LCNE, attention cascade theory, eSTST, threaded cognition model) or whether it results from mechanisms that are independent of, and/or subsequent to, T1 encoding/episodic registration/response selection (see delayed attentional engagement; interference theory; TLC; the boost and bounce models). Given that several AB theories make very different predictions regarding the influence of T1 processing on T2 performance, numerous studies have examined the effect of T1 manipulations onto the magnitude of the AB2.
Jolicœur’s (1998; 1999) finding that an increase in the number of T1 response alternatives leads to larger AB magnitude supports the hypothesis that the extent to which subjects process T1 directly influences T2 performance, and thus causes an AB. Further support for this hypothesis comes from Ouimet and Jolicœur (2007; see also Akyürek, Hommel & Jolicœur, 2007; Colzato, Spapè, Pannebakker, & Hommel, 2007) who have demonstrated that the AB is also larger when T1 working memory encoding load is increased. In addition, several studies have shown that increasing the masking strength of the T1+1 item (either perceptually or conceptually), and thus increasing the time required to process T1, leads to a larger T2 deficit (Dux & Coltheart, 2005; Grandison, Ghirardelli & Egeth, 1997; McAuliffe & Knowlton, 2000; Raymond, Shapiro & Arnell, 1995; Seiffert & Di Lollo, 1997; Shore, McLaughlin & Klein, 2001; see also Marois et al., 2004). Finally, an inverse correlation between T1 performance and AB magnitude has also been observed at the individual subject level (Dux & Marois, 2008; Martens et al., 2006; Seiffert & Di Lollo, 1997).
However, not all T1 manipulations have been shown to affect the magnitude of the AB. Ward et al. (1997), for example, demonstrated that the AB was unaffected by whether subjects had to make “easy” or “difficult” size discriminations for T1. Similarly, Shapiro et al. (1994) reported no difference in AB magnitude between conditions where subjects had to simply detect T1 or identify it. Furthermore, McLaughlin, Shore and Klein (2001) found that manipulating the perceptual quality of the T1 stimulus by covarying the exposure duration of T1 and the T1+1 mask did not influence the size of the AB.
The mixed evidence that T1 manipulations influence the AB has led some researchers to postulate that T1 processing and the AB are not related. According to this view, previous results suggesting a link between T1 and AB magnitude can be accounted for by processes that are unrelated or subsequent to online T1 working memory encoding, episodic registration or response selection, such as task switching between T1 and T2, post-T1 stimulus suppression, offline target retrieval, and attentional filter disruption (e.g., Potter et al., 1998; see also Enns, Vissser, Kawahara & Di Lollo, 2001; Kawahara, Zuric, Enns & Di Lollo, 2003; McLaughlin et al., 2001; Olivers & Meeter, 2008). In addition, it could be argued that some T1 manipulations, such as those employed in Jolicœur’s speeded AB studies, made the experimental task so different from standard AB paradigms that mechanisms related to the blink were no longer tapped. However, a number of studies have found T1 effects on the AB without a task switch between T1 and T2, while holding T1+1 masking strength constant and where both targets required a delayed response. For example, in RSVP streams containing word targets, increasing the difficulty (and presumably the duration) of T1 encoding by making this stimulus disyllabic instead of monosyllabic - a manipulation known to increase working memory difficulty for words (Baddeley, Thomson & Buchanan, 1975; see also Coltheart & Langdon, 1998; Coltheart, Mondy, Dux & Stephenson, 2004) - decreased T2 performance at short T1–T2 lags but not at long T1–T2 lags (Olson, Chun & Anderson, 2001). This suggests that increasing the phonological length of T1 delays T2’s admittance to the encoding bottleneck. Similarly, Dux and Harris (2007b) presented subjects with 2D line drawings of objects and manipulated T1’s in-plane orientation, which previous studies (e.g., Jolicœur, 1985) have demonstrated influences the time required to recognize objects. They observed that the magnitude of the AB was increased when T1 was presented at 90 degrees rotation (an orientation where objects take longer to name relative to upright and upside down objects, see, Jolicœur, 1985), providing more evidence that online T1 processing influences the AB.
If T1 processing affects the AB, then why did some of the studies fail to find an effect of this variable on the magnitude of this deficit? One potential reason for this discrepancy in the literature is that outlined by Olson et al. (2001; see also Visser, 2007, for a similar point of view), who claimed “that the blink is sensitive to only those variables that immediately affect attentional processing between initial sensory registration of a target and consolidation into working memory. The difficulty of tasks that can be performed on representations in working memory, after consolidation has been completed, do not affect T2 performance” (p. 1117). In addition, Visser (2007) has suggested that strong masking manipulations of T1 may cut short T1 processing, thereby reducing its influence on the processing of subsequent targets. It is therefore possible that previous experiments that failed to observe effects of T1 manipulations on the AB did so because they did not tap a stage of processing that occurred prior to or at the bottleneck. This point also raises an intrinsic limitation of the AB paradigm, which is that because it relies on accuracy rather than reaction time as a measure of performance, it is difficult to temporally pinpoint the different stages of processing that take place during that task and to identify which of these stages are the loci of interference in dual-target paradigms. Nevertheless, the discussed T1 effects are problematic for models that posit that it is the T1+1 stimulus that elicits the AB rather than T1 processing (e.g., delayed attentional engagement; the boost and bounce model) as these hypotheses ascribe a limited role for a T1 bottleneck in the generation of the AB and thus would not predict that delaying T2’s access to the bottleneck (through a T1 difficulty manipulation) would increase the magnitude of the T2 deficit (the exception is the threaded cognition model, which despite its assumption of unlimited T1 processing capacity, predicts that T2 detection is suppressed by an unnecessary production rule until T1 processing is complete). Similarly, the T1 manipulation results are not easily reconcilable with Shapiro et al.’s (1994) interference theory, for while this account suggests that capacity-limited T1 processing underlies the AB, the limitation it proposes to be responsible for the deficit comes at the information processing stage of memory retrieval, which takes place offline from the RSVP.
The discussion of T1 vs. T1+1 processing also applies to Lag 1 sparing, as there is theoretical dispute as to whether the effect is determined by T1 and T2 having an SOA of 100 ms (e.g., Bowman & Wyble, 2007; Chun & Potter, 1995) or by the categorical identity of the T1+1 stimulus (e.g., Di Lollo et al., 2005). Bowman and Wyble (2007; see also Nieuwenhuis et al., 2006; Potter et al., 2002) examined this question by reducing the stimulus exposure duration of the RSVP stimuli in a standard AB task (two letter targets amongst digits) to 54 ms. This manipulation created, at lag 2, a condition that pitted the lag at which AB is normally maximal against the time when Lag 1 sparing is usually obtained (108 ms). Under these conditions, sparing was observed at lag 2, for at this lag T2 performance was comparable to that for both T1 (cited in Oliver & Meeter, 2008) and T2 performance at post-AB Lags, and was significantly greater than T2 performance at AB lags (post lag 2). Thus, Bowman and Wyble (2007) concluded that Lag 1 sparing is time-dependent.
Recently, however, Martin and Shapiro (2008) have suggested that Lag 1 sparing is determined by the nature of the T1+1 stimulus rather than the temporal distance between T1 and T2. In one of their experiments, subjects searched for two white letter targets amongst black letter distractors with each stimulus appearing for 17 ms followed by a temporal gap of 85 ms (102 ms per item RSVP rate). In two Lag 1 (SOA of 100 ms) experimental conditions, a black digit was inserted 34 and 64 ms after T1 onset in the temporal gap between T1 and T2 (the T1–T2 SOA was maintained at 102 ms in these two conditions). These trials were compared to a control condition where no distractor appeared in the temporal gap between the targets. Martin and Shapiro (2008) found that T2 performance at Lag 1 was superior in the control condition relative to the experimental conditions and concluded that Lag 1 sparing is determined by the nature of the stimulus that follows T1 rather than by the time that elapses between the two targets. However, it should be noted that even with the presence of the inserted distractor between the two targets at Lag 1 (SOA 100 ms), T1 and T2 performance here did not significantly differ and T2 performance at Lag 1 was always superior to that at Lag 3 (the point where the AB was maximal). Thus, it is questionable whether Lag 1 sparing really was absent in the distractor conditions, suggesting that Lag 1 sparing is primarily determined by T1 and T2 having an SOA of approximately 100 ms.
As discussed above, the evidence that T1 processing plays an important role in generating the AB is problematic for theories which predict that the AB is independent of T1 encoding/episodic registration/response selection. However, if online capacity-limited T1 processing contributes to the AB, then why did Di Lollo et al. (2005, see also Olivers et al., 2007; Kawahara, Kumada & Di Lollo, 2006) find that three consecutive RSVP targets can be recalled equally well if they are members of the same stimulus category (uniform condition) but not if the middle target is a member of a different category (varied condition)? As the target number is increased for uniform trials relative to the standard AB task, online T1 processing accounts predict that an AB would be observed under these conditions as well. Thus, spreading of the sparing appears problematic for AB models - namely the bottleneck, hybrid, global workspace; gated auto-associator, CODAM, and LCNE models – that predict that online T1 capacity limitations underlie the AB3.
Dux, Asplund and Marois (2008; 2009) have recently disputed the claim that the AB deficit is abolished under uniform conditions. Specifically, they pointed out that while the studies of Di Lollo et al. (2005), Olivers et al. (2007) and Kawahara et al. (2006) found that T3 and T1 performance did not differ in uniform trials, these authors also showed that T3 accuracy increased and T1 performance decreased in uniform trials relative to the varied trials, suggestive of a trade-off between T1 and T3 in the uniform condition. Such a T1-T3 trade-off is consistent with online T1 capacity limitations contributing to the AB, but not with the hypothesis that it is the appearance of the T1+1 distractor (or the discontinuation of target information) that elicits the deficit, as these models ascribe a limited role for T1 processing in the blink (gating theory, boost and bounce; delayed attentional engagement; TLC, eSTST models, threaded cognition model).
To test their hypothesis that a T1-T3 trade-off underlies the disappearance of the AB under uniform conditions, Dux and his colleagues (Dux et al. 2008; 2009) manipulated, both exogenously and endogenously, the extent to which subjects devoted attention to T1 and examined its influence on the T1-T3 performance difference (AB). The logic they followed was that if the standard uniform effect was due to a trade-off between the first and third targets, then making the T1 stimulus more salient would increase the attentional resources subjects devoted to its encoding and thus reduce the resources available to T3 encoding, leading to a T1-T3 performance difference.
To exogenously manipulate the attention devoted to T1, Dux et al. (2008) colored the targets and the post-T3 distractors red (pre-T1 distractors were white) so that T1 would exogenously capture attention due to its abrupt color onset (Maki & Mebane, 2006). Consistent with a trade-off between T1 and T3, this manipulation improved T1 performance but worsened T3 performance (but see Olivers, Spalek, Kawahara & Di Lollo, 2009). As a further test of a T1-T3 performance trade-off under uniform conditions, Dux et al. (2009) endogenously manipulated the attentional resources subjects devoted to either T1 or T3 by varying their task relevance in separate blocks of trials. Specifically, in T1-relevant blocks of trials, subjects had to report T1 on all trials, whereas T3 (and T2) required report on only 50% of the trials (and vice versa for T3-relevant blocks of trials). Dux and colleagues predicted that more attention would be devoted to the target that was 100% task-relevant relative to the other two targets. Consistent with this prediction, T1 performance was superior to T3 performance in T1-relevant blocks (suggestive of an AB), and conversely T3 performance was greater than T1-performance in T3-relevant blocks (a reversed AB!). Importantly the exogenous and endogenous manipulations had a similar, though somewhat reduced, effect on T1 and T3 in the varied trials. This result suggests that the same processes are involved in sharing attentional resources between targets regardless of whether the targets are presented successively or separated by distractors, thus generalizing Dux et al.’s results to more typical AB paradigms. Nevertheless, further research is warranted in order to understand why subjects trade-off T1 and T3 performance to a greater extent in the uniform trials than in the varied trials. One particularly worthwhile hypothesis to explore is the possibility that attentional resources can be more easily manipulated across stimuli when these stimuli belong to the same attentional episode (as when targets are successively presented) than to different episodes (as when targets are separated by distractors).
Collectively, Dux et al.’s (2008, 2009) results are consistent with the hypothesis that the absence of an AB in standard uniform trials is the result of a trade-off between T1 and T3 performance, and resonate with recent neuroimaging findings showing that targets share attentional resources during RSVP processing (e.g., Sergent et al., 2005; Shapiro, Schmitz, Martens, Hommel, & Schnitzler, 2006). They also fit very well with recent research from Dell’Acqua, Jolicœur, Luria and Pluchino (2009) who also found an AB under uniform conditions when T3 report accuracy was conditionalized on correct report of the first and second targets. On the other hand, these findings are problematic for models which predict that it is exclusively the processing of the T1+1 distractor that gives rise to the blink (e.g., Di Lollo et al., 2005; Olivers & Meeter, 2008). Furthermore, Nieuwenstein et al. (2009)’s study suggests that the presence of the T1+1 distractor may not even be necessary for an AB to be observed, as they have shown that an AB can occur even when no stimuli follow T1 (no T1 masking) as long as T2 is presented briefly and heavily masked (T2 must be masked to obtain an AB unless there is a task-switch between T1 and T2; Giesbrecht & Di Lollo, 1998; Kawahara, Zuvic, Enns & Di Lollo, 2003). This is additional strong evidence against T1+1 distractor based accounts of the AB. (e.g., Di Lollo et al., 2009; Olivers & Meeter, 2008).
Thus far, our examination of the literature suggests that the AB results, at least to some extent, from the devotion of capacity-limited attentional resources to T1, leaving too few of these resources available for processing the second target at short T1–T2 lags. While the exact nature of this capacity-limited T1 processing has yet to be fully elucidated (is it identity encoding into working memory, episodic registration, response selection, a combination of these processes or something else?), Nieuwenstein and his colleagues (Nieuwenstein et al., 2005; Nieuwenstein, 2006; for similar findings see also Olivers & Meeter, 2008; Olivers et al., 2007; Wee & Chua, 2004) have demonstrated that an all-or-none, inflexible T1 bottleneck, where T2 waits for attentional resources until T1 has been fully processed, cannot be the mechanism responsible for the AB. Nieuwenstein et al. (2005) presented subjects with RSVP streams of black letter distractors and red digits targets and found that the AB was attenuated when a red distractor unexpectedly appeared one or two lags before T2. This effect was observed even when the cue and T2 were of different colors as long as both colors were task-relevant (e.g., search for red T1 and green T2 with presentation of a red cue; Nieuwenstein, 2006). Importantly these cuing manipulations had limited effects on T1 accuracy, suggesting that a trade-off between T1 and T2 was not responsible for the result. These findings suggest that the engagement of attention to the second target is either suppressed or delayed by attentional processing of T1, as cuing attention just prior to the presentation of the second target reduces the AB (see also Chun, 1997b; Vul, Nieuwenstein & Kanwisher, 2008). Thus it appears that impaired/supressed attentional enhancement of the T2 representation - a feature common to the delayed attentional engagement, LCNE, CODAM, eSTST, attention cascade and threaded cognition models of the AB - plays a vital role in generating the second target deficit.
The hypothesis that T1 processing suppresses the attentional enhancement of T2 also fits well with a range of other findings in the literature. Shapiro, Caldwell and Sorensen (1997) demonstrated that the AB was substantially attenuated when T2 was a subject’s own name compared to a different name. Similarly, Anderson and Phelps (2001) found that the AB was reduced when T2 was an emotionally arousing word. It may be the case that T2 performance was improved under these conditions because the strong bottom-up attentional saliency of these stimuli more than compensated for the attentional suppression brought about by T1 processing. In addition, a reduction in the extent to which attention is suppressed by T1 processing may explain cases where distraction reduces the AB (Taatgen et al., 2009; Wyble et al., 2009), findings which have been taken as evidence inconsistent with the predictions of online T1 processing accounts. For example, Olivers and Nieuwenhuis (2005; see also 2006) found that the AB was reduced when subjects performed a concurrent auditory detection task. Similarly, Arend, Johnston and Shapiro (2006) demonstrated that the AB was attenuated when the RSVP stream was presented on top of task-irrelevant moving dots (star-field motion). It may be the case that distraction relieves attentional suppression of T2 by preventing the over-commitment of attention towards T1 and the RSVP stream. In any event, the results discussed in this section suggest that while T1 processing may limit the subsequent encoding/episodic registration/response selection of T2 at short lags, a major factor giving rise to the AB deficit is the suppressed attentional enhancement of the second target when it appears in close temporal proximity to T1.
Our examination of the theories and empirical studies discussed above suggests the following processes may give rise to the AB: During a standard dual-target RSVP task, all stimuli in the stream are processed both perceptually and conceptually, with semantic information about each of these stimuli available for further processing after this preliminary analysis (e.g., Chun & Potter, 1995; Luck et al. 1996; Maki, Frigen et al., 1997; Shapiro, Driver et al., 1997). The strength of these initial representations is determined by their salience (Anderson & Phelps, 2001; Arnell, Killman, & Fijavz, 2007; Most, Chun, Widders, & Zald, 2005; Shapiro, Caldwell et al., 1997; Smith, Most, Newsome, & Zald, 2006) and by the similarity (both perceptual and conceptual) between the target and distractor stimuli presented in the stream (Chun & Potter, 1995; Dux & Coltheart, 2005; Maki et al., 2003), with greater similarity between items leading to greater masking and, therefore, weaker representations. Based on the attentional set established from the task instructions (e.g., Shapiro et al., 1994), distractor items are inhibited (e.g., Dux et al., 2006; Dux & Harris, 2007a; Olivers & Watson, 2006), and upon detection of a target (or a highly salient distractor, Arnell et al., 2007; Most et al., 2005; Smith et al., 2006), an attentional episode is triggered (e.g., Bowman & Wyble, 2007; Chun & Potter, 1995). This attentional episode leads to the enhancement of the representation of the target stimulus and the T1+1 target/distractor (if there is one) due to the temporal dynamics of the attentional deployment. Stimuli that are processed in the same attentional window compete to be admitted to higher stages of processing (e.g., Potter et al., 2002; Potter, Dell’Acqua, Pesciarelli, Job, Peressotti & O’Connor, 2005) with the winner of this competition undergoing episodic registration, working memory consolidation, and/or immediate response selection – all processes that require attention. Typically, under standard RSVP conditions and timing, it will be the T1 stimulus that wins this competition due to its salience, its head start in preliminary identification relative to the T1+1 item and, in the case of T1+1 being a distractor, its task-relevance. Because encoding, episodic registration and response selection stages of processing are attentionally demanding, other stimuli that appear in close temporal proximity to T1 will not receive the same attentional enhancement (except when T2 appears at Lag 1 – see below) and access to working memory, leaving them vulnerable to decay and overwriting (e.g., Chun & Potter, 1995; Giesbrecht & Di Lollo, 1998). Put differently, as attention is devoted to encoding/registering/responding to the first target, this limits/suppresses the attention that is available to enhance subsequently presented targets (e.g., Bowman & Wyble, 2007; Chun & Potter, 1995) and the ability of the system to inhibit distractors (e.g., Dux & Harris, 2007a; Dux & Marois, 2008). All these factors contribute to the generation of the AB. Under conditions where T2 appears at Lag 1, this will typically lead to sparing of T2 because both T1 and T2 will be attentionally enhanced and undergo high-level processing simultaneously. Nevertheless, at Lag 1 there will still be competition between the target stimuli which may result in the superior report of T2 compared to T1 (e.g., Bowman & Wyble, 2007; Chun & Potter, 1995; Potter et al., 2002) and significant numbers of temporal order swaps between the two targets (e.g., Bowman & Wyble, 2007; Chun & Potter, 1995; Hommell & Akyürek, 2005).
The AB is a robust phenomenon that has been demonstrated across a wide range of experimental conditions. Our review of the literature suggests that the AB reflects the competition between targets for attentional resources, not only for working memory encoding, episodic registration and response selection (and perhaps additional processes that have yet to be identified), but also for the enhancement of target representations and the inhibition of distractors. T1 processing renders these attentional resources temporarily unavailable for subsequent stimuli, thereby impairing the report of the second target at short T1–T2 lags. Of the current models, those proposed by Wyble et al. (2009; eSTST; Bowman & Wyble, 2007; see also Chun & Potter, 1995) and Shih (2008; attention cascade model) accommodate the largest number of empirical findings as they incorporate limited capacity T1 processing (episodic registration in the eSTST and encoding in the attention cascade model) which leads to the impaired attentional enhancement for subsequent targets at short T1–T2 lags. Although Dehaene et al. (2003), Chartier et al. (2004), Fragopanagos et al. (2005), Nieuwenhuis et al. (2005), Nieuwenstein et al. (2009) and Taatgen et al. (2009) present somewhat related hypotheses, these models fail to incorporate mechanisms that account for several important findings (e.g., those related to T1+1 and T2+1 masking in the RSVP stream). Having said this, no current theory can fully account for all the findings related to this complex phenomenon.
Irrespective of which model best fits the AB literature, our theoretical summary seems to argue for a multifactorial origin to this processing deficit: Attentional selection, working memory encoding, episodic registration, response selection, attentional enhancement and engagement, and distractor inhibition have all been implicated in limiting multi-target performance in RSVP. However, it is also possible that these multiple processes rely on a common capacity-limited attentional resource, and that it is this resource that underlies the AB deficit. According to this view, the same process that is responsible for the trade-off between T1 and T3 performance in the serial target experiments of Dux et al. (2008; 2009) is the same which underlies the AB impairment in the distractor-less design of Nieweistenin et al. (2009), or the attenuating effect of distraction in the experiments of Olivers and Nieuwenhuis (2005; 2006); namely, the deployment of selective attention. The more attention that is deployed for T1, either because it is more salient, more task relevant or requires more encoding into working memory, then the less that is available to process subsequent targets. Similarly drawing attention away from T1, either by cuing a distractor prior to T2 (e.g., Nieuwenstein, 2006) or by including distracting tasks (see above), may alleviate the T2 deficit. The neuroimaging evidence that AB manipulations recruit the frontal-parietal attentional networks of the brain (Hommel et al., 2006; Marois & Ivanoff, 2005) adds further weight to the view that, first and foremost, the attentional blink represents a deficit of selective attention. Attention, after all, is generally regarded as the mechanism by which behaviorally relevant items, such as targets, are selectively processed over other items, such as distractors (Pashler, 1998). Thus, any stages of information processing that are involved in achieving that behavioral goal may be the recipient of attention, and hence contribute to the AB deficit.
While appealing in its simplicity, this selective attention account of the AB - like all the models elaborated above - does not encapsulate all of the characteristics of this deficit. But as mentioned previously, it is unlikely that a single mechanism can explain the myriad of AB findings. Moreover, even if the AB can be attributed to more than one process, it remains to be determined the extent to which each of these processes contributes to the deficit. In particular, there have been few attempts to distinguish, both in the theoretical and experimental literature, between the factors that cause the AB (i.e., are essential for its occurrence) and those that merely modulate its magnitude. Evidently, more research is needed to further understand the cognitive mechanisms that give rise to the AB and, more importantly, the implications of this fundamental temporal processing limitation for visual awareness.
This work was supported by an ARC grant (DP0986387) to P.E.D. and NIMH (R01 MH70776) and NSF (#0094992) grants to R.M. We thank Howard Bowman, Roberto Dell’Acqua, Mark Nieuwenstein, Adriane Seiffert and Kimron Shapiro for helpful comments.
1Here we list the eSTST model as a framework consistent with online T1 resource depletion accounts of the AB because this theory suggests that episodic registration of T1 is capacity-limited and that this registration causes the attentional blaster to be suppressed when T2 is presented at short lags. It should be noted, however, that Wyble et al. (2009) view their account as one that implicates a perceptual strategy at the origin of the deficit.
2Note that here we discuss the influence of T1 processing on the AB rather than the influence of T1 difficulty on the AB. This is an important distinction because a significant problem with accuracy data is that it does not allow one to fully elucidate whether incorrect responses reflect short or long duration processing (the same can be said of correct responses). Put differently just because subjects show reduced performance on a T1 task relative to another task does not mean that they devoted the same amount of attention/time to the two conditions, it may be the case that they devoted less to the former because it was too difficult (i.e., they quit processing the stimulus early and moved on to a subsequent task – T2). Thus, T1 capacity limited models of the AB don’t make directional predictions regarding the influence of T1 difficulty on the T2 deficit (difficult T1s could lead to bigger or smaller ABs) rather their directional predictions apply to the influence of the attention subjects devote to T1 on the AB.
3It should be noted that some frameworks that do incorporate some type of capacity-limited T1 processing, such as the eSTST model (limited capacity in the number of stimuli that can be episodically registered as distinct items at a time) and the attention cascade model (limited capacity in the number of items that can be encoded at a time), are not inconsistent with spreading of the sparing as the attentional enhancement of targets in these models is sustained under uniform conditions.