|Home | About | Journals | Submit | Contact Us | Français|
The response-signal speed-accuracy trade-off (SAT) procedure was used to investigate the relationship between measures of working memory capacity and the time-course of short-term item recognition. High- and low-span participants studied sequentially presented 6-item lists, immediately followed by a recognition probe. Analyses of composite list and serial position SAT functions found no differences in retrieval speed between the two span groups. Overall accuracy was higher for high spans than low spans, with more pronounced differences for earlier serial positions. Analysis of false alarms to recent negatives (lures from the previous study list) revealed no differences in the timing or magnitude of early false alarms, thought to reflect familiarity-based judgments. However, critically, analyses of false alarms later in retrieval indicated that recollective information accrues more slowly for low spans, which suggests that recollective information may also contribute less to judgments concerning studied items for low span participants. These findings can provide an explanation for the greater susceptibility of low spans to interference.
Measures of working memory capacity (WMC) have been found to predict performance in several cognitive tasks, including reading and language comprehension, vocabulary learning, writing, reasoning, problem solving, complex learning and procedural skills (see Bors & MacLeod, 1996 for a review). Given that successful performance on many complex cognitive tasks requires maintenance and access to the products of prior perceptual and cognitive analyses, a reliable relationship between WMC measures and performance in these types of tasks is not surprising. However, it has proved somewhat more challenging to identify the particular memory operations that differences in WMC measures reflect. In the reported study, we investigated the relationship between WMC measures and retrieval operations—both automatic and controlled retrieval operations—that underlie recognition judgments of recent events.
Research investigating WMC effects has often used performance on WM span tasks to form two contrast groups: High spans (HS), those whose performance falls in the upper quartile, and low spans (LS), those who are in the lower quartile (e.g., Unsworth, Heitz, Schrock, & Engle; 2005; Conway, Kane, Buntig, Hambrick, Wilhelm, & Engle, 2005). A major finding to emerge from studies of memory performance is that LS appear to have a higher susceptibility to interference arising from distracting material (see Engle, 2002 for a review). For instance, Kane and Engle (2000) showed that low spans were more susceptible to proactive interference compared to high spans. Additionally, they found that when participants were required to perform a secondary task, performance of high spans decreased, whereas low spans’ performance was unaffected. Accordingly, Kane and Engle suggested that high spans use attentional control to compensate for the negative effects of proactive interference on their memory performance, whereas low spans do not normally allocate their attention to resist interference.
The importance of attentional control has been implicated in other tasks as well, such as the antisaccade task (Kane, Bleckley, Conway, & Engle, 2001), the stroop task (Kane and Engle, 2003), and the dichotic listening task (Conway, Cowan, Bunting, 2001). Collectively, these studies provide support for the controlled attention hypothesis (e.g., Engle, Kane & Tuholsky, 1999; Kane et al., 2001), which proposes that individual differences in WMC reflect limitations in attention allocation for the specific goals of the task at hand, especially in the face of interference or distraction. Specifically, controlled (or executive) attention refers to “a capability whereby memory representations are maintained in a highly accessible state in the presence of interference, and these representation may reflect action plans, goal states or task-relevant stimuli in the environment” (Kane & Engle, 2002). Accordingly, this account specifically predicts WMC effects to emerge in circumstances that require controlled processing.
One way to assess the respective contributions of automatic and controlled processes to a cognitive task is to measure performance as a function of processing time, as the output from automatic operations is typically available before the output of controlled operations (e.g., McElree & Dosher, 1989; McElree, Dolan & Jacoby, 1999; Öztekin & McElree, 2007). However, to date, this approach has not been applied to study the effects of individual differences in WMC. Here, we report a time-course investigation of how WMC impacts on the retrieval dynamics of short-term (probe) recognition (McElree & Dosher, 1989), using the response-signal speed-accuracy trade off (SAT) procedure, which enables the independent and unbiased estimates of retrieval success and retrieval speed.
Measuring retrieval dynamics in the probe recognition task enables a direct assessment of whether differences in WMC are associated with differences in both retrieval speed and retrieval accuracy. To specifically assess how the contributions of automatic and controlled processes to memory performance may vary with WMC, we manipulated the recency of negative test probes (Monsell, 1978), a manipulation that has been widely used to investigate interference stemming from high episodic familiarity. Crucially for our purposes, manipulations of the recency of a negative probe served to put responses based on what is often regarded as an automatic assessment of familiarity in opposition to controlled retrieval operations (Jacoby 1991; McElree et al, 1999). Specifically, a recently studied negative probe will have higher familiarity than a less recently studied (distant) negative probe, leading to an increased false alarm rate. This higher false alarm rate can be corrected by the recovery of specific episodic (e.g., source) information about list membership.
A common finding in response time tasks (e.g., Monsell, 1978) is that correct rejections are longer and/or less accurate for recent negatives than distant negatives. Importantly, in SAT time-course studies (e.g., Hintzman & Curran, 1994; McElree & Dosher, 1989; Öztekin & McElree, 2007), recent negative probes have been found to induce high false alarm rates early in retrieval compared to distant negative probes. However, the elevated false alarm rates diminish later in retrieval when participants are able to recover more detailed episodic information (i.e. either that the probe was not a member of current study list, or that it was studied on previous trial). This nonmonotonic false alarm function for recent negatives is consistent with two processes in opposition: Automatic assessments of familiarity engender high false alarm rates due to their high residual familiarity, which are then subsequently countered by controlled, strategic retrieval operations that serve to recover detailed episodic information.
The critical question we addressed in this study is whether WMC differentially impacts on early familiarity-based judgments or on the recovery of episodic retrieval processes that are operative later in retrieval. A fine-grain time-course analysis has the potential to identify the underlying mechanisms responsible for the greater susceptibility of LS individuals to the presence of interference in a retrieval context. As noted, previous work suggests that WMC effects largely occur when controlled processing is required (e.g., Feldman-Barrett, Tugade, & Engle, 2004; see Engle, 2002; Unsworth & Engle, 2007 for reviews). To the degree that familiarity judgments are predominately based on automatic assessments of familiarity and judgments based on detailed episodic information required for controlled recollective processes, we expected to see differences between HS and LS groups in later phases of retrieval when source information is used to modulate responses based on familiarity. Tracking differences between HS and LS groups across the time-course of responses to recent and distant lures enabled us to observe differences in both the magnitude and the timing of the two types of information accrual.
The SAT procedure can be used to measure the accuracy and speed of processing in a wide range of cognitive processes, including sentence comprehension (e.g., Foraker, & McElree, 2007; Martin & McElree, 2008; McElree, Foraker & Dyer, 2003), visual attention (e.g., Carrasco, McElree, Denisova, & Giordano, 2003; McElree & Carrasco, 1999), and memory (reviewed in McElree, 2006). Application of SAT in the memory domain has largely focused on investigations of item recognition (e.g., Benjamin & Bjork, 2001; Hintzman & Curran, 1994; McElree & Dosher, 1989; Öztekin & McElree, 2007; Wickelgren, Corbett, & Dosher, 1980), although it has been implemented to characterize relational memory processes as well (e.g., temporal order—McElree & Dosher, 1993; spatial order— Grounlund, Edwards, & Ohrt, 1997; n-back discriminations—McElree, 2001).
The main advantage of SAT is that it provides conjoint measures of the accuracy and the speed of processing. In contrast, response time (RT) measures from a reaction time task do not provide pure measures of processing speed. One problem with RT measures is that they are subject to speed-accuracy trade-offs. More importantly, however, differences in the quality of memory representations can engender differences in RT, even when the underlying speed of information accrual is identical (e.g., Dosher, 1976, 1981; McElree & Dosher, 1989; Ratcliff, 1978; Murdock, 1971; Wickelgren, 1977; Wickelgren et al., 1980; see McElree, 2006 for a review and discussion). RT measures are therefore less than optimal in applications such as the present when the research goal necessitates breaking apart processing speed from terminal accuracy, and teasing apart the contributions of automatic and controlled processes— precisely what is required in the current investigation.
In the SAT procedure, participants are cued to respond to a response signal (a tone) presented at one of several (typically 6–7) times ranging from 40 to 3000 ms after the onset of the probe. The time of the response signal is random on any trial, and participants are trained to respond within 100–300 ms of the tone. Varying the response signal across this range of times allowed us to measure the full time-course of retrieval. This procedure enables us to construct a retrieval function—accuracy as a function of processing time—for each condition of interest for each participant. SAT retrieval functions typically show an early period of chance performance, followed by a period of rapid increase in accuracy as retrieval time increases, and finally an asymptotic period of accuracy is reached, where additional retrieval time does not improve accuracy (illustrated in Figure 1A). The shape of the functions is usually well fit by an exponential approach to a limit. Three parameters describe these functions: (a) an asymptotic accuracy parameter revealing overall limitations of memory, (b) an intercept, indicating the point in time at which performance departs from chance, and (c) a rate of rise from chance to an asymptote. The asymptote parameter indicates the probability of successful retrieval, while the intercept and the rate parameters jointly constitute retrieval speed measures.
Two-hundred and forty-three adults were screened in order to obtain working memory capacity measures using the automated reading span and the automated operation span tasks (Unsworth, Heitz, Schrock, & Engle, 2005; Conway, Kane, Buntig, Hambrick, Wilhelm, & Engle, 2005). Participants who scored in the upper quartile and lower quartile in both tasks constituted the high and low span groups respectively. Low and high span participants were then contacted for participation in the SAT experiment. A total of nine high span and ten low span individuals agreed to participate in the experiment. For the screening session, eighteen of the participants were paid for their time, and the remaining participants received credit for psychology classes via New York University’s subject pool system. All participants who took part in the experimental sessions were paid for their time.
The experiment consisted of six 1-hour sessions, completed over a period of several weeks. Each session contained three 20-minute blocks. Each block consisted of 168 experimental trials, in which participants studied a 6-item list and were cued to respond to a recognition probe following a brief visual mask. Participants indicated whether the test probe was a member of the study list.
The design of the study was based on McElree (1998), which used categorized lists. Lists composed of instances from different categories enable us to create negative probes that tested participants’ ability to discriminate studied categories from unstudied categories and studied instances from semantically related but unstudied instances of a studied category. The stimulus set consisted of 36 categories, containing 21 words each, from the category norms of Van Overschelde, Rawson, and Dunlosky (2004). A study list for a trial was constructed by randomly selecting (without replacement) three members of a semantic category for the first-three serial positions, and randomly selecting (without replacement) three members of another category for the remaining three serial positions. Selection of categories excluded the ones that were used on the preceding trial. Positive probes were drawn from one of the 6 serial positions of the current study list equally often. One quarter of the negative probes consisted of new lures (NN); lures that were not members of either of the two categories on the study list, or words presented on the study list of the preceding trial. One quarter consisted of distant negatives from the first category (DN1); unstudied members of the same semantic category of the first-three words in the study list. One quarter of the negative probes contained distant negatives from the second category (DN2); unstudied members from the same semantic category of the last-three words in the study list. The remaining quarter of the negative probes consisted of recent negatives (RN); lures drawn from studied words of the preceding trial. Note that as the selection of the categories of the current study list excluded those used on the preceding trial, RN probes were always from a category that had not been studied in the current trial. This ensured to isolate the effects of residual familiarity from recent study, independent of semantic similarity effects.
This design structure resulted in 36 trials for each of the seven response-deadlines for each of the six serial positions, and 54 trials for each response-deadline for each of the four lure types.
Figure 2 illustrates the sequence of events in a single trial: (a) A centered fixation point was presented for 500 ms. (b) Study words were then presented sequentially for 500 ms each (c) The study list was followed by a visual mask, consisting of non-letter symbols for 500 ms. (d) Following the mask, the test word was presented for the duration of the response-deadline. (e) At 43, 200, 300, 500, 800, 1500 or 3000 ms after the onset of the recognition probe, a 50-ms tone sounded to cue the participants to respond. (f) Participants indicated a yes-no recognition response as quickly as possible after the onset of the tone by pressing a key. (g) After indicating their response, participants were given feedback on their latency to respond. Participants were trained to respond within 300 ms of the tone 1. They were informed that responses longer than 300 ms were too slow and responses under 100 ms were anticipations, and that both should be avoided. (h) After the latency feedback, participants were asked to give a confidence rating ranging from 1 to 3 (“3” indicating high confidence, “1” indicating low confidence). The confidence ratings primarily served to enable participants to self-pace themselves through trials, and were not analyzed. Participants initiated the next trial by pressing a key. Participants were allowed to take breaks between blocks.
For positive trials, each participant’s hit rates were scaled against the false alarm rates to new lures (NN) to obtain (equal-variance Gaussian) d' measures. To ensure d' s were measurable, perfect performance in any condition was adjusted with a minimal corrections procedure, where hit rates higher than .99 were adjusted to .98, and false alarm rates lower than .01 were adjusted to .02, approximating the correction suggested by Snodgrass and Corwin (1988).
We averaged d's for the last two response-deadlines to obtain an empirical measure of asymptotic accuracy, which reflects the maximum amount of accuracy reached, and is a measure of probability of successful retrieval (e.g., McElree & Dosher, 1989, 1993; McElree, 2001; Öztekin & McElree, 2007). Figure 3 illustrates the average asymptotic accuracy across the six serial positions for the high span (HS) and low span (LS) groups. A 2 (group [HS vs. LS] × 6 (serial position of test probe) mixed ANOVA analysis conducted on the asymptotic d's indicated a main effect of group: HS participants were more accurate than LS participants [F(1,17) = 8.332, p < .010]. There was also a reliable main effect of serial position, with more recent serial positions exhibiting higher accuracy [F(1,17) = 138.260, p < .001]. In addition, this analysis indicated a significant group by serial position interaction: LS participants were less accurate than HS participants for early positions, but this difference was less prominent as the test probe was more recent [F(1,17) = 7.718, p < .013].
These asymptotic differences indicate that LS participants have lower overall probability of retrieval success in short-term recognition compared to HS individuals. Moreover, this difference in memory performance across the two groups is more prominent for the early members of the study list compared to more recent probes.
We estimated the retrieval dynamics by fitting the individual participants’ data and the average data (derived by averaging d' values for each condition across participants) with an exponential approach to a limit:
In Equation (1), d'(t) is the predicted d' at time t; λ is the asymptotic accuracy level reflecting the overall probability of recognition; δ is the intercept reflecting the discrete point in time when accuracy departs from chance (d' = 0); β is the rate parameter, which indexes the speed at which accuracy grows from chance to asymptote. Previous studies have indicated that this equation provides a good quantitative summary of the shape of the SAT functions (e.g., Dosher, 1981; McElree, 2001; McElree & Dosher, 1989, 1993; Wickelgren & Corbett, 1977; Wickelgren, Corbett & Dosher, 1980).
The quality of the fits in the analyses reported below was examined by three criteria, which have been established in previous SAT studies: (i) The value of an adjusted R2 statistic, which reflects the proportion of variance accounted for by a model, adjusted by the number of free parameters (Reed, 1973); (ii) the consistency of the parameter estimates across participants; (iii) evaluation of whether the fit yielded systematic deviations that could be accounted for by additional parameters.
Initially, we evaluated whether SAT functions showed the same essential pattern that has been observed in other studies (e.g., McElree, 1996; 1998; McElree & Dosher, 1989, 1993; Öztekin & McElree, 2007; Wickelgren et al., 1980). To do so, the SAT functions for the 6 serial positions were fit with sets of nested models that systematically varied the 3 parameters of Equation 1. These models ranged from a null model in which all functions were fit with a single asymptote (λ), rate (β), and intercept (δ) to a fully saturated (18-parameter) model in which each function was fit with a unique asymptote (λ), rate (β), and intercept (δ).
Consistent with prior studies, the data indicated retrieval dynamics differences due to fast rising functions for serial position 6. The 6λ-2β-1δ and 6λ-1β-2δ models provided the best fit of the empirical data. The 6λ-2β-1δ model allocated a separate asymptote (λ) to each serial position, one rate (β) for serial positions 1 through 5, another rate (β) for serial position 6 (the most recently studied item), and a common intercept (δ) for all the six serial positions. This two-rate model significantly increased adjusted-R2 value compared to a 6λ -1β-1δ model, t(18) = 4.61, p < .05. The increase in adjusted-R2 value resulted from a faster rate parameter for the last serial position. The rate in 1/β ms-units was 114 ms for serial position 6 versus 207 ms for other serial positions. The difference between the two rate parameters across participants was statistically significant, t(18) = 5.66, p < . 05. Similarly, the 6λ-1β-2δ model [that allocated a common rate (β) to all serial positions, one intercept (δ) for serial positions 1 through 5, and another intercept (δ) for serial position 6], also reliably increased adjusted-R2 value compared to the 6λ-1β-1δ model, t(18) = 3.98, p < . 05. The intercept was 281 ms for serial position 6 versus 337 ms for other serial positions. The difference between the two intercept parameters across participants was statistically significant, t(18) = −7.70, p < . 05.
For a subset of participants —five (out of nine) HS and two (out of ten) LS participants—there was evidence of a fast rising function that included the last three serial positions on the list. For these participants, a model that allocated the faster rate to the last three items further increased the adjusted-R2 value from the 6λ-2β-1δ and 6λ-1β-2δ models reported above. As the most recent three positions shared the same semantic category, this finding is consistent with previous findings (e.g., McElree, 1998; McElree, 2006) which have demonstrated that the retrieval dynamics advantage for the last item can be increased to several items when the items can be grouped into a single chunk. It appears that only a subset of the participants strongly engaged in chunking by semantic category as a strategy for encoding the list.
We now turn to how WMC measures affect the retrieval dynamics of recognition memory over the short-term. To conduct an initial comparison across the two WMC groups, and to determine whether the two groups differ in terms of retrieval dynamics, individual participants’ and average (over participants) d' values were averaged across serial positions. These composite list SAT functions were fit with Equation (1) as described above. The resultant parameter estimates across participants are reported in Table 1. Figure 4A illustrates the SAT functions for the average HS and LS data 2, with smooth curves indicating the fitted exponential functions. As shown in the figure, LS participants had lower asymptotic accuracy than HS participants (2.21 vs. 3.33 for average LS and HS groups respectively). Independent-samples t-test analysis conducted on the asymptote (λ) parameters across participants confirmed that this difference was reliable [t(17) = 3.246, p < .005].
Figure 4B presents the rate and intercept estimates across participants. There were no reliable differences in the rate (β) or the intercept (δ) parameters across the two groups. Hence, the data failed to reject the null-hypothesis that HS and LS participants differ in retrieval speed, either in the rate of information accrual, or when information first became available. As the speed differences occur due to differences operative early in retrieval, they may largely reflect the contributions of fast assessments of global memory strength or familiarity of the representation. Hence, the lack of a measurable difference in retrieval speed estimates across HS and LS participants is consistent with the controlled attention account, which asserts that differences across the two groups should occur only when controlled processing is required, and not for decisions that are based on automatic processing— such as strength or familiarity assessments (e.g., Barrett, Tugade, & Engle, 2004).
In addition to the list fits, we also examined performance in more detail by deriving and modeling SAT functions for each of the six serial positions (SP) for individual participants’ and the average data for HS and LS groups. These functions were fit with Equation (1) as outlined above. Parameter estimates for each serial position across participants and the average data for the LS and HS groups are reported in Table 2. Figure 5 shows the empirical SAT data and the fitted exponential functions for each of the six serial positions for the average HS and LS data. Between-group comparisons conducted on the asymptote (λ) parameter of the serial position SAT functions indicated a lower asymptote for LS compared to HS participants for all serial positions [SP 1, t(17)= 3.800, p < .001; SP 2, t(17)= 3.184, p < .005; SP 3, t(17)= 3.325, p < .004; SP 4, t(17)= 2.626, p < .018; SP 5, t(17)= 2.692, p < .015; SP 6, t(17)= 2.467, p < .025]. In addition, a 2 (group) × 6 (serial position) mixed ANOVA was conducted on the asymptote (λ) estimates derived from the fits for each serial position function. Similar to the analysis reported on empirical d' values (see Asymptotic Accuracy section), this analysis indicated a reliable main effect of serial position [F(1,17) = 48.760, p < .001], a reliable main effect of group [F(1,17) = 10.946, p < .004], and a reliable group by serial position interaction [F(1,17) = 7.611, p < .013]. Consistent with the retrieval dynamics measures derived from the composite list SAT functions, there were no reliable differences across LS and HS groups in terms of the rate and the intercept parameters, suggesting that the two groups did not differ in retrieval speed.
We next turn to our analyses of the negative probe conditions to examine whether and how HS and LS individuals differ in their response patterns to reject lures. Recall that four types of negative test probes were used: New Negatives (NN) from a different semantic category than the members of the current study list, which had not been studied on the current or the preceding trial; Distant Negatives from first category (DN1), an unstudied item from the same semantic category of the first-three serial positions of the study list; Distant Negatives from second category (DN2), an unstudied item from the same semantic category of the most recent three serial positions of the current memory set; Finally, Recent Negatives (RN), lures drawn from studied items in the preceding trial.
Initial analyses were conducted on false alarm (FA) rates across the four lure types, the seven response-deadlines and the two groups. FA rates across conditions for HS and LS participants are illustrated in Figure 6.
A 2 (group) × 7 (response-deadline) × 4 (type of lure) ANOVA was conducted on probability of falsely recognizing a lure as a member of the current study set. This analysis indicated a main effect of lure type [F(3,51) = 18.616, p < .001]. Additional comparisons across the lure conditions revealed that participants were reliably more accurate in rejecting NN lures, compared to DN1 lures [F(1,17) = 9.444, p < .007]. DN1 probes were marginally more accurate than DN2 probes [F(1,17) = 3.763, p < .069], and DN2 probes were more accurate than RN probes [F(1,17) = 4.941, p < .040]. Although the LS participants exhibited higher FA rates than HS participants in all of the four lure types, these differences did not reach statistical significance (mean FA rates for NN, DN1, DN2 and RN probes across the LS and HS groups were .163 and .117 respectively for NN, .185 and .136 for DN1, .233 and .152 for DN2, and .256 and .200 for RN probes).
In addition, there was a reliable interaction between condition (recent negative vs. new lures) and lag, [F(6,102) = 28.254, p < .001]: Participants had a higher tendency to false alarm to recent as compared to new lures early in retrieval, but the differences in FA rates diminished later in retrieval (see Figure 6). When the two false rates are directly scaled against one another, as in the next section, this interaction results in a non-monotonic false alarm rate pattern for rejecting recently studied lures across retrieval time. This type of difference scaling directly shows the greater tendency to false alarm to recently studied lures early in retrieval, which decreases with additional retrieval time. This pattern is consistent with previous research (e.g., Hintzman & Curran, 1994; McElree & Dosher, 1989; McElree, et al., 1999; Öztekin & McElree, 2007) and implicates the contribution of two types of information accrual to recognition memory judgments: global strength (or familiarity) information, which is dominant early in retrieval leading to high false alarm rates, and recollective information, which becomes available later in retrieval via controlled/strategic episodic retrieval attempts. Accrual of the latter can diminish the false alarm rates if relevant episodic information is recovered (i.e. source information in this study). Below, we further investigate this interaction across our LS and HS participants.
To examine how WMC measures impact on the global assessments of strength or familiarity of an item—often regarded as an automatic process— and the recovery of detailed episodic information — of thought to be driven by controlled recollective processes, we computed FA difference scores between our recent negative probes and new lures [FA(RN) – FA(NN)] at each of the seven response-deadlines. This measure allowed an unbiased measure of performance by factoring out participants’ bias to judge an item as a member of the study list (e.g., tendency to respond yes more often than no, regardless of the type of test probe). Hence, the obtained FA difference scores served to reflect a pure measure of tendency to false alarm due to high residual familiarity of the recent negative probes. Crucially, this measure enabled us to examine the differential impact of WMC measures on judgments based on familiarity and the accrual of detailed episodic (recollective) information, independent of a possible general tendency for LS participants to false alarm more than HS participants.
Figure 7 illustrates the FA difference scores for the average LS and HS data. Note that with this scaling, higher scores indicate a higher tendency to false alarm to the RN probes. The figure indicates that for both the HS and LS groups, the FA difference scores increase early in retrieval and then diminish later in retrieval. This non-monotonic pattern indicates that the information basis for the recognition memory judgments has shifted across retrieval and is consistent with predictions of dual-process theories of recognition memory: The early high FA rates indicate the contribution of familiarity (as the RN probe has been studied on previous trial, it has high residual familiarity compared to the new lure). The observed reduction in FA rates later in retrieval suggests accrual of new information that contributes to the recognition judgments, presumably reflecting source or list-specific information (i.e. the fact that the RN probe was studied on previous trial, or that it was not a member of the current study list)— recovered by what is viewed as a recollective process in dual-process models of recognition memory (see Yonelinas, 2002 for review). This biphasic nature of FA rates to RN probes reported in Figure 7 is consistent with previous SAT studies that have put familiarity and recollective information in opposition to one another (e.g., McElree, et al., 1999; McElree & Dosher, 1989; Öztekin & McElree, 2007; Zeelenberg, Wagenmakers, & Shiffrin, 2004; see McElree, et al., 1999 for a complete review).
Of particular interest here is the differences in magnitude or timing of the early familiarity-based judgments, as well as in the timing and accrual of recollective, episodic information across our HS and LS groups. Figure 7 shows that the diminish in FA rate occurs considerably later in time for the LS group compared to the HS group, suggesting that the accumulation of detailed episodic information— presumably recovered by controlled retrieval operations— might have started later in time and/or completed more slowly compared to the HS group. However, the figure further indicates that given enough time (about 3.2 sec), both groups reach comparable accuracy, suggesting that the two groups do not differ in the maximum level of accuracy reached after completion of the controlled retrieval process that accesses relevant episodic information.
To quantify this pattern, and to test the observed differences across the two groups in Figure 7, we fit the FA difference scores with a model that explicitly assumes that retrieval shifts from one source of information to another source across processing time. This type of model is formally equivalent to a two-process retrieval model. Such a two-process retrieval model for SAT has been proposed originally by Ratcliff (1980) and was later adapted to the exponential form (McElree & Dosher, 1989), which we apply to our current dataset:
Equation (2) states that during the initial retrieval period (δ1 < t < δ2), accuracy depends on accrual of one type of information, presumably familiarity information. During this initial period, accuracy is modeled by the top portion of Equation (2), a simple exponential approach to an asymptote (λ1). At time δ2, a second source of information starts to contribute to the recognition memory judgments. This source of information could arise from the output from a second process, e.g., a recollective operation that accesses detailed episodic information. The accrual of this second type of information leads to the change in retrieval, shifting the asymptote from λ1 to λ2. The bottom portion of Equation (2) states that response accuracy gradually shifts to the new asymptote (λ2) starting at time δ2.
In order to test whether WMC measures affect familiarity-based responses, or the recovery of more detailed episodic information, or both, we fit each participant’s and the average LS and HS data with Equation 2, and compared the asymptote (λ1, λ2) and intercept (δ1, δ2) parameters that correspond to each. Specifically, if the two groups differ in the timing or the magnitude of responses based on familiarity information, we should see differences in (respectively) the familiarity asymptote (λ1) and/or the familiarity intercepts (δ1) across HS and LS participants. If on the other hand, they differ in the timing or the magnitude of responses based on controlled processes that serve to recover detailed episodic (source) information, then we should see differences in the intercept parameter δ2, which indicates the point in time at which relevant episodic information first becomes available, and/or the asymptote parameter λ2, which reflects the maximum level of accuracy reached upon accrual of relevant episodic (source) information. Table 3 presents the parameter estimates from the fits of the average HS and LS data, and the individual participants’ data in each group. The smooth functions in Figure 7 indicate the fits to the average HS and LS data.
Before examining differences across the two groups, we initially conducted comparisons between the λ1 and λ2 estimates and between the δ1 and δ2 estimates across all our participants (collapsing over group), which examines whether the observed data pattern is consistent with the model depicted in Equation 2. Paired t-tests across these parameter estimates indicated that λ1, reflecting the familiarity asymptote, was reliably higher than λ2, the asymptote reflecting the level of accuracy reached after recovery of detailed episodic information [t(18) = 3.624, p < .002; with sixteen out of nineteen participants showed this ordering]. The higher λ1 indicates the non-monotonic nature of the functions, and it suggests an early intrusion of familiarity information, which is corrected later in retrieval, presumably with the accrual of specific episodic information. Additionally, δ1, reflecting the familiarity intercept (i.e. point in time when familiarity information first becomes available), occurred earlier in retrieval than δ2, reflecting the intercept of the accrual of specific episodic information [t(18) = −4.565, p < .001; with seventeen out of nineteen participants showed this ordering]. Hence, the data and the parameter estimates are consistent with the contribution of two types of information accrual to the recognition memory judgments, as represented in Equation 2.
Given the applicability of a two-process model, we next examined differences in the estimated parameters across HS and LS participants. These analyses enabled us to investigate whether WMC measures are related either to the magnitude and/or timing of familiarity-based early responses or to the magnitude and/or timing of episodic information.
Consistent with the average data and model fits illustrated in Figure 7, an independent-samples t-test comparison across the LS and HS participants indicated that neither the familiarity asymptote (λ1) or the familiarity intercept (δ1) estimates differ across the two groups (p > .1), suggesting that HS and LS groups did not differ in either point in time when they began to false alarm more to RNs than to NN probes or the degree to which they false alarmed to the two probe types.
Although the two groups do not appear to differ in their familiarity intercept (δ1) estimates, inspection of the fits of the average data in Figure 7 might suggest that rate at which familiarity information accrues is slower for the LS than HS group. Equation 2 makes the simplifying assumption that the rate at which familiarly information accrues is the same as the rate at which episodic information accrues (both estimated by β). This assumption facilitates the recovery of stable parameter estimates by limiting the tendencies for parameter tradeoffs that typically results from dual-process models assuming separate intercepts, rates, and asymptotes for the two forms of information (McElree & Dosher, 1989). While the rate estimates from the above fit did not differ across the two groups [t(17) = .18, p > .86], fitting the data with a common rate could conceal differences between the HS and LS groups. To test whether there might be differences in rates at which the two types of information accrue across the LS and HS groups, we fit a variant of the model in EQ. 2 that allowed for two rates (βs), one for familiarity information and for the late accruing episodic information. The quality of the fit for this model was substantially less than the simpler single rate model in EQ. 2. Specifically, for the average low-span data, R2 dropped from .99 to .91 in fits of the average low-span data, and from .99 to .86 in fits of the average high-span data, and this fit yielded unstable parameter estimates.
Hence, WMC measures did not have a measurable relationship to early familiarity-based responses. The λ2 asymptote parameter, reflecting the maximum level of accuracy reached after the accumulation of specific episodic information, also did not differ across the two groups. However, there was a reliable difference between the two groups in the timing of when detailed episodic information began to accrue. The δ2 intercept parameter, reflecting the onset of when the corrective influence of specific episodic information became evident, was significantly slower for LS participants compared to for HS participants [t(17) = −2.352, p < .031, d = 1.10]. In addition, a default Bayesian t-test (Rouder, Speckman, Sun, Morey & Iverson, 2009) using the “unit information prior” yielded a Bayes factor of 0.3, indicating that the data are approximately three times more likely to have come from the alternative than the null hypothesis. In the average data, δ2 estimates were 623 ms for the HS group as compared to 1220 ms for the LS group.
A single-process model could account for the nonmonotonic pattern of FA rates by assuming that recent negatives are rejected more slowly, and thus result in a longer time for reducing the false alarm rates for recent negatives (e.g., see Brockdoff & Lamberts, 2000 for such an account). However, such a model is less able to account for related nonmonotonic patterns with crossover effects engendered by manipulations of a single variable. For example, McElree et al., (1999) found that repeated study of a lure produces higher false alarm rates early in retrieval than a once-studied lure, but lower false alarm rates later in retrieval. Crucially, here, a single process model cannot explain the fact that WMC is related to the speed of rejecting lures, but not related to the speed with which studied items are retrieved. Hence, these findings suggest that WMC selectively impacts on the accrual of detailed episodic information later in retrieval, which maybe recovered by controlled recollective processes. We found no evidence to indicate that WMC differences are related to the processes responsible for early strength/familiarity-based responses.
In both the analyses of the composite list SAT functions (viz., averaged over serial position) and the individual serial position SAT functions, we found that LS participants had lower accuracy than HS participants. Reliably lower performance was evident in both the empirical measures of asymptotic accuracy (average of the two longest response-deadlines) and the associated asymptotic estimates derived from fits of the exponential retrieval equation (EQ. 1). These differences in retrieval accuracy are not altogether surprising, as participants were partitioned into HS and LS groups in part by the accuracy of their retrieval, albeit in recall rather than recognition tasks. Perhaps more notable is the absence of any concomitant differences in retrieval speed: Retrieval dynamics measures— the intercept parameter that indicates the point in time at which information first becomes available, and the rate parameter that reflects the rate of information accrual—did not differ across the two groups. Prima facie, the absence of dynamics differences suggests that WMC measures are not related to the speed with which participants can gain access to a representation in memory. However, this claim must be qualified by the analyses of the rejection of negative probes, discussed more fully below, which indicates that LS participants recovered detailed episodic information more slowly than HS subjects.
It is often assumed that recognition judgments reflect the contributions of two sources of information: an assessment of the overall quality of the match of a test probe to representations in memory (often viewed as an assessment of familiarity) and the recovery of detailed episodic or contextual information (such as source information). The latter has commonly been viewed as a recollective process in dual-process theories of recognition (see Yonelinas, 2002 for review). Timecourse investigations of memory retrieval have consistently found that familiarity information is available earlier than detailed episodic information (e.g., Hintzman & Curran, 1994; Hintzman, Cauton, & Levitin, 1998; McElree & Dosher, 1989; Öztekin & McElree, 2007; McElree, et al., 1999). We suggest that the absence of WMC-related differences in retrieval speed for cases when the probe matches an item from the study list indicates that HS and LS participants are equally facile at recovering global assessments of the familiarity of the test probe. As discussed in the next section, this claim is supported by analyses of the false alarm rates for negative probes, which likewise indicate that HS and LS participants do not differ in judgments based on familiarity. Crucially, it is these fast assessments of familiarity that primarily determine performances in the early (preasymptotic) portions of the SAT functions, and hence control the dynamics—the intercept and rate estimates—of the functions (see McElree et al, 1999). To the degree that the recovery of familiarity is a largely automatic process, the absence of an effect of WMC on familiarity-based judgments is consistent with the controlled attention view (e.g., Engle, 2002), which asserts that WMC effects occur during circumstances that require controlled processing. It is also consistent with previous research (e.g., Oberauer, 2005) suggesting that WMC is related to the efficiency of recollection, but not familiarity.
Manipulating the recency of negative probes enabled us to isolate and track the timecourse of responses based on familiarity and those based on the recovery of detailed episodic information. Consistent with our interpretation of the data from positive probes, the comparison of false alarm rates to recent and distant negatives across the time-course of recognition indicated that HS and LS groups did not differ in the magnitude or timing of the early familiarity-based judgments. That is, when recent negatives were scaled against distant negatives, thereby isolating differences in false alarms between the two negative probe types, HS and LS participants did not differ in the point in time when the increased false alarm rate for recent negatives first appeared (δ1 in EQ. 2) nor in the maximal false alarm rate observed (λ1 in EQ. 2).
Notably, however, we found that the point in time when participants began to correct the high false alarm rate for recent negatives was substantially earlier for the HS group than the LS group. The δ2 parameter in dual-process model (EQ. 2) estimates the point in time when detailed episodic information is first used to correct responses based on familiarity. Our data indicate that this inflection point in the false alarm functions was on average 600 ms later for LS than HS participants. Although neither group completely overcame the misleading effects of recent study, LS participants were eventually able to reduce their false alarm rate to level of HS participants.
The delayed inflection point for LS participants unequivocally indicates that those with low WMC are slower at using episodic information to correct for the misleading effects of familiarity. There are at least three potential reasons for why this might be the case: (1) LS participants might retrieve episodic information at slower rates than HS participants; (2) LS participants might retrieve episodic information at rates comparable to HS participants, but they take longer to initiate controlled retrieval operations; or (3) LS participants may initiate retrieval operations and recover episodic information at times comparable to HS participants, but they may take longer to resolve the conflict between that information and the high familiarity of a recent negative. Any of (1)–(3) in isolation or in combination is fully compatible with our findings, but we believe that explanations (2) and (3) are the most plausible when viewed in a wider context.
Although it is possible that the intrinsic rate at which LS participants retrieve episodic information is substantially slower than the rate for HS participants, the magnitude of the difference between HS and LS—on average 600 ms—far exceeds the differences that are typically observed from various experimental manipulations of factors that affect retrieval,3 or the differences that are observed between individuals in various memory tasks. Additionally, even if global assessments of familiarity and detailed episodic information are recovered with qualitatively different operations, both require assess to a memory representation, and, without a detailed characterization of the operations used to recover the two forms of information, it is unclear why the former but not the latter should vary with WMC. We believe that explanations (2) and (3) are more consistent with the evidence from a range of tasks (reviewed in Engle, 2002) suggesting that WMC effect are observed when executive/control processes are essential for performing at high accuracy levels. Finally, the view that WMC may reflect the ability to maintain context bindings (Oberauer, 2005) is also consistent with our findings, as recollective information may often rest on content-context bindings, whereas familiarity is commonly assumed to reflect the strength of an individual item independent of its context.
Explanations (2) and (3) do not assume that WMC predicts the speed of episodic retrieval per se, but rather that it is related to the speed with which participants initiate episodic retrieval operations to counter the misleading effects of familiarity (2) or the speed with which they resolve the conflict between familiarity and episodic information (3). Our data do not discriminate between (2) and (3), and so we will refer to them jointly as a controlled/strategic retrieval operations. Crucially, however, both accounts appear consistent with the controlled attention hypothesis (e.g., Engle, Kane & Tuholsky, 1999; Kane et al., 2001) that argues that low WMC reflects a limitation in attention allocation for the specific task goals, especially in the face of interference. These accounts attribute the WMC effects to controlled/strategic operations that are similar to those that may be involved in other tasks where WMC effects have been observed, such as the antisaccade task (Kane, Bleckley, Conway, & Engle, 2001), the stroop task (Kane and Engle, 2003), and the dichotic listening task (Conway, Cowan, Bunting, 2001).
We also note that Unsworth and Engle (2007) have suggested that the Operation Span task (one of the two tasks used in this study to determine WMC) may reflect two components, namely primary and secondary memory. Within such a framework, the differential effect of WMC on the accrual of recollective information would most plausibly have their locus in the secondary memory component, as this component is hypothesized to reflect controlled memory search (Unsworth & Engle, 2007).
Finally, we note that, beyond providing a specific explanation for the greater susceptibility of LS participants to misleading familiarity information—viz., inefficient deployment of controlled retrieval processes to successfully resolve interference—the controlled/strategic retrieval hypothesis also provides an explanation of two other properties of our data. First, it provides an explanation of the observed reduction in accuracy for LS participants for positive trials (discussed above). For the positive test probes, the recognition judgments can be based on both familiarity and episodic information. While the contribution of familiarity and recollective information to studied probes cannot be isolated (as they both contribute to tendency for a positive response), if LS participants rely less on episodic information (e.g., as a result of recollective information accruing slower), then naturally their asymptotic levels of performance on positive trails would be less than HS participants, who can more effectively make use of episodic recollective information in their memory judgments. Additionally, this account can also explain the observed group by serial position interaction. Our data indicated that the difference in accuracy across the two groups was more prominent for early serial positions in the study list. Recently studied items have higher familiarity than less recently studied items, so greater reliance on familiarity information will have less of an adverse effect. Additionally, episodic information might be more likely to be spontaneously recovered, without the need to engage in controlled/strategic retrieval operations, for recently studied items. Consistent with both claims, Öztekin and McElree (2007) found that the most recent three positions in 6-item lists comparable to the current study were immune to proactive interference effects on asymptotic accuracy.
The neural mechanisms that mediate interference resolution in the recent negative probe paradigm have been widely studied. Neuroimaging studies (e.g., Badre & Wagner, 2005; Jonides, Smith, Marshuetz & Koeppe, 1998; Jonides, Badre, Curtis, Thompson-Schill, & Smith, 2002; Jonides, Marshuetz, Smith, Reuter-Lorenz, & Koeppe, 2000; Nelson, Reuter-Lorenz, Sylvester, Jonides & Smith, 2003; Öztekin, Curtis & McElree, 2008; see Jonides & Nee, 2006 for review) have identified enhanced activation in left ventrolateral prefrontal cortex, namely the left inferior frontal gyrus (LIFG), for recent negative probes compared to unstudied probes in item recognition. It has also been shown that this effect is specific to the retrieval stage of the recent negative probe (D'esposito, Postle, Jonides & Smith, 1999). Additionally, patient work (e.g., Thompson-Schill, Jonides, Marshuetz, Smith, D'Esposito, Kan, & Swick, 2002) and repetitive transcranial magnetic stimulation (rTMS) investigations (e.g., Feredoes, Tononi, & Postle, 2006) have provided converging evidence for a direct role of LIFG in successful interference resolution in this paradigm.
Behavioral work suggests that resolving interference in the recent negative probes can be achieved via controlled retrieval processes that recover specific episodic information (such as source or list-specific information) (e.g., McElree & Dosher, 1989; Öztekin & McElree, 2007). In addition, LIFG has been implicated in strategic retrieval of episodic information in the absence of interference as well (e.g., Dobbins, Rice, Wagner, & Schacter, 2003), and patients with left prefrontal cortex lesions show deficits in other tasks that require the recovery of detailed episodic information, such as source memory (e.g., Duarte, Ranganath, & Knight, 2005). Furthermore, neural activation in this region was found to be modulated by the amount of successive retrieval operations carried out to recover temporal order information in a short-term judgment of recency task (Öztekin, McElree, Staresina & Davachi, 2008). Accordingly, it has been suggested that the role of LIFG in resolving interference in the recent negative probe paradigm may be supporting the controlled/strategic retrieval operations that access relevant episodic information that can successfully resolve interference (e.g., Öztekin, Curtis & McElree, 2008).
Our data indicate that LS and HS participants differed in the ability to deploy and use controlled/episodic retrieval operations necessary to resolve interference, but there was no evidence suggesting a measurable effect of WMC on the timing and magnitude of fast familiarity-based judgments. Hence, the current results provide converging evidence to studies that have implicated the importance of executive/controlled attention in yielding WMC effects and the greater susceptibility of LS individuals to interference, (see Engle, 2002 for overview), and provide further support for the contention that these effects may be mediated by the left ventrolateral prefrontal cortex (see also Kane & Engle, 2002; Rosen & Engle, 1997 for similar arguments regarding the role of the prefrontal cortex in modulating WMC effects). Future work investigating the role of this region in mediating WMC effects would be beneficial in advancing our understanding of the underlying mechanisms that lead to individual differences in WMC. In addition, the time-course pattern implicated in our study can be followed up by methods that have better temporal resolution than fMRI, such as EEG and MEG.
This research was supported by grants from the National Science Foundation (BCS-0236732) and National Institute of Health (HD056200) to B. McElree, and by a Graduate Scholarship from the American Psychological Foundation/Council of Graduate Departments of Psychology to I. Öztekin.
Publisher's Disclaimer: The following manuscript is the final accepted manuscript. It has not been subjected to the final copyediting, fact-checking, and proofreading required for formal publication. It is not the definitive, publisher-authenticated version. The American Psychological Association and its Council of Editors disclaim any responsibility or liabilities for errors or omissions of this manuscript version, any version derived from this manuscript by NIH, or other third parties. The published version is available at www.apa.org/pubs/journals/xlm.
1Participants completed a 30-minute practice session before the first experimental session to train for the SAT procedure. In addition, the trials at which participants took longer than 500 ms to respond, as well as trials where participants responded before the tone were excluded from analysis.
2Note that the illustrated SAT function is derived from fitting the average HS and LS data, rather than averaging over individual fits across participants.
3An exception might be the 500 ms differences in SAT intercept observed in the McElree and Dosher’s (1993) investigation of judgments of recency. However, those differences arose from a manipulation that varied the number of serial retrieval operations required to reach a judgment, from one to five hypothesized operations. As such, they are more properly viewed as arising from repetitive operations, rather than intrinsic differences in one type of operation.