|Home | About | Journals | Submit | Contact Us | Français|
Does incremental reinforcement learning influence recognition memory judgments? We examined this question by subtly altering the relative validity or availability of feedback in order to differentially reinforce old or new recognition judgments. Experiment 1 probabilistically and incorrectly indicated that either misses or false alarms were correct in the context of feedback that was otherwise accurate. Experiment 2 selectively withheld feedback for either misses or false alarms in the context of feedback that was otherwise present. Both manipulations caused prominent shifts of recognition memory decision criteria that remained for considerable periods even after feedback was altogether removed. Overall, these data demonstrate that incremental reinforcement learning mechanisms influence the degree of caution participants exercise when evaluating explicit memories.
Recognition criteria are hypothetical standards by which memory evidence is categorized as either sufficient or inadequate to warrant a judgment of prior encounter (viz. “old”) (Macmillan & Creelman, 1991) (see Figure 1). Although most memory researchers assume criteria are adaptive, there are few models of learning that might support such adaptability (however see Estes & Maddox 1995; Unkelbach, 2006) and to date, the vast majority of successful manipulations of memory decision criteria have involved explicit instructions given to observers about the relative preponderance of old and new items (Hirshman, 1998; Rotello et al., 2005; Strack & Foerster, 1995), or explicit warnings to avoid either errors of omission or commission (Azimian-Faridani & Wilding, 2006). These instructed criterion shifts are sometimes augmented with clear descriptions of monetary losses and gains attached to different response outcomes (payoff matrices) (Van Zandt, 2000) but in all of these cases observers consciously attempt to comply with instructions given their understanding of test list regularities or characteristics. What remains unclear is whether the decision criterion can adapt without an explicit or controlled strategy.
One candidate mechanism we propose that might enable adaptive positioning of criterion is incremental reinforcement learning, which is central for learning category distinctions in other non-recognition domains (e.g., Gluck & Bower, 1988; Poldrack et al., 2005). Such learning requires integrating trial-by-trial feedback outcomes and gradual re-mapping of different decisions onto different stimulus feature or feature combinations as a function of probabilistic reward likelihood (for reviews see Ashby & Maddox, 2005). Two category learning paradigms having this characteristic are information integration and probabilistic classification tasks. During both, the relationship between key stimulus features and appropriate decisions cannot be reduced to a simple explicit, verbalizable strategy because observers must classify the items based on complex combinations of multiple feature dimensions (e.g., a nonlinear combination of thickness and orientation of sinusoidal gratings), or because feedback is rendered probabilistically such that making the same judgment for a given repeated stimulus does not guarantee receiving the same feedback outcome on every trial (see also Ashby & O’Brien, 2007). Neuropsychological findings suggest that learning during these tasks heavily relies upon the integrity of the striatum, a basal ganglia structure linked to implicit procedural and habit learning (Knowlton et al., 1996; Saint-Cyr et al., 1988).
Although feedback-based changes in criteria have been frequently examined in perceptual judgments tasks (e.g., Dorfman & Biderman, 1971; Thomas, 1973), there are fundamental differences between perceptual classification tasks and the regulation of episodic recognition judgments. More specifically, in feedback-based category learning tasks it is assumed that the mapping between object features and category decisions are incrementally altered via trial-by-trial feedback learning. However, during episodic recognition tests the perceptual and semantic features of the probes are not diagnostic of the required categorical distinction, since the memory status of the probes, and the types of features each possesses, are orthogonal. Instead, incremental reinforcement learning, if successful, must alter the mapping between levels of retrieved memory evidence and recognition decisions and this represents a level of abstraction not found during perceptual classification learning. Additionally, observers cannot learn a reinforced response to each individual test item, because the items within a memory test are never repeated.
Perhaps consistent with the notable differences between episodic recognition and perceptual classification demands, evidence for the efficacy of feedback regulation of recognition criteria has been decidedly mixed. For example, studies by Estes and Maddox (1995), and Healy and Kubovy (1977) both failed to demonstrate criterion shifts during recognition paradigms that manipulated the base rates of studied items in conjunction with trial-based feedback procedures, although similar procedures easily produced shifts in their perceptual classification tasks. In contrast, Rhodes and Jacoby (2007) manipulated the relative item probabilities across two screen locations and found that in conjunction with trial-based feedback, subjects demonstrated different recognition criteria for the two locations. However, the criterion shifts were only prominent for observers explicitly aware of the location manipulation of target density, and the removal of the feedback greatly reduced the criterion difference. Critically, neither explicit awareness nor continued presence of the feedback reinforcement should be necessary if incremental reinforcement learning governed the effects. Using a different procedure, Verde and Rotello (2007) demonstrated an acquired criterion shift for separate halves of a test list containing well versus poorly encoded study items intermixed with lures. This design also used trial-based feedback, in addition to halting the test and providing performance summaries during testing. Although these researchers did not examine observer awareness of the test list characteristics, it is possible that the feedback and particularly the performance summaries may have explicitly alerted subjects to the fact that the well- and poorly-encoded study items were not distributed evenly across test halves. In total, although these designs importantly demonstrated criterion flexibility during the course of testing, they point towards a mechanism based on explicit awareness of test list regularities. Additionally, they confound a manipulation of the test list characteristics with the presence of trial-based feedback, which necessarily precludes assigning an exclusive role to the processing of feedback in the observed criterion shifts.
Despite the limited support for an incremental reinforcement learning mechanism governing episodic recognition, recognition decisions arguably share important similarities to feedback-based classification learning tasks. First, episodic information is often assumed to be multidimensional (Johnson et al., 1993; Yonelinas, 1994) such that the category “old” may depend upon complex combinations of different trace attributes in ways difficult to capture in a simple explicit response strategy. Second, under most measurement models of recognition, any decision criterion for responding will only yield positive feedback probabilistically because the evidence evoked by old and new items overlaps and cannot be fully separated by a simple criterion boundary (Macmillan & Creelman, 1991) (Figure 1). These two characteristics suggest that recognition judgments might be influenced by the same learning mechanisms shown to govern classification learning in information-integration and/or probabilistic classification tasks, provided such mechanisms are sensitive to abstract mnemonic evidence representations.
One study suggesting such a mechanism underlying learned criterion shifts was Han and Dobbins (2008), which used systematically misleading feedback in order to shift the relative criteria of two recognition groups. During the procedure, one group was given false positive feedback for errors of commission (false alarms) whilst the other was given false positive feedback for errors of omission (misses). All other feedback was correct. This design isolates any criterion shift solely to the nature of the feedback since the actual structure of the test lists remained equivalent across the groups. The manipulation shifted the relative criteria of the groups and this difference remained even when feedback became fully correct in the second test block of the design, suggesting a durable form of learning (cf. Rhodes & Jacoby 2007). Additionally, the majority of subjects did not report any perceived anomalies in the feedback itself post-test.
Although suggestive of incremental reinforcement learning, there were potential drawbacks to Han and Dobbins (2008). First, the feedback was fully deterministic in that every error of a particular kind received the false positive feedback. For example, in the condition designed to instill a lax criterion, all false alarms were incorrectly cued as correct responses. This meant that no “old” response ever received a negative feedback outcome for this group. Such deterministic feedback procedures are known to shift learning towards explicit rule use and away from incremental reinforcement learning, with probabilistic versus deterministic feedback conditions potentially engaging different neural learning systems (e.g., Frank & Kong, 2008; Mehta & Williams, 2002; Robinson et al., 1980). Second, the design relied exclusively upon false positive feedback in order to shift the criteria. This approach was chosen because it was assumed that subjects would be uncertain during the commission of errors and hence the manipulation of the feedback validity would be difficult to detect. Nonetheless, this also potentially weds the manipulation exclusively to surprising event outcomes. Although the reinforcement literature suggests this may be particularly useful for learning, as it should evoke considerable “positive prediction error” (Schultz, 2000), it may also increase the likelihood of explicit awareness of the manipulation. Finally, the design of Han and Dobbins (2008) failed to demonstrate that the criterion shifts survived complete removal of the feedback. Since a hallmark of successful incremental reinforcement learning is the perseverance of decision tendencies in the absence of any form of external reinforcement (e.g., Cincotta & Seger, 2007), it is critical to demonstrate that the acquired memory criterion shifts remain for some notable period absent feedback. Thus the goal of the current study was to examine whether feedback based memory criterion shifts demonstrated three key properties consistent with incremental reinforcement learning processes, namely; a) sensitivity to probabilistic feedback contingencies, b) not solely dependent upon surprising false positive outcomes, and c) persistence in the complete absence of supporting feedback.
Sixty-four Duke undergraduates (30 in Experiment 1; 34 in Experiment 2) participated in return for partial course credit. Informed consent was obtained as required by the human subjects review committee of Duke University. Experiment 1 administered a post-experiment questionnaire asking about the feedback procedures to assess participant awareness of the manipulation. One participant who correctly believed the feedback to be inaccurate or skewed was removed from Experiment 1.
In Experiments 1 and 2, four lists of 200 words (average 7.09 letters, 2.34 syllables, with a Kucera-Francis corpus frequency of 8.85) items (100 studied- and 100 lure-items for each cycle) were constructed for use in sequential study/test cycles. List and condition assignment was randomized for each participant. During study, participants rated words on the computer screen for the number of syllables (“Counting syllables 1/2/3/more than 4”) within a limited amount of time (2 sec), immediately followed by a forewarned memory test. Participants were not forewarned that feedback would be present during testing. In each test, studied and lure items were randomly intermixed and presented serially for self-paced OLD/NEW recognition judgments. Following the old/new response, the participant rated confidence on a scale of 1–3 (“Confidence? Unsure =1 2 3= Certain”). Feedback, when given, immediately followed the confidence report. The key and only manipulation across experiments was the nature of the feedback.
In Experiment 1, the validity of the feedback given to errors was probabilistically altered in order to tacitly encourage lax or strict responding. More specifically, a random portion of a particular type of error (misses or false-alarms) was incorrectly reported as “correct”. Participants were correctly informed during correct responses (hits and correct rejections). Consistent with incremental reinforcement learning principles, the general expectation was that participants would learn to favor the decision more often linked to a positive feedback outcome (“correct” feedback indications) and/or would learn to avoid the response option that more often led to negative outcomes (“incorrect” feedback indications). The false-feedback manipulation was restricted to errors since they are typically of low confidence and hence incorrect feedback should not raise suspicions on the part of the participants. In Experiment 2 the balance of positive/negative feedback was instead shifted by simply omitting correct, negative feedback for one or the other class of error (availability manipulation). The analyses employed the detection theoretic estimate of accuracy, Az (Rotello et al., 2008), and criterion, c.
The goal of Experiment 1 was to determine if a probabilistic variant of the false-feedback procedure would induce criterion shifts. Half of the participants were given false positive feedback “That is CORRECT” for approximately 70 percent of their incorrect “Old” classifications of new items (false alarms). All other responses received correct feedback. We refer to this as the Lax condition (L). For the other half of participants, approximately 70 percent of incorrect “New” classifications of old items (miss) received false positive feedback (S - Strict condition). Each group received the same manipulation (L or S) on the first two successive study/test cycles. Following this, two additional study/test cycles were given with no feedback whatsoever during testing (N - no feedback). This allowed us to assess durability of criterion learning in the absence of any external reinforcement. Thus there were two groups, one receiving LLNN feedback conditions and the other SSNN.
A two-way ANOVA for Az with factors of Group (LLNN or SSNN) and Test (First, Second, Third or Fourth) yielded no main effect of Group (p > .84) or Test (p > .09), and no evidence for an interaction between Group and Test (p > .32) suggesting that the groups displayed similar accuracy during each test (Table 1).
ANOVA for decision criteria c with factors of Group and Test revealed a main effect of Group (F(1,28) = 11.85, p < .01, η2p = .30) with the SSNN group demonstrating a more conservative criterion (mean c = .24) than the LLNN group (−.03). There was no main effect of Test (p > .19) and no evidence for an interaction between Group and Test (p > .65) suggesting a persistent difference in criterion across the two groups regardless of test. Pair-wise comparisons of the groups’ criteria during each of the four separate tests were all significant (t(28) = 2.05, 3.37, 2.57, & 2.48 respectively), although the smallest numerical difference in criteria across the groups was during Test 1 (Table 1).
The probabilistic nature of current feedback manipulation would have precluded the belief that a given type of response never resulted in errors, yet a relative shift was nonetheless induced. Furthermore, the no-feedback condition ruled out interpretations that necessarily rely on the continued presence of feedback. For example, if the criterion shift represented a trial-to-trial win-stay strategy (Frank & Kong, 2008) on the part of the participants, removing feedback should have eliminated the relative criterion differences. Finally, we parsed Tests 1 and 2 into sub-blocks (cumulative blocks of 40 trials (40, 80, 120, 160 & 200 trials)) to examine the emergence of the relative shift of criterion c in a finer grained manner within each test. Test 1 yielded a significant interaction (F(4,92) = 2.86, p < .05, η2p = .11) between Group and Cumulative Sub-block, reflecting an increasingly larger criterion group difference as the total amount of false feedback accumulated within the test. The same analysis during Test 2 yielded a main effect of Group (F(1,23)=7,52, p < .05, η2p = .25) and no evidence for the interaction between Group and Sub-block (p = .82) suggesting that the relative difference acquired during Test 1 had already reached asymptote and was carried largely intact into Test 2. Partially supporting this conclusion, when the criterion measures for each group were compared across the tests the SSNN group showed no difference between Tests 1 and 2 (p > .96), although the LLNN group did show a more liberal criterion in Test 2 versus Test 1 (t(14) = 2.48, p < .05). Overall, these findings suggest that some continued learning may take place across Tests1 and 2, but that the vast majority of criterion learning has occurred prior to the conclusion of Test 1, as indicated by a failure to find a Group by Cumulative Sub-block interaction in the second test. These findings are consistent with an incrementally learned recognition decision tendency1.
Based on the prior findings it could be argued that it was the unexpectedly positive outcomes of the manipulated feedback trials that are particularly important for the learning (e.g., Butterfield & Metcalfe, 2001; Schmidt et al., 1989). While this would not preclude a core role for incremental reinforcement learning, it should nonetheless be possible to demonstrate adaptive criteria whenever the balance of positive to negative reinforcement systematically favors one decision. Experiment 2 differentially reinforced the judgments by withholding feedback for certain types of errors. Thus from the subject’s perspective some small portion of trials simply failed to elicit feedback. These neutral, uninformative feedback trials should not reflect unexpectedly positive (or negative) outcomes, but they nonetheless would serve to shift the balance of reinforcement for the two decision types. For half the participants, the first two tests selectively encouraged lax responding by eliminating the negative feedback for their false alarms. All other response types were correctly identified by the feedback (Lax condition). For the other half of participants, their miss responses received no feedback (Strict condition). Thus for each group, one response type was associated with positive and negative outcomes whereas the other was associated with positive and neutral (no feedback) outcomes. Again, all feedback was eliminated during tests 3 and 4 (LLNN or SSNN).
ANOVA for Az with factors of Group and Test yielded a significant main effect only of Test (F(3,96)=6.29, p<.001, η2p = .16), with accuracy gradually declining across the entire experiment. Importantly, there was no interaction between Group and Test (p > .30) (Table 2).
ANOVA for c with factors of Group and Test revealed a main effect of Group (F(1,32) = 11.20, p < .01, η2p = .25) (.34 vs. −.05 for SSNN vs. LLNN group, respectively). There was no main effect of Test (p > .11) or interaction between Group and Test (p > .07). Pair-wise comparisons of the groups’ criteria at each of the four separate tests were all significant (t(32) = 2.48, 3.61, 3.50, & 2.43 respectively), although again, the smallest numerical difference in criteria across the groups was during the first test (Table 2). To our knowledge, this is the first demonstration that the selective availability of feedback can be used to guide memory decision criterion placement, or criterion placement in general. Again, a finer grained analysis by cumulative test sub-blocks revealed a clear interaction within Test 1 between Group and Sub-Block (F(4,112) = 10.98, p < .01, η2p = .28) suggesting a gradual acquisition of the learned criterion as withheld feedback accrued. The same analysis during Test 2 merely approached significance (F(4,116) = 2.40, p = .053, η2p = .08) suggesting a small increase in criterion differences as further withheld feedback accrued. When Tests 1 and 2 criterion measures were directly compared for each Group, the differences were not significant across the tests for either the SSNN Group (p > .26) or the LLNN Group (p > .09). Similar to Experiment 1, this overall pattern suggests that the bulk of criterion learning occurred during the initial test, although some small degree of additional learning or relearning may have occurred during the second test.
A fundamental role for incremental reinforcement learning in episodic memory judgments has not been suggested in humans (c.f., Wixted and Gaitan (2002) in non-human animals). Although it is difficult if not impossible to completely rule out a role for explicitly maintained strategies in criterion shift experiments (Unkelbach, 2006), and using awareness questionnaires potentially taps only a portion of subject awareness (e.g., Merikle & Reingold, 1991), it is noteworthy that none of the participants included here reported awareness of the biased nature of the feedback manipulations. Furthermore, the current findings are quite similar to other classification learning phenomena that do not require explicit awareness of reward contingencies for learning. In total, these considerations support the notion that the current effects do not require participants to formulate explicit, rule-based strategies in reaction to the biased feedback manipulations, and they clearly demonstrate that no alteration in the test materials themselves is necessary in order to induce a criterion shift (cf. Rhodes & Jacoby 2007; Verde & Rotello 2007). Instead, an incremental reinforcement learning framework suggests that the current manipulations led to shifted decision preferences based on the relation of positive/negative outcomes and levels of recognition evidence.
The current data add to early evidence suggesting different routes to regulating episodic recognition decisions. The first, which has been extensively documented in the recognition literature, is an explicit strategy on the part of the observers typically formed following overt warnings or instructions. Furthermore, in those cases where feedback accompanied a detectable criterion shift for altered lists (e.g., Rhodes & Jacoby 2007), the feedback likely alerted the subjects to the list manipulation, and thus likely represents a similar strategy to those adopted by subjects following explicit instructions or warnings about the riskiness of certain responses. In contrast, the current findings suggest that subjects also appear to develop, through reinforcement learning, incrementally acquired tendencies that durably change the mapping of memory evidence types or levels onto decisions. Similar to the acquisition habits in other domains, these learned criterion shifts may not require subjects to maintain the intention of responding liberally or conservatively across the multiple trials of the test. Because current models of episodic recognition judgment typically do not assume two independent or partially independent decision influences, future work directly contrasting and attempting to doubly dissociate these putative mechanisms using various methods (i.e., behavioral, functional neuroimaging, special populations) holds promise for further elucidating the mechanisms that regulate the translation of memory content into judgments.
We declare that this manuscript is original, has not been published before and is not under currently being considered for publication elsewhere.
1Although we do not present the data due to space considerations, individual variability in the number of false feedback trials modulated the size of induced criterion shifts. For example, a median split of observers receiving high versus low amounts of manipulated feedback demonstrated that the criterion shift was more prominent for subjects receiving high amounts of manipulated feedback during Experiments 1 and 2. It was not significant when the low subgroups were compared at each test level. Of course this outcome is expected if the feedback manipulation is the cause of the shift, and variability in the composition of the feedback across subjects is inherent in all designs that use individual performance feedback to modulate behavior (e.g., Rhodes & Jacoby, 2007).