|Home | About | Journals | Submit | Contact Us | Français|
Categorical learning is dependent on feedback. Here, we compare how positive and negative feedback affect information-integration (II) category learning. Ashby and O’ Brien (2007) demonstrated that both positive and negative feedback are required to solve II category problems when feedback was not guaranteed on each trial, and reported no differences between positive-only and negative-only feedback in terms of their effectiveness. We followed up on these findings and conducted three experiments in which participants completed 2,400 II categorization trials across three days under one of three conditions: positive feedback only (PFB), negative feedback only (NFB), or both types of feedback (CP; Control Partial). An adaptive algorithm controlled the amount of feedback given to each group so that feedback was nearly equated. Using different feedback control procedures, Experiments 1 and 2 demonstrated that participants in the NFB and CP group were able to engage II learning strategies, whereas the PFB group was not. Additionally, the NFB group was able to achieve significantly higher accuracy than the PFB group by day 3. Experiment 3 revealed that these differences remained even when we equated the information received on feedback trials. Thus, negative feedback appears significantly more effective for learning II category structures. This suggests that the human implicit learning system may be capable of learning in the absence of positive feedback.
Feedback plays a critical role in many forms of learning, so it is not surprising that feedback optimization has been the subject of intense investigation (Abe et al., 2011; Ashby & O’ Brien, 2007; Brackbill & O’Hara, 1957; Dunn, Newell, & Kalish, 2012; Edmunds et al., 2015; Galea, Mallia, Rothwell, & Diedrichsen, 2015; Meyer & Offenbach, 1962; Maddox, Love, Glass, & Filoteo, 2008; Wächter et al., 2009). For example, delaying feedback (Dunn, Newell, & Kalish, 2012; Maddox, Ashby, & Bohil, 2003) and adding reward to feedback (Abe et al., 2011; Freedberg, Schacherer, & Hazeltine, 2016; Nikooyan & Ahmed, 2015) impact learning. Of particular interest is the comparative effectiveness of positive and negative feedback to learning (Abe et al., 2011; Brackbill & O’Hara, 1957; Frank, Seeberger, & O’Reilly, 2004; Galea, Mallia, Rothwell, & Diedrichsen, 2015; Meyer & Offenbach, 1962; Wächter et al., 2009). Here, we define positive feedback as a signal that a task has been performed correctly and negative feedback as a signal that a task has been performed incorrectly.
In terms of category learning, several studies (Brackbill & O’Hara, 1957; Meyer & Offenbach, 1962) indicate a stronger influence of negative feedback (such as punishments) over positive feedback (such as rewards) when solving Rule-Based (RB) category problems, which can be solved by applying a verbal strategy (Ashby & O’ Brien, 2005). However, it is less clear how effective positive and negative feedback are when solving information-integration (II) category problems (Ashby & O’Brien, 2007). II category learning involves the pre-decisional (non-verbalizable) synthesis of two or more pieces of information (Ashby & O’ Brien, 2005). Consider the RB and II category structure examples in Figure 1. The “discs” differ in terms of their bar frequency (x-axis) and bar orientation (y-axis). The left panel is an example of a RB category structure. The optimal linear bound (the black line) represents the best possible method for dividing the stimuli into categories and involves paying attention to the bar frequency and ignoring the bar orientation. The right panel is an example of an II category structure. It uses the same stimuli, but the optimal linear bound is now a diagonal. Here, both dimensions must be used on each trial in order to make an accurate category judgment. This category structure is difficult for participants to describe even when they perform with high accuracy1.
Previously, Ashby and O’Brien (2007) examined the effectiveness of positive and negative feedback to II learning under four different feedback conditions: 1) partial positive feedback only (PFB), 2) partial negative feedback only (NFB), 3) partial positive and negative feedback (CP; control partial), and 4) full negative and positive feedback (CF; control full). To control the rate of feedback, the researchers employed an adaptive algorithm that adjusted feedback based on each participant’s error rate. Therefore, the PFB, NFB, and CP groups received roughly equivalent feedback frequencies during the course of the experiment. Moreover, the researchers told participants that on trials where they did not receive feedback, they should not assume that they were right or wrong. Thus, participants were instructed to use only the feedback trials to guide their decisions. Note that when there are just two categories, as in Ashby and O’Brien (2007) (and the present study), positive and negative feedback provide the same amount of information in the context of a single-trial; both indicate what the correct answer should have been. However, it is possible that this information is more or less useful on correct than incorrect trials, or that positive and negative feedback engage different learning systems that are differentially suited for encoding II categories. The primary finding from Ashby and O’Brien’s (2007) study was that II learning was only observed in the groups that received both types of feedback. Overall, the researchers did not observe a significant difference between the PFB and the NFB groups nor did they observe significant II learning in either the PFB or NFB groups.
Here, we return to this issue and evaluate two possibilities regarding the utility of positive and negative feedback in shaping behavior, as previously discussed by Kubanek, Snyder, and Abrams (2015). The first possibility is that positive and negative feedback are equal reinforcers in terms of magnitude, but differ in the sign of their effect of behavioral frequency (Thorndike, 1911). This hypothesis predicts that we should find an equal benefit for positive and negative feedback in supporting II learning. A second possibility is that positive and negative reinforcement represent distinct influences on behavior (Yechiam & Hochman, 2013). In contrast, this hypothesis predicts an asymmetrical influence of positive and negative feedback (as in Abe et al., 2011, Wächter et al., 2009, and Galea et al., 2015); one type of feedback may be more useful than the other. Note, that when feedback is guaranteed during categorical learning, one may expect a mutual benefit of positive and negative feedback. However, when feedback is ambiguous on some trials (when information is limited), it is possible that one type of feedback may be more useful than the other, or that they may be mutually beneficial, as in the case of Ashby & O’Brien’s (2007) experiment.
Brackbill and O’Hara (1957) and Meyer and Offenbach (1962) demonstrated a clear advantage for negative feedback over positive feedback in solving RB category problems. Thus, as a starting point, we expect that participants who receive only negative feedback will demonstrate significantly stronger II learning than participants only receiving positive feedback, consistent with the asymmetry hypothesis.
However, this hypothesis runs counter to the conclusions of Ashby and O’Brien (2007). It is important to note that Ashby and O’Brien used an II category structure where each category was defined as a bivariate normal distribution and the categories partially overlapped (see Figure 2, left panel). This category structure has three consequences. The first is that the optimal accuracy that could be achieved by obeying the optimal-linear bound was 86%. Second, the category structure included items that were distant from the category boundary (to illustrate this we have imposed a line orthogonal to the optimal linear bound in each panel of Figure 2). The consequence of including these items is that these trials can be solved using RB strategies because they are sufficiently far from the optimal bound. Thus, these trials may act as “lures” to initiate a RB strategy. Third, the bivariate overlapping distribution of the categories may have hindered the ability of the PFB and NFB groups to achieve II learning. The optimal accuracy that could be achieved by the best II strategy was 86%, but the best RB strategy was satisfactory enough to yield an accuracy of 77.8%. This difference (8.2%) may not have been sufficiently compelling to promote the abandonment of the default RB strategy. Thus, it is possible that the pattern of results found by Ashby and O’Brien (2007) were shaped by the choice of category structure.
To resolve these issues, we employed a modified version of Ashby and O’ Brien’s (2007) II category learning paradigm. First, we modified the category structure so that the two categories were non-overlapping. In this way, the optimal II strategy would produce the correct response on 100% of trials. Thus, the difference between the optimal RB strategy (81.6%) and the II strategy (100%) is relatively large, to maximize the incentive to abandon the RB strategy. Second, we eliminated trials that were further from the optimal linear bound. The left panel of Figure 2 shows the category structure used by Ashby and O’ Brien (2007). Although most trials are concentrated around the optimal linear bound (denoted by the solid diagonal line) many items are distant from the bound and therefore relatively easy. Rule-based strategies generally provide the correct answer for these stimuli and fail for the more difficult ones. Therefore, we opted to exclude the easier items (see Figure 2, middle and right panel) to promote II learning while discouraging RB learning.
Experiment 1 tested the hypothesis that negative feedback would benefit II category learning more than positive feedback when the category structure minimized the effectiveness of rule-based strategies. As a starting point, we used the same adaptive algorithm used by Ashby and O’Brien (2007), with the exception that the category structure was non-overlapping without stimuli that were 0.3 diagonal units beyond the optimal linear bound (See Figure 2, middle panel).
Fifteen participants were recruited from the University of Texas at Austin community in accordance with the university’s institutional review board. Participants were randomly assigned to one of three conditions: PFB, NFB, or CP. All participants had normal to corrected-to-normal vision and were paid $7 per session.
On each trial, participants were shown a line that varied along two dimensions: length and orientation. Stimuli sets were pre-generated by drawing 10 sets of 80 random values of arbitrary units from two distributions (Table 1). Each set represented one block of trials, and the presentation order of the blocks were randomized between participants. To generate a line stimulus, the orientation value was converted to radians by applying a scaling factor of π/500 (see Ashby, Maddox, & Bohil, 2002). The length value represented the length of the generated line in screen pixels. Unlike Ashby and O’Brien (2007), the large positive covariance between the two dimensions ensured that stimuli represented values close to the optimal linear decision bound of y = x. The rationale for this strategy is that trials that exist on the extreme ends of each category are easier to categorize because one dimension becomes increasingly more important than the other. For instance, if a trial stimulus has an orientation of 90 degrees then there is a significantly greater chance that it belongs to category A than a stimulus that has an orientation of 45 degrees. Likewise, a stimulus with a length of 350 has a significantly greater chance of belonging to category B than a stimulus with a frequency of 175. Therefore, we excluded these trials.
Participants in the positive feedback (PFB) condition only received positive feedback. Participants in the negative feedback (NFB) condition only received negative feedback. Finally, participants in the control condition (CP) received both types of feedback (positive and negative) on ~26% of trials. For all groups, the overall proportion of feedback was approximately 27%. The PFB condition never received feedback after an incorrect response and the NFB condition never received feedback after a correct response.
To control the proportion of feedback trials across sessions and conditions, an adaptive algorithm was used (Eq. 1). The algorithm, developed by Ashby and O’ Brien (2007) was designed to roughly equate feedback between all groups. Whereas the NFB group was given feedback on 80% of incorrect trials, the PFB group was given feedback on trials according to the following algorithm:
where P is probability and Q is proportion. The main function of this algorithm is to decrease the probability of feedback for the PFB group as performance improves (see the supplementary methods section for a graphical representation)2. In the CP condition, feedback was given at a rate which would provide the same amount of feedback as the NFB and PFB conditions. The probability of feedback on each trial was P(Feedback) = 0.8Q(errors on last 50 trials). During the first 50 trials, the error rate was fixed to 0.5.
Participants completed three sessions, each with 800 trials. Participants were instructed to classify the line stimuli into two categories. The PFB group was instructed that on a portion of trials they would receive positive feedback that would be helpful towards making their judgments. The NFB group was instructed that on a portion of trials they would receive negative feedback that would indicate that their selection was wrong. The CP group was told that they would receive both types of feedback. On no feedback trials, participants in all conditions were instructed not to assume that they were right or wrong, but to use their partial feedback to guide their decisions.
Each trial began with the presentation of a single line that remained on the screen until the participant made their judgment. Participants responded on a standard keyboard by pressing the ‘z’ key if they believed the stimulus belonged to category A and the ‘/’ button if they believed the stimulus belonged to category B. The stimulus and feedback were presented in white on a black background. Positive feedback took the form of the phrase “Correct that was an A” and negative feedback took the form of the phrase “Error that was a B.” Each trial began with a 500 ms fixation cross, followed by a response terminated stimulus presentation, followed by 1000 ms of feedback (present or absent), and a 500 ms inter-trial interval (ITI). Blocks included 100 trials that were separated by participant-controlled rest screens. Participants completed 8 blocks in each session. Sessions were usually completed on consecutive days, with no more than 3 days between consecutive sessions.
We fit seven classes of decision bound models to each participant’s data. Four of the models assume a rule-based strategy: 1) Conjunctive A, 2) Conjunctive B, 3) Unidimensional-Length, and 4) Unidimensional-Orientation. The two conjunctive models assume that the participant sets a criterion along the length dimension that divides the stimuli into short and long bars and a criterion along orientation that divided the stimuli into shallow and steep bars. Conjunctive A assumes that the participant classifies stimuli into category A if they were short and shallow, and into category B otherwise. Conversely, Conjunctive B assumes the participant classifies stimuli into category B if they were long and steep, and into category A otherwise. The Unidimensional strategies assume that the participant ignores one dimension when making their judgments. Unidimensional-Length assumes that the participant sets a criterion on bar width and categorized based on that value. Similarly, Unidimensional-Orient assumes that the participant sets a criterion on orientation and categorized based on that value (see Figure 3 for categorization strategy examples). A fifth model assumes that the participant responds randomly (the random responder model). Because our primary interest was in how learning differed between groups, we excluded participants 1) if the best fitting model was the random responder, and 2) if accuracy scores on day 3 did not exceed 50%.
The final two models assumed that the participant uses an II strategy when making their judgments. The optimum general linear classifier (OPT-GLC) represents the most accurate strategy for dividing the stimuli and is denoted by the gray line in the right panel in Figure 3. The suboptimal GLC represents a slightly inferior, but still nonverbal strategy for dividing the stimuli based on the angle of the diagonal line. Thus, whereas the optimal GLC strategy assumes a 45° angle in the diagonal linear bound, the sub-optimal GLC assumes a diagonal that deviates slightly from 45 degrees3. The best model fit for each participant was determined by estimating model parameters relevant to each strategy and using the method of maximum likelihood. Maximum likelihood was defined as the smallest Bayesian information criterion (BIC; Schwarz, 1978) reached for each model fit. BIC was calculated by the following equation:
where N equals sample size, r is the number of free parameters, and L is the likelihood of the model given the data.
Trial-level raw data for Experiments 1–3 available at http://psychology.uiowa.edu/hazelab/archived-data.
To evaluate the algorithm’s feedback rate, we submitted the proportion of feedback trials to a 3 (condition: PFB, NFB, CP) × 3 (day) repeated measures ANOVA. There was a significant main effect of day [F(2, 24) = 14.6, p < 0.001, ], as well as a marginally significant day × condition interaction [F(4, 24) = 2.61, p = 0.06, ]. This interaction indicates that the NFB and CP groups experienced a significant reduction in feedback across days (NFB; day 1: 31%; day 2: 23%, day 3: 21%; CP; day 1: 29%, day 2: 28%, day 3: 21%), whereas the PFB group did not (PFB; day 1: 30%, day 2: 29%, day 3: 28%). There was no main effect of condition (F<1). Post-hoc comparisons revealed no significant pairwise differences between the groups (|t|<1). These results indicate that all groups received feedback on a roughly similar proportion of trials, but that the NFB and CP groups received slightly less feedback than the PFB group on days 2 and 3. Although the NFB group received slightly less feedback than the PFB group, the NFB group demonstrated greater improvements in accuracy4.
We performed a pairwise Wilcoxon sign test on the 24 100-trial blocks of the experiment between all groups, similar to Ashby and O’Brien’s (2007) analysis (Figure 4, left panel). There was a significant difference between CP and PFB (sign test: S = 19 of 24 blocks, p < 0.01), but not between NFB and PFB (sign test: S = 15 of 24 blocks, p = 0.31), nor between NFB and CP (sign test: S = 12 of 24 blocks, p = 1.00). Overall, these results indicate that there was only a significant pairwise difference between the CP and PFB groups, indicating that the CP group outperformed the PFB group.
The left panel of Figure 5 reveals the results of the modeling analysis for Experiment 1. The category boundaries for day 3 were modeled for each participant separately. The best fit model for each participant indicated that an II strategy was used by 3 participants in the CP group, 3 participants in the NFB group, and 1 participant in the PFB group.
Although the NFB group did not significantly outperform the PFB group, there was a trend suggesting that negative feedback led to more learning than positive feedback, and participants receiving both types of feedback performed no better than participants receiving only negative feedback. Moreover, the two groups appeared to prefer different strategies. While 3 out of 5 participants receiving only negative feedback engaged an II strategy, only 1 of the participants receiving only positive feedback did so. Thus, although we did not find differences in accuracy between the PFB and NFB groups, our results suggest that negative feedback may be more helpful towards promoting II learning than positive feedback
Although not predicted, the bulk of the NFB group’s learning improvements were observed between days (offline changes; changes in performance between consecutive sessions) and not within each day (online changes; changes in performance during task engagement) (see Figure 6, top panels). To confirm this impression, we conducted two additional analyses. First, we submitted within-day learning scores (defined as accuracy on block 8 minus accuracy on block 1 for each day) to a group (PFB, NFB, CP) by day repeated-measures ANOVA. This revealed a marginally significant main effect of day [F(2, 42) = 3.170, p = 0.06, ], but no main effect of group (F<1). The interaction between group and day, however, was marginally significant [F(4, 42) = 2.490, p = 0.07, ]. These results suggest that within-day accuracy changes were lower for the NFB group on days 2 and 3, and that accuracy improvements decreased across days (Figure 6, top middle panel).
Second, we submitted between-day scores (defined as accuracy on the first block of the following day minus accuracy on the last block of the previous day) to a group by day (day 2 minus day 1, day 3 minus day 2) repeated-measures ANOVA. The results revealed no main effect of day, and a no interaction (Fs<1). The main effect of group, however, was marginally significant [F(2, 12) = 3.545, p = 0.06, ]. Post-hoc comparisons revealed that NFB differed marginally from PFB (p = 0.06), CP did not differ from PFB (p = 0.741), nor did NFB differ from CP (p = 0.199). This analysis showed that the NFB group experienced larger between-day improvements than the PFB group. Thus, it is possible that negative feedback may engage offline processes not engaged by positive feedback. However, because we did not identify strong behavioral differences between groups, the findings are only suggestive. To resolve this ambiguity, Experiment 2 used the same category structure from Experiment 1, but substituted a more precise method for equating feedback, and included three additional participants to each group to increase statistical power.
Despite the fact that the NFB group was given less feedback than the PFB group, Experiment 1 suggested a trend towards more successful engagement of II strategies and higher accuracies for the NFB group. Therefore, in Experiment 2 we used an alternative algorithm to more precisely equate feedback between groups. Additionally, we included eight participants in each group (a total of 24 participants) to increase our power to detect a potential difference between groups. We hypothesized that we would detect stronger learning for the NFB group over the PFB group. Additionally, since Experiment 1 suggested that negative feedback leads to greater offline changes in category learning than positive feedback, we predicted that offline changes in accuracy would be greater for the NFB group over the PFB group.
Thirty participants were recruited from the University of Iowa community in accordance with the universities institutional review board. Six participants were excluded for poor performance or if they were classified as a random responder by our modeling analysis (1, for PFB, 3 for NFB, and 2 for CP). Participants were randomly assigned to one of three conditions: PFB, NFB, or CP. Eight participants were assigned to each group and balanced based on age and sex (PFB: average age = 24.89±4.49, 4 females; NFB: average age = 24.13±4.48, 4 females; CP: average age = 22.85±4.69, 5 females). All participants had normal or corrected-to-normal vision and were paid $10 per session.
Participants were shown a Gabor patch that varied along two dimensions: frequency and orientation. On each trial a random value for each dimension was generated and combined to form a Gabor patch. The orientation was free to rotate between 0° (completely vertical lines) and 90° (completely horizontal lines). Frequency was free to vary between 0.02 and 0.10 cycles per degree. Table 2 details the characteristics for the category structure used in Experiments 2 and 3. The right panel of Figure 2 represents 400 randomly drawn trials from each category distribution. As in Experiment 1, items that extended beyond a distance of 0.3 diagonal units perpendicular to the optimal linear bound were not presented to participants (see Figure 2 for a comparison between category structures)5.
Participants in the positive feedback (PFB) condition only received positive feedback. Participants in the negative feedback (NFB) condition only received negative feedback, and participants in the control condition (CP) received both types of feedback (positive and negative) on ~20% of trials. For all groups, the overall proportion of feedback was approximately 20%. Furthermore, for the CP condition, there was the constraint that equal amounts of positive and negative feedback be given. The PFB condition received no feedback after an incorrect response and the NFB condition received no feedback after a correct response.
To control the proportion of feedback trials across session and condition, we used an adaptive algorithm (Eq. 3). Although Ashby and O’ Brien (2007) were able to roughly equate feedback between the PFB and NFB groups in their experiment, the CP group received less feedback than the NFB and PFB groups (although this was only significant for the PFB group) and participants received different amounts of feedback on each day. This was because the error rate determined how much feedback participants in the PFB group received.
For Experiment 2, feedback was given on trials eligible for feedback (i.e., incorrect trials for the NFB group and correct trials for the PFB group) if the following expression was true:
This mechanism adjusted the trial-by-trial feedback so that the total amount of feedback given on each day was as close to 20% of all trials for all groups. After each trial in which a response was made making feedback possible, we calculated the overall percentage of feedback 1) if feedback was to be given on the current trial (right side of equation 3), and 2) if feedback was not to be given (left side of equation 3). The option that brought the total percentage feedback closer to 20% was chosen (see supplementary methods for a detailed example). In sum, the feedback algorithm favors the distribution of feedback when the percentage of feedback distributed falls below 20% of all previous trials, and favors the withholding of feedback when feedback exceeds 20%. Thus, there is constant adjustment after each trial response to keep the amount of total feedback anchored towards 20%.
For the CP condition, feedback type (positive or negative) was dependent on the percentage of positive and negative feedback trials as calculated throughout the experiment. If the response was correct, then the proportion of positive feedback trials was calculated and the circumstance (presenting or withholding feedback) that promoted positive feedback closer to 20% of correct trials was chosen. Likewise, if the response was incorrect, then the proportion of negative feedback trials was calculated and the circumstance (presenting or withholding feedback) that brought the total proportion of feedback closer to 20% was chosen. Thus, participants in the CP condition received no feedback on ~80% of all trials, positive feedback on ~20% of correct trials, and negative feedback on ~20% of incorrect trials. Feedback instructions for Experiment 2 were identical to Experiment 1.
Each trial began with the presentation of a single Gabor patch and remained on the screen until the participant made their judgment. Participants responded on a standard keyboard by pressing the ‘z’ key if they believed the stimulus belonged to category A and the ‘m’ button if they believed the stimulus belonged to category B. The stimulus and feedback were presented on a gray background. Positive feedback took the form of the word “Correct” presented in green font and negative feedback took the form of the word “Incorrect” presented in red font. All feedback remained on the screen with the stimulus for 1500 ms. Trials with no feedback showed only the stimulus on a gray background for 1500 ms. Ten blocks of 80 trials were completed and were separated by participant-controlled rest periods. Sessions were usually completed on consecutive days, with no more than 3 days between consecutive sessions. One participant in the NFB group only completed nine of the ten blocks on day 1, but completed all blocks on day 2 and day 3.
The modeling analysis for Experiment 2 was identical to Experiment 1.
To determine how our feedback algorithm controlled the rate of feedback, we submitted the proportion of feedback trials to a two-factor ANOVA using condition (PFB, NFB, and CP) and day as factors. There was a significant effect of day [F(2, 42) = 3.60, p < 0.05, ], condition [F(2, 21) = 7.783, p < 0.005, ], and a significant day × condition interaction [F(4, 42) = 3.876, p < 0.01, ]. Post-hoc comparisons revealed a significant difference in the proportion of feedback received between the NFB and PFB group [t(7) = 3.50, p < 0.05], and between the NFB group and the CP group [t(7) = 4, p < 0.01], but not between the PFB and CP groups (|t|<1). Note that these effects are the product of the low variance caused by the precision of our feedback mechanism. This is supported by the fact that the CP and PFB groups both received 20% feedback on each day, while the NFB group experienced 19%, 19%, and 18% feedback across the three days. We do note that, similar to Experiment 1, the NFB group received significantly less feedback than the PFB groups.
We performed a pairwise Wilcoxon sign test on the 30 blocks between all groups, similar to Ashby and O’Brien (2007) (Figure 4, right panel). There was a significant difference between CP and PFB (sign test: S = 24 of 30 blocks, p < 0.005), NFB and PFB (sign test: S = 23 of 30 blocks, p < 0.01), but not between NFB and CP (sign test: S = 17 of 30 blocks, p = 0.585). This analysis indicates a strong advantage in learning for the NFB and CP groups over the PFB group.
Although the PFB group experienced a gain in accuracy on day 1 from block 1 (53%) to block 10 (64%), no further learning was observed for the rest of the experiment. In contrast, the NFB and CP group continued to show strong evidence of learning throughout the experiment, reaching accuracies of 73% and 70%, respectively by block 10 of day 3 (64% for PFB). Despite the fact that the NFB group received significantly less feedback than the PFB and CP groups, we observed a significant advantage for the NFB group over the PFB group.
As in Experiment 1, we submitted the within-day accuracy scores (defined as accuracy on block 10 minus accuracy on block 1 for each day) to a group (PFB, NFB, CP) by day repeated measures ANOVA. This revealed a significant main effect of day [F(2, 42) = 6.726, p < 0.005, ], but no main effect of group, and no interaction (F’s<1). These results resembled the results of Experiment 1 (within-day improvements decreased across days), with the exception that day 1 improvements were more similar across groups. Thus, within-day changes in accuracy were statistically similar between groups (Figure 6, bottom middle panel).
Furthermore, we submitted between-day accuracy scores (defined as accuracy on the first block of the following day minus accuracy on the last block of the previous day) to a group by day (day 2 minus day 1, day 3 minus day 2) repeated-measures ANOVA. The results revealed no main effect of day [F(1, 21) = 1.417, p = 0.247], and a marginally significant interaction [F(2, 21) = 3.349, p = 0.06]. The main effect of group, however, was significant [F(2, 21) = 9.966, p < 0.005, ]. Post-hoc comparisons revealed that NFB differed significantly from PFB (p < 0.005), CP differed marginally from PFB (p = 0.072), but NFB did not differ significantly from CP (p = 0.124). These results are similar to those of Experiment 1 where learning differences were only identified between groups and provide a clearer picture regarding the benefit of negative feedback over positive feedback; it appears that negative feedback affords the engagement of offline processes that cannot be engaged by positive feedback alone.
As in Experiment 1, the model-based analyses of the patterns of responses in Experiment 2 suggested that participants in the NFB group were more likely to use an II strategy than participants in the PFB group. The right panel of Figure 5 reveals the number of participants whose strategy were best modeled by either a unidimensional, conjunctive, or II strategy. The PFB group almost unanimously favored a unidimensional strategy; 7 of 8 participants used a unidimensional categorization strategy. For the NFB group, 4 participants used a GLC strategy while the other 4 participants chose to use either a Unidimensional or Conjunctive strategy. Finally, the CP group mostly engaged a unidimensional or GLC strategy (Unidimensional: 4, Conjunctive: 1, GLC: 3)6.
Although our modeling process selects the best-fitting model based on the lowest BIC, it does not indicate the probability that the best-fitting model is adequately superior to the other models (Edmunds et al., 2015). To determine how likely the best-fitting model derived from our analysis is actually the most appropriate model over the alternative models, we computed model probabilities based on Bayesian weights (Wagenmakers & Farrell, 2004; See Supplementary Methods). These probabilities for each model and each participant are plotted in Figure 7 as a heat map (including Experiments 1 and 3). The darker the corresponding box, the higher the model probability is for that model. For the NFB group, the probability of the winning model being the GLC, and not the unidimensional, is 97%; the probability of the winning model being the GLC, and not conjunctive, is 78%. Thus, there is a high probability that the correct model is the winning model derived from our model analysis. Similarly, we can infer with strong confidence that the PFB group was correctly modeled by a RB strategy; the probability of the winning model being a RB model (unidimensional or conjunctive), rather than either II model, is 68%. This analysis confirms that the PFB group was best modeled by a RB strategy, and the NFB group was best modeled by an II strategy.
Experiments 1 and 2 reveal an advantage for negative feedback over positive feedback in promoting II learning. Both experiments used similar category structures (e.g. non-overlapping), but different mechanisms for controlling the rate of feedback; whereas Experiment 1 used an error-based method of controlling feedback, similar to Ashby and O’Brien (2007), feedback in Experiment 2 did not depend on the error rate. While this pattern of results is likely the product of the type of feedback received, the distribution of trials that received feedback may also play a critical role. Given that our mechanisms for controlling the feedback rate did not control which stimuli yielded feedback across perceptual space, it could be the case that the PFB and NFB group may have received qualitatively different information on their feedback trials.
For instance, consider a situation where two participants, one in the PFB group and the other in the NFB group, are both achieving 75% accuracy. The PFB participant can receive feedback on a range of trials that span the 75% of accurate trials, and the distribution of these trials in the stimulus space should be biased toward stimuli far from the category boundary. However, the NFB participant can only receive feedback on the 25% of trials that were incorrect. These trials are more likely to involve stimuli that are closer to the category boundary. Thus, the feedback received by the NFB participant may be more useful because it focuses on the harder trials (trials closer to the optimal linear bound). This is supported by research showing that II category learning is facilitated by initial training on “harder” trials over “easier” trials (Spiering & Ashby, 2008). Figure 8 plots the percentage of feedback given to the PFB and NFB groups for three levels of difficulty (i.e., distances from the boundary; top) and across perceptual space (bottom). While the NFB group appears to have a strong concentration of feedback trials close to the optimal linear bound (denoted by the white line), the PFB group appears to have received more distributed feedback.
This presents a possible explanation for our results: perhaps the NFB group performed significantly better than the PFB group because the NFB group received more useful information from their feedback (i.e., more feedback concentrated towards the optimal linear bound). To claim that negative feedback is more effective for teaching II categories than positive feedback, the type of stimuli that receive feedback must be equated across the PFB and NFB groups. Thus, in Experiment 3 we adjusted trial feedback so that participants in the PFB group received feedback mostly on difficult trials in a corresponding fashion to the NFB group run in Experiment 2.
For Experiment 3, we ran a group that received positive feedback with an adaptive algorithm designed to match the biased distribution of feedback towards harder trials as in the NFB group. We refer to the new PFB group as PFB-HF (harder feedback).
Eight participants were recruited from the University of Iowa community in accordance with the universities institutional review board. Participants were balanced with the NFB group from Experiment 2 based on age and sex (PFB-HF: average age = 20.88 ±1.36, 5 females; NFB: average age = 24.13±4.48, 4 females; CP: average age = 22.85±4.69, 5 females). All participants had normal to corrected vision and attended all 3 sessions. All participants were paid $10 per session.
All stimuli were identical to Experiment 2.
All procedures were the same as the PFB group in Experiment 2, except that the probability of feedback was adjusted so that the PFB-HF group was more likely to receive feedback on harder trials (See Supplementary Methods).
The modeling analysis for Experiment 3 was identical to Experiments 1 and 2.
Figure 9 illustrates the distribution of feedback for the PFB-HF group across trial difficulty (left panel) and across perceptual space (right panel). To determine whether we equated the information received by participants between groups, we compared the proportion of feedback trials for the PFB-HF group in Experiment 3 and the NFB group in Experiment 2. Thus, we performed a three-way ANOVA using day, condition (NFB vs. PFB-HF), and trial difficulty (easy, medium, or hard) as factors, and percentage of feedback trials as our dependent variable. The ANOVA revealed a significant main effect of day [F(2, 26) = 20.495, p < 0.001, ], and trial difficulty [F(2, 26) = 184.909, p < 0.001, ], and a significant interaction between trial difficulty and day [F(4, 52) = 13.081, p < 0.001, ]. No other main effects or interactions were significant. Critically, no significant interaction between condition and trial difficulty was revealed (F<1), so we conclude that our algorithm successfully equated feedback across perceptual space for all groups (for a comparison of the distribution of feedback between the NFB and PFB-HF groups, see the upper right panel of Figure 8 and the left panel of Figure 9). The percentage of trials that participants received feedback was 20% for all days.
We performed a pairwise Wilcoxon sign test on the 30 80-trial blocks of the experiment between the PFB-HF group and all of the groups in Experiment 2 analysis. The pairwise analysis revealed a significant difference between CP and PFB-HF (sign test: S = 28 of 30 blocks, p < 0.001), NFB and PFB-HF (sign test: S = 23 of 30 blocks, p < 0.01), but not between PFB-HF and PFB (sign test: S = 16 of 30 blocks, p = 0.856). Similar to the original PFB group, the PFB-HF group experienced a gain in accuracy on day 1 from block 1 (47%) to block 10 (63%), but this growth in accuracy did not increase by the end of the experiment on block 10 of day 3 (63%).
Within and between-day learning was contrasted between all four groups, similar to Experiment 2. The within-day analysis revealed a significant effect of day [F(2, 56) = 10.883, p < 0.001, ], but no effect of group and no interaction (F’s<1). The between-day analysis revealed no significant effect of day [F(1, 28) = 1.107, ns], and a marginally significant interaction [F(3, 28) = 2.658, p = 0.07, ]. We also identified a significant effect of group [F(3, 28) = 5.079, p < 0.01, ]. Post-hoc tests revealed group differences between NFB and PFB (p < 0.01), NFB and PFB-HF (p < 0.05), but no other comparisons were significant (p’s > 0.24). These results reveal that while both groups experienced similar online changes, the NFB group experienced stronger offline changes compared to the PFB and PFB-HF groups7.
The data for the PFB-HF group were modeled similarly to the previous groups. This analysis revealed that no participant’s data in the PFB-HF group was best modeled by an II-strategy: 5 used a unidimensional strategy, and 3 used a conjunctive strategy. Thus, although equating information between groups increased the number of participants using a conjunctive strategy from 1 (in Experiment 2) to 3, we still did not see more than 1 participant engage an II strategy amongst this group. Note also that the model probabilities are high for the PFB-HF group (Figure 7).
Experiment 3 revealed that equating information was not sufficient to eliminate performance differences between groups. After roughly equating the regions of perceptual space that received feedback, we still observed a difference between the PFB-HF and NFB group in terms of accuracy achieved across days. In addition, we only identified a single PFB participant who was able to engage an II strategy across 21 total PFB participants compared to 7 out of 13 NFB participants who adopted an II strategy. These results provide further evidence that negative feedback is significantly more effective for teaching II categories than positive feedback.
The current study demonstrates a clear advantage for negative feedback over positive feedback for II category learning. This conclusion is supported by higher accuracy for the NFB over the PFB group and greater use of II strategies in groups that were given negative feedback. In addition, we observed that the advantage for the NFB group was driven by between-session changes in accuracy, rather than within-session changes. These findings remained robust when the information that each group received was equated.
These results contrast with Ashby and O’Brien’s (2007) finding that there was no difference in the effectiveness between positive and negative feedback. One possible reason for the disparity relates to the category structure. Unlike Ashby and O’Brien, we used categories that were non-overlapping and excluded trials further than 0.3 diagonal units perpendicular to the optimal linear bound. Overlapping category structures lead to feedback that is inconsistent with the optimal bound, and this may be particularly detrimental to performance when the overall rate of feedback is low. Feedback on trials that are far from the bound may provide little information about the location of the bound, thereby diluting the proportion of trials on which useful information was given. In the case of limited feedback (such as in the PFB and NFB groups), these conditions are likely to promote the continued use of a RB strategy. By increasing the gap between the optimal accuracy using an II strategy vs. using a RB strategy from 82% to 100% we promoted conditions necessary to see a difference between these groups. These changes were sufficient to reveal a distinct advantage for negative feedback over positive feedback in promoting II learning.
One might propose that negative feedback is necessary to engage an II strategy because negative feedback signals the need to update the current categorization strategy, whereas positive feedback does not signal the need to change strategy. In other words, negative feedback may present a global signal that the current strategy being used is incorrect on top of the signal that the trial was performed incorrectly. For example, participants in the NFB group who may have used a unidimensional or conjunctive strategy early on may have realized that their current strategy was inadequate, leading to a strategic shift. Thus, negative feedback may signal the need to break out of an inadequate rule-based strategy. In contrast, in the absence of negative feedback, the PFB group may have assumed that their strategy was adequate, resulting in acquiescence to inferior performance. This may explain why participants in the PFB group failed to engage an II strategy.
Another possible explanation for our pattern of results is that negative feedback is required to unlearn incorrect associations between a stimulus and a response that were formed early on during training. In other words, it is possible that when an incorrect response is produced, an association is formed between a stimulus and a response. Evidence for this comes from Wasserman, Brooks, and McMurray (2015), who demonstrated that pigeons can learn to categorize stimuli into multiple categories over the course of thousands of trials. On each trial, the pigeons were shown a target stimulus and a distractor stimulus and were cued to categorize the stimulus into one of sixteen categories. Interestingly, the researchers noted that pigeons were less accurate when the current trial display included a distractor that had been rewarded as a target on the previous trial. This suggests that the pigeons had difficulty suppressing responses to stimuli that had just been rewarded. They posited that associative learning benefitted from “pruning” incorrect associations, as well as the formation of correct associations. Based on our experiments, it is reasonable to suggest that the PFB group formed many incorrect associations early on in training, but that those associations were never corrected in the absence of negative feedback. If this was the case, it may imply that the positive feedback group had difficulty pruning these incorrect or irrelevant associations, which may explain the pattern of results we observed in our experiments.
A final possibility is that positive feedback on correct trials may not have been as informative as negative feedback on incorrect trials in our experiments. Typically, in a two-choice categorization task, positive and negative feedback are equally informative; positive feedback indicates that the response was correct, whereas negative feedback indicates that the alternate option was correct. Nonetheless, it is possible that as training proceeded, positive feedback became less useful than negative feedback. This is because positive feedback may predominantly include information about trials the participant has already mastered. In contrast, negative feedback always indicates information about trials that participants are unsure about, or at least erred on. This may explain the differences in the pattern of information the PFB and NFB group received during Experiment 2. However, a problem with this explanation is that even when biasing feedback towards harder trials (the PFB-HF group in Experiment 3), which is presumably where one would find feedback most informative, we still observed a significant advantage for the NFB group over the PFB-HF group. Future research will be needed to disambiguate which possibility best explains our pattern of results.
Our analyses pointed to a potential mechanism to explain the advantage for the NFB group: differences in offline changes between groups, rather than online changes. Although we did identify that this was the key difference between groups, we did not specifically isolate the locus of this effect (time away from the task, sleep-dependent consolidation, etc.). Thus, we are unable to give a precise reason why offline changes were greater for the NFB group over the PFB groups. Generally, however, it is assumed that offline processes do not involve any intentional shifts in categorization strategy. Thus, the between-days advantage suggests that negative feedback affords the engagement of incidental learning (e.g. learning not guided by intention) processes between days, that positive feedback cannot. Future work will be needed to determine if this is the case.
Our objective was to investigate the effectiveness of positive and negative feedback towards promoting II learning. Contrary to previous findings (e.g., Ashby & O’Brien, 2007), we demonstrated a stronger advantage for negative feedback over positive feedback. We observed higher accuracies as well as the successful engagement of II strategies for the negative feedback group whereas only one participant in the PFB group was able to engage an II strategy. These results were observed even after equating the information that was received between groups. In addition, while online changes were similar between groups, stronger offline changes were observed for participants that received negative feedback compared to those that received positive feedback. These results suggest that negative feedback may act as a more effective signal for teaching II categories.
Supplementary Figure 1. The probability of feedback (P) is determined by the proportion of correct trials for the last 50 trials for both groups. Note for the PFB group, accuracies below 0.45 result in P values of 1, or feedback that is guaranteed on eligible trials. Accuracies above 50% results in progressively smaller values of P for both groups
The authors would like to thank Darrell Worthy for his help with the modeling analyses. Additionally, we would like to thank Bob McMurray and Gavin Jenkins for their insight with respect to the interpretation of our findings. This research was supported in part by a National Institute on Drug Abuse under Award Number DA032457 to WTM and 5R03DA031583-02 to EH.
1We note that there are mixed findings regarding whether RB and II category problems are solved by one or more systems (Ashby et al., 2002; Ashby & Maddox, 2011; Dunn et al., 2012; Edmunds et al., 2015; Filoteo et al., 2005; Maddox et al., 2003; Maddox et al., 2004; Maddox et al., 2005; Newell et al., 2011; Stanton & Nosofsky, 2007; Tharp & Pickering, 2009; Waldron & Ashby, 2001), but here we do not wish to enter the debate regarding whether one or more systems are engaged to support RB or II learning. Rather, the main objective of this study is to characterize II learning, whether positive or negative feedback is more effective for solving II category problems.
2Another important consequence of this feedback mechanism is that improvements in performance result in decreases in positive feedback for the PFB group. Thus, the PFB group is penalized with less feedback for improving their performance, and rewarded with more feedback for performing worse. Consequently, this feature of the algorithm may have caused an elimination of group differences, which may also account for Ashby and O’Brien’s (2007) pattern of results.
3Note that although Ashby and O’Brien (2007) do not mention the use of a sub-optimal GLC, they do include this model in their analyses.
4This is also the case for Experiments 2 and 3.
5Note that while Ashby and O’Brien (2007) employed a category structure with a bivariate normal distribution, we used an evenly distributed category structure.
6To test whether our modeling analysis favored the selection of RB models, we conducted three model recovery simulations, similar to Donkin et al. (2015). We performed this analysis for the sub-optimal GLC, the unidimensional-length, and conjunctive B models. To conduct this analysis, we extracted the best fitting parameters for each participant from our original modeling analysis. Next, we used the parameters to form new optimal linear bounds for each of our 24 participants and simulated their day 3 responses as if they were using that bound. Finally, we reran our modeling analyses for all participants with these new responses. If our modeling analysis biases the selection of RB models over II models, then we should identify several participants who cannot be fit with the II models, even when we assume that they are using a GLC strategy. Similarly, if our modeling analysis biases the selection of II strategies over RB strategies, then we should identify several participants who cannot be fit with RB strategies, even when we assume they are using either a conjunctive or unidimensional strategy.
When we modeled each participant’s responses using their sub-optimal II parameters, all participants were best fit by an II strategy. Thus, our modeling analysis did not favor the selection of a RB strategy over an II strategy; all participants in our original modeling analysis had the chance to be modeled using an II strategy, and either were, or another model fit better. Finally, we reran our models assuming all participants adhered to a unidimensional and a conjunctive strategy, in separate analyses. For both of these model recovery simulations, we only identified one participant (of 24) who was best fit by an II strategy. Thus, although our modeling analysis represented a very slight bias towards II strategies, we still did not identify any participants in the PFB group who used an II strategy.
7Note that this pattern of results is the same when directly comparing the PFB-HF and NFB groups.