Previous studies have shown that formation and retention of motor memories are dynamic processes that evolve over multiple behavioral stages: online learning, consolidation and long-term retention [4
]. Consolidation has been defined as reduced fragility of fresh memories during the initial hours after the training period or as spontaneous memory improvements [4
], measured at 24 hs post practice [8
]. Long-term retention of newly acquired memories allows to recall them without further practice after longer delays [4
Reward and punishment have been investigated in relation to their influence on short-term learning in conditioning tasks by different authors [1
]. It has been demonstrated that learning under conditions in which good performance is rewarded or bad performance punished can transiently improve formation of new associations between events in animal models [12
]. In humans, their relative effectiveness in inducing consolidation and long-term retention of memories is not known.
Activity in dopaminergic neurons [14
], fundamental for formation [15
] and retention [18
] of new motor memories is differentially modulated by reward and punishment. Neuronal excitability increases with reward and decreases with punishment [19
]. Reward’s strong reliance on dopaminergic neurotransmission [20
] makes it a reasonable candidate to influence long-term retention of newly acquired motor memories [15
]. Here, we hypothesized that learning under rewarding conditions would result in better long-term retention of a newly acquired memory than learning under punished or neutral conditions and that this advantage would be driven through improved consolidation.
38 right-handed healthy subjects learned a tracking isometric pinch force task () under the influence of monetary reward, punishment, or a neutral control condition (different groups, ). Subjects were instructed to pinch a force transducer between the right thumb and index finger in order to maintain a red cursor within a moving blue target on a computer screen (). At the end of each training trial, subjects were given feedback according to their group: the rewarded group earned money based on the amount of time the red cursor stayed within the blue target, the punished group lost money based on the amount of time the cursor stayed outside the target, and the neutral group received neutral monetary information irrespective of performance. Subjects were told that they would start with $0 and earn money for time on target (rewarded group), would start at $72 and lose money for time off target (punished group), or simply receive $40 at the end of the training session (neutral group). The monetary values were based on preliminary data, so that all groups would have a comparable amount of money at the end of training (actual amounts were $40.4 ± 1.2, 38.6 ± 2.2 and 40.0 ± 0.0 in the rewarded, punished and neutral groups respectively, see Supporting information 1, S1
for training results). Mean error () was evaluated at test blocks (). Thus, all test measurements (baseline, Immediate, 6 hs, 24 hs and 30 days) were done in the absence of any reward or punishment while actual training trials were carried out under the influence of reward, punishment or neutral information.
Figure 1 behavioral task. (a) Tracking isometric pinch force task. Subjects pinched a force transducer between the right thumb and index finger. Squeezing the force transducer resulted in the upward movement of a red cursor on the computer screen, while relaxing (more ...)
Figure 2 Experimental design. Subjects participated in 3 different sessions (days 1, 2 and 30) separated into 3 training groups who practiced the task over 4 blocks (20 trials each, black rectangle) under the influence of monetary reward (green, n = 13), monetary (more ...)
All three groups had similar mean errors at baseline (rewarded vs. neutral, p = 0.86 and rewarded vs. punished, p = 0.91, multiple pairwise comparisons with Bonferroni adjustments) and immediately after training (rewarded vs. neutral, p = 0.77 and rewarded vs. punished, p = 0.23, ). Learning, measured as mean error change between the baseline and the immediate post-training time points (Delta immediate - baseline
), was similar across groups (10.3 ± 0.5, 10.6 ± 0.5 and 9.8 ± 0.6 for neutral, rewarded and punished groups respectively; rewarded vs. neutral, p = 0.78, rewarded vs. punished, p = 0.45 and neutral vs. punished, p = 0.56), although training performance while feedback information was provided differed between groups (S1
Figure 3 (a) Effect of reward and punishment on motor skill. Mean errors in the rewarded, punished and neutral groups as a function of time. A repeated measures mixed-model ANOVA with factors GROUP (rewarded/punished/neutral) and TIME (baseline/immediate/6 hours/24 (more ...)
Retention at 6 hours post-training (Delta 6 hours - immediate
) was significantly larger in the rewarded group than in the neutral (p = 0.02) and punished (p = 0.04) groups (). Within-group comparisons between the immediate and 6 hours post-training time points showed mean errors that remained stable in the rewarded group (p = 0.87) but increased in the neutral and punished groups (worsened performance, p < 0.001 and p = 0.01 respectively, and S2
By 24 hours post-training Delta 24 hours - immediate
, a common measure of overnight consolidation [8
], was significantly larger in the rewarded than in the neutral (p = 0.02) and the punished (p = 0.04) groups (). Within-group comparisons between immediate and 24 hours post-training time points showed decreased mean error in the rewarded group (p < 0.001) in the absence of differences in the neutral and punished groups (p = 0.39 and p = 0.19, respectively, S2
) indicating successful overnight offline consolidation in the rewarded group. Whereas the punished and neutral groups showed decreased retention at 6 hours relative to the rewarded group, all groups decreased mean errors (improved performance) to similar extents between 6 and 24 hours (1.24 ± 0.4, 1.00 ± 0.2 and 1.35 ± 0.4 for neutral, rewarded and punished conditions respectively; rewarded vs. neutral: p = 0.35 and rewarded vs. punished: p = 0.22).
Most importantly, by 30 days post-training Delta 30 days - immediate
remained larger in the rewarded group than in the neutral (p = 0.01) and punished (p = 0.001) groups (). Within-group comparison showed decreased mean error (improved performance) in the rewarded group (p = 0.02), in contrast to the increased mean error (worsened performance) in the punished and neutral groups (p < 0.001 and p = 0.003 respectively, S2
). This difference was better accounted for by a relatively stable error between 24 hours and 1 month time points in the rewarded group (p = 0.31), whereas errors increased in the punished (p < 0.001) and the neutral (p < 0.001) groups (S2
). These results clearly documented better long-lasting retention of post-training gains in the rewarded group relative to the other two. As a result, the rewarded group had a significantly smaller mean error than the neutral (p = 0.03) and punished (p = 0.002) groups at 30 days ( and S2
). Mean error at each time point and delta mean error between time points were not significantly different between the punished and neutral groups (p > 0.4 for all comparisons).
Finally, time on target during testing showed results comparable to distance error (Methods). A repeated mixed model ANOVA with factors GROUP (neutral/rewarded/punished) and TIME (baseline/immediate/6hs/24hs/30days) showed a significant GROUP x TIME interaction on mean time on target (F = 3.59, p = 0.008) with all three groups having comparable values at baseline (3.25 ± 0.18, 3.47 ± 0.17 and 3.03 ± 0.20 for neutral, rewarded and punished, respectively) and immediately after training (5.79 ± 0.17, 5. 98 ± 0.19 and 5.53 ± 0.30 for control, rewarded and punished, respectively). Consistent with the measurement of error as cumulative distance away from the target described above, the mean time on target at 30 days was better in the rewarded group (6.33 ± 0.15) than in the neutral (5.22 ± 0.24, p = 0.022) or the punished (5.17 ± 0.20, p = 0.009) groups.