|Home | About | Journals | Submit | Contact Us | Français|
In humans, training in which good performance is rewarded or bad performance punished results in transient behavioral improvements [1–3]. Their relative effects on consolidation and long-term retention, critical behavioral stages for successful learning [4, 5], are not known. Here, we investigated the effects of reward and punishment on these different stages of human motor skill learning. We studied healthy subjects who trained on a motor task under rewarded, punished, or neutral control conditions. Performance was tested before, and immediately, 6 hs, 24 hs and 30 days after training in the absence of reward or punishment. Performance improvements immediately after training were comparable in the three groups. At 6 hs, the rewarded group maintained performance gains while the other two groups experienced significant forgetting. At 24 hs, the reward group showed significant offline (posttraining) improvements while the other two groups did not. At 30 days, the rewarded group retained the gains identified at 24 hs, while the other two groups experienced significant forgetting. We conclude that training under rewarded conditions is more effective than training under punished or neutral conditions in eliciting lasting motor learning, an advantage driven by offline memory gains that persist over time.
Previous studies have shown that formation and retention of motor memories are dynamic processes that evolve over multiple behavioral stages: online learning, consolidation and long-term retention [4–6]. Consolidation has been defined as reduced fragility of fresh memories during the initial hours after the training period or as spontaneous memory improvements [4, 5, 7], measured at 24 hs post practice . Long-term retention of newly acquired memories allows to recall them without further practice after longer delays [4, 5].
Reward and punishment have been investigated in relation to their influence on short-term learning in conditioning tasks by different authors [1, 2, 9–11]. It has been demonstrated that learning under conditions in which good performance is rewarded or bad performance punished can transiently improve formation of new associations between events in animal models [12, 13]. In humans, their relative effectiveness in inducing consolidation and long-term retention of memories is not known.
Activity in dopaminergic neurons , fundamental for formation [15–17] and retention  of new motor memories is differentially modulated by reward and punishment. Neuronal excitability increases with reward and decreases with punishment . Reward’s strong reliance on dopaminergic neurotransmission [20, 21] makes it a reasonable candidate to influence long-term retention of newly acquired motor memories [15, 22]. Here, we hypothesized that learning under rewarding conditions would result in better long-term retention of a newly acquired memory than learning under punished or neutral conditions and that this advantage would be driven through improved consolidation.
38 right-handed healthy subjects learned a tracking isometric pinch force task (Fig 1) under the influence of monetary reward, punishment, or a neutral control condition (different groups, Fig 2). Subjects were instructed to pinch a force transducer between the right thumb and index finger in order to maintain a red cursor within a moving blue target on a computer screen (Fig 1). At the end of each training trial, subjects were given feedback according to their group: the rewarded group earned money based on the amount of time the red cursor stayed within the blue target, the punished group lost money based on the amount of time the cursor stayed outside the target, and the neutral group received neutral monetary information irrespective of performance. Subjects were told that they would start with $0 and earn money for time on target (rewarded group), would start at $72 and lose money for time off target (punished group), or simply receive $40 at the end of the training session (neutral group). The monetary values were based on preliminary data, so that all groups would have a comparable amount of money at the end of training (actual amounts were $40.4 ± 1.2, 38.6 ± 2.2 and 40.0 ± 0.0 in the rewarded, punished and neutral groups respectively, see Supporting information 1, S1 for training results). Mean error (Fig 1c) was evaluated at test blocks (Fig 2). Thus, all test measurements (baseline, Immediate, 6 hs, 24 hs and 30 days) were done in the absence of any reward or punishment while actual training trials were carried out under the influence of reward, punishment or neutral information.
All three groups had similar mean errors at baseline (rewarded vs. neutral, p = 0.86 and rewarded vs. punished, p = 0.91, multiple pairwise comparisons with Bonferroni adjustments) and immediately after training (rewarded vs. neutral, p = 0.77 and rewarded vs. punished, p = 0.23, Fig 3). Learning, measured as mean error change between the baseline and the immediate post-training time points (Delta immediate - baseline), was similar across groups (10.3 ± 0.5, 10.6 ± 0.5 and 9.8 ± 0.6 for neutral, rewarded and punished groups respectively; rewarded vs. neutral, p = 0.78, rewarded vs. punished, p = 0.45 and neutral vs. punished, p = 0.56), although training performance while feedback information was provided differed between groups (S1).
Retention at 6 hours post-training (Delta 6 hours - immediate) was significantly larger in the rewarded group than in the neutral (p = 0.02) and punished (p = 0.04) groups (Fig 3). Within-group comparisons between the immediate and 6 hours post-training time points showed mean errors that remained stable in the rewarded group (p = 0.87) but increased in the neutral and punished groups (worsened performance, p < 0.001 and p = 0.01 respectively, Fig 3 and S2).
By 24 hours post-training Delta 24 hours - immediate, a common measure of overnight consolidation , was significantly larger in the rewarded than in the neutral (p = 0.02) and the punished (p = 0.04) groups (Fig 3). Within-group comparisons between immediate and 24 hours post-training time points showed decreased mean error in the rewarded group (p < 0.001) in the absence of differences in the neutral and punished groups (p = 0.39 and p = 0.19, respectively, S2) indicating successful overnight offline consolidation in the rewarded group. Whereas the punished and neutral groups showed decreased retention at 6 hours relative to the rewarded group, all groups decreased mean errors (improved performance) to similar extents between 6 and 24 hours (1.24 ± 0.4, 1.00 ± 0.2 and 1.35 ± 0.4 for neutral, rewarded and punished conditions respectively; rewarded vs. neutral: p = 0.35 and rewarded vs. punished: p = 0.22).
Most importantly, by 30 days post-training Delta 30 days - immediate remained larger in the rewarded group than in the neutral (p = 0.01) and punished (p = 0.001) groups (Fig 3). Within-group comparison showed decreased mean error (improved performance) in the rewarded group (p = 0.02), in contrast to the increased mean error (worsened performance) in the punished and neutral groups (p < 0.001 and p = 0.003 respectively, S2). This difference was better accounted for by a relatively stable error between 24 hours and 1 month time points in the rewarded group (p = 0.31), whereas errors increased in the punished (p < 0.001) and the neutral (p < 0.001) groups (S2). These results clearly documented better long-lasting retention of post-training gains in the rewarded group relative to the other two. As a result, the rewarded group had a significantly smaller mean error than the neutral (p = 0.03) and punished (p = 0.002) groups at 30 days (Fig 3 and S2). Mean error at each time point and delta mean error between time points were not significantly different between the punished and neutral groups (p > 0.4 for all comparisons).
Finally, time on target during testing showed results comparable to distance error (Methods). A repeated mixed model ANOVA with factors GROUP (neutral/rewarded/punished) and TIME (baseline/immediate/6hs/24hs/30days) showed a significant GROUP x TIME interaction on mean time on target (F = 3.59, p = 0.008) with all three groups having comparable values at baseline (3.25 ± 0.18, 3.47 ± 0.17 and 3.03 ± 0.20 for neutral, rewarded and punished, respectively) and immediately after training (5.79 ± 0.17, 5. 98 ± 0.19 and 5.53 ± 0.30 for control, rewarded and punished, respectively). Consistent with the measurement of error as cumulative distance away from the target described above, the mean time on target at 30 days was better in the rewarded group (6.33 ± 0.15) than in the neutral (5.22 ± 0.24, p = 0.022) or the punished (5.17 ± 0.20, p = 0.009) groups.
In summary, we found that training under rewarded conditions elicited substantial long-term retention of a newly acquired memory while training under punished or neutral conditions did not and that this advantage developed through stabilization of offline memory gains in subsequent days.
Under our experimental conditions, all three groups improved significantly although to different extents during rewarded, punished or neutral training. Immediately after training, when testing was carried out in the absence of any reward or punishment, all groups showed comparable and marked learning. Memory changes after completion of training could involve stabilization (consistent performance over time), offline gains (performance improvements beyond stabilization) [4, 5, 7], often referred to as consolidation , or offline forgetting (performance worsening over time). We found that at 6 hours, mean errors increased in the punished and neutral groups, reflecting offline forgetting of memory but remained stable in the rewarded group. These findings indicate substantive differences in the strength of the motor memory during the initial hours of the consolidation period depending on training type with stabilization of memory gains present only in the rewarded group.
Consolidation at 24 hours , measured as the difference in performance between the immediate and 24-hour post-training time points, was larger in the rewarded than in the punished or neutral groups. Within group analysis demonstrated offline improvements only in the reward group, which remained present 30 days later. In contrast, the punished and neutral groups did not have significant offline gains and by 30 days showed substantial performance loss (Fig 3).
These results represent to our knowledge the first demonstration of a benefit of reward on long-term retention of a motor memory in animals or humans. Long-term retention is important because it impacts our ability to maintain an acquired memory over time without the need to relearn it each time memory retrieval is required . Learning under the rewarded condition induced significant offline memory gains, while learning under the punished and the neutral conditions resulted in the opposite effect: offline memory losses (forgetting) (Fig 3). Therefore, training under reward not only compensated for the offline forgetting seen in the punished and neutral groups by 30 days, but also resulted in a reversal from offline forgetting to lasting offline learning (Fig 4).
To explore the possibility that performance during training influenced long-term retention we calculated single exponential fits of the training data on the decay parameter and then looked for correlations with retention at 30 days. We found no significant correlations between individual subjects’ decay parameters and retention at 30 days for any of the groups (p = 0.27, p = 0.23, p = 0.35 for neutral, rewarded and punished groups respectively) or for all subjects together (p = 0.52). Across groups, retention at 30 days was not predicted by mean error in the last 10 training trials either (S3). Thus, we found no evidence that the decay parameter during training or training performance in the last 10 trials predicted long-term retention, an issue plausible to experimental testing in the future.
Reward is associated with increased dopaminergic function  in the midbrain  and striatum , which is influential on memory retention in humans , possibly through D1/D5-dopamine-dependent long-term potentiation (LTP) [17, 24, 25]. It is conceivable that dopaminergic neurotransmission could represent a common mechanistic link underlying the synergistic effects of training and reward on long-term retention of motor memories . Dopamine-dependent LTP develops gradually over hours  and persists for days to weeks , a time course similar to that of the developing reward benefits in our study. It operates in cortico-striatal loops [17, 25], which are engaged in motor memory formation  and is activated by motor training [4, 5] and reward protocols [23, 27]. It is possible that the facilitatory effect of reward on long-term motor memory retention reveals an underpinning of D1/D5-dopamine-dependent LTP-like mechanisms [15, 17, 25] at the intersection of networks that mediate motor learning and reward processing , as proposed recently in relation to episodic memory  and habit formation . Training under punishment did not significantly modify memory formation stages relative to the neutral group, an effect that could be accounted for by its depressing influence on dopamine- dependent LTP-like mechanisms  or/and its predominant reliance on activity in serotoninergic pathways [30, 31] that are not part of the network mediating consolidation and long-term retention of motor memories, consistent with results reported in a procedural learning paradigm .
We conclude that training under rewarded conditions is more effective than training under punished or neutral conditions in inducing long-term retention of newly learned memories. Understanding the learning stages influenced by reward, driven through reduction in degradation of the fresh memory and induction of persistent offline memory gains, may influence the design of practice protocols in education as well as the treatment of memory disorders and rehabilitation of function after brain lesions.
Forty-one young adults (24.3 ± 5.2 years, mean ± S.D., 18 females) were enrolled in this study. All subjects were recruited at the laboratory of the Human Cortical Physiology and Stroke Neurorehabilitation Section, NINDS, National Institutes of Health. All participants were right-handed as assessed by the Edinburgh Handedness Inventory, had no abnormal physical or neurological findings, had no past history of neurological or psychiatric diseases, and did not take chronic medications. All subjects gave written informed consent to participate in the study before the experiment. The study was approved by the Ethics Committee of the National Institute of Neurological Disorders and Stroke. We excluded three subjects from the analysis, because their baseline performances were two standard deviations beyond the mean baseline performance of all subjects. Thus, 38 subject’s data were used for data analysis (rewarded training, n = 13; punished training, n = 12; control training, n = 13).
Seated subjects pinched a force transducer between the right thumb pad and lateral middle phalanx of the index finger (Fig. 1a), which controlled the vertical movements of a red cursor (0.6 cm2). Subjects were asked to modulate their pinch force to keep the red cursor in the blue target (1.5 cm2). The blue target moved in a sequential pattern along a single vertical axis for 9 s during each trial (Fig. 1b). The force required to reach the target increased logarithmically with the vertical displacement. Error was defined as the vertical distance between the edges of the blue target and the red cursor at each sampled time point, as shown in Fig 1c.
Subjects were randomly allocated to the rewarded, punished, or neutral control training groups (Fig. 2). Each group practiced the same task over one session (80 trials total). After the end of every trial, the red cursor and the blue box disappeared for 0.5 s. All subjects received visual monetary reward / monetary punishment / neutral information for 1 s specific to their training group (Fig. 2). The range of the positive or negative monetary outcomes was + $ 0.00 to 0.80 or − $ 0.80 to 0.00 $ per trial, respectively. The neutral information in the control group consisted of a sequence of characters (“#####”). Unbeknownst to the subjects, all groups ultimately earned a comparable amount of money (see main text). Monetary reward or punishment depended on the amount of time during which the subjects kept the red cursor in the blue target per trial, a measurement tightly correlated with mean error (S4).
For all outcome measures, assumption of a normal distribution (Shapiro-Wilk test of normality) and homogeneity of variance (Mauchly’s Sphericity Test) were verified. Multiple pairwise comparisons with Bonferroni adjustments were performed to compare delta (mean error change) across groups at each delta time point. A repeated measures mixed-model ANOVA with factors GROUP (rewarded/punished/neutral) and TIME (baseline/immediate/6 hours/24 hours/30 days) on mean error was performed. Mean error was also compared with Bonferroni adjustments across groups at each level of time (baseline, or the immediate or the 30 days post-training) and compared between the immediate and the other post-training time points within each group. Time-series of single trial error during training was mathematically modeled by a single exponential decay function: Error (t) = a*exp(−b*t) + c, where t indicates the number of trials . The decay parameter (term b) for each individual was calculated and then possible correlations between these decay parameters and retention at 30 days were tested for each group and for all subjects lumped together. Secondarily, time on target (see Testing paradigm) was also computed during testing and then analyzed with a repeated measures mixed model ANOVA, followed by multiple pairwise comparisons with Bonferroni adjustments, with the same factorial design and post-hoc comparisons across groups as the ones when mean error was tested. All the analyses were done in SPSS 17.0 (SPSS Inc., Chicago, IL). Significance level was set to p < 0.05. All data are reported as mean ± SEM.
This work was supported by the Intramural Research Program of the National Institute of Neurological Disorders and Stroke (NINDS, NIH). We thank Drs Toyomi Abe, Nitzan Censor, Mark Hallett, Eran Dayan, Ethan Buch, John Krakauer and Pablo Celnik for critical comments on earlier versions of this manuscript.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.