|Home | About | Journals | Submit | Contact Us | Français|
Sequential reward-seeking actions are readily learned despite the temporal gap between the earliest (distal) action in the sequence and the reward delivery. Fast dopamine signaling is hypothesized to mediate this form of learning by reporting errors in reward prediction. However, such a role for dopamine release in voluntarily initiated action sequences remains to be demonstrated.
Using fast-scan cyclic voltammetry we monitored phasic mesolimbic dopamine release, in real-time, as rats performed a self-initiated sequence of lever presses to earn sucrose rewards. Prior to testing rats received either 0 (n=11), 5 (n=11) or 10 (n=8) days of action sequence training.
For rats acquiring the action sequence task at test dopamine release was strongly elicited by response-contingent (but unexpected) rewards. With learning, a significant elevation in dopamine release preceded performance of the proximal action, and subsequently came to precede the distal action. This pre-distal dopamine release response was also observed in rats previously trained on the action sequence task, and the amplitude of this signal predicted the latency with which rats completed the action sequence. Importantly, the dopamine response to contingent reward delivery was not observed in rats given extensive pre-training. Pharmacological analysis confirmed that task performance was dopamine dependent.
These data suggest that phasic mesolimbic dopamine release mediates the influence that rewards exert over the performance of self-paced, sequentially-organized behavior and shed light on how dopamine signaling abnormalities may contribute to disorders of behavioral control.
A long-standing challenge in behavioral science, termed the distal reward (1) or credit assignment problem (2), has been to explain how action sequences are learned, given that one initiates a sequence by performing an action that never directly earns reward. There is growing evidence that dopamine is involved in mediating this influence of rewards over distal events. It is imperative to understand the role of dopamine in action sequence learning, and in the control of self-initiated behavior more generally, since these actions appear to be particularly disrupted by disorders involving alterations in striatal dopaminergic transmission, such as Parkinson’s disease (3) and addiction (4-6).
Studies recording dopamine cell activity have shown that reward presentation elicits a firing burst that shifts to reward-predictive cues (7-9), or discriminative stimuli (10, 11) with training (12). Fast-scan cyclic voltammetry studies have found that phasic mesolimbic dopamine release also backpropagates from reward to reward-predictive cues during Pavlovian learning (13, 14) or to a discriminative stimulus during discrete-trial instrumental tasks (15, 16). In paradigms involving a chain of reward-predictive cues dopamine cell activity also shifts from proximal to distal cues (17, 18). Such findings support the hypothesis that dopamine reports reward prediction errors, i.e., discrepancies between the observed and expected values of rewards and cues (19, 20). This signal is considered important for the acquisition of complex reward-seeking behaviors (21, 22). Interestingly, dopamine receptor blockade has recently been shown to reduce response likelihood during a self-initiated reward-seeking action sequence (23) and accumbal dopamine depletions disrupt action performance when many actions are required to obtain reward (24), which is consistent with a large literature implicating dopamine in incentive motivation (25, 26), the process that allows reward-predictive cues to invigorate reward-seeking actions. Such findings raise the possibility that task-related dopamine signaling also exerts a motivational influence over reward-seeking actions.
Although the simplifying assumptions common to reinforcement learning theory make it difficult to generate predictions about the characteristics of reward prediction error (and phasic dopamine) signaling in self-paced, free-operant instrumental tasks, there is limited data to suggest that in such situations a similar pattern of dopamine responses emerges with learning. Indeed, both dopamine neuron firing (27, 28) and phasic mesolimbic dopamine release (29, 30) precede well-learned self-initiated actions, consistent with a backpropagating reward prediction error signal. However, studies of dopamine release during free-operant learning have typically used cocaine reward (29, 30), which may not generate normal prediction error signals (31), given that it elicits dopamine when predicted (29) and over time undoubtedly alters dopaminergic function (32). Whether phasic mesolimbic dopamine release shows the properties of a prediction error signal under naturalistic free-operant conditions therefore remains unanswered. Moreover, given the prominent role that modern reinforcement learning models attribute to phasic dopamine in action sequence learning and a reward’s ability to influence distal events, it is surprising how little is known about the characteristics of dopamine release in such situations. Uncovering this information is vital if modern reinforcement learning concepts are to be used to explain the neural mechanisms of normal and aberrant decision-making.
We used fast-scan cyclic voltammetry to provide an initial characterization of phasic dopamine release in the ventral striatum/nucleus accumbens of rats performing an unsignaled, self-paced multi-action sequence task for sucrose reward. Given the combined findings that phasic dopamine activity backpropagates from reward through a series of Pavlovian cues (17, 18) and can come to precede first-order self-initiated instrumental actions for cocaine reward (29, 30, 33), we assessed whether phasic dopamine release shifted with learning from the reward itself to increasingly more distal events. Given the large body of evidence implicating mesolimbic dopamine in incentive motivation (25, 34, 35), we also tested the relationship between task-related dopamine responses and action sequence performance.
Male Sprague Dawley rats (n=46, Charles River Laboratories, Wilmington, MA) served as the subjects for these experiments. For the fast-scan cyclic voltammetry experiments rats were trained on a sequence of lever-pressing actions to earn sucrose pellet rewards (Bioserv, Frenchtown, NJ). Briefly, the behavioral paradigm (see Supplement) required rats to perform a fixed sequence of two different lever press actions to earn sucrose pellets, such that one action was temporally distal and the other temporally proximal to reward delivery. The distal lever was continuously available and, when pressed, resulted in the insertion of the proximal lever into the chamber. Pressing the proximal lever resulted in the delivery of a pellet and caused that lever to be retracted. Importantly, this task was self-paced, in that the subject could control both the initiation of each sequence and the speed with which the sequence of actions was performed (i.e., trial latency). Prior to testing rats received either 0 (n=11), 5 (n=11) or 10 (n=8) days of action sequence training. During the test phasic dopamine concentration changes in the ventral portion of the neostriatum/nucleus accumbens, a region previously implicated in reward-motivated behavior (36-38), were monitored with fast-scan cyclic voltammetry (see Supplement) while rats earned a total of 30 sucrose pellets rewards through their action sequence performance. All recording sites were verified with histological procedures (Figure 1).
In a separate set of rats we examined the effects of flupenthixol (Sigma Aldrich, St. Louis, MO) on sequence performance after either 0, 5 or 10 days of action sequence training. Flupenthixol (0.5mg/kg/ml i.p.) or saline control was administered 1h prior to a test in which rats were allowed to respond on the action sequence to earn up to 30 sucrose pellet rewards.
For full methodological details see the Supplement.
Using several measures of task performance (distal and proximal action rate, task efficiency and the total time to complete the sequence) we found that rats given 5 or 10 days of training displayed similar levels of performance and that task performance was significantly better in these groups relative to rats that did not receive any pre-test sequence training. Analysis of rats’ distal lever press rate (Figure 2A) revealed a main effect of training (F2,27=3.32, p=0.05), suggesting that distal action rate was significantly higher in groups with more training. Similarly, there was a significant increase in proximal action rate across training groups (F2,27=5.88, p=0.007, Figure 3B). Importantly, the proximal/distal action ratio, a measure of task efficiency executed, also showed a significant improvement across training groups (F2,27=38.24, p<0.0001; Figure 2B-inset). Dunnent’s post-hoc analyses of these data showed that task efficiency was significantly better in the 5-day training group (p<0.001) relative to the 0-day group, and that no further improvement occurred after 10 days of sequence training (5-day v. 10-day p>0.05). Not only did training result in an increase in task efficiency, there was also an effect of training on the average time it took rats to complete each sequence (F2,27=11.23, p=0.0003; Figure 2C). Post hoc analyses revealed that the average action sequence time was significantly longer in rats without sequence pre-training relative to rats trained for 5 days on the sequence prior to test (p<0.001), while rats trained for 10 days did not perform differently than those trained for 5 (p>0.05).
As mentioned above, mesolimbic dopamine release is hypothesized to backpropagate from reward to more distal elements in a sequence with learning. Our results appear to be consistent with this idea. Figure 4 shows the dopamine concentration change averaged across rats in the 5s prior to and after the distal (4A) and proximal (4B) lever press at the beginning (representative example shown in Figure 3-left), middle and end (representative example Figure 3-2nd panel) of the acquisition session for 0-day training group rats. Early in training, when reward delivery was relatively unexpected, mesolimbic dopamine release peaked following reward delivery (which occurred immediately following the proximal lever press, marked “Reward” in Figure 4B). Although rats had no training on the full action sequence prior to test, they had received brief training on the proximal lever-reward contingency and as such the proximal lever insertion likely served as a reward-predictive cue and elicited a small dopamine response. In the middle of the acquisition session the maximal dopamine signal amplitude occurred earlier, coincident with the proximal press. By the end of the session, when the rats had presumably come to expect response-contingent reward delivery, there was a dopamine increase prior to the distal action.
Peak dopamine concentration change to each event in the sequence (preceding distal action, to the lever insertion cue, preceding proximal action, and following reward delivery) averaged across session phase, either the first 5 (beginning), middle 5 (middle) or last 5 (end) trials of the acquisition session, and then averaged across subjects, is presented in Figure 4C. Analysis of these data revealed no significant effect of event (F3,30=1.30, p=0.29), but did show a significant session phase effect (F2,20=8.43, p=0.002), as well as a significant phase by event interaction (F6,60=13.08, p=0.01), demonstrating that the dopamine release in response to each task event was differentially altered by action sequence learning. Indeed, individual analysis of the dopamine signal associated with each event in the beginning of the session confirmed a significant effect of event (F3,30=5.18, p=0.005), with post hoc analysis revealing that the dopamine response to the reward delivery was significantly higher than to that preceding the distal action (p<0.01), to the lever cue (p<0.05) and preceding the proximal action (p<0.01). This was not the case in the middle and end of the session; there was only a marginally insignificant effect of event on dopamine peak concentration in the middle of the session (F3,30=2.6, p=0.07) and no effect of event at the end of the acquisition session (F3,30=0.66, p=0.58). Importantly, analysis of the dopamine response to each event individually in the beginning, middle and end of the acquisition session supports the notion that task-related dopamine signaling was modulated by training. There was a significant effect of time on the peak dopamine concentration change prior to the proximal response (F3,30=6.53, p=0.007), with post hoc analysis confirming that this response was significantly increased in both the middle (p<0.05) and end (p<0.01) of the session relative to the beginning of the session. Similarly, there was a significant effect of time on the pre-distal dopamine response (F3,30=4.62, p=0.02), however post hoc analyses here showed that, relative to the beginning, the amplitude of this response was increased only at the end of the session (p<0.05). Thus, it appears that for rats learning to perform a new sequence of actions, the mesolimbic dopamine response increased first during the period immediately before proximal action performance, and then during the period just before distal action performance, consistent with a backpropagating, reward-prediction error profile. Importantly, rather than dopamine being solely elicited by overt cues or events, it also came to precede the rats’ initiation of the action sequence, which is notable for a self-paced task in which rats’ reward seeking was voluntary.
This shift in dopamine to more distal elements, noticed within the group acquiring the action sequence task, appeared to be followed by longer-term changes in task-related dopamine signaling, which were apparent when assessed across groups of rats with differing sequence training levels. Figure 5A and B shows the dopamine concentration change in the 5s prior to and after each distal (5A) and proximal (5B) lever press averaged across the 30-trial session for each rat and then averaged across rats for subjects in the 0-, 5-, or 10-day training groups. As is clear from this figure and the representative examples shown in Figure 3, mesolimbic dopamine was elevated both prior to and after the distal and proximal actions in all 3 groups. However, the amplitude and pattern of these dopamine concentration changes differed across training groups; the phasic dopamine signal was more prominent in the end stages of the sequence (proximal lever press and following reward delivery) in rats acquiring the sequence at test (0-day group), and became preferentially associated with more distal elements of the sequence in extensively trained rats (10-day group).
Statistical analysis of the peak dopamine release associated with each event in the sequence (Figure 5C) further supports the notion dopamine levels surrounding the major events within the sequence critically depended on the extent of training prior to testing (event x training group interaction: F6,81=46.42, p=0.005). Dopamine levels were significantly greater during the reward delivery compared to the pre-distal press period (p<0.05) for rats without sequence pre-training, whereas no such difference was observed in rats given 5 days of training prior to testing (p>0.05). Rats given extensive pre-training (10-day group) showed the opposite effect, exhibiting a larger dopamine response before performing the distal action than to reward delivery (p<0.05). There were no main effects of either event (F3,81=15.02, p=0.36), or training group (F2,81=1.29, p=0.29).
Rats were intermittently given non-contingent sucrose pellets before each test to compare dopamine responses to these unexpected rewards with those earned by lever pressing. This analysis, (Figure 6) revealed a main effect of expectancy (F1,27=10.83, p=0.002), suggesting that the dopamine response to the earned, and therefore expected, reward was less than to unexpected reward delivery. Although there was no training group effect (F2,27=1.56, p=0.23) or interaction between training group and expectancy (F2,27=0.60, p=0.55), Bonferroni post hoc analysis revealed that the effect of expectancy was significant only in the 10-day training group (p<0.05), i.e. only in the most well-trained rats did earned reward elicit significantly less dopamine release than unexpected reward delivery. Importantly, no group differences were observed in the magnitude of the dopamine response to unexpected reward (F2,29=0.42, p=0.66), demonstrating that the group differences described above were specific to predictable rewards and depended on rats’ training history rather than other potential differences across groups (e.g., electrode sensitivity).
Taken together, these data show that the rapid, within-session modulation of task-related dopamine signaling observed in rats learning to perform the action sequence continues to occur over sessions, as animals become proficient in the task. Moreover, there appears to be an interim learning phase in which phasic dopamine is elevated to each event in the sequence (5-day group) prior to transitioning from the reward to more distal elements (10-day group).
Importantly, as can been seen in Figure 5A, the dopamine peak preceding the distal action is tightly time-locked to the actual distal lever press in both the 0- and 5-day training groups (see tick line Figure 5A-top and - middle). In both these groups, dopamine levels peaked between 3.8 and 0s prior to the distal lever press and for most rats the peak occurred within 1s prior to the press. The average time between these dopamine peaks and the distal lever press was not significantly different between the 0- and 5-day training groups (t20=0.31, p=0.76). In the group receiving extended training (10 days) on the action sequence, time-locking of the dopamine peak preceding the distal action was more variable across rats (ranging from 4.9 to 0.3s prior to the press, see tick line Figure 5A-bottom) and occurred much earlier, on average, than it did for the 5-day group that had less training but showed comparable behavioral performance (t17=2.17, p=0.04). This pattern explains the apparent slow rise in average dopamine levels prior to the distal action in the 10-day group (Figure 5A). Rather than reflecting a consistent pattern across subjects, the averaged data obscure individual differences in pre-response dopamine signaling. Thus, it appears that the temporal relationship between phasic dopamine transients and initiation of sequence performance became decoupled for rats receiving extensive training.
Our finding that mesolimbic dopamine release actually came to precede the rats’ initiation of the action sequence suggests that dopamine release may not simply be a response to overt task cues, but may also reflect a motivational component of task performance. Indeed, not only was there an evolution of phasic mesolimbic dopamine release prior to action sequence performance (i.e., before the distal press), the magnitude of this effect predicted task performance. Figure 7A presents the concentration of the dopamine peak prior to the distal action for each of the 30 total events (shaded to reflect 5-trial bins) relative to the average amount of time it took to complete the sequence, averaged across rats in the 0-day training group. The magnitude of the dopamine peak preceding each distal action was significantly negatively correlated with action sequence time (R30= -0.49, p=0.006). A similar relationship was also apparent between subjects across all three training groups (Figure 7B). Statistical analysis of these data, controlling for training group, also revealed a significant negative correlation (R27= -0.44, p=0.02). Thus, it appeared that the more dopamine released prior to initiation of the action sequence, the quicker the rat completed the sequence. This finding suggests that phasic dopamine is not solely serving as a prediction error signal during action sequence learning, but may also be related to the influence of incentive motivation on task performance. Interestingly, there were no significant correlations between the dopamine release amplitude to any of the action sequence elements and the rats’ response rate on either lever (Table S1 in the Supplement), indicating that the rats’ overt behavioral output level was not associated with phasic mesolimbic dopamine activity and is, in this sense, dissociable from their action sequence performance.
To further explore the relationship between dopamine signaling and self-initiated action sequence performance, we conducted an experiment in a separate group of animals to assess the sensitivity of the current task to dopamine receptor blockade. Naive rats were given 10 days of training on the action sequence. Rats were pretreated with the non-specific dopamine receptor antagonist, flupenthixol (0.5mg/kg i.p.) or vehicle, prior to the 1st, 5th or 10th day of action sequence training. As Figure 8A demonstrates, we found that flupenthixol administered on test altered the time it took rats to complete the action sequence; there was a main effect of both drug (F1,28=8.58, p=0.01) and training (F2,28=11.24, p=0.0003), as well as an interaction between these factors (F2,28=7.83, p=0.002). Post hoc analysis found that rats treated with flupenthixol took significantly longer to complete each action sequence, on average, relative to vehicle-treated controls when administered on the initial day of action sequence learning (p<0.001), but not when administered after 5 or 10 days of training (p>0.05). Thus, this aspect of action sequence performance appears to be dopamine-dependent only during initial acquisition of the task, presumably because the underlying learning process requires dopamine signaling. However, flupenthixol also had a more persistent task performance effect, reducing the rate of responding on both levers irrespective of training (Figure 8B and C). For both lever-pressing actions there was a significant main effect of training (distal: F2,28=13.52, p<0.0001; proximal: F2,28=22.38, p<0.0001), as well as a main effect of drug (distal: F1,28=30.66, p<0.0001; proximal: F1,28=18.46, p=0.0007), with no significant interaction (distal: F2,28=1.50, p=0.24; proximal: F2,28=1.43, p=0.26), indicating that, with training, rats pressed faster on both levers and that flupenthixol attenuated press rate in a manner that did not depend on training history. Taken together these results confirm that dopamine signaling plays a critical role in the acquisition and performance of self-initiated sequential actions.
This study characterized the pattern of phasic mesolimbic dopamine release during the acquisition and performance of a self-paced two-action sequencing task in rats. We found that dopamine release shifted from the reward to more distal elements of the sequence, a pattern detected both within-subjects, in rats acquiring the action sequence for the first time, and across groups of rats given varying amounts of pre-training on the task. Moreover, we found that the concentration of the dopamine transient preceding the initiation of the action sequence predicted the speed with which rats completed the sequence.
These results are generally consistent with the findings of previous studies showing that phasic mesolimbic dopamine signaling transitions from reward delivery to reward-predictive cues during passive Pavlovian learning (7, 13, 14, 39) and discrete–trial, discriminative stimulus-controlled instrumental tasks (10, 11, 15, 40). Our current findings significantly extend this work by demonstrating that phasic dopamine release shifts from reward delivery to precede a self-initiated free-operant instrumental action. In this respect our results are consistent with the few studies that have examined phasic mesolimbic dopamine release during free-operant behavior for drug reward (29, 30, 41) and importantly show that phasic mesolimbic dopamine signaling comes to precede a self-initiated action under naturalistic conditions in which the dopamine system is not pharmacologically altered (42, 43). Importantly, unlike these studies with drug reward, we show that the phasic dopamine signal to earned food reward diminishes with training. Moreover, our results add to previous studies to suggest that, with learning, the initiation of a self-paced action sequence can be preceded by phasic mesolimbic dopamine release and that performance of an action that has never directly earned reward can be accompanied by phasic dopamine signaling. Interestingly, we show that dopamine release does not transition immediately in our self-paced action sequence task from the reward to more distal sequence elements, but rather that there is an interim learning phase in which phasic dopamine is elevated to each event in the sequence, including the reward delivery. This result is consistent with electrophysiological recordings of midbrain dopamine neurons during cue-reward pairings (39).
Temporal difference models of reinforcement learning assume that learning is regulated by a reward prediction error signal (21, 22, 44-46), and there is now considerable evidence that this reward prediction error is mediated by phasic dopamine (7, 19). However, using a temporal difference algorithm to model free operant conditioning is challenging due to the lack of unambiguous rules for defining model states. This clearly applies to the current task since our rats were allowed to decide when to initiate the sequence. The main features of our data are nevertheless in line with the general themes of such models. We show that rewards earned by the performance of well-established self-initiated actions elicit significantly smaller phasic mesolimbic dopamine responses than unexpected reward deliveries. Whereas training attenuated the dopamine response to response-contingent rewards, it led to an increase in phasic dopamine release during the period before the distal lever press. This increase was apparent within a single learning session. While not a response to any overt cue, this dopamine response may have been elicited by environmental or internal cues unappreciated by the experimenter.
In addition to the view that dopamine mediates reinforcement learning, a second popular hypothesis assumes that dopamine is responsible for mediating the incentive motivation that allows reward-paired cues to invigorate reward-seeking actions (25, 47-49). There have been several recent attempts to integrate the concept that dopamine mediates incentive motivation into the reinforcement learning framework (50-52). For example, McClure et al. (2003) posits that phasic dopamine’s role in mediating the direct incentive motivation effects of reward-predictive cues on action selection is dissociable from its role in reporting the reward prediction errors that support reinforcement learning (22, 44, 45), or action chunking (53, 54). While not providing a critical test of such theories, our finding that the transient dopamine release amplitude preceding the initiation of the action sequence predicts the speed with which that sequence will be completed is generally consistent with a role for phasic mesolimbic dopamine signaling in incentive motivation. Our pharmacological data support this correlational finding by showing that dopamine receptor blockade attenuates both action sequence learning and performance. These effects of dopamine receptor antagonism on action sequence performance are consistent with findings from nonhuman primate studies showing that dopamine transmission is preferentially involved during the early stages of action sequence learning (53, 54), when discrete actions are being integrated into sequence-level action chunks (55, 56), and with literature implicating dopamine in incentive motivation (25, 34, 35).
Taken together these data suggest that phasic mesolimbic dopamine release reflects the properties of a prediction error signal during the acquisition and performance of a self-paced sequence of actions and that such release is also associated with the incentive motivational properties of rewards and reward-paired cues, potentially providing a mechanism by which rewards come to exert influence over temporally distal actions.
This research was supported by grants DA09359 and DA05010 from NIDA to N.T.M., grant T32 DA024635 from NIDA and Hatos scholarship to K.M.W. and grant DA029035 to S.B.O. The authors would like to thank Katie McNutt for research assistance.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Financial Disclosures: All authors report no biomedical financial interests or potential conflicts of interest.