Reinforcement and punishment are fundamental processes that shape animal learning. Reinforcement maintains or increases, while punishment decreases, the probability of specific behavior
1,2. Dysfunction in these processes contributes to many psychiatric disorders. For example, addiction is characterized by heightened reinforcement from drug-paired stimuli, coupled with impaired punishment from negative consequences
3. In contrast, depression is marked by impaired reinforcement from positive stimuli, and heightened punishment from negative stimuli
4. While the striatum is implicated in both reinforcement and punishment, the specific roles of the two populations of striatal projection neurons are not well understood. Here, we tested the hypothesis that D1-expressing direct pathway medium spiny neurons (dMSNs) mediate reinforcement, while D2-expressing indirect pathway neurons (iMSNs) mediate punishment
5–9.
In order to selectively activate dMSNs or iMSNs
in vivo, we expressed channelrhodopsin-2 (ChR2) in the dorsomedial striatum using a Cre-dependent viral strategy
10. We first characterized the effects of ChR2 stimulation on dMSNs in awake behaving mice with
in vivo electrophysiology, utilizing microwire arrays that included an integrated optical fiber (
Fig. S1a, b). Each of 48 recorded neurons (n=3 mice) was illuminated at four laser intensities (0.1mW, 0.3mW, 1mW, and 3mW, 1 s constant illumination). We concluded that neurons expressed ChR2 (and were therefore dMSNs) if they exhibited a significant increase in firing within 10msec of the laser onset at any laser power (
Fig. S1 c-g). Spiking data from recorded neurons in this experiment is available for download at
http://uri.neuinfo.org/nif/nifstd/nlx_144028. Overall, 19 (40%) neurons were identified as dMSNs. Importantly, there were no significant differences between waveform characteristics or sorting quality of recording channels that contained ChR2-positive MSNs vs. those that did not (
Table S1). Average firing rates and total number of ChR2-responsive dMSNs increased with higher laser intensity, demonstrating that higher laser intensity caused more MSN activation (
Fig. S1 h-j).
To investigate reinforcement, mice expressing ChR2 in dMSNs or iMSNs (termed dMSN-ChR2 or iMSN-ChR2 mice, respectively) received bilateral fiber optic implants targeting the dorsomedial striatum (Fiber placements for all experiments in the main text are presented in
Fig. S2). The dorsomedial striatum was targeted due to its role in reinforcement and action selection
5,8,9,11. Mice were placed in an operant box that contained two triggers, one that activated a 1mW laser (1 s constant illumination, delivered bilaterally) and one that was inactive (
Video S1). The capacitive touch sensors were crucial for this experiment because they are much more sensitive than lever-press or nose-poke manipulanda, which allowed us to observe both increments and decrements in responding. We tested three groups of naïve mice in this task: dMSN-ChR2 mice (n=8), iMSN-ChR2 mice (n=8), and control mice that expressed YFP in dMSNs (n=4) or iMSNs (n=4). Data from dMSN-YFP and iMSN-YFP control mice were combined as neither group showed any significant effects (
Fig. S3, Table S2) (all p>0.25 when the below analyses were run on each group independently). All groups completed one 30-min training session each day for three consecutive days.
Within the first session, naïve dMSN-ChR2 mice exhibited a significant bias towards the laser-paired trigger whereas iMSN-ChR2 mice exhibited a significant bias away from the laser-paired trigger (,
Table S2, Videos S1 and S2). In contrast to the first 2 min of Day 1, trigger biases for both dMSN-ChR2 and iMSN-ChR2 mice were present within the first 2 min of Day 2, suggesting a learned behavior ( ). However, this effect was weaker in iMSN-ChR2 mice and was no longer significant at the beginning of Day 3 ( ). To further investigate this persistence, mice with at least 3 days of prior training underwent retraining for 30 min, followed by a 30-min extinction session, in which neither trigger elicited a laser pulse. dMSN-ChR2 mice continued to exhibit a significant bias toward the previously laser-paired trigger throughout the entire extinction session, while iMSN-ChR2 mice rapidly lost their behavioral preference (,
Table S2).
In light of these differences in persistence, we analyzed the timecourse of reinforcement and punishment following each laser pulse and noted differences on a shorter time scale as well. dMSN-ChR2 mice had a heightened probability of contacting the laser-paired trigger for at least 45 sfollowing a laser pulse, relative to YFP control mice (
Fig. e, f). iMSN-ChR2 mice had a lower probability of contacting the laser-paired trigger in the initial 15 seconds following a laser pulse, but this effect was no longer significant in the period 15–30 seconds following the laser stimulation (
Fig. e, f). These findings are consistent with the diminished cross-day persistence of iMSN-mediated punishment. We considered that the amount of experience dMSN-ChR2 and iMSN-ChR2 mice had with the laser could explain these differences in learning. However, the persistence of trigger preference across sessions was not related to the number of contacts mice had with the laser-paired trigger on the previous day (
Fig. S4).
As activation of these cell groups can induce motor changes
11, we tested whether motoric changes during the laser pulses might have contributed to our results. For example, dMSN-ChR2 activation might have induced stereotypies that caused multiple contacts during the laser-paired stimulation. Interestingly, however, dMSN-ChR2 stimulation did not produce changes in the animal’s velocity (,
Video S1), a difference from our previous study
11, which may reflect either the shorter duration (1s vs. 30s) or the operant nature of this stimulation. iMSN-ChR2 stimulation elicited brief freezing (consistent with our previous findings
11) followed by an aversive-like escape response, evidenced by an increase in velocity following the laser pulse (,
Video S2). However, these brief (<2 second) changes in motor behavior following stimulation are not sufficient to explain the decrease in probability of active trigger contacts that persists for >30 seconds after stimulation ( ).
To test whether the level of dMSN activation correlated with the magnitude of reinforcement, dMSN-ChR2 mice (same cohort as , fiber tip placements shown in
Fig. S2a) were placed in an operant box that utilized 4 capacitive touch sensors as operant triggers. A computer detected contacts with these triggers and controlled three lasers, which were calibrated to 0.3mW, 1mW and 3mW of output power per side (1 s constant illumination, delivered bilaterally,
Fig. S5a). Contacts with an “inactive” trigger were also counted, but had no consequences. dMSN-ChR2 mice preferred higher laser intensities (R
2=0.99, p<0.01, n=8,
Fig. S5b, c), demonstrating that the magnitude of reinforcement was correlated with the level of dMSN activation.
Although we were directly activating MSNs, we considered the possibility that we might have also elicited striatal dopamine (DA) release. To examine whether DA itself was involved in the acquisition of trigger preference, we tested whether combined D1 and D2 receptor antagonists (0.02mg/kg SCH23390 and 25 mg/kg sulpiride, co-injected IP) would impair the acquisition of the 2-trigger operant task in naïve dMSN-ChR2 and iMSN-ChR2 mice (, fiber placements shown in
Fig. S2b). DA antagonists significantly reduced overall movement, as compared to separate groups of mice that were injected with saline (). Importantly, DA antagonists did not significantly alter the total number of contacts with either trigger (), or prevent acquisition of trigger biases over 3 days of training (,
Table S3). To test whether DA was required for the expression of trigger bias, we injected the previously saline-treated groups with the same DA antagonists on a 4
th day of training and found that expression of the previously-learned trigger preference was not impaired (,
Table S3).
To test whether this learning was specific to our operant task, we trained dMSN-ChR2 and iMSN-ChR2 mice in a real-time place preference task in which one half of a chamber was paired with pulsed laser stimulation (2 s 1mW laser/8 s off, cohort is a subset of mice used in ). Mice were trained for 30 min for two consecutive days, and the 2nd training session was immediately followed by a 30-min test session with no laser stimulation (). Consistent with our above results, dMSN-ChR2 mice showed a persistence of their learned place preference during the entire test session, while iMSN-ChR2 mice showed no evidence of such persistence ().
Our results indicate that activation of striatal dMSNs is sufficient for persistent reinforcement, while activation of iMSNs is sufficient for transient punishment, in both an operant and a place preference task. The differences in time course that we observed are qualitatively similar to results from animals as diverse as invertebrates, rodents, and humans, demonstrating that reinforcement is more effective than punishment at modifying long-term behavior
2,12,13. These differences in time course may relate to differences in synaptic plasticity mechanisms in each pathway
14. While DA is known to influence both activity and plasticity of these cells under natural conditions
5,9,15, other neurochemicals play a role as well. Future therapies could target dMSNs or iMSNs independently to address specific dysfunctions in reinforcement or punishment associated with psychiatric disorders.