|Home | About | Journals | Submit | Contact Us | Français|
Dopamine signaling is implicated in reinforcement learning, but the neural substrates targeted by dopamine are poorly understood. Here, we bypassed dopamine signaling itself and tested how optogenetic activation of dopamine D1- or D2-receptor-expressing striatal projection neurons influenced reinforcement learning in mice. Stimulating D1-expressing neurons induced persistent reinforcement, whereas stimulating D2-expressing neurons induced transient punishment, demonstrating that activation of these circuits is sufficient to modify the probability of performing future actions.
Reinforcement and punishment are fundamental processes that shape animal learning. Reinforcement maintains or increases, while punishment decreases, the probability of specific behavior1,2. Dysfunction in these processes contributes to many psychiatric disorders. For example, addiction is characterized by heightened reinforcement from drug-paired stimuli, coupled with impaired punishment from negative consequences3. In contrast, depression is marked by impaired reinforcement from positive stimuli, and heightened punishment from negative stimuli4. While the striatum is implicated in both reinforcement and punishment, the specific roles of the two populations of striatal projection neurons are not well understood. Here, we tested the hypothesis that D1-expressing direct pathway medium spiny neurons (dMSNs) mediate reinforcement, while D2-expressing indirect pathway neurons (iMSNs) mediate punishment5–9.
In order to selectively activate dMSNs or iMSNs in vivo, we expressed channelrhodopsin-2 (ChR2) in the dorsomedial striatum using a Cre-dependent viral strategy10. We first characterized the effects of ChR2 stimulation on dMSNs in awake behaving mice with in vivo electrophysiology, utilizing microwire arrays that included an integrated optical fiber (Fig. S1a, b). Each of 48 recorded neurons (n=3 mice) was illuminated at four laser intensities (0.1mW, 0.3mW, 1mW, and 3mW, 1 s constant illumination). We concluded that neurons expressed ChR2 (and were therefore dMSNs) if they exhibited a significant increase in firing within 10msec of the laser onset at any laser power (Fig. S1 c-g). Spiking data from recorded neurons in this experiment is available for download at http://uri.neuinfo.org/nif/nifstd/nlx_144028. Overall, 19 (40%) neurons were identified as dMSNs. Importantly, there were no significant differences between waveform characteristics or sorting quality of recording channels that contained ChR2-positive MSNs vs. those that did not (Table S1). Average firing rates and total number of ChR2-responsive dMSNs increased with higher laser intensity, demonstrating that higher laser intensity caused more MSN activation (Fig. S1 h-j).
To investigate reinforcement, mice expressing ChR2 in dMSNs or iMSNs (termed dMSN-ChR2 or iMSN-ChR2 mice, respectively) received bilateral fiber optic implants targeting the dorsomedial striatum (Fiber placements for all experiments in the main text are presented in Fig. S2). The dorsomedial striatum was targeted due to its role in reinforcement and action selection5,8,9,11. Mice were placed in an operant box that contained two triggers, one that activated a 1mW laser (1 s constant illumination, delivered bilaterally) and one that was inactive (Video S1). The capacitive touch sensors were crucial for this experiment because they are much more sensitive than lever-press or nose-poke manipulanda, which allowed us to observe both increments and decrements in responding. We tested three groups of naïve mice in this task: dMSN-ChR2 mice (n=8), iMSN-ChR2 mice (n=8), and control mice that expressed YFP in dMSNs (n=4) or iMSNs (n=4). Data from dMSN-YFP and iMSN-YFP control mice were combined as neither group showed any significant effects (Fig. S3, Table S2) (all p>0.25 when the below analyses were run on each group independently). All groups completed one 30-min training session each day for three consecutive days.
Within the first session, naïve dMSN-ChR2 mice exhibited a significant bias towards the laser-paired trigger whereas iMSN-ChR2 mice exhibited a significant bias away from the laser-paired trigger (Fig. 1a, Table S2, Videos S1 and S2). In contrast to the first 2 min of Day 1, trigger biases for both dMSN-ChR2 and iMSN-ChR2 mice were present within the first 2 min of Day 2, suggesting a learned behavior ( Fig. 1b). However, this effect was weaker in iMSN-ChR2 mice and was no longer significant at the beginning of Day 3 ( Fig. 1b). To further investigate this persistence, mice with at least 3 days of prior training underwent retraining for 30 min, followed by a 30-min extinction session, in which neither trigger elicited a laser pulse. dMSN-ChR2 mice continued to exhibit a significant bias toward the previously laser-paired trigger throughout the entire extinction session, while iMSN-ChR2 mice rapidly lost their behavioral preference (Fig. 1c, d, Table S2).
In light of these differences in persistence, we analyzed the timecourse of reinforcement and punishment following each laser pulse and noted differences on a shorter time scale as well. dMSN-ChR2 mice had a heightened probability of contacting the laser-paired trigger for at least 45 sfollowing a laser pulse, relative to YFP control mice (Fig. e, f). iMSN-ChR2 mice had a lower probability of contacting the laser-paired trigger in the initial 15 seconds following a laser pulse, but this effect was no longer significant in the period 15–30 seconds following the laser stimulation (Fig. e, f). These findings are consistent with the diminished cross-day persistence of iMSN-mediated punishment. We considered that the amount of experience dMSN-ChR2 and iMSN-ChR2 mice had with the laser could explain these differences in learning. However, the persistence of trigger preference across sessions was not related to the number of contacts mice had with the laser-paired trigger on the previous day (Fig. S4).
As activation of these cell groups can induce motor changes11, we tested whether motoric changes during the laser pulses might have contributed to our results. For example, dMSN-ChR2 activation might have induced stereotypies that caused multiple contacts during the laser-paired stimulation. Interestingly, however, dMSN-ChR2 stimulation did not produce changes in the animal’s velocity (Fig. 1g, Video S1), a difference from our previous study11, which may reflect either the shorter duration (1s vs. 30s) or the operant nature of this stimulation. iMSN-ChR2 stimulation elicited brief freezing (consistent with our previous findings11) followed by an aversive-like escape response, evidenced by an increase in velocity following the laser pulse (Fig. 1g,Video S2). However, these brief (<2 second) changes in motor behavior following stimulation are not sufficient to explain the decrease in probability of active trigger contacts that persists for >30 seconds after stimulation ( Fig. 1e).
To test whether the level of dMSN activation correlated with the magnitude of reinforcement, dMSN-ChR2 mice (same cohort as Fig. 1, fiber tip placements shown in Fig. S2a) were placed in an operant box that utilized 4 capacitive touch sensors as operant triggers. A computer detected contacts with these triggers and controlled three lasers, which were calibrated to 0.3mW, 1mW and 3mW of output power per side (1 s constant illumination, delivered bilaterally, Fig. S5a). Contacts with an “inactive” trigger were also counted, but had no consequences. dMSN-ChR2 mice preferred higher laser intensities (R2=0.99, p<0.01, n=8, Fig. S5b, c), demonstrating that the magnitude of reinforcement was correlated with the level of dMSN activation.
Although we were directly activating MSNs, we considered the possibility that we might have also elicited striatal dopamine (DA) release. To examine whether DA itself was involved in the acquisition of trigger preference, we tested whether combined D1 and D2 receptor antagonists (0.02mg/kg SCH23390 and 25 mg/kg sulpiride, co-injected IP) would impair the acquisition of the 2-trigger operant task in naïve dMSN-ChR2 and iMSN-ChR2 mice (Fig. 2a, fiber placements shown in Fig. S2b). DA antagonists significantly reduced overall movement, as compared to separate groups of mice that were injected with saline (Fig. 2a). Importantly, DA antagonists did not significantly alter the total number of contacts with either trigger (Fig. 2a), or prevent acquisition of trigger biases over 3 days of training (Fig. 2b, c, Table S3). To test whether DA was required for the expression of trigger bias, we injected the previously saline-treated groups with the same DA antagonists on a 4th day of training and found that expression of the previously-learned trigger preference was not impaired (Fig. 2d, Table S3).
To test whether this learning was specific to our operant task, we trained dMSN-ChR2 and iMSN-ChR2 mice in a real-time place preference task in which one half of a chamber was paired with pulsed laser stimulation (2 s 1mW laser/8 s off, cohort is a subset of mice used in Fig. 2). Mice were trained for 30 min for two consecutive days, and the 2nd training session was immediately followed by a 30-min test session with no laser stimulation (Fig. 3a). Consistent with our above results, dMSN-ChR2 mice showed a persistence of their learned place preference during the entire test session, while iMSN-ChR2 mice showed no evidence of such persistence (Fig. 3a-c).
Our results indicate that activation of striatal dMSNs is sufficient for persistent reinforcement, while activation of iMSNs is sufficient for transient punishment, in both an operant and a place preference task. The differences in time course that we observed are qualitatively similar to results from animals as diverse as invertebrates, rodents, and humans, demonstrating that reinforcement is more effective than punishment at modifying long-term behavior2,12,13. These differences in time course may relate to differences in synaptic plasticity mechanisms in each pathway14. While DA is known to influence both activity and plasticity of these cells under natural conditions5,9,15, other neurochemicals play a role as well. Future therapies could target dMSNs or iMSNs independently to address specific dysfunctions in reinforcement or punishment associated with psychiatric disorders.
Bacterial artificial chromosome (BAC) transgenic mouse lines that express Cre-recombinase under control of the Dopamine D1 receptor (D1-Cre) and A2A receptor (A2A-Cre) regulatory elements were obtained from GENSAT. Animals entered the study at ~6 weeks of age, weighing ~20 grams. All procedures were approved by the UCSF Institutional Animal Care and Use Committee.
We used double-floxed inverted (DIO) constructs to express ChR2-YFP fusions and YFP alone in Cre-expressing neurons, which virtually eliminates recombination in cells that do not express Cre-recombinase (Sohal et al, Nature, 2009). The double-floxed reverse ChR2-YFP or YFP cassette was cloned into a modified version of the pAAV2-MCS vector (Stratagene, La Jolla, CA) carrying the EF-1a promoter and the Woodchuck hepatitis virus posttranscriptional regulatory element (WPRE) to enhance expression. The recombinant AAV vectors were serotyped with AAV1 coat proteins and packaged by the viral vector core at the University of North Carolina. The final viral concentration was 4 × 1012 virus molecules/mL (by Dot Blot, UNC vector core).
Anesthesia was induced with a mixture of ketamine and xylazine (100mg ketamine + 5mg xylazine per kg body weight co-injected IP), and maintained with 0.5 – 1.0% isoflurane through a nose cone mounted on a stereotaxic apparatus (Kopf Instruments). The scalp was opened and bilateral holes were drilled in the skull (+0.8mm AP, ±1.5mm ML from Bregma). 1µL of DIO ChR2-YFP virus was injected into the left and right dorsomedial striata −3.0mm DV from top of brain) through a 33 gauge steel injector cannula (Plastics1) using a syringe pump (World Precision Instruments) over 10 min. The injection cannula was left in place for 5 min following the injection, and then slowly removed. After the viral injection, a plastic mount containing two fibers (105µm core/125µm cladding) mounted in 1.25mm zirconia ferrules were slowly lowered into the brain and cemented in place such that each fiber was aimed at the dorsomedial striatum on either side. To allow time for viral expression, animals were housed for at least 2 weeks following injection before any experiments were initiated. All surgical procedures were performed under aseptic conditions.
Anaesthesia was induced with a mixture of ketamine and xylazine (100mg ketamine plus 5mg xylazine per kilogram of body weight IP) and maintained with isoflurane through a nose cone mounted on a stereotaxic apparatus (Kopf Instruments). The scalp was opened and a hole was drilled in the skull (0.0 to +1.0mm AP, −1.0 to −2.0mm ML from bregma). Two skull screws were implanted in the opposing hemisphere. Dental adhesive (C&B Metabond, Parkell) was used to fix the skull screws in place and coat the surface of the skull. An array of 16 or 32 microwires (35-µm tungsten wires, 100-µm spacing between wires, 200-µm spacing between rows; Innovative Physiology) and one optical fiber in a ferrule was lowered into the striatum (3.0mm below the surface of the brain) and cemented in place with dental acrylic (Ortho-Jet, Lang Dental). After the cement dried, the scalp was sutured shut. Animals were allowed to recover for at least seven days before striatal recordings were made.
Anaesthesia was induced with a mixture of ketamine and xylazine (100mg ketamine plus 5mg xylazine per kilogram of body weight IP) and maintained with isoflurane through a nose cone mounted on a stereotaxic apparatus (Kopf Instruments). The scalp was opened and two holes were drilled in the skull (+0.8 mm AP, +/− 1.5 mm ML from bregma) where two ferrules attached to short fibers were lowered into the brain targeting the dorsomedial striatum (−3.0 mm DV from top of skull) and glued into place. Animals were giving at least seven days of recovery before behavioral experimentation.
Voltage signals from each recording site on the microwire array were band-pass-filtered, such that activity between 150 and 8,000 Hz was analyzed as spiking activity. This data was amplified, processed and digitally captured using commercial hardware and software (Plexon). Single units were discriminated with principal component analysis (Offline sorter, Plexon). Two criteria were used to ensure quality of recorded units: (1) recorded units smaller than 100 µV (~3 times the noise band) were excluded from further analysis and (2) recorded units in which more than 1% of interspike intervals were shorter than 2-ms were excluded from further analysis. Average waveforms were exported with Offline sorter. During the recording we coupled the array to a laser and pulsed the laser at four intensities (0.1mW, 0.3mW, 1mW, and 3mW). Laser stimulation was run in a cyclical fashion, on for 1 second, and off for 3 seconds. Each neuron received 100 pulses at each laser intensity.
For all neurons, peri-event histograms were generated for each laser intensity independently. Neurons were classified as ChR2-expressing if they exhibited a firing rate greater than 3× above the standard deviation of the 1-second preceding the laser pulse within 10msec of the laser onset. Each neuron was tested independently at each laser power, and neurons that satisfied these criteria at any one power were defined as ChR2-expression.
Two 1-meter glass fibers (62.5µm core/125µm cladding, moc.tramelbacE) were connectorized with LC connectors on one end, and an LC ferrule on the other. The LC connectorized ends were hooked up to a 50/50 splitter coming off a laser, such that equal laser light passed through each fiber. The end with the LC ferrule was attached to the mouse for experiments with a zirconia connection sleeve. After the optical fibers were connected, each mouse was placed in a 16” square operant box that contained either 2 or 4 capacitive touch sensors which were used as operant triggers. Contacts with the touch sensors were recorded by Ethovision 7.0 software, which controlled the illumination of lasers (1 second constant illumination, delivered bilaterally) via an I/O box (Noldus). A separate behavioral chamber was used for the place preference task, which contained two 8”×8” compartments, one of which had opaque white walls and floor, and the other opaque black walls and floor. The mouse’s position was calculated in real time with the Ethovision 7.0 software, and this position was used to control the illumination of the laser, which was counterbalanced between the white and black compartments within each group. Laser was illuminated in a 2-s on/8-s off cycle for the duration that the mouse remained on the laser-paired compartment. For drug experiments, mice were injected with DA antagonists (0.02mg/kg SCH23390 and sulpiride 25mg/kg co-injected IP) or 0.9% saline and returned to their home cage for 30 min before beginning each 30-min session. All experimental sessions were 30 min long.
Statistics were first performed with repeated measures ANOVAs to test for effects of day (Day 1, 2, or 3), group (dMSN, iMSN, or YFP), and/or drug (antagonist or saline), followed up with posthoc one sample t-tests to test whether specific conditions differed from the null hypothesis that 50% of behavior would be directed at the laser-paired and 50% percent at the inactive trigger or compartment.
Animals were sacrificed with a lethal dose of ketamine and xylazine (400 mg ketamine + 20 mg xylazine per kg body weight IP). Animals with microwire arrays received a current injection (10uA for 5sec) through each microwire to lesion the wire tips. All animals were transcardially perfused with phosphate buffered saline (PBS), followed by 4% paraformaldehyde. Following perfusion, brains were left in 4% paraformaldehyde for 16–24 hours and then moved to a 30% sucrose solution in PBS for 2–3 days. Brains were then frozen and cut into 30 µm sections (either coronal or sagittal) with a sliding microtome (Leica Microsystems, Wetzlar, Germany, model SM2000R) equipped with a freezing stage (Physiotemp, Clifton, NJ). To identify fiber locations, relevant sections were identified and mounted on slides. Sections were then photographed in bright field and fluorescence on a Nikon D6 microscope with a 4× objective. From these photographs, fiber tip locations were identified and marked on a coronal schematic of the striatum at 0.8 mm anterior to bregma.
We thank the Nikon Imaging Center at UCSF for assistance with image acquisition, Karl Deisseroth for optogenetic constructs, and Kay Tye for helpful comments on the manuscript. A.C.K and co-workers are funded by the W.M. Keck Foundation, the Pew Biomedical Scholars Program, the McKnight Foundation and the NIH.
Author contributions: AVK and LDT jointly conducted experiments and analyzed data. AVK and ACK conceived of the study and wrote the manuscript.