|Home | About | Journals | Submit | Contact Us | Français|
Author contributions: S.M.F., P.E.M.P., and J.F.N. designed research; S.M.F. performed research; S.M.F., B.L.R., J.W., and J.F.N. contributed unpublished reagents/analytic tools; S.M.F., P.E.M.P., and J.F.N. analyzed data; S.M.F., P.E.M.P., and J.F.N. wrote the paper.
The dorsal striatum has been implicated in reward-based decision making, but the role played by specific striatal circuits in these processes is essentially unknown. Using cell phenotype-specific viral vectors to express engineered G-protein-coupled DREADD (designer receptors exclusively activated by designer drugs) receptors, we enhanced Gi/o- or Gs-protein-mediated signaling selectively in direct-pathway (striatonigral) neurons of the dorsomedial striatum in Long–Evans rats during discrete periods of training of a high versus low reward-discrimination task. Surprisingly, these perturbations had no impact on reward preference, task performance, or improvement of performance during training. However, we found that transiently increasing Gi/o signaling during training significantly impaired the retention of task strategies used to maximize reward obtainment during subsequent preference testing, whereas increasing Gs signaling produced the opposite effect and significantly enhanced the encoding of a high-reward preference in this decision-making task. Thus, the fact that the endurance of this improved performance was significantly altered over time—long after these neurons were manipulated—indicates that it is under bidirectional control of canonical G-protein-mediated signaling in striatonigral neurons during training. These data demonstrate that cAMP-dependent signaling in direct-pathway neurons play a well-defined role in reward-related behavior; that is, they modulate the plasticity required for the retention of task-specific information that is used to improve performance on future renditions of the task.
Rational decision making is a complex process involving choices among actions based on the subjective evaluation of the relative costs and benefits of each option. The amount and likelihood of reward, the motivational state of the animal, and the amount of effort to be exerted are all factors that guide decisions regarding the optimal choice of an action. Reward harvesting can be improved by garnering predictive information about these attributes as well as configural information from the environment. Disruption of the fine-tuning of these neurocomputations is a prominent feature of many neuropsychiatric disorders, including drug addiction, schizophrenia, and obsessive-compulsive disorder, and is manifested as maladaptive decision making. In rodents, instrumental learning tasks can be used to assess decision making and have revealed an important role for the striatum in the mechanisms underlying decision-making processes, including action selection and initiation and rule learning (Salamone et al., 1994; Van Golf Racht-Delatour and El Massioui, 1999; Balleine et al., 2007; Walton et al., 2009; Gan et al., 2010; Kurniawan et al., 2011). For example, phasic dopamine transmission in the striatum as well as several intracellular signaling cascades associated with dopamine-mediated alterations in cAMP function have been found to regulate reward valuation and decision-making processes (Pittenger et al., 2006; Gan et al., 2010; Shiflett and Balleine, 2011a). However, the striatum is comprised of distinct neuronal populations that may differentially govern these processes (Gerfen et al., 1990; Smith et al., 1998). Indeed, human studies of striatal gene polymorphisms have implicated the two primary output populations of the striatum, the GABAergic medium spiny projection neurons (MSNs) of the direct and indirect pathways, in regulation of these behaviors (Frank and Fossella, 2011). Nonetheless, decision making has not been studied after controlled, selective perturbation of these neuronal populations. We hypothesized that intracellular signaling dynamics play a central role in the plasticity associated with developing and retaining decision-making strategies.
To address this issue, we used a novel chemical–genetic approach involving viral-mediated gene transfer to express engineered DREADD (designer receptor exclusively activated by a designer drug) receptors in one prominent population of striatal MSNs that contain dynorphin (DYN) and substance P and comprise the striatonigral, or direct, pathway. Because DREADD receptors are only activated by the otherwise pharmacologically inert synthetic ligand clozapine-N-oxide (CNO), this strategy produces transient alterations of G-protein-dependent signaling selectively in these neurons (Rogan and Roth, 2011). Activation of Gi/o-coupled DREADD (hM4Di) receptors increases Gi/o-mediated signaling, which results in a decrease in cAMP activity, GIRK channel activation, and membrane hyperpolarization (Armbruster et al., 2007; Ferguson et al., 2011), whereas activation of Gs-coupled DREADD (rM3Ds) receptors produces an augmentation in neuronal signaling through increases in cAMP production, Ca2+/cAMP-responsive element activity and dopamine- and cAMP-regulated neuronal phosphoprotein of 32 kDa phosphorylation (Brancaccio et al., 2013; Farrell et al., 2013). These viral vectors were used to examine the consequences of enhancing either Gi/o or Gs signaling in striatonigral (direct-pathway) MSNs of the dorsomedial striatum during the training or the performance phases of a rodent decision-making task to ascertain the function of canonical G-protein-dependent signaling cascades in these neurons in the acquisition and retention of decision-making behavior under a variable benefit condition.
All experimental procedures were approved by the University of Washington and Seattle Children's Research Institute Institutional Animal Care and Use Committees and were conducted in accordance with National Institutes of Health guidelines. Male Long–Evans rats (Charles River) weighing 250–274 g on arrival were housed two per cage and given a 1 week acclimation period before any experimental manipulation. The housing room was temperature- and humidity-controlled and maintained on a 12 h light/dark cycle, with water and food available ad libitum except as indicated. Five days before any behavioral procedure, rats began food restriction and continued on food restriction throughout training and testing unless otherwise specified. Rats were restricted to ~90% of their free-feeding weight and received a total of 11–22 g of rat chow per day, consisting of the 45 g food pellets (BioServ) the rats earned during training and testing sessions and once daily feedings of rat chow in the animal colony room.
CNO (obtained from the National Institutes of Health as part of the Rapid Access to Investigative Drug Program funded by the National Institute of Neurological Disorders and Stroke) was dissolved in sterile water with 1% DMSO and administered by intraperitoneal injection in a volume of 1 ml/kg at doses of 1 or 3 mg/kg.
A modified herpes simplex virus (HSV) amplicon was used to express the hemagglutinin (HA)-tagged Gi/o-DREADD receptor transgene (hM4Di) under control of the DYN promoter as described previously (Ferguson et al., 2011). This vector system infects neurons, not glia, and transgene expression begins 7 d after viral infusion and lasts at least 3 weeks (Barot et al., 2007; Ferguson et al., 2011). To construct an HSV vector that expresses the Gs DREADD receptor transgene (rM3Ds) under control of the DYN promoter, the HA-tagged rM3Ds gene was excised from a pcDNA5 plasmid and blunt cloned into an HSV pDYN–GFP plasmid after removal of the GFP gene. Restriction mapping was used to identify successfully ligated and correctly oriented clones, and their sequences were confirmed by PCR. To prevent HSV promoter-driven “leakage” expression in nontargeted neurons, the promoter fragments and the transgenes were inserted in a reverse orientation with respect to the endogenous HSV promoter/origin of replication sequence, and two SV40 polyadenylation sequences were positioned between the end of the HSV promoter and the transgenes. The amplicons were packaged into viral vectors using replication-deficient helper virus as described previously (Clark et al., 2002).
Rats were removed from food restriction at least 24 h before surgery and were returned to food restricted a minimum of 48 h after surgery. Rats were anesthetized with 2–4% isoflurane (Webster Veterinary Supply) during the surgical procedure and were given buprenorphine analgesia (0.1 mg/kg, s.c.) or meloxicam (0.2 mg/kg, s.c.) preoperatively. Using standard stereotaxic procedures, 27-gauge stainless steel injectors were placed above targeted brain regions. The coordinates from bregma for dorsomedial striatum were as follows (in mm): anteroposterior, −0.3; mediolateral, ±3.2; dorsoventral, −4.8 from skull surface. Then, 3 μl of pDYN–hM4Di, pDYN–rM3Ds, or pDYN–GFP (~108 infectious units/ml in 10% sucrose) was infused bilaterally over a 15 min period at a flow rate of 0.2 μl/min. The injector was left in place an additional 5 min to minimize diffusion up the injector tract. Previous studies have found that rats expressing DREADD receptors and treated with vehicle do not differ from those expressing the GFP control transgene and treated with CNO (Ferguson et al., 2011); therefore, only vehicle-treated DREADD receptor groups were used as controls in the present experiments, with the exception of the experiment examining c-Fos expression in which GFP was infused into the control hemisphere of rats unilaterally infused with pDYN–rM3Ds to perform within-subject comparisons. For all experiments, accuracy of injection coordinates was confirmed by visualization of HA immunofluorescence or DAPI staining of the injection needle tracts in 40 μm tissue sections.
Rats were killed with Beuthanasia-D (19 mg of pentobarbital and 2.5 mg of phenytoin in 0.15 ml, i.p.; Schering-Plough), and the transcardial perfusion procedure proceeded once the rats were unresponsive to paw pinch and had an absence of corneal reflex. Perfusions were performed with 200 ml of PBS, followed by 200 ml of 4% paraformaldehyde; both solutions were pH 7.4 and kept on ice. Brains were removed and postfixed in 4% paraformaldehyde for 4 h and then placed into PBS. Tissue sections (40 μm) were made on a Leica VT1000S vibrating blade microtome and kept in PBS until processed for immunohistochemistry or cresyl violet staining.
Floating sections were washed in 0.5% Triton X-100/PBS for 10 min and then blocked in 5% normal goat serum (NGS)–0.25% Triton X-100/PBS for 1 h. Sections were then incubated in 2.5% NGS–0.25% Triton-X/PBS containing substance P (1:400; Millipore) and HA (1:200; Millipore) or c-Fos (1:400; Santa Cruz Biotechnology) with gentle agitation at 4°C for 48–72 h. Next, sections were rinsed four times in PBS and incubated in species-appropriate Alexa Fluor 488-conjugated (green) and/or Alexa Fluor 568-conjugated (red) goat secondary antibodies (1:500; Invitrogen) for 1 h. Sections were washed two times in PBS, mounted on slides, and coverslipped with Vectashield mounting medium with DAPI (Vector Laboratories). Images were captured with a Carl Zeiss confocal LSM701 microscope.
Twelve days after viral infusions, rats were transported to a novel test environment and given an injection of CNO (3 mg/kg), followed 30 min later by an injection of amphetamine (2 mg/kg). Two hours later, rats were perfused transcardially with PBS, followed by 4% paraformaldehyde. Brains were postfixed for 4 h in paraformaldehyde and transferred to PBS until processed for immunohistochemistry.
Behavioral testing was conducted in standard rat modular test chambers (Med Associates) equipped with two levers and two lights on either side of a pellet receptacle containing a light on one wall and a house light on the opposite wall. All chambers were housed in sound-attenuating boxes (Med Associates) equipped with fans providing temperature regulation and white noise.
Habituation and training methods are similar to those described previously (Walton et al., 2006; Gan et al., 2010). Briefly, rats were preexposed to food pellets (45 mg food pellets; BioServ) in the home cage 24 h before magazine training in the operant chamber. After a 30 min habituation to the operant chamber and food pellets, rats were trained to lever press for food pellets on a fixed ratio (FR) 1 schedule. During the 100-trial sessions (maximum session time of 90 min), the house light was illuminated, and either the right or left lever was continuously extended and its associated cue light was illuminated throughout the session. After two to three sessions, the protocol was modified so that completing the response requirement resulted in the lever retracting, the associated cue light extinguishing, delivery of a single food pellet into the magazine receptacle, and the magazine light illuminating. Six seconds after food delivery, the magazine light was extinguished, and the intertrial interval (ITI) began. The start of the subsequent trial was signaled by illumination of one of the two cue lights and extension of the associated lever. For these sessions, each trial was “forced” (i.e., only one of the two levers was extended), and the rats received 40 trials on each lever (total of 80 trials, maximum session time of 2 h). The response requirement was increased on each lever across six sessions from FR1 to FR16.
Rats trained as described above were introduced to new contingencies in which meeting the 16-press response requirement on one lever resulted in delivery of four food pellets [high-reward (HR) option], whereas 16 presses on the other lever resulted in delivery of one food pellet [low-reward (LR) option]. Assignment of the HR versus LR lever remained fixed for the entire session but was reversed daily to avoid perseverative behaviors on one lever. Rats do not appear to learn the pattern of lever assignment, because trained rats that are given a session in which the lever assignment is not reversed (i.e., it is the same as the previous session) do not differ in the number of trials needed to reach criterion (Fig. 1c; t(6) = 0.69, p = 0.52) or in preference for the HR lever (Fig. 1d; t(6) = 1.17, p = 0.28) compared with rats that have the reversed lever assignment (i.e., it is different from the previous session). HR and LR options were presented independently during forced trials or at the same time during “choice” trials. Each session consisted of repeated blocks of four forced trials (each option presented twice in pseudorandom order), followed by four choice trials for a maximum of 80 total trials per session (i.e., 40 forced and 40 choice trials) (Fig. 1a).
The start of each trial was signaled by illumination of the house light, presentation of the lever(s), and illumination of the associated cue light(s). During choice trials, the first press on a lever caused the other lever to retract and its cue light to extinguish. Completion of the response requirement on the selected lever resulted in delivery of the food reward associated with that lever. At that time, the lever was retracted, the cue light was extinguished, the magazine light was illuminated, and the food pellet(s) was delivered into the magazine. After 6 s, the house and magazine lights extinguished, and the ITI began. The ITI was calculated as 30 s minus the time taken to complete the response requirement for the trial. If a lever-press response did not occur within 10 s from the trial onset, all lights extinguished for a 30 s timeout, and that trial was scored as a “miss.” Before the first decision-making session, rats received a pre-session that was identical in makeup, except there were no timeouts given.
Rats learned the assignment of the reward options during each session, as evidenced by development of a preference for the HR lever during choice trials (calculated as the number of HR lever choices/the number of completed choice trials). This preference was also inferred by the rat reaching the behavioral criterion, defined as choosing one option 75% of the last 12 choice trials (i.e., 9 of the last 12 choice trials were on the same lever). Once the criterion was reached for a session, rats were given an additional block of trials (i.e., four forced and four choice trials). The maximum number of trials for each session was 80 (maximum session time was 120 min). The performance of the rats improved during training, as measured by a decrease in the number of trials needed to reach the behavioral criterion over sessions and by an increase in preference for the HR lever over sessions. These measures were used as an indication that the rats learned the decision-making task during the training period. All experimental manipulations had no effect on response latency or the number of omitted trials, so these data are not shown.
In some experiments, rats were given a preference test that was identical to the decision-making sessions except all rats received 80 trials regardless of lever selection. When the preference test occurred 1 week after the last decision-making session, rats were removed from food restriction after the last session, and food restriction commenced 3 d before the preference test.
Rats were given 10 decision-making sessions and were removed from food restriction for 5 d before receiving infusions of the hM4Di viral vector. After a 2 d recovery period, food restriction was restarted. Five d after infusion, rats were given an additional decision-making session. Rats then received two decision-making sessions with vehicle treatment 20 min before each test session, followed by two decision-making sessions with 1 mg/kg CNO treatment 20 min before each test session.
Rats were given one FR16 training session and were removed from food restriction for 2 d before receiving infusions of the hM4Di viral vector. After a 2 d recovery period, food restriction was restarted. Five days after infusion, rats received a second FR16 training session. Rats then started the decision-making training sessions and received vehicle or CNO treatment (1 mg/kg) 20 min before each session in experiments 2 and 3 and immediately after each session in experiment 4. Rats received a preference test 1 week after the last training session in experiments 2 and 4 and 24 h after the last training session in experiment 3. Rats did not receive vehicle or CNO treatment before the preference test.
The experimental design was identical to experiment 2, except the rats received viral infusions of the rM3Ds viral vector.
Group differences in trials to reach criterion and preference for the HR lever were tested using unpaired t tests or one-way or two-way repeated-measures ANOVAs when applicable, followed by Bonferroni's post hoc tests. Differences in the number of c-Fos-positive cells were tested using paired t tests. For all comparisons, α ≤ 0.05. Ten of 92 rats were excluded from analysis because three of the rats had injection sites outside of the targeted brain region (Fig. 1b), two of the rats failed to lever press during the initial training sessions, and five rats perseverated on one lever. Data are graphed as mean ± SEM.
To determine whether cAMP pathway-dependent signaling in direct-pathway neurons in the dorsomedial striatum regulates the exhibition of a preference for a previously learned action, rats were first trained to perform a reward-discrimination decision-making task (Fig. 2). During training, rats showed a significant decrease in the number of trials needed to reach criterion over sessions (Fig. 2a; main effect of session, F(9,63) = 7.34, p < 0.0001) and a significant increase in preference for the HR lever (Fig. 2b; main effect of session, F(9,63) = 5.96, p < 0.0001), indicating that they learned to select and prefer the high-benefit outcome during this training period. However, there were no differences in the number of trials needed to reach criterion nor in the preference for the HR lever when the rats were given subsequent sessions after no injection, pretreatment with vehicle, or pretreatment with CNO (1 mg/kg, i.p.) (Fig. 2a,b; main effect of session not significant, F(2,14) = 0.34 and 0.69, p = 0.71 and 0.52), suggesting that cAMP-dependent signaling in direct-pathway MSNs is not required to exhibit preference for a previously learned strategy.
Next, we examined whether increasing Gi/o-mediated signaling in these neurons affects learning of this reward-discrimination decision-making task and development of an HR preference. In this experiment, animals were pretreated with vehicle or CNO before each training session. Both vehicle- and CNO-treated rats showed a significant decrease in the number of trials needed to reach criterion over sessions (Fig. 3a; main effect of session, F(7,91) = 7.70, p < 0.0001), as well as a significant increase in preference for the HR lever over sessions (Fig. 3b; main effect of session, F(7,91) = 5.29, p < 0.0001). However, there were no significant differences between groups on these two metrics (Fig. 3a,b; no main effect of treatment, F(1,13) = 0.57 and 0.61, p = 0.47 and 0.45; and no interaction between session and treatment, F(7,91) = 1.05 and 1.28, p = 0.4 and 0.27). Although these data suggest that cAMP pathway-dependent signaling in striatonigral MSNs in the dorsomedial striatum is not necessary to acquire this decision-making behavior, a preference test conducted 1 week after the last training session in the absence of CNO treatment revealed that rats that had received CNO treatment during the training sessions required a significantly higher number of trials to reach criterion (Fig. 3a; t(13) = 3.73, p = 0.003) and exhibited a significantly reduced preference for the HR lever (Fig. 3b; t(13) = 2.96, p = 0.01) compared with rats that had received vehicle treatment during the training sessions. These group differences were apparent in the first four trial blocks (Fig. 3c; main effect of pretreatment, F(1,13) = 7.72, p = 0.02), suggesting that task information was differentially retained in the two groups. Interestingly, when the preference test was conducted 24 h after the last training session, there were no group differences in the number of trials to reach criterion (Fig. 4a; t(11) = 0.03, p = 0.97) or in preference for the HR lever (Fig. 4b; t(11) = 0.70, p = 0.50). In addition, when the CNO treatment was given immediately after each training session, there were no group differences during the preference test in the number of trials to reach criterion (Fig. 5a; t(14) = 0.27, p = 0.79) or in preference for the HR lever (Fig. 5b; t(14) = 0.69, p = 0.50), suggesting that the effects of CNO were unlikely to be attributable to interference with reconsolidation processes. Together, these findings indicate that increased activation of Gi/o-mediated signaling in direct-pathway striatal neurons during training disrupted stable encoding of task parameters used to improve future performance.
To further explore this process, we developed a viral vector that expresses rM3Ds receptors under the control of the DYN promoter (Fig. 6a) to determine whether increased Gs-mediated signaling in direct-pathway neurons would produce opposite effects in this reward-discrimination task. Much like analogous vectors in which we used the DYN promoter to target gene expression to striatonigral neurons (Ferguson et al., 2011), we found that HA-tagged rM3Ds receptors were primarily expressed in DYN-containing MSNs (95% of HA-positive cells were substance P positive, 157 of 166 cells; Fig. 6b). To confirm that activation of pDYN–rM3Ds produces increases in intracellular signaling activity in the striatum, as has been reported previously in rM3Ds transgenic mice (Farrell et al., 2013), we tested whether CNO could augment the ability of amphetamine to stimulate expression of the immediate early gene c-Fos, which is activated through Gs-mediated signaling events and is routinely used as a marker of neuronal activity (Morgan and Curran 1991). We found that activation of pDYN–rM3Ds receptors significantly increased the number of amphetamine-evoked c-Fos-positive cells in dorsomedial striatum by nearly 50% compared with the pDYN–GFP controls (Fig. 6c,d; t(3) = 0.95, p = 0.01), confirming that viral expression and activation of rM3Ds does indeed increase intracellular signaling of direct-pathway neurons. There was no effect of this manipulation on amphetamine-induced c-Fos expression in dorsolateral striatum (Fig. 6c; t(3) = 5.83, p = 0.41), suggesting that pDYN–rM3Ds receptor expression was confined to the dorsomedial aspects of the striatum.
Similar to the hM4Di experiments, we found that increasing Gs signaling in direct-pathway neurons did not affect the acquisition of an HR lever preference during training, as assessed by a decrease in the number of trials to reach criterion over sessions (Fig. 7a; main effect of session, F(7,196) = 8.21, p < 0.0001; no main effect of treatment, F(1,28) = 0.65, p = 0.43; and no interaction between session and treatment, F(7,196) = 1.79, p = 0.09) and an increase in preference for the HR lever over sessions (Fig. 7b; main effect of session, F(7,196) = 7.415, p < 0.0001; interaction between session and treatment, F(7,196) = 2.13, p = 0.04; no main effect of treatment, F(1,28) = 0.76, p = 0.39). However, a preference test conducted 1 week after the last training session in the absence of CNO treatment revealed that increasing Gs-mediated signaling in direct-pathway neurons during training resulted in a significant decrease in the number of trials needed to reach criterion on the preference test (Fig. 7a; t(28) = 2.25, p = 0.03) and the development of a significantly stronger preference for the HR lever (Fig. 7b; t(28) = 3.04, p = 0.005). These data further suggest that G-protein-dependent signaling cascades in direct-pathway neurons in dorsomedial striatum are particularly important for retaining previously learned information to improve subsequent performance and are the first to use DREADD receptor technology in rats using a complex cognitive task. Together, these findings indicate that activation of cAMP-dependent intracellular signaling cascades in direct-pathway neurons during training of a reward-discrimination task is required to retain configural task information for future use.
In the present set of experiments, we used the recently developed DREADD receptors and phenotype-specific viral vectors to examine the contribution of direct-pathway striatonigral neurons in the dorsomedial striatum in the acquisition and stability of strategies that maximize reward obtainment. Surprisingly, we found that cAMP pathway-dependent signaling in these neurons does not regulate reward preference, task performance, or improvement of performance during training. Instead, cAMP-dependent signaling in dorsomedial striatum direct-pathway neurons is critically involved in the endurance of this improved task performance. Specifically, increasing Gi/o signaling in direct-pathway neurons during training subsequently impaired the retention of task-specific information (i.e., decreased preference for the lever paired with HRs and increased the number of trials needed to reach criterion in the task), whereas increasing Gs signaling in these same neurons enhanced future retention (i.e., increased preference for the HR lever and decreased the number of trials needed to reach task criterion). These results suggest that decision-making processes in dorsomedial striatum are under bidirectional control of canonical G-protein-mediated signaling cascades in direct-pathway neurons.
During training, subjects acquire task-specific information that can be used to more quickly identify the HR lever on subsequent renditions of the task. In our task design, the lever that yielded the larger reward was not fixed from session to session, and we found that the rats did not learn to predict the pattern of lever assignment. Thus, the altered performance in experimental groups 1 week after training was not simply related to the memory of the higher-rewarded lever or the previously assigned lever as such but rather to retention of the task rules that are used to improve subsequent performance. Because these effects emerged 1 week, but not 1 d, after the final training sessions and were not apparent when CNO treatment occurred immediately after each training session, DREADD receptor activation during training was critical for alterations in the retention of task-specific information that was used after training. These findings are unlikely a result of state-dependent learning effects, because there were no group differences in responding during the CNO-free preference test given 24-h after training and rats that were previously trained in the absence of CNO performed the same when they were subsequently treated with CNO. Because the dose of CNO used in the present experiments is cleared from blood plasma by 2 h, the drug-induced state would have worn off by the test sessions that occurred 24 h later (Guettier et al., 2009). Finally, the observed results are not attributable to changes in response speed or motivation for the task, because the experimental manipulations had no effect on response latency or on the number of omitted trials (data not shown).
Although alterations in cAMP-dependent signaling in dorsomedial striatum were sufficient to alter decision-making performance over longer periods of time, they had no effect on day-to-day improvements in task performance. However, the striatum is a heterogeneous structure consisting of the ventral striatum (i.e., nucleus accumbens) and dorsomedial and dorsolateral aspects of dorsal striatum, and these subdivisions can subserve distinct functions in decision-making processes. For example, the ventral striatum is thought to be critical for learning to predict subsequent rewards, whereas the dorsal striatum is important for preserving action values to optimize future decisions (O'Doherty et al., 2004; Atallah et al., 2007; Kahnt et al., 2009). Thus, because our DREADD manipulations only affected a discrete area within dorsomedial striatum, it is possible that normal information processing within ventral striatum was sufficient to allow acquisition during training to proceed efficiently.
The mechanism for how dorsomedial striatonigral neurons may govern the retention of task-specific information is not yet clear, but there is evidence for mnemonic functions of the striatum from lesion studies (Packard et al., 1989; McDonald and White, 1993). Our findings suggest that cAMP-dependent signaling cascades in striatonigral neurons are particularly important, because both Gi/o and Gs DREADD receptors alter neuronal signaling through changes in cAMP production (Armbruster et al., 2007; Ferguson et al., 2011; Brancaccio et al., 2013; Farrell et al., 2013). Indeed, previous work has found that cytosolic mRNA for Arc is selectively increased in striatonigral neurons after a contingency reversal, implicating this immediate early gene—which is downstream of cAMP production—in response learning (Daberkow et al., 2007). In addition, other work has shown that dopaminergic inputs from the ventral tegmental area and substantia nigra to the striatum are important for linking the outcome of an action to the predictability that it will occur in the future (Gerfen and Surmeier, 2011; Aggarwal et al., 2012), and phasic dopamine transmission in the striatum has been found to selectively encode benefit options in a decision-making task with variable benefit conditions (Gan et al., 2010). Moreover, local pharmacological manipulations of the dopamine system in the striatum can bidirectionally influence memory retention (Packard et al., 1994; Setlow and McGaugh, 1999; White and Salinas, 2003; Kabai et al., 2004). However, unlike the current work, these manipulations were effective when administered after a training session (Packard et al., 1994; Setlow and McGaugh, 1999; White and Salinas, 2003).
Activation of dopamine D1 receptors engages multiple signaling pathways, such as CREB and ERK activation, in striatonigral neurons through increases in cAMP production, whereas activation of dopamine D2 receptors leads to downregulation of these signaling cascades in striatopallidal neurons. Importantly, these cascades have been implicated in learning and decision-making processes and are known regulators of cellular excitability and synaptic plasticity (Sgambato et al., 1998; Thomas and Huganir, 2004; Pittenger et al., 2006; Shiflett and Balleine, 2011b). Finally, striatal dopamine release leads to long-term potentiation at corticostriatal synapses on direct-pathway neurons (i.e., increases in neuron excitability) and long-term depression on indirect-pathway neurons (i.e., decreases in neuron excitability) (Gerfen and Surmeier, 2011; Aggarwal et al., 2012). Thus, it is likely that cAMP-mediated signaling is critical in modulating the stability of behavioral strategies by producing longer-lasting adaptations in cell function that are not yet well understood.
Interestingly, the present set of findings share a common theme with our previous work probing the role of G-protein-mediated signaling in direct-pathway MSNs in another enduring form of experience-dependent behavioral plasticity: psychostimulant-induced locomotor sensitization. In those experiments, we found that, although behavioral sensitization did occur when Gi/o-mediated signaling was increased in striatonigral neurons during amphetamine treatment, the persistence of this phenomenon was prevented, as evident on a challenge test given 1 week later in the absence of CNO treatment (Ferguson et al., 2011). Together, this work provides convergent evidence that direct-pathway neurons in dorsomedial striatum regulate the adaptations that sustain plasticity in behaviors that regulate complex phenomenon, such as decision making, as well as in the shared behavioral dysfunctions that are seen across neuropsychiatric disorders.
This work was supported by National Institutes of Health Grants K99 DA024762 (S.M.F.), R01 MH079292 (P.E.M.P.), U19MH82441 (B.L.R.), and R21 DA021273 and R01 DA030807 (J.F.N.) and National Institute of Mental Health Psychoactive Drug Screening Program (B.L.R.). We thank Dr. Michele Kelly and Hannah DeMeritt for packaging the viral vectors and Dr. Jerylin Gan for technical assistance.
The authors declare no competing financial interests.