|Home | About | Journals | Submit | Contact Us | Français|
The ventral striatum (VS) is thought to serve as a gateway whereby associative information from the amygdala and prefrontal regions can influence motor output to guide behavior. If VS mediates this ‘limbic-motor’ interface, then one might expect neural correlates in VS to reflect this information. Specifically neural activity should reflect the integration of motivational value with subsequent behavior. To test this prediction, we recorded from single units in VS while rats performed a choice task in which different odor cues indicated that reward was available on the left or on the right. The value of reward associated with a left or rightward movement was manipulated in separate blocks of trials by either varying the delay preceding reward delivery or by changing reward size. Rats’ behavior was influenced by the value of the expected reward and the response required to obtain it, and activity in the majority of cue-responsive VS neurons reflected the integration of these two variables. Unlike similar cue-evoked activity reported previously in dopamine neurons, these correlates were only observed if the directional response was subsequently executed. Further, activity was correlated with the speed at which the rats’ executed the response. These results are consistent with the notion that VS serves to integrate information about the value of an expected reward with motor output during decision-making.
The ventral striatum (VS) is thought to serve as a ‘limbic-motor’ interface (Mogenson et al., 1980). This hypothesis has largely been derived from this area’s connectivity with decision/motor-related areas including the prefrontal cortex, limbic-related areas including the hippocampus, amygdala, orbitofrontal cortex and midbrain dopamine neurons, along with its outputs to motor regions, such as ventral pallidum (Groenewegen and Russchen, 1984; Heimer et al., 1991; Brog et al., 1993; Wright and Groenewegen, 1995; Voorn et al., 2004; Gruber and O’Donnell, 2009). Through these connections the ventral striatum is thought to integrate information about the value of expected outcomes with motor information to guide motivated behavior. Consistent with this proposal, manipulations of VS impair changes in response latencies associated with different quantities of reward (Hauber et al., 2000; Giertler et al., 2003) and impact other measures of vigor, salience and arousal thought to reflect the value of expected rewards (Berridge and Robinson, 1998; Cardinal et al., 2002a; Cardinal et al., 2002b; Di Chiara, 2002; Nicola, 2007).
From these and other studies (Wadenberg et al., 1990; Ikemoto and Panksepp, 1999; Di Ciano et al., 2001; Di Chiara, 2002; Salamone and Correa, 2002; Wakabayashi et al., 2004; Yun et al., 2004; Gruber et al., 2009), it has been suggested that VS is indeed critical for motivating behavior in response to reward-predicting cues. However there is ample contradictory evidence (Amalric and Koob, 1987; Cole and Robbins, 1989; Robbins et al., 1990; Reading and Dunnett, 1991; Reading et al., 1991; Brown and Bowman, 1995; Giertler et al., 2004) and little direct single-unit recording data from VS in tasks designed to directly address this question (Hassani et al., 2001; Cromwell and Schultz, 2003). Specifically, most VS studies have not varied both expected reward and response direction. Further, no studies have examined how VS neurons respond when animals are making decisions between differently-valued rewards, in order to assess the relationship between the cue-evoked activity and the decision.
To address these issues, we recorded from single neurons in VS while rats performed a choice task for differently valued rewards (Roesch et al., 2006; Roesch et al., 2007b; Roesch et al., 2007a). On every trial rats were instructed or chose between two wells (left or right) to receive reward. In different trial blocks, we manipulated the value of the expected reward by either increasing the delay to or size of reward (10% sucrose solution). Here we report that cue-evoked activity in VS neurons integrated the value of the expected reward and the direction of the upcoming movement. Increased firing required that the response be executed and was not observed if the reward was available but the animal chose to execute a different response. Furthermore, increased firing was correlated with the speed at which the rats’ executed that response. These results are consistent with the notion that VS serves to integrate information about the value of an expected reward with motor output during decision-making.
Male Long-Evans rats were obtained at 175–200g from Charles River Labs, Wilmington, MA. Rats were tested at the University of Maryland School of Medicine in accordance with SOM and NIH guidelines.
Surgical procedures followed guidelines for aseptic technique. Electrodes were manufactured and implanted as in prior recording experiments. Rats had a drivable bundle of 10 25-um diameter FeNiCr wires (Stablohm 675, California Fine Wire, Grover Beach, CA) chronically implanted in the left hemisphere dorsal to VS (n = 6; 1.6 mm anterior to bregma, 1.5 mm laterally, and 4.5 mm ventral to the brain surface). Immediately prior to implantation, these wires were freshly cut with surgical scissors to extend ~ 1 mm beyond the cannula and electroplated with platinum (H2PtCl6, Aldrich, Milwaukee, WI) to an impedance of ~300 kOhms. Cephalexin (15 mg/kg p.o.) was administered twice daily for two weeks post-operatively to prevent infection. Rats were aproximately 3 months old at the time of surgery and were individually housed on a 12 hr light/dark cycle; experiments were conducted during the light phase.
Recording was conducted in aluminum chambers approximately 18″ on each side with sloping walls narrowing to an area of 12″ × 12″ at the bottom. A central odor port was located above and two adjacent fluid wells on a panel in the right wall of each chamber. Two lights were located above the panel. The odor port was connected to an air flow dilution olfactometer to allow the rapid delivery of olfactory cues. Task control was implemented via computer. Port entry and licking was monitored by disruption of photobeams. Odors where chosen from compounds obtained from International Flavors and Fragrances (New York, NY),
The basic design of a trial is illustrated in Figure 1. Trials were signaled by illumination of the panel lights inside the box. When these lights were on, nosepoke into the odor port resulted in delivery of the odor cue to a small hemicylinder located behind this opening. One of three different odors was delivered to the port on each trial, in a pseudorandom order. At odor offset, the rat had 3 seconds to make a response at one of the two fluid wells located below the port. One odor (Verbena Oliffac) instructed the rat to go to the left to get reward, a second odor (Camekol DH)instructed the rat to go to the right to get reward, and a third odor (Cedryl Acet Trubek) indicated that the rat could obtain reward at either well. Odors were presented in a pseudorandom sequence such that the free-choice odor was presented on 7/20 trials and the left/right odors were presented in equal numbers (+/−1 over 250 trials). In addition, the same odor could be presented on no more than 3 consecutive trials. Odor identity did not change over the course of the experiment.
Once the rats were shaped to perform this basic task, we introduced blocks in which we independently manipulated the size of the reward delivered at a given side and the length of the delay preceding reward delivery. Once the rats were able to maintain accurate responding through these manipulations, we began recording sessions. For recording, one well was randomly designated as short (500 ms) and the other long (1–7 s) at the start of the session (Figure 1A: block 1). Rats were required to wait in the well in order to receive reward. In the second block of trials these contingencies were switched (Figure 1A: block 2). The length of the delay under long conditions abided the following algorithm. The side designated as long started off as 1s and increased by 1 s every time that side was chosen until it became 3 s. If the rat continued to choose that side, the length of the delay increased by 1 s up to a maximum of 7 s. If the rat chose the side designated as long less than 8 out of the last 10 choice trials then the delay was reduced by 1 s to a minimum of 3 s. The reward delay for long forced-choice trials was yoked to the delay in free-choice trials during these blocks. In later blocks we held the delay preceding reward delivery constant (500 ms) while manipulating the size of the expected reward (Figure 1A). The reward was a 0.05 ml bolus of 10% sucrose solution. For big reward, an additional bolus was delivered after 500 ms. At least 60 trials per block were collected for each neuron. Rats were mildly water deprived (~30 min of free water per day) with free access on weekends.
Procedures were the same as described previously (Roesch et al., 2006; Roesch et al., 2007a). Wires were screened for activity daily; if no activity was detected, the rat was removed, and the electrode assembly was advanced 40 or 80 um. Otherwise active wires were selected to be recorded, a session was conducted, and the electrode was advanced at the end of the session. Neural activity was recorded using two identical Plexon Multichannel Acquisition Processor systems (Dallas, TX), interfaced with odor discrimination training chambers. Signals from the electrode wires were amplified 20X by an op-amp headstage (Plexon Inc, HST/8o50-G20-GR), located on the electrode array. Immediately outside the training chamber, the signals were passed through a differential pre-amplifier (Plexon Inc, PBX2/16sp-r-G50/16fp-G50), where the single unit signals were amplified 50X and filtered at 150–9000 Hz. The single unit signals were then sent to the Multichannel Acquisition Processor box, where they were further filtered at 250–8000 Hz, digitized at 40 kHz and amplified at 1-32X. Waveforms (>2.5:1 signal-to-noise) were extracted from active channels and recorded to disk by an associated workstation with event timestamps from the behavior computer. Waveforms were not inverted before data analysis.
Units were sorted using Offline Sorter software from Plexon Inc (Dallas, TX), using a template matching algorithm. Sorted files were then processed in Neuroexplorer to extract unit timestamps and relevant event markers. These data were subsequently analyzed in Matlab (Natick, MA). To examine activity related to the decision we examined activity from odor onset to odor port exit. Wilcoxon tests were used to measure significant shifts from zero in distribution plots (p < 0.05). T-tests or anovas were used to measure within cell differences in firing rate (p < 0.05). Pearson Chi-square tests (p < 0.05) were used to compare the proportions of neurons.
Rats were trained on a choice task illustrated in Figure 1A (Roesch et al., 2006; Roesch et al., 2007a). On each trial, rats responded to one of two adjacent wells after sampling an odor at a central port. Rats were trained to respond to three different odor cues: one odor that signaled reward in the right well (forced-choice), a second odor that signaled reward in the left well (forced-choice), and a third odor that signaled reward at either well (free-choice). Across blocks of trials we manipulated value by increasing the length of the delay preceding reward delivery (Fig. 1A; Block 1–2) or by increasing (Fig. 1A; Block 3–4) the number of rewards delivered. Essentially there were four types of rewards (short-delay, long-delay, big-reward and small-reward) and two response directions (left and right) resulting in a total of eight conditions.
Rats’ behavior on both free and forced-choice trials reflected manipulations of value. On free-choice trials rats chose shorter delays and larger rewards over their respective counter parts (ttest; d.f. = 119; t’s > 16; p’s < 0.0001). Likewise, on forced-choice trials rats were faster and more accurate when responding for a more immediate or larger reward (ttest; d.f. = 119; t’s > 9; p’s < 0.0001). Thus rats perceived the differently delayed and sized rewards as having different values and were more motivated under short-delay and big-reward conditions than under long-delay and small-reward conditions, respectively.
We recorded 257 VS neurons across 75 sessions in 6 rats during performance of all four trial blocks. Recording locations are illustrated in figure 2F. Since forced choice trials present an evenly balanced neural dataset with equal numbers of responses to each well, we will first address our hypothesis by analyzing data from these trials. Thus we will ask whether neural activity in VS neurons reflects value and direction of responding across blocks, particularly after learning (last 10 trials in each direction).
As has been reported previously (Carelli and Deadwyler, 1994; Nicola et al., 2004; Taha and Fields, 2006; Robinson and Carelli, 2008) many VS neurons were excited (n = 44; 17%) or inhibited (n = 76; 30%) during cue sampling (odor onset to port exit) vs. baseline (1 s before nosepoke; ttest comparing baseline to cue sampling over all trials collapsed across condition; p < 0.05). An example of the former is illustrated in Figure 2A–D. Consistent with the hypothesis put forth in the introduction, activity of this neuron reflected the integration of associative information about the value of the reward predicted by the cue and the subsequent response. Thus cue-evoked activity on forced choice trials after learning was strongest for the cue that indicated reward in the left well, and this neural response was highest when value predicted for that well was high (on short and big trials). To quantify this effect we performed a 2-factor anova with value and direction as factors during the last 10 forced choice trials in each block (p < 0.05). Of the 44 cue-responsive neurons, 21 (47%) showed a similar significant interaction between direction and value. This count was significantly above chance given our threshold for statistical significance in our unit analysis (chi-square; p < 0.0001), and there was no directional bias to the left or right across the population (Fig. 2E; p = 0.98). By contrast – and in keeping with the most rigorous account of the hypothesis that VS integrates value and direction information - only 5 (11%) showed a main effect of direction alone and only 3(7%) showed a main effect of value alone (Figure 2E; anova; p < 0.05); these counts did not exceed chance (chi-square; p’s > 0.05).
The overall effect is illustrated in figure 3 which plots the average activity across all cue-responsive neurons on forced choice trials during the last ten trials for all eight conditions. For each cell, direction was referenced to its preferred response before averaging, thus by definition, activity was higher in the preferred direction (left column). Like the single cell example, population activity during cue-sampling was stronger in the preferred direction when value was high. That is, activity was stronger prior to a response in the cell’s preferred direction (left column) when the expected outcome was either a short-delay (blue) or a large-reward (green) compared to a long-delay (red) or a small-reward (orange), respectively. Notably although activity in these populations did begin to increase upon entry into the odor port, the difference in firing was only present during actual delivery of the odor (gray shading in figure 3).
Distributions of delay and size indices for each neuron, defined by the difference between high and low value divided by the sum of the two, are illustrated for each direction (preferred and nonpreferred) during the odor epoch (odor onset to port exit) in figure 3E–F. Only when value was manipulated in the cell’s preferred direction was the index significantly shifted above zero, indicating higher firing rates for more valued outcomes (wilcoxon; μ = 0.134; z = 3.56; p < 0.001). Cases in which neurons exhibiting stronger firing for high valued reward (n = 16[18%]) outnumbered those showing the opposite effect (n = 4[5%]; chi-square; p < 0.008). Neither the shift in the distribution nor the difference in number of cases in which activity was stronger for high or low value achieved significance in the nonpreferred direction (Fig. 3F; p’s > 0.4).
VS is thought to motivate or invigorate behavior (Robbins and Everitt, 1996; Cardinal et al., 2002a). If the neural signal integrating value and directional response, identified above, relates to that function, then one should expect this activity to be correlated with the motivational differences between high and low value reward in our task. To address this question, we next examined the relationship between neural activity and reaction time (speed at which rat made the decision to move and exited the odor port). In previous sections, we showed that the reaction time was faster and activity was stronger (Fig. 3) when more valued reward (short-delay and big-reward) was at stake. To ask if the two were correlated we plotted neural activity (high-low/high+low) versus reaction time (high-low/high+low) independently for preferred and nonpreferred directions. We found there was a significant negative correlation between the two in the neuron’s preferred direction (Fig. 3G; p < 0.001, r2 = 0.150). This relationship was not evident in the nonpreferred direction (Fig. 3H; p = 0.361; r2 = 0.010).
To examine this phenomenon more closely, we divided sessions into those with a strong versus a weak motivational difference between high and low value reward. According to the correlation described above, we would expect activity to be stronger for higher valued reward in sessions where rats showed a strong difference between high and low value outcomes. To test this, we sorted sessions based on each rat’s reaction time difference between high and low value trial types (Small minus Big; Long minus short). In the top half of the distribution, the average reaction time on high- and low-value trials was 156 ms and 285 ms respectively (ttest; d.f. = 43; t = 17; p < 0.0001), whereas in the lower half, reaction times on high- and low-value trials were 207 ms and 234 ms respectively (ttest; d.f. = 43; t = 5; p < 0.01). Although both halves exhibited significant differences between high and low value outcomes, the differences were significantly larger in the top half (ttest; d.f. = 43; t = 17; p < 0.0001).
Remarkably, the neural signal identified above was only evident in sessions in which the rats were more strongly invigorated by high value reward (Fig. 4A–D). This is illustrated in both delay and size blocks by higher firing rate during odor sampling for short-delay (blue) and big-reward (green) conditions over long-delay (red) and small-reward (orange) conditions, respectively. Value index distributions were significantly shifted above zero in the preferred direction in these sessions (Fig. 4E; wilcoxon; μ = 0.188; z = 3; p < 0.002). In sessions in which rats were less concerned about the outcome (Fig. 4G–L), there was only a modest nonsignificant difference in activity in the preferred direction (wilcoxon; μ = 0.080, z = 1; p = 0.138).
Notably the differences between sessions with strong and weak reaction time differences did not seem to reflect satiation, which has been shown to lead to slower overall reaction times (Holland and Straub, 1979; Sage and Knowlton, 2000). Overall speed of responding was not significantly different between sessions with strong and weak reaction time differences (220 ms vs. 221 ms; ttest; d.f. = 43, t = 0.02; p = 0.986), and value correlates were no more likely to be observed early in a session versus late. The number of cells exhibiting value selectivity during the first two blocks of a session did not significantly differ from those observed during the last two blocks of a session (12 neurons or 27% vs 11 neurons or 25%; chi-square; p = 0.85).
Rats also appeared to learn the contingencies similarly in the two session types; rats chose the more valuable reward on 69% of trials (strong = 69.1%; weak= 69.4%; ttest; d.f. = 43; t = 0.2; p = 0.852). This indicates that latency differences did not reflect a learning effect. Taken together, these data suggest that differences in reaction time did not result from satiation or insufficient learning. Instead, when rats were goal oriented and strongly motivated by differences in expected value, activity in VS clearly reflected the animals’ motivational output.
Up to this point we have only analyzed forced-choice trials, in which odors instruct rats to respond to the left or the right well. We have assumed that this directional selectivity reflects the impending movement; however directional selectivity might also represent the identity of the odor, irrespective of whether or not that response is executed. This is because on forced-choice trials, the odor and movement direction are confounded, since one odor means go right and the other means go left.
To address this issue we compared activity on forced-choice trials with that on free-choice trials. This comparison can resolve this issue because on free-choice trials a different odor (than forced-choice) indicated the freedom to choose either direction (i.e. reward was available on each side). Moreover as illustrated in Figure 1C, rats chose the lower value direction on a significant number of free-choice trials. Thus by comparing firing on free- and forced-choice trials, we can disambiguate odor from movement selectivity. If the directional signal identified on forced choice trials reflects only the impending movement, then it should be identical on free- and forced-choice trials, provided the rat makes the same response. On the other hand, if the signal differs on free- and forced-choice trials when the rat makes the same response, then this would suggest that the proposed directional selectivity incorporates information about the sensory features of the odor.
For this analysis we included all trials after learning (> 50% choice performance) and collapsed across delay and size blocks. This procedure allowed us to increase our sample of low-value free choice trials which were sparse at the end of trial blocks. To further control for any differences that might arise during learning (rats typically chose low-value outcomes earlier on free-choice trials, but were forced to choose low-value outcomes throughout the entire block on forced-choice trials) we paired each free-choice trial with the immediately preceding and following forced-choice trial of the same value.
The results of this analysis are illustrated in Figure 5. Figure 5A and B represent the average activity over all neurons that showed a significant interaction between direction and value when rats responded in the cell’s preferred (solid) and nonpreferred (dashed) direction for high (black) and low value outcomes (gray) during forced- and free-choice trials, respectively. As described previously, cue-evoked activity on forced-choice trials was stronger for high-value outcomes, but only in one direction (Fig. 5A). Activity during free-choice trials showed exactly the same pattern. Thus firing was higher on free-choice trials but only when the rat chose the high value outcome and only when that outcome was in a particular direction (Fig. 5B). This is quantified in figure 5C, which plots the difference between the cell’s preferred outcome/response (e.g. high-value-left) and nonpreferred outcome/response (e.g. low-value-right) on forced choice trials (x-axis) versus the same calculation from data on free-choice trials (y-axis). By definition, values are all shifted above zero on the x-axis, since firing in these neurons was always higher for the preferred outcome/response on forced choice trials. Importantly, values were also shifted above zero on free-choice trials (y-axis; wilcoxon; μ = 0.2879; z = 4; p < 0.001). This indicates that neural activity was the same for a particular value and response, even though the two trial types (free and forced) involved different odors (Fig. 5C). This pattern suggests that neural signals in VS neurons reflect the value of a particular yet-to-be-executed motor response and is not cue-specific. This pattern also indicates that signaling in VS reflects the value of the response that is going to be executed, since firing differed on free-choice trials when different responses were made, even though the high-value reward was always available to be selected.
Lesions or other manipulations of VS make animals more likely to abandon a larger, delayed or higher cost reward in favor of a smaller, more immediate or lower cost reward (Cousins et al., 1996; Cardinal et al., 2001; Cardinal et al., 2004; Winstanley et al., 2004; Bezzina et al., 2007; Floresco et al., 2008; Kalenscher and Pennartz, 2008). These studies suggest that VS may be important for maintaining information about reward after the decision has been made. Consistent with this, we found that activity in the cue-responsive VS neurons described above was also be elevated during the delay in our task, especially on correct trials. This is apparent in figures 3 and and5,5, which show that activity was higher after the response in the cell’s preferred direction on long-delay (red) compared to short delay trials (blue). To quantify this effect, Figure 6A–B plots the distribution of delay indices (short − long/short + long) during the 3 seconds (minimum delay after learning) after the behavioral response in the cell’s preferred and nonpreferred direction. Delay indices were shifted significantly below zero, indicating higher firing after responding to the delayed well (wilcoxon; μ = −0.158; z = 2.2; p < 0.024;), and the counts of neurons exhibiting this pattern (n = 24[55%]) significantly outnumbered those showing the opposite effect (n = 6[14%]). Notably the increased firing after responding to the delayed well always preceded reward, since it occurred before the minimum delay after learning (3 s).
Interestingly, the difference in firing between short- and long-delay trials after the behavioral response was also correlated with reaction time (Fig. 6C; p < 0.005; r2 = 0.190). However the direction of this correlation was the opposite of that between reaction times and cue-evoked activity described earlier. Thus slower responding on long-delay trials resulted in stronger firing rates after well entry and prior to reward delivery. If activity in VS during decision making reflects motivation, as we have suggested, then activity during this period may reflect the exertion of increased will to remain in the well to receive reward or expectation of reward, rather than signaling of other variables such as disappointment. Perhaps loss of this signal after lesions or inactivation of VS reduces the rat’s capacity to maintain motivation toward the delayed reward. This suggests that it is necessary for VS to fire more in the delay to keep the rat in the well waiting for reward. Unfortunately, there were too few trials in which the rat left the fluid port prematurely to test this hypothesis.
Finally we asked whether the 76 neurons (30% of total neurons recorded) that were inhibited during odor-sampling reflected motivational value. Inhibitions in VS activity during performance of behavioral tasks have been described previously (Carelli and Deadwyler, 1994; Nicola et al., 2004; Taha and Fields, 2006; Robinson and Carelli, 2008) and might reflect the inhibition of inappropriate behaviors during task performance (i.e. leaving odor port or fluid well early), which might be more critical when a better reward is at stake. Here we address whether or not these neurons where modulated by expected reward value.
The average firing rates over these neurons are illustrated in figure 7A–B. As defined in the analysis, activity was inhibited during odor-sampling. As the rat moved down to the well activity briefly returned to baseline, but then quickly returned to an inhibited state upon entering the well and then, subsequently returned to baseline upon well exit. As for excitatory neurons, we asked if the motivational level of the animal modulated neural firing in these neurons. Distributions of value indices were not significantly shifted from zero (Fig. 7E–F; wilcoxon; z’s < 2; p’s > 0.082) and roughly equal numbers of neurons fired more strongly and weakly for high value reward (Fig. 7E–F; black bars). Furthermore, activity in these neurons was not correlated with reaction time. Thus inhibitions observed during task performance were not modulated by value as observed for excitations.
Previously, in rats performing this same task, we have shown that dopamine neurons fire more strongly at the beginning of trial blocks when an unexpected reward was delivered and less strongly in trial blocks when an expected reward was omitted (Roesch et al., 2007a). Such activity is thought to represent bidirectional prediction error encoding.
Out of the sample of 257 VS neurons, activity in 41 neurons was responsive to reward delivery (ttest comparing baseline to reward delivery (1 s) over all trials collapsed across condition; p < 0.05). Of those, 12 were also cue-responsive as defined above. Analysis of prediction errors revealed that few VS neurons seem to signal errors in reward prediction. For example, the single cell illustrated in figure 8A fired more strongly when reward was delivered unexpectedly; firing was maximal immediately after a new reward was instituted and diminishing with learning. However this example was the exception, rather than the rule. This is illustrated across the population in Figure 8B–C, which shows the contrast in activity (early vs late) for all of the reward-responsive VS neurons (n = 41). This contrast is plotted separately for blocks involving unexpected delivery and omission of reward. Neither distribution was shifted significantly above zero, indicating no difference in firing early, after a change in reward, compared to later, after learning (Figure 8B–C; wilcoxon; z’s < 2; p’s > 0.2610).
Here we show that single neurons in VS integrate information regarding value and impending response during decision making and influence the motivational level associated with responding in a given direction. Cues predicting high value outcomes had a profound impact on behavior, decreasing reaction time and increasing accuracy. This behavioral effect was correlated with integration of value and impending response during cue-sampling in VS neurons. This result is broadly consistent with proposals that VS acts as a ‘limbic-motor’ interface (Mogenson et al., 1980) and with a number of recent reports showing that VS signals information about impending outcomes at the time a decision is made (Carelli, 2002; Setlow et al., 2003; Janak et al., 2004; Nicola, 2007; Ito and Doya, 2009; van der Meer and Redish, 2009).
Although these results are correlational in nature, they are in agreement with results from several studies in which pharmacological methods were used to show a more causal relationship between VS function and behavior (Berridge and Robinson, 1998; Cardinal et al., 2002a; Nicola, 2007). One set of studies in particular examined the impact of several different VS manipulations on rats’ latencies to respond for different quantities of reward (Hauber et al., 2000; Giertler et al., 2003). In this simple reaction time task, discriminative stimuli presented early in each trial predicted the magnitude of the upcoming reward. As in our task, rats were faster to respond when reward was larger. Manipulations of glutamate and dopamine transmission in VS disrupted changes in the speed of responding to stimuli predictive of the upcoming reward magnitude. This is consistent with correlations between reaction time and firing in VS reported above.
Interestingly, the same group reported that lesions or inactivation of VS had no impact on latency measures, suggesting that complete disruption of VS allows for other areas to motivate behavioral output (Brown and Bowman, 1995; Giertler et al., 2004). This may explain why in some sessions in the current study VS activity was not selective for the upcoming reward, yet there remained a weak difference in response latencies. Notably, rats continued to choose the more preferred outcome during free-choice trials, consistent with reports that VS is not required for choosing a large over a small reward (Cousins et al., 1996).
Interestingly, our results suggest VS may play multiple, potentially conflicting roles in delay discounting tasks. On one hand, activity during the decision is higher preceding an immediate reward and seems to invigorate behavior toward the more valued reward. On the other hand, once a decision to respond for the delayed reward had been made, activity in VS neurons increased, as if maintaining a representation of the anticipated reward. Most of the delay discounting literature suggests that the latter function is the one of importance; lesions or other manipulations of VS make animals more likely to abandon a larger, delayed reward in favor of a smaller, more immediate reward (Cousins et al., 1996; Cardinal et al., 2001; Cardinal et al., 2004; Winstanley et al., 2004; Bezzina et al., 2007; Floresco et al., 2008; Kalenscher and Pennartz, 2008). However we would speculate that different training procedures might change the relative contributions of these two functions. For example, if animals were highly trained to reverse behaviors based on discounted reward – as in the recording setting used here - they might be less reliant on VS to maintain the value of the discounted reward. In this situation, the primary effect of VS manipulations might be to reduce the elevated motivation elicited by cues predicting more immediate reward.
Another notable aspect of these data is that VS neurons integrated activity regarding value (size and delay) and response, both during forced- and free-choice behavior. Anticipation of differently valued rewards has been previously shown to affect firing in other regions of striatum. For example, many neurons in occulomotor regions of caudate (dorsal medial striatum) encode both direction and motivational value and are thought to be critical in the development of response biases towards desired goals (Lauwereyns et al., 2002). These data differ from our results in several ways. First, neurons in caudate typically exhibit a contralateral bias, firing more strongly for saccades made in the direction opposite to the recording hemisphere. In VS, roughly equal numbers of neurons preferred leftward and rightward movement. These results are consistent with deficits observed after pharmacological manipulations of these areas (Carli et al., 1989). Second, activity in many neurons in caudate has been reported to reflect available movement-reward associations even when the relevant response is not subsequently executed (Lauwereyns et al., 2002; Samejima et al., 2005; Lau and Glimcher, 2008). Such “action-value” or “response-bias” correlates were not present in VS. In this, our results are consistent with recent findings by Ito and Doya, which showed that representations of action-value are less dominant in rat VS compared to other types of information (Ito and Doya, 2009). Thus while activity in dorsal striatum (DS) may be critical in representing the value of available actions (behaviorally independent action-value), activity in VS seems to be more closely tuned to representing the value of the upcoming response (behaviorally dependent action-value). Such activity may reflect an “action-specific reward value” (Samejima et al., 2005), because it is specific for value for only one of the two actions. Practically speaking, such a representation could invigorate or motivate a specific behavior (left or right) through downstream motor areas via some sort of winner take all mechanism (Pennartz et al., 1994; Redgrave et al., 1999; Nicola, 2007; Taha et al., 2007).
Another possibility is that the correlates observed in VS incorporate information about the expected outcome itself. Such representations would allow behavior to change spontaneously in response to changes in the value of the outcome. Such information might be acquired through inputs from orbitofrontal cortex or basolateral amygdala, both of which send information to VS and are implicated in signaling of information about expected outcomes (Hatfield et al., 1996; Schoenbaum et al., 1998; Gallagher et al., 1999; Gottfried et al., 2003; Ambroggi et al., 2008). Interestingly data regarding the role of VS in these behavioral settings is sparse and often contradictory. This is also somewhat true of our own results; since we recorded during presentation of the differently valued outcomes (ie during learning), we cannot distinguish signaling such outcome representations from cached estimates of response value.
Critically, such firing cannot represent “cue-value” because the signal integrating value and impending response in VS neurons is not present when the rats choose to respond in the opposite direction. Moreover we have shown previously that responses to the low-value well on these trials are not mistakes; the rats’ response latencies on these trials indicate that they know they are responding for the less valuable outcome (Roesch et al., 2007a). As illustrated in Figure 5B, the elevated cue-evoked activity on trials in which the rats responded in the neuron’s preferred direction (bold lines) was not evident when the rat chose to go in the opposite direction (dashed lines). This was true despite the fact that on these trials the rats sampled the same odor and had available the same outcome in the preferred direction. Notably this result differs from what we have previously reported for cue-evoked activity in dopamine neurons in this same task; these neurons signaled the value of the best available option on free choice trials, even when it was not selected (Roesch et al., 2007a). Thus firing in VTA dopamine neurons reflects the value of the better option during decision-making, while activity in VS neurons tracks the value of the action that is ultimately chosen.
This notion is consistent with the idea that VS plays a critical role in actor-critic models, optimizing long term action selection through its connections with midbrain dopamine neurons. In this model the Critic stores and learns values of states which in turn are used to compute prediction errors necessary for learning and adaptive behavior. The Actor stores and forms a policy on which actions should be selected (Joel et al., 2002; Montague et al., 2004). Recently, the functions of Critic and Actor have been attributed to ventral and dorsal lateral striatum, respectively, based on connectivity, pharmacology and fMRI (Everitt et al., 1991; Cardinal et al., 2002a; O’Doherty et al., 2004; Voorn et al., 2004; Balleine, 2005; Pessiglione et al., 2006).
Our single-unit results fit well with this hypothesis. Neurons in VS signal the value of the upcoming decision, which may in turn impact downstream dopamine neurons which subsequently modify both the Actor (DS) and the Critic (VS). In this regard, it is noteworthy that analysis of neural activity in VS during learning in this task revealed no evidence that VS neurons encode the actual reward prediction errors, which are proposed to stamp in associative information. This is consistent with recent suggestions that the strong error signal in VS often reported in human fMRI studies reflects input from other areas and is not an output signal from this region (Knutson and Gibbs, 2007).
Finally, we also found that many neurons were inhibited during task performance. Previous studies have also reported long-lasting inhibitions during task performance and argued that these correlates may reflect inhibition of competing behaviors (e.g. locomotion, grooming, running away) (Nicola et al., 2004; Taha and Fields, 2005; Taha and Fields, 2006). Unlike the instructive signals described above, it is thought that these inhibitory signals should be modulated by appetitive behavior but be independent of the specific response being performed (Taha and Fields, 2006). In our task, it necessary for rats to remain stationary in the odor port and then in the well in order to receive reward. During these two periods, activity of many VS neurons was inhibited, perhaps reflecting the need to suppress competing behaviors. However if this is the case, it seems odd that inhibitory activity was no more pronounced on big-reward and short-delay conditions than on small-reward and long-delay trials. This suggests that activity in these cells is not influenced by the value of the reward at stake, despite the fact that the rats attend better and are more motivated on these trials. Of course, maintaining hold in the odor port and then in the fluid well was not difficult, as evidenced by the low number of early unpokes. It is possible that increasing the requirement to remain still during these periods would provide more evidence for such a function.
This work was supported by grants from the NIDA (R01-DA015718, GS; K01DA021609, MR), NIA (R01-AG027097; GS.