Rats were trained on a choice task illustrated in (
Roesch et al., 2006;
Roesch et al., 2007a). On each trial, rats responded to one of two adjacent wells after sampling an odor at a central port. Rats were trained to respond to three different odor cues: one odor that signaled reward in the right well (forced-choice), a second odor that signaled reward in the left well (forced-choice), and a third odor that signaled reward at either well (free-choice). Across blocks of trials we manipulated value by increasing the length of the delay preceding reward delivery (; Block 1–2) or by increasing (; Block 3–4) the number of rewards delivered. Essentially there were four types of rewards (short-delay, long-delay, big-reward and small-reward) and two response directions (left and right) resulting in a total of eight conditions.
Rats’ behavior on both free and forced-choice trials reflected manipulations of value. On free-choice trials rats chose shorter delays and larger rewards over their respective counter parts (ttest; d.f. = 119; t’s > 16; p’s < 0.0001). Likewise, on forced-choice trials rats were faster and more accurate when responding for a more immediate or larger reward (ttest; d.f. = 119; t’s > 9; p’s < 0.0001). Thus rats perceived the differently delayed and sized rewards as having different values and were more motivated under short-delay and big-reward conditions than under long-delay and small-reward conditions, respectively.
We recorded 257 VS neurons across 75 sessions in 6 rats during performance of all four trial blocks. Recording locations are illustrated in . Since forced choice trials present an evenly balanced neural dataset with equal numbers of responses to each well, we will first address our hypothesis by analyzing data from these trials. Thus we will ask whether neural activity in VS neurons reflects value and direction of responding across blocks, particularly after learning (last 10 trials in each direction).
Activity in VS reflected the value and direction of the upcoming response
As has been reported previously (
Carelli and Deadwyler, 1994;
Nicola et al., 2004;
Taha and Fields, 2006;
Robinson and Carelli, 2008) many VS neurons were excited (n = 44; 17%) or inhibited (n = 76; 30%) during cue sampling (odor onset to port exit) vs. baseline (1 s before nosepoke; ttest comparing baseline to cue sampling over all trials collapsed across condition; p < 0.05). An example of the former is illustrated in . Consistent with the hypothesis put forth in the introduction, activity of this neuron reflected the integration of associative information about the value of the reward predicted by the cue and the subsequent response. Thus cue-evoked activity on forced choice trials after learning was strongest for the cue that indicated reward in the left well, and this neural response was highest when value predicted for that well was high (on short and big trials). To quantify this effect we performed a 2-factor anova with value and direction as factors during the last 10 forced choice trials in each block (p < 0.05). Of the 44 cue-responsive neurons, 21 (47%) showed a similar significant interaction between direction and value. This count was significantly above chance given our threshold for statistical significance in our unit analysis (chi-square; p < 0.0001), and there was no directional bias to the left or right across the population (; p = 0.98). By contrast – and in keeping with the most rigorous account of the hypothesis that VS integrates value and direction information - only 5 (11%) showed a main effect of direction alone and only 3(7%) showed a main effect of value alone (; anova; p < 0.05); these counts did not exceed chance (chi-square; p’s > 0.05).
The overall effect is illustrated in which plots the average activity across all cue-responsive neurons on forced choice trials during the last ten trials for all eight conditions. For each cell, direction was referenced to its preferred response before averaging, thus by definition, activity was higher in the preferred direction (left column). Like the single cell example, population activity during cue-sampling was stronger in the preferred direction when value was high. That is, activity was stronger prior to a response in the cell’s preferred direction (left column) when the expected outcome was either a short-delay (blue) or a large-reward (green) compared to a long-delay (red) or a small-reward (orange), respectively. Notably although activity in these populations did begin to increase upon entry into the odor port, the difference in firing was only present during actual delivery of the odor (gray shading in ).
Distributions of delay and size indices for each neuron, defined by the difference between high and low value divided by the sum of the two, are illustrated for each direction (preferred and nonpreferred) during the odor epoch (odor onset to port exit) in . Only when value was manipulated in the cell’s preferred direction was the index significantly shifted above zero, indicating higher firing rates for more valued outcomes (wilcoxon; μ = 0.134; z = 3.56; p < 0.001). Cases in which neurons exhibiting stronger firing for high valued reward (n = 16[18%]) outnumbered those showing the opposite effect (n = 4[5%]; chi-square; p < 0.008). Neither the shift in the distribution nor the difference in number of cases in which activity was stronger for high or low value achieved significance in the nonpreferred direction (; p’s > 0.4).
Activity in VS was correlated with motivational level
VS is thought to motivate or invigorate behavior (
Robbins and Everitt, 1996;
Cardinal et al., 2002a). If the neural signal integrating value and directional response, identified above, relates to that function, then one should expect this activity to be correlated with the motivational differences between high and low value reward in our task. To address this question, we next examined the relationship between neural activity and reaction time (speed at which rat made the decision to move and exited the odor port). In previous sections, we showed that the reaction time was faster and activity was stronger () when more valued reward (short-delay and big-reward) was at stake. To ask if the two were correlated we plotted neural activity (high-low/high+low) versus reaction time (high-low/high+low) independently for preferred and nonpreferred directions. We found there was a significant negative correlation between the two in the neuron’s preferred direction (; p < 0.001, r
2 = 0.150). This relationship was not evident in the nonpreferred direction (; p = 0.361; r
2 = 0.010).
To examine this phenomenon more closely, we divided sessions into those with a strong versus a weak motivational difference between high and low value reward. According to the correlation described above, we would expect activity to be stronger for higher valued reward in sessions where rats showed a strong difference between high and low value outcomes. To test this, we sorted sessions based on each rat’s reaction time difference between high and low value trial types (Small minus Big; Long minus short). In the top half of the distribution, the average reaction time on high- and low-value trials was 156 ms and 285 ms respectively (ttest; d.f. = 43; t = 17; p < 0.0001), whereas in the lower half, reaction times on high- and low-value trials were 207 ms and 234 ms respectively (ttest; d.f. = 43; t = 5; p < 0.01). Although both halves exhibited significant differences between high and low value outcomes, the differences were significantly larger in the top half (ttest; d.f. = 43; t = 17; p < 0.0001).
Remarkably, the neural signal identified above was only evident in sessions in which the rats were more strongly invigorated by high value reward (). This is illustrated in both delay and size blocks by higher firing rate during odor sampling for short-delay (blue) and big-reward (green) conditions over long-delay (red) and small-reward (orange) conditions, respectively. Value index distributions were significantly shifted above zero in the preferred direction in these sessions (; wilcoxon; μ = 0.188; z = 3; p < 0.002). In sessions in which rats were less concerned about the outcome (), there was only a modest nonsignificant difference in activity in the preferred direction (wilcoxon; μ = 0.080, z = 1; p = 0.138).
Notably the differences between sessions with strong and weak reaction time differences did not seem to reflect satiation, which has been shown to lead to slower overall reaction times (
Holland and Straub, 1979;
Sage and Knowlton, 2000). Overall speed of responding was not significantly different between sessions with strong and weak reaction time differences (220 ms vs. 221 ms; ttest; d.f. = 43, t = 0.02; p = 0.986), and value correlates were no more likely to be observed early in a session versus late. The number of cells exhibiting value selectivity during the first two blocks of a session did not significantly differ from those observed during the last two blocks of a session (12 neurons or 27% vs 11 neurons or 25%; chi-square; p = 0.85).
Rats also appeared to learn the contingencies similarly in the two session types; rats chose the more valuable reward on 69% of trials (strong = 69.1%; weak= 69.4%; ttest; d.f. = 43; t = 0.2; p = 0.852). This indicates that latency differences did not reflect a learning effect. Taken together, these data suggest that differences in reaction time did not result from satiation or insufficient learning. Instead, when rats were goal oriented and strongly motivated by differences in expected value, activity in VS clearly reflected the animals’ motivational output.
Activity in VS reflected the value of the decision
Up to this point we have only analyzed forced-choice trials, in which odors instruct rats to respond to the left or the right well. We have assumed that this directional selectivity reflects the impending movement; however directional selectivity might also represent the identity of the odor, irrespective of whether or not that response is executed. This is because on forced-choice trials, the odor and movement direction are confounded, since one odor means go right and the other means go left.
To address this issue we compared activity on forced-choice trials with that on free-choice trials. This comparison can resolve this issue because on free-choice trials a different odor (than forced-choice) indicated the freedom to choose either direction (i.e. reward was available on each side). Moreover as illustrated in , rats chose the lower value direction on a significant number of free-choice trials. Thus by comparing firing on free- and forced-choice trials, we can disambiguate odor from movement selectivity. If the directional signal identified on forced choice trials reflects only the impending movement, then it should be identical on free- and forced-choice trials, provided the rat makes the same response. On the other hand, if the signal differs on free- and forced-choice trials when the rat makes the same response, then this would suggest that the proposed directional selectivity incorporates information about the sensory features of the odor.
For this analysis we included all trials after learning (> 50% choice performance) and collapsed across delay and size blocks. This procedure allowed us to increase our sample of low-value free choice trials which were sparse at the end of trial blocks. To further control for any differences that might arise during learning (rats typically chose low-value outcomes earlier on free-choice trials, but were forced to choose low-value outcomes throughout the entire block on forced-choice trials) we paired each free-choice trial with the immediately preceding and following forced-choice trial of the same value.
The results of this analysis are illustrated in . represent the average activity over all neurons that showed a significant interaction between direction and value when rats responded in the cell’s preferred (solid) and nonpreferred (dashed) direction for high (black) and low value outcomes (gray) during forced- and free-choice trials, respectively. As described previously, cue-evoked activity on forced-choice trials was stronger for high-value outcomes, but only in one direction (). Activity during free-choice trials showed exactly the same pattern. Thus firing was higher on free-choice trials but only when the rat chose the high value outcome and only when that outcome was in a particular direction (). This is quantified in , which plots the difference between the cell’s preferred outcome/response (e.g. high-value-left) and nonpreferred outcome/response (e.g. low-value-right) on forced choice trials (x-axis) versus the same calculation from data on free-choice trials (y-axis). By definition, values are all shifted above zero on the x-axis, since firing in these neurons was always higher for the preferred outcome/response on forced choice trials. Importantly, values were also shifted above zero on free-choice trials (y-axis; wilcoxon; μ = 0.2879; z = 4; p < 0.001). This indicates that neural activity was the same for a particular value and response, even though the two trial types (free and forced) involved different odors (). This pattern suggests that neural signals in VS neurons reflect the value of a particular yet-to-be-executed motor response and is not cue-specific. This pattern also indicates that signaling in VS reflects the value of the response that is going to be executed, since firing differed on free-choice trials when different responses were made, even though the high-value reward was always available to be selected.
Activity after the decision was stronger in anticipation of the delayed reward
Lesions or other manipulations of VS make animals more likely to abandon a larger, delayed or higher cost reward in favor of a smaller, more immediate or lower cost reward (
Cousins et al., 1996;
Cardinal et al., 2001;
Cardinal et al., 2004;
Winstanley et al., 2004;
Bezzina et al., 2007;
Floresco et al., 2008;
Kalenscher and Pennartz, 2008). These studies suggest that VS may be important for maintaining information about reward after the decision has been made. Consistent with this, we found that activity in the cue-responsive VS neurons described above was also be elevated during the delay in our task, especially on correct trials. This is apparent in and , which show that activity was higher after the response in the cell’s preferred direction on long-delay (red) compared to short delay trials (blue). To quantify this effect, plots the distribution of delay indices (short − long/short + long) during the 3 seconds (minimum delay after learning) after the behavioral response in the cell’s preferred and nonpreferred direction. Delay indices were shifted significantly below zero, indicating higher firing after responding to the delayed well (wilcoxon; μ = −0.158; z = 2.2; p < 0.024;), and the counts of neurons exhibiting this pattern (n = 24[55%]) significantly outnumbered those showing the opposite effect (n = 6[14%]). Notably the increased firing after responding to the delayed well always preceded reward, since it occurred before the minimum delay after learning (3 s).
Interestingly, the difference in firing between short- and long-delay trials after the behavioral response was also correlated with reaction time (; p < 0.005; r2 = 0.190). However the direction of this correlation was the opposite of that between reaction times and cue-evoked activity described earlier. Thus slower responding on long-delay trials resulted in stronger firing rates after well entry and prior to reward delivery. If activity in VS during decision making reflects motivation, as we have suggested, then activity during this period may reflect the exertion of increased will to remain in the well to receive reward or expectation of reward, rather than signaling of other variables such as disappointment. Perhaps loss of this signal after lesions or inactivation of VS reduces the rat’s capacity to maintain motivation toward the delayed reward. This suggests that it is necessary for VS to fire more in the delay to keep the rat in the well waiting for reward. Unfortunately, there were too few trials in which the rat left the fluid port prematurely to test this hypothesis.
Inhibitory responses in VS were not correlated with motivation
Finally we asked whether the 76 neurons (30% of total neurons recorded) that were inhibited during odor-sampling reflected motivational value. Inhibitions in VS activity during performance of behavioral tasks have been described previously (
Carelli and Deadwyler, 1994;
Nicola et al., 2004;
Taha and Fields, 2006;
Robinson and Carelli, 2008) and might reflect the inhibition of inappropriate behaviors during task performance (i.e. leaving odor port or fluid well early), which might be more critical when a better reward is at stake. Here we address whether or not these neurons where modulated by expected reward value.
The average firing rates over these neurons are illustrated in . As defined in the analysis, activity was inhibited during odor-sampling. As the rat moved down to the well activity briefly returned to baseline, but then quickly returned to an inhibited state upon entering the well and then, subsequently returned to baseline upon well exit. As for excitatory neurons, we asked if the motivational level of the animal modulated neural firing in these neurons. Distributions of value indices were not significantly shifted from zero (; wilcoxon; z’s < 2; p’s > 0.082) and roughly equal numbers of neurons fired more strongly and weakly for high value reward (; black bars). Furthermore, activity in these neurons was not correlated with reaction time. Thus inhibitions observed during task performance were not modulated by value as observed for excitations.
VS activity during reward delivery was not modulated by unexpected reward
Previously, in rats performing this same task, we have shown that dopamine neurons fire more strongly at the beginning of trial blocks when an unexpected reward was delivered and less strongly in trial blocks when an expected reward was omitted (
Roesch et al., 2007a). Such activity is thought to represent bidirectional prediction error encoding.
Out of the sample of 257 VS neurons, activity in 41 neurons was responsive to reward delivery (ttest comparing baseline to reward delivery (1 s) over all trials collapsed across condition; p < 0.05). Of those, 12 were also cue-responsive as defined above. Analysis of prediction errors revealed that few VS neurons seem to signal errors in reward prediction. For example, the single cell illustrated in fired more strongly when reward was delivered unexpectedly; firing was maximal immediately after a new reward was instituted and diminishing with learning. However this example was the exception, rather than the rule. This is illustrated across the population in , which shows the contrast in activity (early vs late) for all of the reward-responsive VS neurons (n = 41). This contrast is plotted separately for blocks involving unexpected delivery and omission of reward. Neither distribution was shifted significantly above zero, indicating no difference in firing early, after a change in reward, compared to later, after learning (; wilcoxon; z’s < 2; p’s > 0.2610).