|Home | About | Journals | Submit | Contact Us | Français|
As one of the two main sources of brain dopamine, the ventral tegmental area (VTA) is important for several complex functions, including motivation, reward prediction, and contextual learning. Although many studies have identified the potential neural substrate of VTA dopaminergic activity in reward prediction functions during Pavlovian and operant conditioning tasks, less is understood about the role of VTA neuronal activity in motivated behaviors and more naturalistic forms of context-dependent learning. Therefore, VTA neural activity was recorded as rats performed a spatial memory task under varying context conditions. In addition to reward- and reward predicting cue-related firing commonly observed during conditioning tasks, the activity of a large proportion of VTA neurons was also related to the velocity and/or acceleration of the animal’s movement. Importantly, movement-related activity was strongest when rats displayed more motivation to obtain reward. Furthermore, many cells displayed a dual code of movement- and reward-related activity. These two modes of firing, however, were differentially regulated by context information, suggesting that movement- and reward-related firing are two independently regulated modes of VTA neuronal activity and may serve separate functions.
Dopamine release from the ventral tegmental area (VTA) is considered to be important for several complex behaviors, such as reinforcement learning, spatial/contextual learning, and motivation (Fields, Hjelmstad, Margolis, & Nicola, 2007; Lisman & Grace, 2005; Schultz, 2002; Wise, 2004). One perspective hypothesizes that the fundamental role of VTA dopamine is to solidify stimulus-reward associations (Schultz, 2002). Accordingly, dopamine neurons fire upon presentation of unexpected rewards and conditioned cues that predict reward, and are inhibited when expected events do not occur (Schultz & Dickinson, 2000). These firing patterns may signal an error in the prediction of reward (Bayer & Glimcher, 2005; Hollerman & Schultz, 1998), representing a teaching signal that enables the use of flexible behaviors during learning (Schultz & Dickinson, 2000). Importantly, reward prediction errors are exclusively represented by short (~200 ms) ‘bursts’ of action potentials (Grace & Bunney, 1984a; Schultz, 2002). VTA neurons are also known to exhibit a more slowly changing, non-bursting firing pattern, the function of which is less well understood (Grace & Bunney, 1984b; Schultz, 2007).
The reward prediction error signal appears to take into account the behavioral context in which rewards are obtained (Nakahara, Itoh, Kawagoe, Takikawa, & Hikosaka, 2004), consistent with the importance of VTA in contextual learning (Ihalainen, Riekkinen, & Feenstra, 1999; Lisman & Grace, 2005). Despite the potential link between VTA neuronal activity and contextual information, most definitions of the term ‘context’ center on visuo-spatial features of the environment (Mizumori, Smith, & Puryear, 2007). As such, we tested the hypothesis that VTA reward-related activity is, at least in part, dependent on these very salient and important features of a context.
A second perspective contends that VTA dopamine facilitates motivated behaviors by providing incentive salience to environmental stimuli, such as cues that predict food, in order to invigorate goal-directed behaviors (Berridge & Robinson, 1998; Salamone & Correa, 2002). Indeed, animals with compromised dopaminergic systems are not willing to work to obtain foods they otherwise prefer (Cannon & Palmiter, 2003). Conversely, hyperdopaminergic mice (Zhuang et al., 2001) will run faster and are willing to work harder to obtain rewards (Cagniard, Balsam, Brunner, & Zhuang, 2006; Pecina, Cagniard, Berridge, Aldridge, & Zhuang, 2003). Interestingly, VTA neurons in these mice display increased non-burst firing rates, while burst firing remains normal (Cagniard, Beeler et al., 2006). This raises the possibility that the non-burst firing mode of VTA neurons may be selectively related to motivational state, and highlights the importance of considering both burst and non-burst firing modes in order to establish complete theories of VTA function. Since most VTA unit recording studies have been conducted in restrained or behaviorally confined animals, it is unclear whether VTA neurons also encode aspects of motivational state. Thus, we hypothesize that, by recording VTA neural activity in freely navigating animals, we may be able to highlight novel firing patterns that relate to naturalistic motivated behaviors, such as the animal’s movement to obtain rewards.
We report here that reward-related VTA activity in freely navigating rats is, indeed, dependent on several aspects of the context. In addition, we demonstrate that VTA neurons are sensitive to the movements made to obtain reward. This movement-related activity is independent of context information, and may relate to motivational state.
Nine male Long-Evans rats (4–6 months old) were obtained from Simonson Labs (Gilroy, CA). Rats were housed individually in Plexiglas cages in a temperature and humidity-controlled environment and were maintained on a 12 hour light/dark cycle, and experiments were conducted during the light portion of the cycle. Food and water were provided ad libitum for 5 days as rats acclimated to the colony room prior to being handled daily and reduced to 85% of ad libitum feeding weights. All rats had unlimited access to water throughout the experiment. All animal care and use was conducted according to University of Washington’s Institutional Animal Care and Use Committee guidelines.
Details concerning the construction of recording stereotrodes and microdrives and surgical procedures can be found in previous reports (Puryear, King, & Mizumori, 2006; Smith & Mizumori, 2006). Briefly, stereotrodes were constructed by twisting together two 25 µm laquer-coated tungsten wires (California Fine Wire, Grover Beach, CA) and passed through a 30 ga stainless steel guide cannula. Four stereotrodes were then secured to each microdrive (one per hemisphere) with epoxy. Stereotrodes were cut with sharp surgical scissors to leave 2–3 mm of each stereotrode exposed at the tips of the guide canulae and were electroplated (AgCl solution, Fisher Scientific, Pittsburgh, PA) as necessary to obtain a final impedance of 200–400 kΩ (tested at 1 kHz). Rats were anesthetized with isofluorane (5% mix with O2 for induction with 1–4% for maintenance of anesthesia) and given an antibiotic (Batryl; 5 mg/kg) and an analgesic (Ketofen; 5 mg/kg). The microdrive assemblies were implanted dorsal to VTA according to the following coordinates relative to bregma: −5.25 mm posterior, 0.7 mm lateral, 7 mm ventral (Swanson, 2003). A reference electrode (114 µm Teflon coated stainless steel wire) was implanted near the corpus collosum and a ground screw was implanted into the skull. Rats were then given one week of free feeding to fully recover from surgery before being placed back on food-restriction to begin recording experiments.
The final position of each stereotrode was marked by passing a 25 µA current through each recording wire for 25 s while rats were under 5% isofluorane anesthesia. Rats were then given an overdose of sodium pentobarbital and transcardially perfused with a 0.9% buffered saline solution, followed by 10% formalin. The electrodes were then retracted and the brain was removed and allowed to sink in a 30% sucrose-formalin solution. Forty-micron coronal sections were sliced through the midbrain with a cryostat. The sections were then stained with cresyl violet and imaged with a light microscope. Digital images were taken of the slices that contained the stereotrode tracks in order to determine the approximate location of each cell recorded. Since the lesions created at the stereotrode sites provided us with the final depth of each stereotrode, the approximate location of each cell was considered to be dorsal to the lesion by an amount equal to the distance driven down since the cell was recorded. Only cells determined to be located in VTA (Swanson, 2003) were considered for data analysis.
Prior to each session, rats were connected to the recording equipment by a pre-amplification headstage, which was equipped with a pair of infrared diode arrays to track the animal’s movements. All stereotrodes were checked daily for spontaneous neural activity. If no clear neural activity was encountered, stereotrodes were lowered in approximately 25 µm increments (up to 175 µm per day) until clear, isolatable units were observed. The animal’s position and electrophysiological data were recorded on a Neuralynx Cheetah data acquisition system (Neuralynx, Inc., Bozeman, MT). The position of the animal was monitored by an infrared video camera mounted on the ceiling above the maze. Multiunit activity was recorded simultaneously and independently on each wire of the stereotrode. Incoming signals were amplified (1,000–10,000 times), filtered between 600 Hz and 6 kHz and passed through a window discriminator that triggered a 2 ms sampling period (at 16 kHz) when an impulse from either channel passed a user-defined threshold.
Single units were isolated from the multiunit records using standard cluster-cutting software (MClust; A.D. Redish, University of Minnesota). In addition, a template-matching algorithm (written by C. Higginson) was used offline to facilitate separation of unique spike waveforms. To ensure a high degree of unit separation quality, we only included cells with a signal-to-noise ratio of at least 3:1, exhibited stable clusters throughout the recording session, and exhibited a clear refractory period in the inter-spike interval histogram following cluster cutting.
Rats were habituated to the testing environment by allowing them to freely forage for chocolate milk on an eight-arm radial maze until they reliably drank the chocolate milk and continuously moved about the maze for the full 30 min session. Rats were then trained to perform a differential reward, win-shift spatial memory task using procedures reported previously (Pratt & Mizumori, 1998, 2001). Briefly, the end of each arm was baited prior to the start of each trial with either a large (5 drops) or small (1 drop) amount of chocolate milk on alternating arms. Maze arms containing large or small amounts of reward (counterbalanced across rats) were held constant for each rat throughout training. Each trial consisted of a Study and Test Phase. The Study Phase started with the individual and sequential presentation of four of the eight arms (two large and two small reward arms, randomly selected for each trial) in which the rat ran down to the end of each arm and consumed the chocolate milk reward. After presentation of the fourth arm, the Test Phase began by allowing the rat access to all maze arms. The rat was then required to collect the remaining rewards by choosing the four arms that were not presented during the Study phase. The trial ended once all eight arms were visited and the rat returned to the center of the maze, where it was confined for a 2 min inter-trial interval before a new trial started. Entries into previously visited arms were classified as errors. Once the rat performed 15 trials in approximately one hour for seven consecutive days, recording electrodes were surgically implanted into VTA.
In order to test the context-dependency of task performance and VTA unit activity, we employed a within-subjects design in which each recording session consisted of two blocks of five trials. During the first block of trials (Baseline trials), rats performed the task with the extra-maze cues and rewards in their familiar configuration (i.e., the configuration present during initial training). Following completion of the 5th trial, rats performed a second block of five trials under one of the following four context conditions; either 1) the same conditions as the Baseline trials (Control), 2) the maze room lights extinguished (Darkness), thereby eliminating the visuo-spatial cues in the maze environment, 3) the locations of the large and small rewards switched (Reward Location Switch), or 4) two rewards (1 large and 1 small reward, randomly chosen) were omitted from the study phase of each trial (Reward Omission). Importantly, the design of these experiments allowed us to test whether VTA neuronal activity was dependent on three 3 different aspects of the context: visuo-spatial information, reward location, and reward probability, respectively. Although the context manipulation performed on a given day of testing was chosen randomly, care was taken to insure that adequate numbers of cells were recorded for each type of manipulation.
All statistical tests were performed using SPSS 13.0 (Chicago, Il).
As mentioned above, animals performed the differential-reward, spatial memory task under varying contextual conditions. In order to test whether altering contextual information had any impact on task performance, we compared the average number of errors in each block of trials with Paired T-tests for each context manipulation (α = 0.0125, corrected for the number of tests performed).
We were also able to assess the subject’s discrimination of, and motivation for large and small reward locations in two ways. First, the probability of the animal choosing a large reward arm during the first four arm choices of the test phase was calculated for each trial. Since subjects were food restricted during this study, it was expected that they would retrieve the large rewards before the small rewards when given the choice during the Test Phase of the trial. Therefore, we assessed whether there was a significant Spearman’s correlation (α = 0.05) between the first four arm choices and the probability that the choice would be a large reward arm for each block of trials. Furthermore, we evaluated whether subjects’ displayed different running speed (calculation described below) on maze arms that contained large and small rewards. To do this, we separately calculated the average velocity of the animal from 2000 to 200 ms prior to the acquisition of large or small rewards (i.e., during its approach behavior). Significant differences in velocity during approach to large or small rewards was assessed with a Paired T-test (α = 0.05).
The basic electrophysiological properties (i.e. average firing rate, burst rate, non-burst rate, per-cent spikes in bursts, and intra-burst firing rate) for neurons in this study were calculated according to previously established methods (Grace & Bunney, 1984a, 1984b; Hyland, Reynolds, Hay, Perk, & Miller, 2002; Pan, Schmidt, Wickens, & Hyland, 2005; Robinson, Smith, Mizumori, & Palmiter, 2004). Accordingly, ‘burst’ activity was defined according to previously established criteria of ≤ 80 ms between successive spikes to signal the beginning of a burst and ≥ 160 ms between successive spikes to signal the end of a burst. Spiking activity occurring outside of bursts was considered to occur in the non-burst firing mode.
Due to previously established reward-related activity of rodent VTA neurons (Hyland et al., 2002; Pan et al., 2005; Roesch, Calu, & Schoenbaum, 2007) we were interested in evaluating whether similar activity occurred during performance of the spatial memory task. To do this, the rewards were located in small metal cups mounted to the end of each maze arm, which served as ‘lick-detectors’ and were connected to the recording equipment (custom designed by Neuralynx, Inc.). An event marker was automatically inserted into the data stream when the rat licked the cup, providing an instantaneous measurement of the time the rat first obtained the reward. Peri-event time histograms (PETHs) were then constructed (50 ms bins, ± 2.5 s around each reward event) for all reward events, as well as separate PETHs for large and small reward events. A cell was considered to have a significant excitatory reward response if it passed the following two criteria: 1) the cell was observed to have a peak firing rate within ± 150 ms of reward acquisition and 2) the peak rate was >150% of its average firing rate for the block of trials. These criteria were applied to PETHs collapsed across reward amounts, and separately for large and small reward events. For neurons that exhibited significant reward-related activity, we determined the time point at which the population of these neurons exhibited maximal activity. This was achieved by calculating the average firing rates of the population of neurons in the 150 ms period around the reward event (5 ms bins). The bin with the highest average firing rate was considered the peak bin. In order to determine the degree to which the firing rate of a given cell differentiated amounts of reward we calculate a Reward Discrimination Index (RDI) according to the following formula, where xlarge and xsmall are the average firing rates in the 150 ms period around the time of acquisition of large and small rewards, respectively:
In order to determine if reward responses were dependent on contextual information, we analyzed cells that exhibited a significant reward response in one of the two blocks of trials. First, we calculated the average firing rate in the ± 150 ms epoch around the time of reward acquisition, expressed as a percent change relative to the cell’s average firing rate for each block of trials. In order to directly compare the reward-related activity for each context manipulation, these values were normalized to the maximum value observed for each manipulation. The same calculations were performed for non-rewarded arms in order to address whether similar reward prediction errors observed in putative dopamine neurons (Hollerman & Schultz, 1998; Roesch et al., 2007; Tobler, Dickinson, & Schultz, 2003) occur during performance of this spatial memory task. We then created scatter plots of the normalized reward activity for each block of trials to determine whether the reward activity was similar across context conditions. The theory behind this is that, if VTA reward-related activity was independent of context information, the magnitude of the reward response should be similar in each block of trials, thereby lining up near the diagonal of the scatter plots. Therefore, we devised a Reward Activity Change Index (RACI), which calculated the distance of each data point to the diagonal line according to the following formula (a derivation of the Pythagorean Theorem for calculating the height of a right triangle), where x1 and x2 are the normalized reward responses in blocks 1 and 2, respectively:
Average RACI values were then compared across context manipulations with a oneway ANOVA and Bonferroni posthoc tests (α = 0.05). This analysis was carried out for reward responses collapsed across large and small reward amounts as well as for responses at large rewards. Unit responses at small rewards were not analyzed since no VTA neurons exhibited significant activity upon acquisition of only small rewards (see Results).
Rodent VTA neurons have also been shown to fire in response to cues that predict the availability of rewards (Pan et al., 2005; Roesch et al., 2007). During the course of these recordings, it was noticed that some neurons consistently fired when the first arm of each trial was presented (i.e., the ‘first arm’ cue signaling the start of the trial and availability of reward). Therefore, we began to manually insert event markers, online, into the data stream, indicating the time in which the first arm of each trial was made available. For this subset of cells, PETH’s were also constructed (100 ms bins, ±2.5 s around each cue event time). A cell was considered to have a significant excitatory cue response if it passed the following two criteria: 1) the cell was observed to have a peak firing rate within ±150 ms of cue presentation and 2) the peak rate was >400% of its average firing rate for the block of trials. It was necessary to use a more conservative set of criteria since VTA neurons typically fired at lower rates during the inter-trial interval, which led to false identification of cue-related responses with less stringent criteria. Since we recorded a relatively small number of these types of neurons (see Results), we were unable to determine if these responses were modulated by contextual information.
Given the affordance of free, unrestrained movement during performance of the task, we also sought to determine whether rodent VTA neurons were sensitive to the vigor of the animal’s movement (i.e., its velocity and acceleration). The position of the animal was sampled at 30 Hz and the ‘instantaneous’ velocity of the animal was calculated by dividing the distance between two position points by the inverse of the video sampling rate (or the inverse of the square of the sampling rate for calculating the animals acceleration). As described previously (Gill & Mizumori, 2006; Yeshenko, Guazzelli, & Mizumori, 2004), we assessed each neuron’s firing rate for a significant Pearson’s linear correlation with velocity or acceleration (α = 0.05). In order to avoid contamination with any reward-related activity (i.e., when the animal was not moving) and to ensure adequate sampling of these movement parameters, we limited the ranges of velocity and acceleration to 3–30 cm/s and 3–30 cm/s2, respectively. In addition, we assessed whether there was a significant Pearson’s correlation (α = 0.05) between neuronal firing rates and velocity immediately prior to acquisition of reward. We restricted this analysis to only include time points between 2000 and 200 ms prior to the acquisition of reward to selectively assess neuronal activity related to reward approach behaviors, while excluding any reward-related firing that may occur once the animal obtained reward.
For cells with significant firing rate correlations with velocity and/or acceleration during one or more blocks of trials, we assessed whether these movement parameters were influenced by contextual information. Similar to the RACI values calculated for analysis of reward-related activity, we calculated an r Value Change Index (RVCI) by substituting the r values of the firing rate-velocity and firing rate-acceleration correlations for the normalized reward activity used in the reward analysis. Significant contextual effects on RVCI values were also assessed by means of oneway ANOVAs and Bonferroni posthoc tests (α = 0.05).
VTA neural activity was recorded while rats performed two blocks of five trials of a differential reward, spatial memory task on an eight-arm radial maze under varying context conditions (Pratt & Mizumori, 2001; Puryear et al., 2006) (see Methods and Fig. 1a).
We determined whether alterations of important contextual information produced changes in spatial memory abilities by comparing the average number of errors across blocks of trials (Fig. 1b). Consistent with previous reports (Puryear et al., 2006), rats made significantly more errors when tested in darkness (t13 = −4.36, p < 0.001). The decrement in spatial memory observed in darkness is likely not confounded by potential changes in motivational states of the animal across time, since performance levels did not differ across blocks of trials in the Control condition (t28 = −0.42, p > 0.6). This indicates that accurate performance of the task depends on, at least in part, the presence of visuo-spatial information. Interestingly, performance was significantly enhanced following Reward Location Switch manipulations (t13 = 3.75, p < 0.002), indicating that rats made fewer errors after the locations of large and small rewards were changed. Similar to Control manipulations, Reward Omission manipulations did not alter task performance (t14 = -.72, p > 0.4).
Consistent with previous reports from our lab (Pratt & Mizumori, 1998, 2001), subjects readily discriminated the locations of large and small reward during the first block of trials by preferentially selecting arms associated with large rewards (Fig. 1c). For Control sessions, there was a significant negative correlation between the first four arm choices and the probability that the arm chosen was a large reward arm during the test phases in both blocks of trials (Block 1: r = −0.38, p < 0.001, Block 2: r = −0.61, p < 0.001). Importantly, this demonstrates that, under constant context conditions, rats’ preference for large rewards remained stable across both blocks of trials. This discrimination, however, was influenced by alterations of contextual information. When tested in Darkness, rats were unable to differentiate large and small reward arms, as evidenced by a loss of correlation between arm choice number and the probability of choosing a large reward arm (Block 1: r = −0.70, p < 0.001, Block 2: r = −0.09, p > 0.5). Despite the fact that Reward Location Switch manipulations caused rats to perform the spatial memory task better, they were unable to adjust to the changes in the reward locations. This was manifested by a significant negative correlation between arm choice number and the probability of a large reward arm choice in the first block of trials (Block 1: r = −0.71, p < 0.001) and a significant positive correlation (Block 2: r = 0.57, p < 0.001) between arm choice number and the probability of a large reward arm choice in the second block of trials (i.e., the first and second arm choices tended to be arms that contained large rewards in Block 1). Similar to control conditions, Reward Omission manipulations did not alter rat’s preference for large rewards (Block 1: r = −0.76, p < 0.001, Block 2: r = −0.75, p < 0.001).
We recorded a total of 89 neurons that were histologically determined to be located in VTA (Fig. 2). The firing characteristics of VTA units (i.e., spike duration, average firing rate) as well as the burst firing characteristics (non-burst firing rate, burst rate, percent spikes in burst, and intra-burst firing rate) were similar to what has been reported previously in freely behaving rats (Hyland et al., 2002; Lee, Steffensen, & Henriksen, 2001). None of these basic spike parameters were significantly altered following changes to the context (data not shown). Although dopamine neurons recorded from freely moving rodents have historically been identified using electrophysiological and pharmacological methods (Grace & Bunney, 1983), there is now sufficient evidence to suggest that these methods may not be as reliable as once thought (Margolis, Lock, Hjelmstad, & Fields, 2006; Margolis, Mitchell, Ishikawa, Hjelmstad, & Fields, 2008). In accordance with these recent studies, cell types recorded in our study (i.e., dopamine vs. non-dopamine neurons) were not clearly separable using traditional methods (see Supplementary Fig. 1 and Supplementary text for a more detailed discussion of this issue). Of note, VTA neurons that displayed significant task- and behavior-related firing patterns (described below) exhibited a broad and overlapping range of spike widths, average firing rates, burst-firing characteristics, and pharmacological responses (Supplementary Fig. 1 and Supplementary Fig. 2). As such, the following analyses were conducted on the entire population of neurons recorded, and the cells were categorized according to their firing patterns relative to certain task-related events and behavioral parameters. What follows is a description of the most prominent task-related and behavioral correlates of VTA neuronal firing as rats performed the spatial memory task. For each type of neural-behavioral correlate, the baseline properties are described first, followed by a description of the context-dependency of this neural activity.
Of the 89 cells, 5 cells with very low firing rates (< 1 spike/s) were excluded from this analysis in order to avoid inclusion of any spurious correlations caused by low firing rates. Consistent with previous reports in rodents (Hyland et al., 2002; Pan et al., 2005; Roesch et al., 2007), we observed a large population of VTA neurons (49%, 41/84) that exhibited short-latency, excitatory responses upon reward acquisition when considering large and small rewards together (see Methods for classification criteria and Fig. 3).
Depicted in Figure 4a is an example of one neuron that exhibited excitatory responses at both large and small reward encounters. The cell depicted in Figure 3b only exhibited a significant response upon acquisition of large rewards. Similar to previous reports (Tobler, Fiorillo, & Schultz, 2005), consideration of large and small rewards separately indicated that 34% (14/41) of cells responded to both large and small rewards, whereas 61% (25/41) of cells only responded to acquisition of large rewards. No cells were found to selectively fire relative to acquisition of small rewards (see Fig. 3d for summary). As a population, neurons that did and did not differentiate reward amounts were maximally active ~35 ms and ~30 ms after the acquisition of reward, respectively, indicating that the reward-related activity of these cells was related to acquisition of the reward itself, not the movements required to obtain rewards (see Fig. 3a & b). All reward-related cells exhibited low average firing rates (< 10 spikes/s). Cells that responded to both reward amounts fired at an average rate of 3.54 ± 0.35 spikes/s (mean ± sem) and exhibited waveform durations of 1.41 ± 0.10 ms. Cells that responded to only large rewards had similar average firing rates and spike durations (3.41 ± 0.30 spikes/s and 1.48 ± 0.08 ms, respectively). A oneway ANOVA indicated that there were no significant differences in firing rates or spike durations of cells responding to all rewards and cells responding to only large rewards (Mean Rate: F[1,38] = 0.29, p > 0.50; Spike Duration: F[1,38] = 0.79, p > 0.70).
For neurons that exhibited significant reward-related activity in one or more blocks of trials, we determined whether the activity was gated by contextual information by computing the change in firing rates in the 150 ms period around the time of reward acquisition across blocks of trials (see Methods and Fig. 4).
A oneway ANOVA comparing an index of changes in reward activity (i.e., RACI, defined in Methods; larger RACI values correspond to larger changes in reward-related firing) for all (large and small) rewards revealed a significant main effect of context conditions (F[3,67] = 6.10, p < 0.002, Fig. 4e). Bonferroni post hoc tests revealed that RACI values were significantly larger for Darkness (p < 0.003) and Reward Location Switch (p < 0.04) manipulations compared to Control conditions. The mean RACI values for cells recorded in the Reward Omission condition were not different from Controls (p = 1.00). A similar result was obtained when only considering reward activity at large rewards. In this case, there was a significant main effect of context manipulation (F[3,73] = 4.14, p < 0.01, Supplementary Fig. 3), in which RACI values were significantly higher for Reward Location Switch compared to Control conditions (p < 0.01). Although there was an increase in RACI values for Darkness and Reward Omission manipulations, these increases were not significantly different from Control conditions (p = 0.16 and p = 1.0, respectively). Overall, these analyses indicate that the reward-related activity of VTA neurons is, at least in part, bound to information about the current context.
Since the Reward Omission manipulation created a 25% decrease in the probability of reward (2/8 rewards omitted from each trial), it was possible to determine whether reward prediction error-related activity observed in conditioning paradigms (Bayer & Glimcher, 2005; Hollerman & Schultz, 1998; Roesch et al., 2007) also occurs during performance of this spatial memory task. Positive prediction errors were detected by comparing the normalized reward activity at rewarded arms in the first and second block of trials. A paired samples T-test indicated that responses to rewards were significantly increased after the probability of obtaining reward was decreased to 75% (t21 = 2.68, p < 0.02). Negative prediction errors were detected by comparing responses at non-rewarded arms to the cell’s average firing rate for the block of trials. Although 5/15 cells tested in this condition exhibited a >50% decrease in firing rate when the rat did not receive an expected reward (see Fig. 4d for example), a paired samples T-test indicated that there was no significant decrease in activity at the population level (t13 = −1.37, p > 0.10) when reward was omitted.
We tested a total of 54 VTA neurons for significant responses when the first arm of each trial was presented to the animal (i.e., the first “cue” signaling the start of the trial and availability of reward). Of these cells, 3 were omitted from this analysis due to very low average firing rates (< 1 spike/s). Overall, 27% of cells tested (14/51) exhibited significant excitatory responses when the first arm was made available (Fig. 3c). As with reward-related neurons, all cue-responsive cells fired at low average rates (3.99 ± 0.40 spikes/s) and had waveform durations of 1.61 ± 0.08 ms. Interestingly, 64% (9/14) of cue-related cells also exhibited significant reward-related responses (see Fig. 3c for example). Fifty-six percent of these cells (5/9) only fired relative to large rewards, while 44% (4/9) responded to both large and small rewards (Fig. 3d). This analysis indicates that a large proportion of cue-responsive VTA neurons also exhibited some type of reward-related activity. Due to the relatively small sample size, we were unable to determine whether the cue-related responses were dependent on contextual information.
To investigate the potential relationship between the movement of the animal and firing properties of VTA neurons, we determined whether the firing rates of VTA cells correlated with the velocity and/or acceleration of the animal.
Overall, 66% of cells (59/89) exhibited significant firing rate correlations with either the velocity or acceleration of movement (see Fig. 5a & b for examples and population distributions). Of these cells, 39% (23/59) of the cells’ firing rates were correlated with only velocity, 22% (13/59) with only acceleration, and 39% (23/59) with both velocity and acceleration (Fig. 5d). Of note, positive and negative firing rate correlations with velocity/acceleration were observed (see Fig 5a & b population distributions). This indicates that VTA neurons could either increase or decrease their firing rates as the animal’s velocity/acceleration increased. Importantly, this movement-related activity did not include any reward- or cue-related activity (see Methods), and is represented by lower, more gradually changing non-burst firing rates. Although some neurons that exhibited significant velocity/acceleration-firing rate correlations fired at high average rates, most cells in the distribution had average firing rates similar to neurons that displayed reward- and cue-related firing (see Supplementary Figure 2), indicating that movement sensitivity of VTA neurons is not restricted to one cell type, such as high rate GABAergic neurons (Lee et al., 2001). In addition, 44% of VTA cells that exhibited significant firing rate correlations with velocity or acceleration (24/55, after removal of 4 very low rate neurons) also exhibited reward-related firing (Fig. 5c & d). To our knowledge, this is the first report of a conjunctive representation of movement (i.e., velocity and/or acceleration) and reward in VTA neurons of the freely behaving rodent, indicating that VTA neurons may encode both the acquisition of reward and some aspect of the movements made to obtain it.
Given the novelty of the dual code for movement and reward in VTA neurons, it was of considerable interest to determine whether the movement-related responses of VTA neurons were related to the reward-related responses. First, we found a significant correlation between the degree of movement-related activity (i.e., r value and linear regression slope of the firing rate-velocity correlation) and the magnitude of response exhibited upon acquisition of rewards (Pearson’s r = 0.85, p < 0.001). This initially suggested that the degree to which the cell’s firing rate was tuned to the animal’s velocity predicted the magnitude of the response upon acquisition of reward. However, we also found a significant relationship between the average firing rate and the degree to which the firing rate was modulated by changes in velocity or acceleration (i.e., the slope of the firing rate-velocity/acceleration correlation regression line; Velocity: Pearson’s r = 0.73, p < 0.001; Acceleration: Pearson’s r = 0.87, p < 0.001). That is, the higher the average firing rate of the cell, the more tightly coupled it was to changes in velocity or acceleration (Supplementary Fig. 4). Furthermore, the average firing rate of the cell was significantly correlated with the magnitude of reward response (Pearson’s r = 0.41, p < 0.01). Overall, these results suggest that neurons with higher average firing rates are more likely to be involved in computations about aspects of the animal’s movement, and this activity does not appear to serve as a predictive cue of reward.
In order to determine whether movement-related activity was associated with differential responses to large and small rewards, we determined the r value and slope of the regression line of the firing rate-velocity correlations immediately prior to the acquisition of reward, and assessed whether these values correlated with the degree to which the cell responded differentially to large and small rewards (Reward Discrimination Index, or RDI, defined in Methods). This analysis only included cells with significant reward-related activity (n = 39) and indicated that there was no significant correlation between the degree to which the cell would differentiate amounts of reward and the degree of movement-related activity while approaching rewards (r value vs. RDI: Pearson’s r = −0.14, p > 0.3; slope vs. RDI: Pearson’s r = −0.02, p > 0.8). This indicates that the degree of movement-related activity does not predict whether the neuron will discriminate acquisition of large and small rewards. Together, these data suggest that the degree to which a given VTA neuron is sensitive to movement is more or less independent of the response of the neuron once the animal obtains reward and supports the hypothesis that burst and non-burst firing serve different functions (Fields et al., 2007; Schultz, 2007; Wise, 2004).
Next, we examined whether movement-related activity of VTA neurons may be related to the motivational state of the animal. Our behavioral results clearly show that animals preferred to obtain large rewards before small rewards. As such, during performance of this task, the only time in which there may be a clear difference in motivational state is when animals approached large versus small rewards. Consistent with this notion, rats ran significantly faster while approaching large rewards than small rewards (large reward = 27.72 ± 0.71 cm/s, small reward = 25.31 ± 0.72 cm/s; t25 = 11.65, p < 0.001; see Fig. 6a for an example and Fig. 6b for group averages). Next, we assessed whether the firing rates of reward-related VTA neurons were differentially coupled to the animal’s velocity while approaching large or small rewards. We hypothesized that, if movement-related firing of VTA neurons was related to aspects of motivational state, the firing rate of these neurons should be significantly correlated with velocity while in a state of higher motivation (i.e., while approaching large rewards). Therefore, this analysis was limited to cells that displayed significant reward-related activity and significant firing rate-velocity correlations (positive or negative) during the approach to large rewards (n = 10 out of the 24 cells that displayed a dual code for reward and movement). We found that the average r values of the firing rate-velocity correlations of this subset of VTA neurons were significantly greater while the animals approached large rewards than while they approached small rewards (t9 = 2.69, p < 0.03; see Fig. 6b). This raises the intriguing possibility that the changes in firing rate as rats increase and decrease velocity may be related to the animal’s motivation state.
To establish whether the neural coding of movement information is context-dependent, we determined whether the relationship between firing rate and movement measurements was consistent across context manipulations.
A oneway ANOVA comparing changes in the r values of the firing rate correlations with velocity and/or acceleration (i.e., RVCI’s, defined in Methods) revealed significant differences across contextual conditions for cells with firing rate correlations with acceleration (F[3,75] = 3.09, p < 0.03, Fig. 7b), but not velocity (F[3,89] = 0.47, p > 0.70, Fig. 7a). Bonferroni post hoc tests on the apparent contextual modulation of acceleration-sensitive cells revealed only marginally significant differences when Reward Omission conditions were compared to Control and Reward Location Switch conditions (each p = 0.07). Further analysis revealed, however, that the average r values of the acceleration-firing rate correlations were significantly different during the first block of trials across manipulations (F[3,75] = 4.64, p < 0.01). Bonferroni post hoc tests indicated that the correlation r values were significantly lower in the Reward Omission condition when compared to Control and Reward Location Switch conditions (p < 0.03 and p < 0.01, respectively) in the first block of trials. This indicates that the population of cells recorded during Reward Omission conditions happened to have had weaker relationships with acceleration during the first block of trials, and that relationship became stronger in the second block of trials (Fig. 7c). Therefore, the apparent context-dependency of the relationship between firing rate and acceleration was solely due to differences in the population of cells sampled across context conditions, and not due to the change in context information (i.e., reward probability) per se. Overall, these data indicate that movement/motivation-related activity of VTA neurons is not overtly dependent on contextual information.
This study demonstrates that many of the basic components of VTA neuronal activity exhibited during conditioning experiments (Pan et al., 2005; Roesch et al., 2007) also exist during goal-directed navigation. Furthermore, by testing rats in a spatially extended environment, we discovered that the activity of the majority of neurons (66%) was also related to aspects of the animal’s movement. Furthermore, reward- and movement-related activities appear independent of one another for they were differentially regulated by contextual information. These findings, particularly the conjunction of movement and reward coding, should inform theoretical models of the role of VTA in reinforcement learning (Lisman & Otmakhova, 2001; Montague, Dayan, & Sejnowski, 1996; Suri, 2002) such that they may be applied to more complex learning situations.
Reward-related activity in dopamine neurons is known to diminish after animals learn to predict rewards (Schultz, 2002). Since our study was performed in well trained animals, one might have expected a paucity of reward-related activity. However, recent studies suggest that longer time intervals (8–16 sec) between the presentation of cues and rewards will cause dopamine neurons to continue to respond to reward, even when the cue-reward association is well learned (Fiorillo, Newsome, & Schultz, 2008; Kobayashi & Schultz, 2008). In our study, animals typically completed a trial in roughly 1–2 minutes. Thus, the relative lack of temporal precision of reward acquisition during the spatial navigation task likely maintained VTA reward activity. This could explain why negative prediction errors were not observed at the population level, while positive prediction errors were clearly observed in many VTA neurons. As such, the patterns of VTA reward-related neuronal activity during performance of the spatial memory task appear to be consistent with the reward prediction error hypothesis of dopamine neuron function.
This raises the following question: since spatial memory tasks more closely approximate every-day situations (such as finding a parking space in downtown Boston), would there be an advantage of them being inherently less predictable? Specifically, the relative lack of temporal predictability during spatial navigation tasks may be particularly useful, since they depend on active exploration in order for animals to obtain the reward in which they seek. Thus, the degree of uncertainty likely allowed VTA neurons to continue to respond to reward, resulting in increased exploratory behaviors that would maximize the total amount of reward obtained (Doya, 2008; Ikemoto, 2007; Redgrave & Gurney, 2006).
Despite some of the potential challenges of using spatial navigation tasks, perhaps one of their most important virtues is the affordance of free, unrestrained behaviors of the animals being tested. This allowed for the clear result that movement is a much larger contributor to the firing patterns of VTA neurons than previously appreciated (66% of all neurons recorded). Similar to the activity of identified GABAergic neurons (Lee et al., 2001), we found that the activity of a small population of high rate neurons was related to the velocity and acceleration of movement. Importantly however, movement encoding was not limited to high rate neurons: a large proportion of low rate neurons were also sensitive to velocity and acceleration. Although a few studies have shown that the activity of some putative dopamine neurons may relate to specific behaviors (Kosobud, Harris, & Chapin, 1994; Schultz, 1986), the movement-related activity described here differs in two important ways. First, the activity of these neurons was not related to specific behavioral acts, but to how fast the animal moved. Second, about half of the neurons that responded to rewards also displayed significant movement-related activity, suggesting that individual VTA neurons can dually encode information about rewards and the movements made to obtain them.
The movement-related firing of VTA neurons is not likely a byproduct of a value representation (Tobler et al., 2005). For example, it is possible that the movement-related activity actually represented the temporal derivative of the increase in chocolate milk odor concentration as the animal approached reward, which theoretically would have caused a gradual increase in firing rate. Although the current study can not conclusively rule out this interpretation, we feel that it is unlikely since rats behaviorally demonstrated that they were not using odor to guide their choice behavior. If animals were able to smell the rewards from the center of the maze, and this odor information was driving VTA neural activity, the animals should have been able to immediately adapt their choice behavior when reward locations were switched. Furthermore, we observed both positive and negative correlations between firing rate and movement (see Fig. 5). Thus, about half of the movement-sensitive VTA neurons decreased firing rates as the animal approached rewards. If movement-related activity was a value signal, we should have only observed positive correlations between movement and firing rate. In addition, a value representation would necessarily be independent of the response of the animal, as it would simply reflect the value of the reward itself. The fact that VTA neuronal activity was found to vary according to how fast the animal moved to obtain the reward further suggests that this activity is not a representation of the reward.
One previous study described what was termed an ‘uncertainty signal’ of dopamine neurons (Fiorillo, Tobler, & Schultz, 2003). This activity was comprised of a gradual increase in firing prior to acquisition of rewards that were poorly predicted and was independent of cue- and reward-related burst firing, as the sustained firing rates fell within the range of the non-burst firing mode of dopamine neurons (Grace & Bunney, 1984a, 1984b). Although the movement-related activity we describe here appears qualitatively similar, it likely does not represent an uncertainty signal. First, rats reliably collected large rewards before small rewards. As such, their choice behaviors indicated that there was a relatively high degree of certainty about which reward would be obtained. According to Fiorillo et al. (2003), in this state of high certainty, there should not have been any change in firing as rats moved toward rewards. Second, as mentioned above, positive and negative firing rate-velocity/acceleration correlations were observed. If this activity was an uncertainty signal, only positive correlations should have been observed.
Although many movement-related VTA neurons also exhibited reward-related activity (~44%), we provide support for the notion that burst and non-burst activity are relatively independent modes of firing with relatively independent functions (Floresco, West, Ash, Moore, & Grace, 2003; Ikemoto, 2007; Kitai, Shepard, Callaway, & Scroggs, 1999; Schultz, 2007). By definition, bursting activity encompasses short bouts of firing rates of at least 12.5 spikes/s (Grace & Bunney, 1984a) and any rates below that are considered non-burst firing (Grace & Bunney, 1984b). As such, the movement-related activity observed here was well within the range of non-burst firing rates of dopamine neurons (see Fig. 5 for examples). This mode of firing is considered important to drive motivated behaviors via more sustained extrasynaptic dopamine release in nucleus accumbens (Floresco et al., 2003; Ikemoto, 2007).
Furthermore, we show that burst and non-burst activity are independently modulated by contextual information. Consistent with the suggestion that reward prediction error signals are dependent on the ‘behavioral context’, (Nakahara et al., 2004), we show here that unexpected changes in reward contingencies (i.e., location and probability) are important factors in determining whether the neuron will burst upon reward acquisition. Moreover, we are able to extend the notion of context-dependent reward prediction to visuo-spatial features of the context. As such, a broader definition of context, one that includes reward and spatial information, appears appropriate when discussing the role of VTA in context-dependent mnemonic functions (Lisman & Grace, 2005).
Here, we report the novel finding that the degree of movement-related neural activity is stronger while animals approached large rewards, a condition in which motivation-related behaviors were enhanced. This is consistent with reports of hyperdopaminergic mice that exhibit increased motivation-related behaviors (i.e., response rates and movement velocity) and increased non-burst firing rates of VTA dopamine neurons (Cagniard, Beeler et al., 2006; Pecina et al., 2003). Although further studies will be required to conclusively rule out more general interpretations of the non-bursting, movement-related activity of VTA neurons, the current findings provide initial evidence that this activity may be related to the motivational state of the animal.
Regardless of the potential link between movement-related VTA neuronal activity and motivation, the finding that VTA neurons are sensitive to some aspects of the behavioral state of the animal is important when considered in the broader context of the neural networks involved in goal-related behaviors. Assuming that at least some of the neurons sampled in the current study are dopaminergic, this movement-related activity could play an important role in selecting populations of neurons in forebrain structures involved in generating behaviors appropriate for a given context. For example, the increases and decreases in firing rates of VTA neurons that occur while animals change speeds would lead to increases or decreases in dopamine release onto striatal neurons. This would then bi-directionally alter the excitability of striatal neurons (Mizumori, Puryear, & Martig, 2009; Murer, Tseng, Kasanetz, Belluscio, & Riquelme, 2002; Tseng, Snyder-Keller, & O'Donnell, 2007). This mechanism would simultaneously enable and inhibit specific striatal neural ensembles in order to generate the desired behaviors, while concurrently suppressing the non-desired behaviors.
It remains to be determined how VTA neuronal activity relates to spatial memory functions. These data show that the most dramatic changes in reward-related responses only occurred following context manipulations that challenged subjects’ spatial memory ability (i.e., Darkness and Reward Location Switch conditions). These two manipulations of context information, however, induced opposite behavioral effects. Impaired spatial memory in darkness is likely due to the importance of visuo-spatial information for accurate performance of this task (Puryear et al., 2006). On the other hand, receiving a reward that is different from what was expected (i.e., a small reward on an arm associated with large rewards) could have caused an increase in attention, thereby facilitating spatial memory via increased noradrenergic and/or cholinergic activity (Hasselmo & Giocomo, 2006; Sara, 2009). Interestingly, qualitatively similar changes in unit activity were observed in these two conditions. Consistent with a recent report (Roesch et al., 2007), reward- and cue-related activity of VTA neurons may not directly reflect the behavioral decisions made during performance of a spatial memory task. Rather, this activity may be important for signaling the presence of behaviorally important stimuli, thereby engaging downstream brain areas that enable the selection and execution of appropriate behaviors (Ikemoto, 2007; Redgrave & Gurney, 2006).
This research was supported by NIMH Grant MH 58755 to S.J.Y.M.
Publisher's Disclaimer: The following manuscript is the final accepted manuscript. It has not been subjected to the final copyediting, fact-checking, and proofreading required for formal publication. It is not the definitive, publisher-authenticated version. The American Psychological Association and its Council of Editors disclaim any responsibility or liabilities for errors or omissions of this manuscript version, any version derived from this manuscript by NIH, or other third parties. The published version is available at www.apa.org/pubs/journals/bne