|Home | About | Journals | Submit | Contact Us | Français|
The dopamine system has been thought to play a central role in guiding behavior based on rewards. Recent pharmacological studies suggest that another monoamine neurotransmitter, serotonin, is also involved in reward processing. To elucidate the functional relationship between serotonin neurons and dopamine neurons, we performed single unit recording in the dorsal raphe nucleus (DRN), a major source of serotonin, and the substantia nigra pars compacta, a major source of dopamine, while monkeys performed saccade tasks in which the position of the target indicated the size of an upcoming reward. After target onset, but before reward delivery, the activity of many DRN neurons was modulated tonically by the expected reward size with either large- or small- reward preference, whereas putative dopamine neurons had phasic responses and only preferred large rewards. After reward delivery, the activity of DRN neurons was modulated tonically by the received reward size with either large- or small- reward preference, whereas the activity of dopamine neurons was not modulated except after the unexpected reversal of the position-reward contingency. Thus, DRN neurons encode the expected and received rewards, whereas dopamine neurons encode the difference between the expected and received rewards. These results suggest that the DRN, probably including serotonin neurons, signals the reward value associated with the current behavior.
Many functions of the brain are modified by various kinds of monoamine neurons. In particular, dopamine and serotonin appear to be the two major modulators of motivational and emotional behaviors (for review, Daw et al., 2002). The role of dopamine is particularly clear as dopamine neurons in the midbrain in and around the substantia pars compacta (SNc) are excited by a reward or a sensory event that predict the reward, either of which can change motivational or emotional states. More specifically, the activity of the dopamine neurons encodes the difference between the expected reward and the actual reward, which is often called reward prediction error. This signal is suggested to induce learning and modulate actions (Mirenowicz and Schultz, 1994; Montague et al., 1996; Schultz et al., 1997; Hollerman and Schultz, 1998; Schultz, 1998; Suri and Schultz, 1998).
Several lines of evidence suggest that serotonin is also related to reward-related behaviors (Rogers et al., 1999; Daw et al., 2002; Doya, 2002; Schweighofer et al., 2007), in addition to other functions such as the sleep–wake cycle (McGinty and Harper, 1976; Lydic et al., 1983; Guzman-Marin et al., 2000; Dugovic, 2001) appetite (Curzon, 1990), locomotion (Jacobs and Fornal, 1993), emotion and social behavior (Davidson et al., 2000; Graeff, 2004), stress-coping behavior (Deakin, 1991; Graeff et al., 1996) and learning and memory (Meneses, 1999). Notably, it has been proposed that there are opponent interactions between dopamine and serotonin (for review, Kapur and Remington, 1996). However, the physiological basis of the function of the serotonin system in the cognitive and motivational behavior has not been well understood. Electrophysiological studies of the raphe nuclei have been focused mainly on sleep-wake cycle and motor behavior (for review, Jacobs and Fornal, 1993). Specifically, it is unknown whether and how serotonin neurons in the raphe nuclei encode reward-related information.
In a series of studies using saccade tasks with a biased reward schedule, we have shown that the activity of neurons in the caudate (Kawagoe et al., 1998; Lauwereyns et al., 2002) and the substantia nigra pars reticulata (SNr) (Sato and Hikosaka, 2002), as well as putative dopamine neurons in SNc (Nakahara et al., 2004; Takikawa et al., 2004), was modulated depending on the expected reward. We also showed that the reward-dependent changes in saccade behavior depended on the physiological dopamine release in the caudate (Nakamura and Hikosaka, 2006). Having found that these tasks engage the basal ganglia and the dopamine system, we hypothesized that they would also recruit the serotonin system. We therefore recorded from the dorsal raphe nucleus (DRN), the principal source of serotonergic innervations in the basal ganglia (van der Kooy and Hattori, 1980; Imai et al., 1986; Corvaja et al., 1993). As a comparison, we also recorded from dopamine neurons using the same tasks in the same animals. We found that neurons in the DRN relay signals related to cognitive and motivational processes, but in a different manner from the dopamine system.
We used four hemispheres of two rhesus monkeys (Macaca mulatta; laboratory designations E, male and L, female). Both animals had been implanted with scleral search coils for measuring eye position and a post for holding the head. The recording chambers were placed over the posterior cortices. All aspects of the behavioral experiment, including presentation of stimuli, monitoring of eye movements, monitoring of neuronal activity, and delivery of reward and electrical stimulation were under the control of a QNX-based real time experimentation data acquisition system (REX, Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, Maryland). Eye position was monitored by means of a scleral search coil system with 1 ms resolution. Stimuli generated by an active matrix liquid crystal display projector (PJ550; View Sonic, Walnut, CA) were rear-projected on a frontoparallel screen 25 cm from the monkey’s eyes. Upon successful completion of each trial, drops of water or juice were delivered as reward through a spigot under control of a solenoid valve. Magnetic resonance images were obtained to determine the position of the electrode. The activity of single neurons was recorded using tungsten electrodes (Frederick Haer Company, Bowdoinham, ME, diameter 0.25mm, 1–3 M Ohm). The signal was amplified with a band-pass filter (200 Hz – 5 kHz) (BAK Mount Airy, MD) and collected at 1 kHz via custom-made window discriminator (MEX). We also collected spike wave form for each recorded neuron. All procedures were approved by the Institute Animal Care and Use Committee and complied with Public Health Service Policy on the humane care and use of laboratory animals.
The animal performed a memory-guided saccade task with a biased reward schedule (one-direction rewarded memory-guided saccade task, ‘1DR-MGS’, Fig 1A). The appearance of a central fixation point (FP, diameter: 0.6°) signaled the trial initiation. The monkeys were required to fixate on the FP and maintain fixation within a window of 3°. After fixation on the FP for 1000–1500 ms (‘Fixation period’), a cue indicating the future target position (diameter: 1.2 deg) was presented for 100 ms either to the right or left 20 deg from the FP. The position of the target was chosen pseudorandomly such that within every ‘sub-block’ of four trials each of the two positions was chosen twice. The monkey had to keep fixating on the FP for another 800 ms until the FP went off. The disappearance of the FP was the cue for the monkey to make a saccade toward the memorized cue position. A correct saccade was signaled by the appearance of the target with a 100 ms delay. A liquid reward was delivered with an additional 100ms delay. If the monkey broke fixation at any time during the fixation period or failed to make a saccade to the cued position, the trial was determined to be an error, and the same trial was repeated until a correct saccade occurred. The inter-trial interval, which started at the time of reward offset and lasted until FP onset in the next trial, was 3 sec.
The biased reward schedule was introduced in blocks (Kawagoe et al., 1998). In one block of 20–28 trials (10–14 trials for each direction), the amount of reward was always large (0.4 ml) for one direction of the target and small (0 or 0.01 ml) for the other direction (for example, left large-reward, right small-reward). In the next block, the position-reward contingency was reversed (i.e., left small, right large). These two kinds of blocks with opposite position-reward contingencies are called the left-large and right-large blocks, and they were alternated two or three times for each recording session (Fig 1C).
In a separate experiment, we also used a visually-guided saccade task (one-direction rewarded visually-guided saccade task, ‘1DR-VGS’, Fig 1B). After fixation on the FP for 1200 ms (‘Fixation period’), the FP disappeared and at the same time, the target (1.2 deg) appeared either to the right or left 20 deg from the FP. The monkey then had to make a saccade to the target immediately. The trial sequence and the reward schedule were the same as those in 1DR-MGS.
We used both 1DR-MGS and 1DR-VGS tasks for 64 DRN neurons, 1DR-MGS only for 20 neurons, and 1DR-VGS only for 103 neurons in two monkeys. For dopamine neuron recordings, we used only 1DR-VGS.
The location of DRN was estimated using magnetic resonance imaging (MRI) and was later verified histologically (see below). A recording chamber, which was angled 38 deg (monkey E) or 35 deg (monkey L) posteriorly, was implanted over the midline of the parietal cortex in order to access the brain stem between the superior colliculi and the inferior colliculi. For electrophysiological recordings we used a grid system (Crist et al., 1988). A stainless-steel guide tube (outer diameter, 0.6 mm; inner diameter, 0.35 mm) was inserted through a grid hole and, after penetrating the dura, it was lowered until its tip reached about 7 mm above the surface of the superior colliculi which was estimated by MR images. Through the guide tube we inserted an electrode to reach the DRN. The distance of the recording sites from the midline was 1or 1.5 mm. The antero-posterior extent of the recording sites was 2 mm, which corresponded to 6–8 mm anteriorly to the level of the ear canals (Horsley-Clarke coordinates) in both monkeys.
The DRN is known to be a major source of serotonin neurons (Dahlstrom and Fuxe, 1964; Leger et al., 2001). It has traditionally been accepted that DRN serotonin neurons spontaneously fire slowly and regularly with broad spikes whereas non-serotonin neurons generally fire more rapidly and irregularly with narrow spikes (Aghajanian et al., 1978; Sawyer et al., 1985; Jacobs and Fornal, 1991; Hajos et al., 1998). Recent studies, however, report that serotonin neurons do not always differ significantly from non-serotonin neurons in terms of these electrophysiological features (Allers and Sharp, 2003; Kocsis et al., 2006). In this report, therefore, rather than choosing neurons with specific electrophysiological properties we studied all well-isolated neurons in the DRN whose activity changed during saccade tasks.
To record from putative dopamine neurons, we searched in and around the SNc. Dopamine neurons were identified by their irregular and tonic firing around 5 spikes/s with broad spike potentials. In this experiment, we focused on dopamine neurons that responded to reward-predicting stimuli with a phasic excitation.
At the conclusion of the experiments, we made electrolytic microlesions at selected recording sites in monkey L. The animal was then deeply anesthetized with pentobarbital and perfused with 10% formaldehyde. The brain was cut into 50 μm coronal sections and stained with cresyl violet (Fig 1D).
A neuron was judged to be task-related if there was a statistical difference in its firing rate across the following seven task periods (Kruskal-Wallis, p<0.007≈0.05/7): fixation point onset to cue (target) onset, 0–200ms after target onset, 700-0 ms before fixation point offset (only for 1DR-MGS), 200 ms before to 200ms after saccade, and three post-reward periods which were 0–400, 400–1200, 1200–2000 ms after reward onset.
Since reward-related modulation of neuronal activity was found mainly during a period after target onset (which indicated the size of an upcoming reward) and during a period after reward delivery, we focused our analysis on neuronal activity during the two task periods: (1) a 400 ms period after target onset, which we will call ‘pre-reward period’, and (2) a 400 ms period starting 400 ms after reward onset, which we will call ‘post-reward period’. We analyzed the neuronal activity in each task period using a two-way ANOVA [reward (large or small) x direction (contralateral or ipsilateral target to the recording site)].
To examine changes in neuronal activity throughout the trial as a whole, we computed an ROC (receiver operating characteristic) value comparing the firing rate in a test window of 100 ms aligned with respect to a task-related event (e.g., target onset) to the firing rate in a control window of 400 ms before fixation onset. We repeated the ROC analysis on consecutive overlapping test windows (advanced in 20-ms steps), separately for the large-reward, small-reward, contraversive-saccade, and ipsiversive-saccade trials (see Fig. 3A–D). Similarly, to examine the changes in the reward and direction effects we computed an ROC value comparing the firing rates in the same test window of 100 ms between the large- and small-reward trials (reward effects, see Fig. 3E) and between the contraversive- and ipsiversive saccade trials (direction effects, see Fig. 3F).
To examine the changes in the neuronal activity in the pre-reward and post-reward periods after the reversal of position-reward contingency, we normalized the firing rate in each trial by: (the firing rate in the trial – the mean firing rates across all trials) / (SD of the firing rate across all trials). We performed this calculation for each direction of saccades. Then we compared the firing rates for the i-th (e.g. the first and second) trials before and after the contingency reversal with the firing rates for the last five trials during the new block (Mann-Whitney U test, p<0.01, see Fig. 8).
We characterized the physiological properties of recorded neurons by (1) spike wave form, (2) baseline firing rate, and (3) irregularity of firing pattern. The typical spike shape consisted of the following waves in order: first, sharp negative, second, sharp positive, third, long-duration negative, fourth, long-duration positive. Thus we measured the spike duration from the first sharp negative to the peak of the fourth, long-duration positive deflection (Kocsis et al., 2006). It ranged from 1.0 ms to 3.7 ms (mean 2.2 ms, SD 0.58 ms). Baseline firing rate is the mean firing rate during 1000 ms before the onset of the fixation point on the first trial of each experiment, because the activity during the inter-trial interval was often modulated tonically after the delivery of reward in the preceding trial. Finally, to quantify irregularity of spike trains we used an irregularity metric introduced by Davies et al. (Davies et al., 2006) which they called ‘IR’. First, inter-spike interval (ISIs) was computed for each ‘between-spikes’. If spike(i-1), spike(i), and spike(i+1) occurred in this order, the duration between spike(i-1) and spike(i) corresponds to ISIi ; the duration between spike(i) and spike(i+1) corresponds to ISIi+1. Second, the difference between adjacent inter-spike intervals (ISI) was computed as |log(ISIi /ISIi+1)|. The value was then assigned to the timing when the spike(i) occurred. Thus, small IR values indicate regular firing and large IR values indicate irregular firing. We then computed a median of all IR values during the whole task period for all correct trials. This measure has an advantage over traditional measures of irregularity, such as the coefficient of variation of the inter-spike intervals, which require a constant firing rate during the measurement period. This requirement was not met in our experiments because neural responses often changed during the task periods. We analyzed IR values of DRN neurons, putative dopamine neurons, and putative projection neurons in the caudate (Supplementary Fig 1). The caudate data were obtained in the separate experiments (Davies et al., 2006).
We analyzed the activity of DRN neurons using two tasks with biased reward schedules: a memory-guided saccade task (1DR-MGS, Fig 1A; 17 neurons from monkey E, 67 from monkey L) and a visually-guided saccade task (1DR-VGS, Fig 1B; 96 neurons from monkey E, 71 from monkey L). Because the biased reward schedule was introduced in blocks, on each trial the animal could predict the reward value based on the location of the target cue (Fig 1C). Indeed, saccadic reaction times were significantly shorter for large-reward than small-reward trials in both monkeys in both tasks (Supplementary Table 1, see also Fig. 8G).
The electrode was directed to the DRN through a recording chamber which was implanted over the midline of the parietal cortex. During the initial survey of DRN, the following brain structures were identified and used as landmarks: superior colliculus with receptive fields in the upper visual field with large eccentricities, inferior colliculus with auditory responses, mesencephalic trigeminal nucleus with responses to mouth movements, the locus coeruleus with phasic responses to salient sensory stimuli, and trochlear nucleus with increased firing during downward eye movements. We analyzed neurons located 0–2 mm anterior to the trochlear nucleus.
Traditionally, it has been accepted that serotonin neurons fire broad spikes spontaneously in a slow and regular ’clock-like’ firing pattern (Aghajanian et al., 1978; Sawyer et al., 1985; Jacobs and Fornal, 1991; Hajos et al., 1998). Therefore, we computed the baseline firing rate, spike duration, and regularity of sampled neurons (see Methods). The baseline firing rate across neurons ranged from 0 to 22 spikes/s with a mean of 4.9 spikes/s (SD 4.3, median 4.0). The spike duration ranged from 1.0 ms to 3.7 ms (mean 2.2 ms, SD 0.58 ms). Different methods have been used to quantify the regularity of neuronal firing (Shinomoto et al., 2003). In this paper we used the irregularity metric ‘IR’, which was the median value of the differences between adjacent inter-spike intervals during the whole task period (Davies et al., 2006, see Methods). Smaller IR values indicate more regular firing. There was no significant difference in IR value between 1DR-MGS and 1DR-VGS (Wilcoxon signed rank test, p=0.79, Supplementary Fig 1A). The IR values for the DRN neurons we sampled were significantly smaller - i.e., more regular - than those for putative projection neurons in the caudate nucleus (p<0.0001) and putative dopamine neurons in the substantia nigra pars compacta (p=0.02, Supplementary Fig 1B). Among DRN neurons, there was no significant correlation between IR values and spike duration (p=0.4, Spearman rank correlation) or baseline firing rate (p=0.05).
DRN neurons exhibited task-related modulations with distinctive features during the performance of the 1DR-MGS. Most notably, DRN neurons often showed reward-dependent modulations in activity after reward onset. Fig 2A shows a representative example. This neuron was characterized by long spike duration (2.76 ms), low baseline activity (2 Hz), and regular firing (median IR 0.31). The neuron exhibited an increase in activity after the onset of the fixation point (FPon) followed by regular and tonic firing until reward onset. The activity further increased after the onset of a large reward but ceased after the onset of a small reward. This modulation occurred regardless of the direction of the saccade, and lasted for 860 ms after reward onset (permutation test, p<0.05, see Methods). Such reward-dependent modulations during the post-reward period lasted longer for other DRN neurons. For example, the neuron in Fig 2B was also characterized by long spike duration (2.6 ms), low baseline activity (6 Hz), and regular firing pattern (median IR 0.50). For both saccade directions, there was a long-lasting decrease in activity starting 400 ms after the onset of large reward (permutation test p<0.05). The activity of the neuron in Fig 2C (baseline firing rate 3Hz, spike duration 1.9 ms, IR=0.47) was significantly stronger for large than small reward trials starting 800 ms to 1500 ms after reward onset. The neuron in Fig 2D (baseline firing rate of 10Hz, spike duration 1.4 ms, IR=0.48) also exhibited a long-lasting reward effect starting around the time of reward offset. Note that in all of these examples the post-reward modulations of activity disappeared before the next trial started (Supplementary Fig 2).
In some neurons reward dependent modulations were also observed before reward onset during the delay period. The neuron in Fig 2C exhibited stronger activity on small reward than large reward trials (p=0.8×10−6). The neuron in Fig 2D also exhibited stronger activity on small than large reward trials, but only when leftward saccades were required (two way ANOVA, reward effect, p=0.005; interaction, p=0.02). Such direction selectivity, however, was relatively rare among DRN neurons.
Reward-dependent modulations in activity during the delay and the post-reward periods, as shown in the example neurons in Fig 2, were commonly observed in the population of DRN neurons. Fig 3A–D illustrate the time course of these modulations using receiver operating characteristic (ROC) analysis, by comparing each neuron’s firing rate for each task condition to the baseline activity during 400ms before fixation onset. During the delay and post-reward periods of the task, many DRN neurons had tonic increases in activity (shown in warm colors) or decreases in activity (cool colors).
Fig 3E shows the time course of reward selectivity, using ROC analysis to compare each neuron’s activity between large- and small-reward trials. Fig 3F shows a similar analysis for direction selectivity, comparing contraversive- and ipsiversive-saccade trials. The reward effect was present in many neurons during both task periods before (mainly the delay period) and after reward, while direction effects were uncommon.
The data in Fig 3A and B reveal a notable difference in the reward-dependent modulations between the pre-reward period and the post-reward period. For each neuron, the changes in activity during the pre-reward period, compared with the baseline activity, tended to be in the same direction on both large- and small-reward trials (Fig 3A and B). On the contrary, the changes in activity during the post-reward period, compared with the baseline activity, tended to be in opposite directions (Fig 3A and B). For example, for the neuron shown in Fig 2A, the pre-reward activity increased compared with the baseline on both large- and small -reward trials. On the other hand, the post-reward activity increased on large-reward trials, but it was inhibited on small-reward trials.
The main cause of the reward effect during the pre-reward period was that the changes in activity tended to be stronger on large-reward trials than on small reward trials, which is illustrated by the greater intensity of colors in Fig 3A than in Fig 3B. To quantify the trend, we computed the pre-reward activity as the firing rate during 400ms after target onset minus the baseline firing rate, and the results are shown in Fig 4A. Among 22 neurons (22/84, 26%) that showed significant reward effects during the pre-reward period, 20 neurons exhibited significant activity changes on large-reward trials whereas only 10 neurons did on small-reward trials. This tendency is illustrated by a wider distribution of the pre-reward activity on large-reward trials than that on the small-reward trials (marginal histograms in Fig 4A). When the firing rate in the pre-reward period was compared between the reward conditions, 16 neurons showed higher firing rates on the large-reward trials than on the small-reward trials; the other 6 neurons showed the opposite pattern (two-way ANOVA, p<0.01).
Reward-dependent modulations were clearer and more prevalent in post-reward activity. Among 42 neurons (42/84, 50%) that showed significant reward effects during the post-reward period, 24 neurons showed changes in activity in opposite directions between large- and small-reward trials (data points in the upper-left and lower-right quadrants in Fig 4B). When post-reward activity was compared between the reward conditions, 18 neurons showed a large-reward preference (i.e., higher firing rates on large-reward trials than on small-reward trials); the other 24 neurons showed a small-reward preference (two-way ANOVA, p<0.01).
As discerned from Fig. 3A–D, some DRN neurons also exhibited changes in activity (1) after fixation onset: increases for 23/84 (27.4 %) or decreases for 12/84 (14.3 %) neurons (comparison between activity during 400ms before and 200ms after fixation onset, Mann-Whitney U test p<0.01), and (2) during the later fixation period: increases for 17/84 (20.2 %) or decreases for 20/84 (23.8 %) neurons (comparison between activity during 400ms before fixation onset and 800–400 ms before target onset, p<0.01).
To understand the functional significance of the reward-related activity of DRN neurons, we compared it to the activity of dopamine neurons in the same two monkeys. For this purpose, we used a visually-guided version of the biased-reward saccade task (Fig 1B, ‘1DR-VGS’). We recorded from 167 DRN neurons (96 from monkey E, 71 from monkey L) and 64 dopamine neurons (20 from monkey E, 44 from monkey L).
The characteristics of the reward-dependent modulations in the activity of DRN neurons in 1DR-VGS were similar to those found in 1DR-MGS. Thus, many DRN neurons exhibited increases or decreases in tonic activity (usually increases) after the onset of the fixation point. These changes became more evident during the pre-reward period, after the onset of the saccade target which indicated the size of the upcoming reward. As in 1DR-MGS, changes in pre-reward activity occurred in the same direction on both large- and small-reward trials (Fig 5A–B), but tended to be greater on large-reward trials (Fig. 6A), thus leading to differences in activity between the two reward conditions (Fig. 5E). Among 44 neurons (44/167, 26%) that showed significant reward effects during the pre-reward period, 34 exhibited significant activity changes on large-reward trials (29 increase and 5 decrease) whereas only 15 did on small-reward trials (13 increase and 2 decrease).
In the post-reward period, the same DRN neurons tended to exhibit opposite changes in activity (Fig. 5A and B). Among 74 neurons (74/167, 44%) that showed significant reward effects, 40 neurons changed their activity in opposite directions on large- and small-reward trials (Fig. 6B). About half (n=36) showed a large-reward preference, while the other 38 neurons showed a small-reward preference (two-way ANOVA, p<0.01). The direction of the reward preference was not always the same between the pre- and post-reward periods (Fig 6E).
The activity pattern of dopamine neurons was distinctively different from DRN neurons (Fig. 5C and D). Dopamine neurons exhibited a phasic increase in activity after fixation onset, as reported by Takikawa et al for 1DR-MGS (Takikawa et al., 2004). They also exhibited a phasic increase in activity after the onset of the target indicating an upcoming large reward (Fig. 5C) and a phasic decrease in activity after the onset of the target indicating an upcoming small reward (Fig. 5D), leading to a strong and transient large-reward preference in the pre-reward period (Fig 5F).
In contrast to the pre-reward period, changes in the post-reward period were less clear in dopamine neurons. Small increases in activity were observed in some neurons after a large reward (Fig. 5C), leading to weak reward effects (Fig. 5F). Whereas 53 of 167 DRN neurons (31.7%) exhibited significant activation modulation long after reward (600–1000ms after reward onset, sign test, p<0.01), only 5 of 64 dopamine neurons (7.8%) did so. Thus the duration of the post-reward activity in dopamine neurons was shorter than that in DRN neurons (chi-square test, p<0.0001). Overall, most of dopamine neurons showed large-reward preference in the pre-reward period and some did so in the post-reward period (Fig. 6F).
Fig 7 shows the proportions of neurons that exhibited significant reward and direction effects for both DRN and dopamine neurons. Statistical significance was determined using a two-way ANOVA for each task period (p<0.01). In both DRN and dopamine neurons, reward effects were more prevalent than direction effects. For DRN neurons, the large-reward preference was more common than the small-reward preference in the pre-reward period, while these kinds of preferences were equally common in the post-reward period. The reward effect was more robust among dopamine neurons. They predominantly showed the large-reward preference in the pre-reward period and less commonly in the post-reward period. The ratio of large- vs. small- reward preference was significantly different between DRN neurons and DA neurons (chi-square, p<0.0001 for both pre- and post-reward periods).
In both of our tasks, the contingency between target position and reward value was fixed during one block of trials, but was then reversed with no external cue. This allowed us to examine how the monkey’s performance and neuronal activity changed adaptively to the new position-reward contingency. As in previous studies from our laboratory, the saccadic reaction time changed quickly after the reversal of the position-reward contingency (Fig 8G) (Lauwereyns et al., 2002; Watanabe and Hikosaka, 2005).
We therefore examined the time course of the changes in the activity of DRN and dopamine neurons (Fig. 8). We computed the mean normalized firing rates for the pre-reward period (0–400ms after target onset) and the post-reward period (400–800ms after reward onset for DRN neurons; 0–400ms after reward onset for dopamine neurons) as a function of the trial number after the reversal. To assess the speed of activity change after the reversal we tested whether the neuronal activity on each trial number was significantly different from the mean activity on the last five trials of the new block (Mann-Whitney U test, p<0.01). This analysis was restricted to neurons whose firing rates were significantly modulated by reward value (two-way ANOVA, p<0.01), and was performed separately for the pre- and post-reward periods.
The changes in pre-reward activity after the contingency reversal were qualitatively similar for DRN neurons and dopamine neurons (Fig. 8A, C, and E). In both DRN neurons and dopamine neurons, the activity on the first trial after the contingency reversal was not different from the last trial of the block before the reversal. This is not surprising because the changed reward had not yet been delivered when the activity occurred. Interestingly, however, the change in activity of DRN neurons was delayed by one trial after the reversal from large rewards to small rewards (Fig. 8A and C), unlike dopamine neurons (Fig. 8E).
The difference between DRN neurons and dopamine neurons was clearer in the post-reward period (Fig. 8B, D, and F). Unlike in the pre-reward period, the changed reward had already been delivered on the first trial after the contingency reversal. The activity of DRN neurons followed the size of the reward faithfully (Fig. 8B and D). In contrast, the activity of dopamine neurons only changed transiently on the first trial, and thereafter returned to a level close to baseline activity (Fig. 8F). Specifically, dopamine neurons decreased their activity on large-to-small reward reversals and increased their activity on small-to-large reversals. These transient changes in activity represent the ‘reward prediction error’, which is the difference between the expected reward value (e.g., small reward) and the actual reward value (e.g., large reward). This pattern of dopamine neuron activity has been shown previously using other tasks (Hollerman and Schultz, 1998; Takikawa et al., 2004). The results thus indicate that DRN neurons encode the actual reward value, not the reward prediction error.
In the present experiment we studied all well-isolated neurons in the DRN whose activity changed during saccade tasks. It has traditionally been accepted that serotonin neurons in the DRN show slow and regular firing with broad spikes (Aghajanian et al., 1978; Sawyer et al., 1985; Jacobs and Fornal, 1991; Hajos et al., 1998), although recent studies may not agree with this characterization (Allers and Sharp, 2003; Kocsis et al., 2006). To examine whether such electrophysiological properties were correlated with reward-related modulation, we first grouped 71 DRN neurons (whose spike shapes were successfully recorded) based on their spike durations (shorter or longer than 2 ms) and baseline firing rates (higher or lower than 3 Hz) (Table 1). These criteria were chosen based on a previous study reporting that the mean spike duration of immunohistochemically identified serotonin neurons was 2.17 ms (range, 1.67–3.5) and the mean baseline firing rate was 1.67 Hz (range, 0.37–3.0), respectively (Allers and Sharp, 2003). During both pre-reward and post-reward periods, there was no tendency that neurons in specific categories show specific types of reward modulation (chi-square test, p>0.5).
We further examined whether the reward-related features of DRN neurons were correlated with any combination of the electrophysiological properties (Fig. 9). There was no significant difference between large- and small-reward preferring neurons in baseline firing rate, spike duration, or irregularity (Kruskal-Wallis, p>0.05). Furthermore, multiple regression analysis indicated that reward effects in ROC values could not be significantly predicted by any linear combination of these three variables (pre-reward, p=0.17; post-reward, p=0.68).
Pharmacological and behavioral studies have suggested that the dorsal and median raphe nuclei (DRN and MRN) are important elements of the brain reward circuitry (Higgins and Fletcher, 2003; Liu and Ikemoto, 2007). However, it was unknown whether and how reward information is represented in the DRN. Our experiments now demonstrate that single neurons in the monkey DRN encode reward information before and after the delivery of reward.
Many serotonergic neurons in the brain (about 40% in cats, 60% in rats) are located in the DRN (Wiklund et al., 1981). The lateral component (the wings) of DRN, best developed around the trochlear nucleus, is most prominent in primates (Jacobs and Azmitia, 1992) and that was where we sampled most of the neurons. Among heterogeneous DRN neurons containing different neurotransmitters (for review, Michelsen et al., 2007), previous studies report that a substantial proportion of DRN neurons are serotonergic: about 30 % in rats (Descarries et al., 1982), 70 % of medium-sized DRN neurons in cats (Wiklund et al., 1981), and 70 % in human (Baker et al., 1991). Recent combined electrophysiological and immunochemical studies revealed that DRN neurons with ‘traditional’ electrophysiological characteristics, such as long spike duration and low and regular baseline firing, are not always serotonergic (Allers and Sharp, 2003; Kocsis et al., 2006). Nevertheless, as shown in table 1, 52 % of our sample neurons exhibited long spike duration (>2ms) and 50% exhibited low firing rate (<3Hz), consistent with the proportion of ‘classical’ serotonergic neurons in DRN. We also found that these neurons did show reward-dependent modulation in activity, indicating that a group of classical serotonergic DRN neurons modulate their activity depending on reward information. We also observed 12% of neurons exhibited baseline firing rate higher than 10Hz and they also exhibited reward-dependent modulation. Such DRN neurons with high firing rates may be GABA neurons (Table 1, Allers and Sharp, 2003).
Reward-dependent modulations in the activity of DRN neurons were different from those observed in putative dopamine neurons. First, whereas the dopamine neurons predominantly responded to a reward-predicting sensory stimulus, DRN neurons responded to both the reward-predicting stimulus and the reward itself. Second, whereas dopamine neurons respond to a reward only when it was larger or smaller than expected, DRN neurons reliably coded the value of the received reward whether or not it was expected. Unlike DRN neurons, dopamine neurons responded to reward delivery only when the cue position-reward contingency was switched so that the reward was unexpectedly small or large (Fig. 8). In other words, dopamine neurons encoded reward prediction error, as suggested previously (Schultz, 1998; Satoh et al., 2003; Kawagoe et al., 2004), but DRN neurons did not. Third, whereas dopamine neurons invariably preferred larger rewards (i.e., are excited by larger rewards), the DRN contains neurons preferring larger rewards and neurons preferring smaller rewards. Finally, whereas dopamine neurons exhibit phasic responses, DRN neurons typically exhibited tonic responses. Thus, whereas dopamine neurons provide phasic signals related to reward-prediction error, DRN neurons provide tonic signals related to expected and received reward values.
The responses of DRN neurons were diverse, compared with relatively stereotyped responses of dopamine neurons. This may be because DRN neurons are heterogeneous, containing different neurotransmitters such as GABA, dopamine, noradrenaline, substance P, nicotine, and acetylcholine (for review, Michelsen et al., 2007), in addition to serotonin neurons which constitute 30–70 % of DRN neurons (Descarries et al., 1982; Leger and Wiklund, 1982). On the other hand, in the current experiment, putative dopamine neurons were selected based on their firing rates, spike shapes, and responsiveness to 1DR-saccade tasks, which might be a reason why their task-related activity was quite homogeneous.
The reward-related signals in DRN neurons may originate from the brain areas that project to the DRN (Aghajanian and Wang, 1977; Sakai et al., 1977; Behzadi et al., 1990; Peyron et al., 1998). Notable among them are (1) dopamine neurons in the substantia nigra pars compacta and the ventral tegmental area, and (2) the lateral habenula. The dopamine neurons, which project to both the DRN and MRN (Kitahama et al., 2000), may exert facilitatory effects on putative serotonin neurons in the DRN (Haj-Dahmane, 2001). Since the dopamine neurons are excited by the stimulus that predicts a large reward, DRN neurons would also be excited by the large reward-predicting stimulus. Indeed, during the pre-reward period, large-reward preference was more common than small-reward preference. In contrast, DRN neurons are inhibited by electrical stimulation of the lateral habenula (Wang and Aghajanian, 1977; Stern et al., 1979; Varga et al., 2003). Using the same reward-biased saccade tasks, a recent study from our laboratory showed that lateral habenula neurons exhibit strong small-reward preference (i.e., inhibited by stimuli that predict large rewards and excited by stimuli that predict small rewards) (Matsumoto and Hikosaka, 2007). These changes in habenula activity would then be translated into the large-reward preference in DRN neurons.
In contrast, the post-reward responses of DRN neurons are unlikely to be derived from dopamine or habenula neurons because neither of them exhibit reliable post-reward responses. Possible origins of the post-reward information include the hypothalamus (Celada et al., 2002) and the medial prefrontal cortex (Hajos et al., 1998; Varga et al., 2003). Hypothalamic orexin neurons are activated by arousal, feeding, and rewarding stimuli (Mieda and Yanagisawa, 2002; Harris and Aston-Jones, 2006). They project to the DRN in addition to many other areas (Peyron et al., 1998) and facilitate serotonin release (Tao et al., 2006). Medial prefrontal cortex inputs to the DRN and MRN attenuate the increase in serotonin release in response to aversive stimuli (Amat et al., 1998).
In the post-reward period, about half of DRN neurons showed large-reward preference and the other half showed small-reward preference. One possible interpretation would be that the two kinds of reward-related signals are represented in other brain areas such as the anterior cingulate cortex (Amiez et al., 2006; Niki and Watanabe, 1979) and these signals are transmitted to DRN (Arnsten and Goldman-Rakic, 1984). Another possibility is that reward information is transferred from one group of neurons to the other via inhibitory connections within the DRN. It has been suggested that the ventral medial prefrontal cortex inhibits serotonin neurons in the DRN by targeting local GABAergic interneurons (Varga et al., 2001). Thus some DRN neurons’ modulation in activity may be in opposite direction to the others depending on the direct or indirect projection from the cortex.
Among the widespread efferent projections of the DRN (Lavoie and Parent, 1990 ; Vertes, 1991) those to the basal ganglia structures, especially, the striatum and the substantia nigra (van der Kooy and Hattori, 1980; Imai et al., 1986), may be particularly important because they are thought to control reward-dependent saccadic eye movements (Hikosaka et al., 2006).
Many lines of evidence suggest that an inhibition of raphe neurons causes a rewarding effect and that this is mediated, at least partly, by the disinhibition of dopamine neurons. Electrical stimulation of the DRN and MRN causes inhibitions of dopamine neurons which are mediated by serotonin released in the substantia nigra (Dray et al., 1976; Tsai, 1989; Trent and Tepper, 1991). Self-administration of muscimol into the raphe nuclei causes rewarding effects in behavior, and this effect is dependent on normal dopamine function (Liu and Ikemoto, 2007). It has been suggested that dopamine actions in the basal ganglia are antagonized by serotonin which derives from the DRN or MRN (Kapur and Remington, 1996). Thus, the inhibition of the DRN/MRN followed by the enhancement of dopaminergic transmission in the basal ganglia appears to be rewarding (Fletcher et al., 1993).
The DRN may have a more direct route to influence saccadic eye movements, which is its projection to the substantia nigra pars reticulata (SNr) (Corvaja et al., 1993). The SNr is known to exert tonic GABAergic inhibition on the superior colliculus and to remove this inhibition in response to sensory, memory, and motivational demands (Hikosaka et al., 2006).
Characteristic features of DRN neurons’ activity were that (1) their reward-related response pattern was tonic, and (2) the changes were of either large- or small- reward preference. Such activation patterns may be useful in integrating appetitive or aversive reward information for a substantial time, as suggested by Solomon and Corbit (Solomon and Corbit, 1974). This may also explain the experimental results indicating that serotonin-depleted animals show impulsive tendencies. That is, systemic or local depletion of serotonin renders the animal likely to choose a small but immediate reward rather than a large but delayed reward (Wogar et al., 1993; Brunner and Hen, 1997; Harrison et al., 1997; Mobini et al., 2000b; Mobini et al., 2000a; Winstanley et al., 2004; Denk et al., 2005; Winstanley et al., 2006). The human DRN was activated when subjects learned to obtain large future rewards (Tanaka et al., 2004). Long-lasting DRN activity may have other functions as well, because impulsivity has been associated with other serotonin-related behavioral tendencies such as aggression (Mehlman et al., 1994; van Erp and Miczek, 2000) and obsession (Insel et al., 1990).
The coding of delayed rewards has been a long-standing issue in reinforcement learning theories (Cardinal et al., 2001). Recent studies have suggested that multiple neural systems may participate in the representation of rewards at different time scales (McClure et al., 2004; Tanaka et al., 2004). One hypothesis is that serotonin regulates the balance between immediate and delayed rewards (Doya, 2002). Daw et al. suggested that the current reward value is represented by the phasic activation of dopamine neurons whereas the average value is represented by the tonic activation of serotonin neurons (Daw et al., 2002). We found indeed that half of DRN neurons exhibited such reward-related tonic activation. However, our results do not completely support the theory because the tonic activation of DRN neurons did not seem to accumulate across trials. Further experiments using tasks involving long-term reward prediction will be necessary to test this hypothesis.
In conclusion, our experiments demonstrate that many neurons in the monkey DRN encode expected and received rewards. They do so in a manner distinctly different from dopamine neurons. It remains to be solved whether and how the DRN signals are used for the reward-based modulation of motor behavior or learning.
Supp Fig. 1. Irregularity metric of DRN neurons. (A) Each data point indicates the irregularity metric (IR) of one DRN neuron during 1DR-MGS and 1DR-VGS (n=64). There was no significant difference in IR value between the tasks (Wilcoxon signed rank test, p=0.79). (B) Cumulative histograms of IR values for DRN neurons (n=167), dopamine neurons (n=87), caudate projection neurons (n=428). Data on DRN and dopamine neurons were obtained from the same monkeys (monkey L and E); data on caudate neurons were obtained from monkey L and monkey S, the latter of which participated in different experiments. IR values for DRN neurons (median: 0.69) were significantly smaller than those for dopamine neurons (median: 0.79) (p<0.03, Mann-Whitney U test) and caudate neurons (median:1.10) (p<0.0001).
Supp Fig. 2. The long-lasting reward modulation disappeared before the next trial started. In (A) and (B), the data in Fig 2B and 2D are now aligned at the reward offset (left panels) and at the fixation point onset on the next trial (right panels). Blue lines: activity for the small reward trials; red lines: activity for the large reward trials.
Supp Table 1. Mean (SD) reaction times (ms) for 1DR-MGS and 1DR_VGS. P values indicate statistical significance by Mann-Whitney U test.
This work was supported by the intramural research program of the National Eye Institute. We thank Dr. Long Ding and Ethan Bromberg-Martin for helpful comments. We thank GC AMERICA, INC for providing us with dental acrylic.