Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Neuron. Author manuscript; available in PMC 2010 July 16.
Published in final edited form as:
PMCID: PMC2723053

Midbrain dopamine neurons signal preference for advance information about upcoming rewards


The desire to know what the future holds is a powerful motivator in everyday life, but it is unknown how this desire is created by neurons in the brain. Here we show that when macaque monkeys are offered a water reward of variable magnitude, they seek advance information about its size. Furthermore, the same midbrain dopamine neurons that signal the expected amount of water also signal the expectation of information, in a manner that is correlated with the strength of the animal’s preference. Our data shows that single dopamine neurons process both primitive and cognitive rewards, and suggests that current theories of reward-seeking must be revised to include information-seeking.


Dopamine-releasing neurons located in the substantia nigra pars compacta and ventral tegmental area are thought to play a crucial role in reward learning (Wise, 2004). Their activity bears a remarkable resemblance to ‘prediction errors’ signaling changes in a situation’s expected value (Schultz et al., 1997; Montague et al., 2004). When a reward or reward-predictive cue is more valuable than expected, dopamine neurons fire a burst of spikes; if it has the same value as expected, they have little or no response; and if it is less valuable than expected, they are briefly inhibited. Based on these findings, many theories invoke dopamine neuron activity to explain human learning and decision-making (Holroyd and Coles, 2002; Montague et al., 2004) and symptoms of neurological disorders (Redish, 2004; Frank et al., 2004), inspired by the idea that these neurons could encode the full range of rewarding experiences, from the primitive to the sublime. However, their activity has almost exclusively been studied for basic forms of reward such as food and water. It is unknown whether the same neurons that process these basic, primitive rewards are involved in processing more abstract, cognitive rewards (Schultz, 2000).

We therefore chose to study a form of cognitive reward that is shared between humans and animals. When people anticipate the possibility of a large future gain – such as an exciting new job, a generous raise, or having their research published in a prestigious scientific journal – they do not like to be held in suspense about their future fate. They want to find out now. In other words, even when people cannot take any action to influence the final outcome, they often prefer to receive advance information about upcoming rewards. Here we define “advance information about upcoming rewards” as a cue that is available before reward delivery and is statistically dependent on the reward outcome. We do not mean information in the quantitative sense of mathematical information theory (Supplemental Note A). Related concepts have been arrived at independently in several fields of study. Economists have studied “temporal resolution of uncertainty” (Kreps and Porteus, 1978), and have shown that humans often prefer their uncertainty to be resolved earlier rather than later (Chew and Ho, 1994; Ahlbrecht and Weber, 1996; Eliaz and Schotter, 2007; Luhmann et al., 2008). Experimental psychologists have studied “observing behavior” (Wyckoff, 1952), and have shown that a class of observing behavior that produces reward-predictive cues can be a powerful motivator for rats, pigeons, and humans (Wyckoff, 1952; Prokasy, 1956; Daly, 1992; Lieberman et al., 1997). To date, however, there has not been a rigorous test of this preference in non-human primates, the animals in which the reward-predicting activity of dopamine neurons has been best described (Schultz, 2000; Schultz et al., 1997; Montague et al., 2004) (Supplemental Note B).

To this end, we developed a simple decision task allowing rhesus macaque monkeys to choose whether to receive advance information about the size of an upcoming water reward. We found that monkeys expressed a strong behavioral preference, preferring information to its absence and preferring to receive the information as soon as possible. Furthermore, midbrain dopamine neurons which signaled the monkey’s expectation of water rewards also signaled the expectation of advance information, in a manner that was correlated with the animal’s preference. These results show that the dopaminergic reward system processes both primitive and cognitive rewards, and suggest that current theories of reward-seeking must be revised to include information-seeking.


Behavioral preference for advance information

We trained two monkeys to perform a simple decision task (“information choice task”, Figure 1A). On each trial two colored targets appeared on the left and right sides of a screen, and the monkey had to choose between them by making a saccadic eye movement. Then, after a delay of a few seconds, the monkey received either a big or a small water reward. The monkey’s choice had no effect on the reward size – both reward sizes were always equally probable. However, choosing one of the colored targets produced an informative cue - a cue whose shape indicated the size of the upcoming reward. Choosing the other color produced a random cue - a cue whose shape was randomized and therefore had no meaning. The positions of the targets were randomized on each trial. To familiarize monkeys with the two options, we interleaved choice trials with forced-information trials and forced-random trials, in which only one of the targets was available.

Figure 1
Behavioral preference for advance information

After only a few days of training, both monkeys expressed a strong preference to view informative cues (Figure 1B). Monkey Z chose information about 80% of the time, and monkey V’s choice rate was even higher, close to 100%. Their preference for advance information cannot be explained by a difference in the amount of water reward, because information did not allow monkeys to obtain extra water from the reward-delivery apparatus (Figure S1), and had little effect on whether they completed a trial successfully (< 2% error rate for each target, Figure S2).

An important concern is that advance information might have allowed monkeys to extract a greater amount of subjective value from the water reward by physically preparing for its delivery – for instance, by tensing their cheek muscles to swish water around in their mouths in a more pleasurable fashion (Perkins, 1955). We therefore introduced a second task that equalized the opportunity for simple physical preparation (Mitchell et al., 1965) (“information delay task”, Figure 2A). Monkeys again chose between informative and random cues, but afterward a second cue appeared that was always informative on every trial. Thus, information was always available well in advance of reward delivery; the choice was between receiving the information immediately, or after a delay.

Figure 2
Behavioral preference for immediate delivery of information

Soon after being exposed to the new task, both monkeys expressed a clear preference for immediate information, comparable to their preference in original task (Figure 2B). We then reversed the relationship between cue colors and information content, and monkeys switched their choices to the newly informative color (Figure 2B, Figure S3). We conclude that monkeys treated information about rewards as if it was a reward in itself, preferring information to its absence and preferring to receive it as soon as possible.

Dopamine neurons signal advance information

To understand the neural basis of the rewarding value of information, we recorded the activity of 47 presumed midbrain dopamine neurons while monkeys performed the information choice task shown in Figure 1. As in previous studies, we focused on neurons that were presumed to be dopaminergic based on standard electrophysiological criteria and that signaled the value of water rewards (henceforth referred to as “dopamine neurons”) (Methods). Figure 3 shows an example neuron that carried a strong water reward signal. On trials when the monkey viewed informative cues, the neuron was phasically excited by the cue indicating a large reward and inhibited by the cue indicating a small reward. In contrast, on trials when the monkey was forced to view uninformative random cues, the neuron had little response to the cues but was strongly responsive to the later reward outcome, excited when the reward was large and inhibited when it was small. Thus, consistent with previous studies, this neuron signaled changes in the monkey’s expectation of water rewards.

Figure 3
Dopamine neurons signal information

The same neuron also responded to the targets indicating the availability of information. On forced trials when only one target was available, the neuron was excited by the informative-cue target and inhibited by the random-cue target. On choice trials when both targets were available, the monkey always chose to receive information, and the neuron responded much as it did when the informative-cue target was presented alone. Thus, this dopamine neuron signaled changes in both the expectation of water and the expectation of information.

This pattern of responses was quite common in dopamine neurons. We measured each neuron’s discrimination between targets, cues, and rewards using the area under the receiver operating characteristic (Figure 4B–D, Methods). This measure ranges from 0.5 at chance levels to 0.0 or 1.0 for perfect discrimination. As in the example, neurons discriminated strongly between informative reward-predicting cues and between randomly-sized rewards, but only weakly between uninformative random cues and between fully predictable rewards (Figure 4C,D). The same neurons also discriminated between the targets, with clear preferential activation by the target that predicted advance information (Figure 4B). The discrimination was highly similar when measured using either forced-information or choice-information trials in independent data sets (rho = 0.68, p < 10−4; Methods), indicating that the neural preference for information was reproducible and consistent across different stimulus configurations. The same pattern occurred in both monkeys (Figure S4) and could be seen in the population average firing rate (Figure 4A).

Figure 4
Analysis of the dopamine neuron population

There was also a tendency for neurons to have a weak initial excitation for each task event (Figure 4A, Figure S4). This nonspecific response is probably due to the animal’s initial uncertainty about the stimulus identity (Kakade and Dayan, 2002; Day et al., 2007) or stimulus timing (Fiorillo et al., 2008; Kobayashi and Schultz, 2008). We did not observe a predominant tendency for neurons to have anticipatory tonic increases in activity before the delivery of probabilistic rewards, a phenomenon that has been reported in one study (Fiorillo et al., 2003) but not others (Satoh et al., 2003; Morris et al., 2006; Bayer and Glimcher, 2005; Matsumoto and Hikosaka, 2007; Joshua et al., 2008). This may be due to differences in task design such as the size of the reward or the manner in which the reward was signaled (Fiorillo et al., 2003).

An important question is whether dopamine neurons signal the presence of information per se, or whether they truly signal how much it is preferred. In the latter case, there should be a correlation between the neural preference for information, expressed as the neural discrimination between the informative-cue target and the random-cue target, and the behavioral preference for information, expressed as a choice percentage. Such correlations were indeed present, both between-monkeys and within-monkey. Between-monkeys, monkey V expressed a stronger behavioral preference for information than monkey Z (Figure 1B), and also expressed a stronger neural preference (P = 0.02, Figure 5A). Within-monkey, during the sessions in which monkey Z’s behavioral preference was strongest, the neural preference was enhanced (rho = 0.44, P = 0.02, Figure 5D). On the other hand, behavioral preferences for information were not significantly correlated with neural discrimination between water-related cues or water rewards (all P > 0.25, Figure 5B,C,E,F). Thus, consistent with evidence that dopamine neurons signal the subjective value of liquid rewards (Morris et al., 2006; Roesch et al., 2007; Kobayashi and Schultz, 2008), they may also signal the subjective value of information.

Figure 5
Correlation between neural discrimination and behavioral preference


Here we have shown that macaque monkeys prefer to receive advance information about future rewards, and that their behavioral preference is paralleled by the neural preference of midbrain dopamine neurons. Thus, the same dopamine neurons that signal primitive rewards like food and water also signal the cognitive reward of advance information.

Monkeys expressed a strong preference for advance information even though it had no effect on the final reward outcome. This is consistent with the intuitive belief that, all things being equal, it is better to seek knowledge than to seek ignorance. It also provides an explanation for the puzzling fact that the brain devotes a great deal of neural effort to processing reward information even when this is not required to perform the task at hand. For example, many studies use passive classical conditioning tasks in which informative cues are followed by rewards with no requirement for the subject to take any action. In these tasks the brain could simply ignore the cues and wait passively for rewards to arrive. Yet even after extensive training, many neurons continue to use the cue information to predict the size, probability, and timing of reward delivery (e.g. (Tobler et al., 2003; Joshua et al., 2008)). In other tasks, neurons persist in predicting rewards even when the act of prediction is harmful, causing maladaptive behavior that interferes with reward consumption (e.g. refusing to perform trials with low predicted value (Shidara and Richmond, 2002; Lauwereyns et al., 2002)). These observations suggest that the act of prediction has a special status, an intrinsic motivational or rewarding value of its own. Our data provides strong evidence for this hypothesis. When given an explicit choice, monkeys actively sought out the advance information that was necessary to make accurate reward predictions at the earliest possible opportunity.

A limitation of our study is that it does not determine the precise psychological mechanism by which value is assigned to information. There are several possibilities. Theories from experimental psychology suggest that in our task the value of viewing informative cues would simply be the sum of the conditioned reinforcement generated by the individual big-reward and small-reward cues. In this view, the preference for information implies that the conditioned reinforcement is weighted nonlinearly, so that the benefit of strong reinforcement from the big-reward cue outweighs the drawback of weak reinforcement from the small-reward cue (Wyckoff, 1959; Fantino, 1977; Dinsmoor, 1983), akin to the nonlinear weighting of rewards that produces risk seeking (von Neumann and Morgenstern, 1944). On the other hand, theories in economics suggest that preference is not due to independent contributions of individual cues but instead comes from considering the full probability distribution of future events. In this view, information-seeking is due to an explicit preference for early resolution of uncertainty (Kreps and Porteus, 1978) or an implicit preference induced by psychological factors such as anticipatory emotions (Caplin and Leahy, 2001). In addition, just as the value assigned to conventional forms of reward (e.g. food) depends on the internal state of the subject (e.g. hunger), the value assigned to information is likely to depend on psychological factors such as personality (Miller, 1987), emotions like hope and anxiety (Chew and Ho, 1994; Wu, 1999) and attitudes toward uncertainty (Lovallo and Kahneman, 2000; Platt and Huettel, 2008).

Implications of information-seeking for attitudes toward uncertainty

In the framework of decision-making under uncertainty, advance information reduces the amount of reward uncertainty by narrowing down the set of potential reward outcomes. Our data therefore suggests that in our task rhesus macaque monkeys preferred to reduce their reward uncertainty at the earliest possible moment, as though the experience of uncertainty was aversive.

Interestingly, several previous studies using similar saccadic decision tasks came to a seemingly opposite conclusion: macaque monkeys appeared to prefer uncertainty, choosing an uncertain, variable-size reward instead of a certain, fixed-size reward (McCoy and Platt, 2005; Platt and Huettel, 2008). How can these results be reconciled? One possibility is that they can be explained by a common principle; for instance,perhaps monkeys treat the offer of a variable-size reward as a source of uncertainty to be confronted and resolved. An important point, however, is that a preference for reward variance can be caused by factors unrelated to uncertainty – most notably, it can be caused by an explicit preference over the probability distribution of reward outcomes, for instance due to disproportionate salience of large rewards (Hayden and Platt, 2007) or a nonlinear utility function (Platt and Huettel, 2008). In contrast, the choice of information has no influence on the reward outcome; it only affects the amount of time spent in a state of uncertainty before the reward outcome is revealed. In this sense the preference for advance information is a relatively pure measurement of attitudes toward uncertainty.

Information signals in the dopaminergic reward system

Dopamine neuron activity is thought to teach the brain to seek basic goals like food and water, reinforcing and punishing actions by adjusting synaptic connections between neurons in cortical and subcortical brain structures (Wise, 2004; Montague et al., 2004). Our data suggests that the same neural system also teaches the brain to seek advance information, selectively reinforcing actions that lead to knowledge about rewards in the future. Thus, the behavioral preference for information could be created by the dopaminergic reward system. At the neural level, neurons which gain sensitivity to rewards through a dopamine-mediated reinforcement process would come to represent both rewards and advance information about those rewards in a ‘common currency’, particularly neurons involved in reward timing, conditioned reinforcement, and decision-making under risk (Kim et al., 2008; Seo and Lee, 2009; Platt and Huettel, 2008). In turn,these signals could ultimately feed back to dopamine neurons to influence their value signals.

An important goal for future research will therefore be to discover how dopamine neurons measure information and assign its rewarding value. One possibility is that dopamine neurons receive information-related input from specialized brain areas, distinct from those that compute the value of traditional rewards like food and water. Indeed, signals encoding the amount and timing of reward information, and dissociated from preference coding of traditional rewards, have been found in several cortical areas (Nakamura, 2006; Behrens et al., 2007; Luhmann et al., 2008). How these information signals could be translated into a behavioral preference, and whether they are communicated to dopamine neurons, is unknown.

Another possibility is that dopamine neurons receive information signals from the same brain areas that contribute to their food- and water-related signals, such as the lateral habenula (Matsumoto and Hikosaka, 2007). In this case, dopamine neurons would receive a highly processed input, with different forms of rewards already converted into a ‘common currency’ by upstream brain areas. We are currently testing this possibility in further experiments.

Why do dopamine neurons treat information as a reward?

The preference for advance information, despite its intuitive appeal, is not predicted by current computational models of dopamine neuron function (Schultz et al., 1997; Montague et al., 2004), which are widely viewed as highly efficient algorithms for reinforcement learning. This raises an important question: could the information-predictive activity of dopamine neurons be a harmful ‘bug’ that impairs the efficiency of reward learning? Or is it a useful ‘feature’ that improves over existing computational models? Here we present our hypothesis that the positive value of advance information is a ‘feature’ with a fundamental role in reinforcement learning.

Specifically, modern theories of reinforcement learning recognize that animals learn from two types of reinforcement: “primary” reinforcement generated by rewards themselves, and “secondary” reinforcement generated predictively, by observing sensory cues in advance of reward delivery. Predictive reinforcement greatly enhances the speed and reliability of learning, as demonstrated most strikingly by temporal-difference learning algorithms (Sutton and Barto, 1998) which have produced influential accounts of animal behavior (Sutton and Barto, 1981) and dopamine neuron activity (Schultz et al., 1997; Montague et al., 2004). This implies that animals should treat predictive reinforcement as an object of desire, making an active effort to seek out environments where reward-predictive sensory cues are plentiful. If an animal was trapped in an impoverished environment where reward-predictive cues were unavailable, the consequences would be devastating: the animal’s sophisticated predictive reinforcement learning algorithms would be reduced to impotence. This can be seen clearly in our dopamine neuron data (Figure 4A). When an action produces informative cues, dopamine neurons signal its value immediately, a predictive reinforcement signal; but when an action produces uninformative cues, dopamine neurons must wait to signal its value until the reward outcome arrives, acting as little more than a primitive reward detector. Thus, predictive reinforcement depends entirely on obtaining advance information about upcoming rewards.

In light of these considerations, we propose that any learning system driven by the ‘engine’ of predictive reinforcement must actively seek out its ‘fuel’ of advance information. In this view, current models of neural reinforcement learning present a curious paradox: their learning algorithms are vitally dependent on advance information, but they treat information as valueless and make no effort to obtain it. These models do include a form of knowledge-seeking by exploring unfamiliar actions, but they make no effort to obtain informative cues that would maximize learning from these new experiences. In fact, models using the popular TD(λ) algorithm (Sutton and Barto, 1998) are actually averse to advance information (Figure S5). Our data shows that a new class of models is necessary that assign information a positive value – perhaps representing the future reward the animal expects to receive, as a result of obtaining better ‘fuel’ for its learning algorithm. This would be akin to the concept of intrinsically motivated reinforcement learning (Barto et al., 2004), in that dopamine neurons would assign an intrinsic value to information because it could help the animal learn to better predict and control its environment (Barto et al., 2004; Redgrave and Gurney, 2006). Also, although dopamine neurons have been best studied in the realm of rewards, they can also respond to salient non-rewarding stimuli (Horvitz, 2000; Redgrave and Gurney, 2006; Joshua et al., 2008; Matsumoto and Hikosaka, 2009). This suggests that dopamine neurons might be able to signal the value of information about neutral and punishing events (Herry et al., 2007; Badia et al., 1979; Fanselow, 1979; Tsuda et al., 1989), as part of a more general role in motivating animals to learn about the world around them.



Subjects were two male rhesus macaque monkeys (Macaca mulatta), monkey V (9.3 kg) and monkey Z (8.7 kg). All procedures for animal care and experimentation were approved by the Institute Animal Care and Use Committee and complied with the Public Health Service Policy on the humane care and use of laboratory animals. A plastic head holder, scleral search coils, and plastic recording chambers were implanted under general anesthesia and sterile surgical conditions.

Behavioral Tasks

Behavioral tasks were under the control of the REX program (Hays et al., 1982) adapted for the QNX operating system. Monkeys sat in a primate chair, facing a frontoparallel screen 31 cm from the monkey’s eyes in a sound-attenuated and electrically shielded room. Eye movements were monitored using a scleral search coil system with 1 ms resolution. Stimuli generated by an active matrix liquid crystal display projector (PJ550, ViewSonic) were rear-projected on the screen.

In the information choice task (Figure 1), each trial began with the appearance of a central spot of light (1° diameter), which the monkey was required to fixate. After 800 ms, the spot disappeared and two colored targets appeared on the left and right sides of the screen (2.5° diameter, 10–15° eccentricity). (On forced-information and forced-random trials, only a single target appeared). The monkey had 710 ms to saccade to and fixate the chosen target, after which the non-chosen target immediately disappeared. At the end of the 710 ms response window a cue (14° diameter) was presented of the chosen color. For the informative color, the cue was a cross on large-reward trials or a wave pattern on small-reward trials. For the random color, the cue’s shape was chosen pseudorandomly on each trial (see below). The colors were green and orange, chosen to have similar luminance and counterbalanced across monkeys. Monkeys were not required to fixate the cue. After 2250 ms of display time, the cue disappeared and simultaneously a 200 ms tone sounded and reward delivery began. The inter-trial interval was 3850–4850 ms beginning from the disappearance of the cue. Water rewards were delivered using a gravity-based system (Crist Instruments). Reward delivery lasted 50 ms on small-reward trials (0.04 ml) and 700 ms (0.88 ml, monkey V) or 825 ms (1.05 ml, monkey Z) on large-reward trials. To minimize the effects of physical preparation, licking the water spout was not required to obtain rewards; water was delivered directly into the mouth.

The task proceeded in blocks of 24 trials, each block containing a randomized sequence of all 3×2×2×2 combinations of choice type (forced-information, forced-random, choice), reward size (large, small), random cue shape (cross, waves), and informative target location (left, right). Thus, the “random” cues were actually quasi-random and could theoretically yield a small amount of information about reward size, but extracting that information would require a very difficult feat of working memory.

If monkeys made an error (broke fixation on the central spot, or failed to choose a target, or broke fixation on the chosen target before the cue appeared), then the trial terminated, an error tone sounded, an additional 3 seconds were added to the inter-trial interval, and the trial was repeated (‘correction trial’). If the error occurred after the choice, only the chosen target was available on the correction trial.

The information delay task (Figure 2) was identical to the information choice task except the cue colors and shapes were different, and a third set of always-informative gray cues lasting for 1500 ms were appended to the cue period. (there were also minor differences in the task parameters for monkey Z: the duration of the first cue was 2000 ms, and the big reward volume was ~1.29 ml). The 1500 ms duration of the always-informative cue was chosen to allow near-optimal physical preparation for rewards. With a shorter cue duration (e.g. < 750 ms), there might not be enough time to discriminate the cue and make a physical response (e.g. compare to the latency of anticipatory licking in (Tobler et al., 2003)). With a longer cue duration (e.g. > 2 seconds), physical preparation for reward delivery begins to be impaired by timing errors (e.g. compare to the timecourse of anticipatory licking in (Fiorillo et al., 2008; Kobayashi and Schultz, 2008)). To perform a reversal (vertical lines in Figure 2B), the colors of the informative and random cues were switched.

Neural Recording

Midbrain dopamine neurons were recorded using techniques described previously (Matsumoto and Hikosaka, 2007). A recording chamber was placed over fronto-parietal cortex, tilted laterally by 35 degrees, and aimed at the substantia nigra. The recording sites were determined using a grid system, which allowed recordings at 1 mm spacing. Single-neuron recording was performed using tungsten electrodes (Frederick Haer) that were inserted through a stainless steel guide tube and advanced by an oil-driven micro-manipulator (MO-97A, Narishige). Single neurons were isolated on-line using custom voltage-time window discrimination software (the MEX program (Hays et al., 1982) adapted for the QNX operating system).

Neurons were recorded in and around the substantia nigra pars compacta and ventral tegmental area. We targeted this region based on anatomical atlases and magnetic resonance imaging (4.7T, Bruker). During recording sessions, we identified this region based on recording depth and using landmarks including the somatosensory and motor thalamus, subthalamic nucleus, substantia nigra pars reticulata, red nucleus, and oculomotor nerve. Presumed dopamine neurons were identified by their irregular tonic firing at 0.5–10 Hz and broad spike waveforms. We focused our recordings on presumed dopamine neurons that responded to the task and appeared to carry positive reward signals. Occasional dopamine-like neurons that upon examination showed no differential response to the cues and no differential response to the reward outcomes were not recorded further. We then analyzed all neurons that were recorded for at least 60 trials and that had positive reward discrimination for both informative cues and random outcomes, or positive reward discrimination for cues and no discrimination for outcomes, or positive reward discrimination for outcomes and no discrimination for cues (P < 0.05, Wilcoxon rank-sum test). We were able to examine the response properties of 108 neurons, 84 of which met our criteria for presumed dopaminergic firing rate, pattern, and spike waveform, and 47 of which also met our criteria for trial count and significant reward signals. This yielded 20 neurons from monkey V (right hemisphere) and 27 neurons from monkey Z (left hemisphere) for our analysis.

Data Analysis

All statistical tests were two-tailed. The neural analysis excluded error trials and correction trials. We analyzed neural activity in time windows 150–500 ms after target onset (targets), 150–300 ms after cue onset (cues), and 200–450 ms after cue offset (rewards). These were chosen to include the major components of the average neural response. The neural discrimination between a pair of task conditions was defined as the area under the receiver operating characteristic (ROC), which can be interpreted as the probability that a randomly chosen single-trial firing rate from the first condition was greater than a randomly chosen single-trial firing rate from the second condition (Green and Swets, 1966). We observed the same results using other measures of neural discrimination such as the signal-to-baseline ratio and signal-to-noise ratio. Confidence intervals and significance of the population averages of single-neuron ROC areas (Figure 5A–C) were computed using a bootstrap test with 200,000 resamples (Efron and Tibshirani, 1993). Consistent with previous studies of reward coding (Schultz and Romo, 1990; Kawagoe et al., 2004; Roesch et al., 2007; Matsumoto and Hikosaka, 2007) we observed similar neural coding of behavioral preferences for both of the target locations on the screen (average ROC area, forced information vs. forced random: ipsilateral = 0.60, P < 10−4, contralateral = 0.62, P < 10−4; choice information vs. forced random: ipsilateral 0.58, P < 10−4, contralateral 0.62, P < 10−4), so for all analyses the data were combined. We could not analyze activity on choice-random trials due to their rarity (< 3 trials for most neurons). All correlations were computed using Spearman’s rho (rank correlation). To compare neural discrimination measured using either forced-information or choice-information trials in independent data sets, we calculated the correlation between two values, the discrimination between forced-information trials vs. even-numbered forced-random trials, and the discrimination between choice-information trials vs. odd-numbered forced-random trials (rho = 0.68, P < 10−4). Significance of correlations, and of the difference in mean ROC area between the two monkeys (Figure 5), were computed using permutation tests (200,000 permutations) (Efron and Tibshirani, 1993).

Supplementary Material



We thank M. Matsumoto, G. La Camera, B. J. Richmond, D. Sheinberg, and V. Stuphorn for valuable discussions, and G. Tansey, D. Parker, M. Lawson, B. Nagy, J.W. McClurkin, A.M. Nichols, T.W. Ruffner, L.P. Jensen, and M.K. Smith for technical assistance. This work was supported by the intramural research program at the National Eye Institute.


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


  • Ahlbrecht M, Weber M. The resolution of uncertainty: an experimental study. Journal of institutional and theoretical economics. 1996;152:593–607.
  • Badia P, Harsh J, Abbott B. Choosing Between Predictable and Unpredictable Shock Conditions: Data and Theory. Psychological Bulletin. 1979;86:1107–1131.
  • Barto AG, Singh SP, Chentanez N. Proceedings of the Thirteenth Yale Workshop on Adaptive and Learning Systems. CT, USA: New Haven; 2004. Intrinsically motivated learning of hierarchical collections of skills.
  • Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–141. [PMC free article] [PubMed]
  • Behrens TE, Woolrich MW, Walton ME, Rushworth MF. Learning the value of information in an uncertain world. Nat Neurosci. 2007;10:1214–1221. [PubMed]
  • Caplin A, Leahy J. Psychological expected utility theory and anticipatory feelings. The Quarterly Journal of Economics. 2001
  • Chew SH, Ho JL. Hope: an empirical study of attitude toward the timing of uncertainty resolution. Journal of Risk and Uncertainty. 1994;8:267–288.
  • Daly HB. Preference for unpredictability is reversed when unpredictable nonreward is aversive: procedures, data, and theories of appetitive observing response acquisition. In: Gormezano I, Wasserman EA, editors. Learning and Memory: The Behavioral and Biological Substrates. L.E. Associates; 1992. pp. 81–104.
  • Day JJ, Roitman MF, Wightman RM, Carelli RM. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat Neurosci. 2007;10:1020–1028. [PubMed]
  • Dinsmoor JA. Observing and conditioned reinforcement. The Behavioral and Brain Sciences. 1983;6:693–728.
  • Efron B, Tibshirani RJ. An Introduction to the Bootstrap (Chapman & Hall/CRC) 1993
  • Eliaz K, Schotter A. Experimental testing of intrinsic preferences for noninstrumental information. American Economic Review (Papers and Proceedings) 2007;97:166–169.
  • Fanselow MS. Naloxone attenuates rat's preference for signaled shock. Physiological Psychology. 1979;7:70–74.
  • Fantino E. Conditioned reinforcement: Choice and information. In: Honig WK, Staddon JER, editors. Handbook of operant behavior. Englewood Cliffs, NJ: Prentice Hall; 1977.
  • Fiorillo CD, Newsome WT, Schultz W. The temporal precision of reward prediction in dopamine neurons. Nat Neurosci. 2008;11:966–973. [PubMed]
  • Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science. 2003;299:1898–1902. [PubMed]
  • Frank MJ, Seeberger LC, O'Reilly RC. By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism. Science. 2004;306:1940–1943. [PubMed]
  • Green DM, Swets JA. New York: Wiley; 1966. Signal Detection Theory and Psychophysics.
  • Hayden BY, Platt ML. Temporal discounting predicts risk sensitivity in rhesus macaques. Current Biology. 2007;17:49–53. [PMC free article] [PubMed]
  • Hays AV, Richmond BJ, Optican LMA. Unix-based multiple process system for real-time data acquisition and control. WESCON Conf Proc; 1982. pp. 1–10.
  • Herry C, Bach DR, Esposito F, Di Salle F, Perrig WJ, Scheffler K, Luthi A, Seifritz E. Processing of temporal unpredictability in human and animal amygdala. J Neurosci. 2007;27:5958–5966. [PubMed]
  • Holroyd CB, Coles MG. The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychol Rev. 2002;109:679–709. [PubMed]
  • Horvitz JC. Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience. 2000;96:651–656. [PubMed]
  • Joshua M, Adler A, Mitelman R, Vaadia E, Bergman H. Midbrain dopaminergic neurons and striatal cholinergic interneurons encode the difference between reward and aversive events at different epochs of probabilistic classical conditioning trials. J Neurosci. 2008;28:11673–11684. [PubMed]
  • Kakade S, Dayan P. Dopamine: generalization and bonuses. Neural Networks. 2002;15:549–559. [PubMed]
  • Kawagoe R, Takikawa Y, Hikosaka O. Reward-Predicting Activity of Dopamine and Caudate Neurons--A Possible Mechanism of Motivational Control of Saccadic Eye Movement. J Neurophysiol. 2004;91:1013–1024. [PubMed]
  • Kim S, Hwang J, Lee D. Prefrontal coding of temporally discounted values during intertemporal choice. Neuron. 2008;59:161–172. [PMC free article] [PubMed]
  • Kobayashi S, Schultz W. Influence of reward delays on responses of dopamine neurons. J Neurosci. 2008;28:7837–7846. [PMC free article] [PubMed]
  • Kreps DM, Porteus EL. Temporal resolution of uncertainty and dynamic choice theory. Econometrica. 1978;46:185–200.
  • Lauwereyns J, Takikawa Y, Kawagoe R, Kobayashi S, Koizumi M, Coe B, Sakagami M, Hikosaka O. Feature-based anticipation of cues that predict reward in monkey caudate nucleus. Neuron. 2002;33:463–473. [PubMed]
  • Lieberman DA, Cathro JS, Nichol K, Watson E. The role of S- in human observing behavior: bad news is sometimes better than no news. Learning and Motivation. 1997;28:20–42.
  • Lovallo D, Kahneman D. Living with uncertainty: attractiveness and resolution timing. Journal of Behavioral Decision Making. 2000;13:179–190.
  • Luhmann CC, Chun MM, Yi D-J, Lee D, Wang X-J. Neural dissociation of delay and uncertainty in inter-temporal choice. Journal of Neuroscience. 2008;28:14459–14466. [PMC free article] [PubMed]
  • Matsumoto M, Hikosaka O. Lateral habenula as a source of negative reward signals in dopamine neurons. Nature. 2007;447:1111–1115. [PubMed]
  • Matsumoto M, Hikosaka O. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature. 2009 [PMC free article] [PubMed]
  • McCoy AN, Platt ML. Risk-sensitive neurons in macaque posterior cingulate cortex. Nat Neurosci. 2005;8:1220–1227. [PubMed]
  • Miller SM. Monitoring and blunting: validation of a questionnaire to assess styles of information seeking under threat. Journal of personality and social psychology. 1987;52:345–353. [PubMed]
  • Mitchell KM, Perkins NP, Perkins CC., Jr Conditions affecting acquisition of observing responses in the absence of differential reward. Journal of comparative and physiological psychology. 1965;60:435–437. [PubMed]
  • Montague PR, Hyman SE, Cohen JD. Computational roles for dopamine in behavioural control. Nature. 2004;431:760–767. [PubMed]
  • Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H. Midbrain dopamine neurons encode decisions for future action. Nat Neurosci. 2006;9:1057–1063. [PubMed]
  • Nakamura K. Neural representation of information measure in the primate premotor cortex. J Neurophysiol. 2006;96:478–485. [PubMed]
  • Perkins CC., Jr The stimulus conditions which follow learned responses. Psychol Rev. 1955;62:341–348. [PubMed]
  • Platt ML, Huettel SA. Risky business: the neuroeconomics of decision making under uncertainty. Nat Neurosci. 2008;11:398–403. [PMC free article] [PubMed]
  • Prokasy WF., Jr The acquisition of observing responses in the absence of differential external reinforcement. Journal of comparative and physiological psychology. 1956;49:131–134. [PubMed]
  • Redgrave P, Gurney K. The short-latency dopamine signal: a role in discovering novel actions? Nat Rev Neurosci. 2006;7:967–975. [PubMed]
  • Redish DA. Addiction as a Computational Process Gone Awry. Science. 2004;306:1944–1947. [PubMed]
  • Roesch MR, Calu DJ, Schoenbaum G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat Neurosci. 2007;10:1615–1624. [PMC free article] [PubMed]
  • Satoh T, Nakai S, Sato T, Kimura M. Correlated coding of motivation and outcome of decision by dopamine neurons. J Neurosci. 2003;23:9913–9923. [PubMed]
  • Schultz W. Multiple reward signals in the brain. Nat Rev Neurosci. 2000;1:199–207. [PubMed]
  • Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. [PubMed]
  • Schultz W, Romo R. Dopamine neurons of the monkey midbrain: contingencies of responses to stimuli eliciting immediate behavioral reactions. J Neurophysiol. 1990;63:607–624. [PubMed]
  • Seo H, Lee D. Behavioral and neural changes after gains and losses of conditioned reinforcers. J Neurosci. 2009;29:3627–3641. [PMC free article] [PubMed]
  • Shidara M, Richmond BJ. Anterior cingulate: single neurons related to degree of reward expectancy. Science. 2002;296:1709–1711. [PubMed]
  • Sutton RS, Barto AG. Toward a modern theory of adaptive networks: expectation and prediction. Psychol Rev. 1981;88:135–170. [PubMed]
  • Sutton RS, Barto AG. MIT Press; 1998. Reinforcement learning: an introduction.
  • Tobler PN, Dickinson A, Schultz W. Coding of Predicted Reward Omission by Dopamine Neurons in a Conditioned Inhibition Paradigm. J Neurosci. 2003;23:10402–10410. [PubMed]
  • Tsuda A, Ida Y, Satoh H, Tsujimaru S, Tanaka M. Stressor predictability and rat brain noradrenaline metabolism. Pharmacology, biochemistry, and behavior. 1989;32:569–572. [PubMed]
  • von Neumann J, Morgenstern O. Princeton, NJ: Princeton University Press; 1944. Theory of Games and Economic Behavior.
  • Wise RA. Dopamine, learning and motivation. Nat Rev Neurosci. 2004;5:483–494. [PubMed]
  • Wu G. Anxiety and decision making with delayed resolution of uncertainty. Theory and Decision. 1999;46:159–198.
  • Wyckoff LB., Jr The role of observing responses in discrimination learning. Psychol Rev. 1952;59:431–442. [PubMed]
  • Wyckoff LB., Jr Toward a quantitative theory of secondary reinforcement. Psychol Rev. 1959;66:68–78. [PubMed]