Our data show that novelty enhances behavioral exploration in humans in the context of an appetitive reinforcement learning task. Participants' actual choices were best captured in a model that introduced higher initial values for novel stimuli than for prefamiliarized stimuli. This computationally defined novelty bonus was associated with activation of ventral striatum, suggesting that exploration of novelty shares properties with reward processing. Specifically, the observed overlap of novelty-related and reward-related neural components of prediction error signals supports this interpretation. The observation that activation by novelty bonuses in both striatal and midbrain areas correlated with individual novelty-seeking scores points to a functional contribution of the mesolimbic system to novelty-related enhancement of choice behavior.
All of these findings are consistent with a specific computational and neural mechanism (Kakade and Dayan, 2002
), namely that a dopaminergic prediction error signal for reinforcement learning reports a novelty bonus encouraging exploration. Such a model had been originally advanced to explain dopaminergic neuron responses to novel stimuli in passive, nondecision tasks (Horvitz et al., 1997; Schultz, 1998
), a response pattern that has also been suggested in humans (Bunzeck and Duzel, 2006; Wittmann et al., 2007
). By linking a bonus-related neural signal to actual novelty-seeking behavior, the present study provides evidence to support a model of dopamine-driven novelty exploration. While it is not possible to identify definitively the neural source underlying fMRI signals, recent results support an inference that striatal prediction error signals have a dopaminergic basis, because they are modulated by dopaminergic drugs (Pessiglione et al., 2006; Yacubian et al., 2006
). Also, given that fMRI does not allow inference of causality from correlations of brain activity with behavior, alternative explanations for our findings are possible. For instance, areas outside of the mesolimbic system could mediate the exploration effect of novelty, and the striatal activations might then reflect these choices. However, in directly contrasting exploratory to exploitative choices (as in Daw et al., 2006
), we did not find novelty- or exploration-related activity in frontopolar cortex, a candidate region outside the midbrain (Daw et al., 2006
Computational models stress the necessity to overcome exploitative tendencies in order to optimize decision making under uncertainty (Gittins and Jones, 1974
). One solution to this is the introduction of an exploration bonus to guide decisions toward uncertain options (Gittins and Jones, 1974; Kaelbling, 1993
). Here, we provide evidence for a specific version of such a bonus that uses novelty as a signal for uncertainty (Brafman and Tennenholtz, 2003; Kakade and Dayan, 2002; Ng et al., 1999
). Notably, a bonus directed toward uncertainty per se was not evident, either neurally or behaviorally, in a previous study of gambling involving an n-armed bandit task, in which uncertainty arose due to a gradual change in the unknown payoffs but without accompanying perceptual novelty (Daw et al., 2006
). The differences between the tasks may explain why, in the previous study but not the present one, exploratory choices were found to be accompanied by BOLD activity in frontopolar cortex, a region broadly associated with cognitive control. Psychologically, exploration in a familiar context, as in the earlier study, requires overriding not only a tendency to exploit known highly rewarding stimuli but also a tendency to avoid previously low-valued stimuli. However, novel options, like those used here, may not only be attractive due to a novelty bonus, but crucially have no history of negative feedback, perhaps reducing the demand for cognitive control to encourage their exploration.
Computationally, the present findings point to the likelihood that humans use perceptual novelty as a substitute for true choice uncertainty in directing exploration. This would explain why they had a greater tendency to explore perceptually novel options even when no more uncertain and also why our previous study (Daw et al., 2006
) did not detect exploration directed toward uncertainty without perceptual novelty. Such a scheme is common in artificial intelligence (Brafman and Tennenholtz, 2003; Ng et al., 1999
), because it is so easily implemented by optimistic initialization. Additionally, it seems to be a plausible neural shortcut, because novelty is likely to be a reliable signal for uncertainty in the natural world. Physiologically, this appears to be implemented by using the same system to process the motivational aspects of standard reward.
To be sure, on a rational analysis, the degree to which exploration is net beneficial depends on a number of circumstantial factors, including for instance how dangerous unexplored alternatives are likely to be. Computationally, this points to an important requirement that the degree of novelty seeking needs to be carefully tuned to appropriate levels (there are some proposals for the neural substrates for similar “metalearning” processes; Doya, 2002
). Behaviorally, this point resonates with the fact that animals' novelty preferences exhibit a great deal of subtle contextual sensitivity (Hughes, 2007
). Rats, for instance, avoid novel foods (presumably due to serious risk of illness), and fear-promoting stimuli such as electric shocks can also promote novelty avoidance on some tasks. Such phenomena are not inconsistent with our account of novelty seeking in the present (safe) context; indeed, we would infer that our approach could easily be extended to quantify the effects of factors such as fear.
Finally, while the novelty bonus may be a useful and computationally efficient heuristic in naturalistic environments, it clearly has a downside. In humans, increased novelty seeking is associated with gambling and addiction (Hiroi and Agatsuma, 2005; Spinella, 2003
), disorders that are also closely linked to dopaminergic pathophysiology (Chau et al., 2004; Reuter et al., 2005
). More generally, the substitution of perceptual novelty for choice uncertainty represents a distinct, albeit slight, departure from rational choice that, as in our task, introduces the danger of being sold old wine in a new skin.