A major finding of this study was that variation across participants in aberrant reward learning pertaining to cues associated with identical reward probabilities was strongly associated with differential DLPFC and MTG responses to those stimuli. We also found, consistent with previous studies (Knutson et al., 2001
), that presenting cues with a strong reward association elicited hemodynamic responses in the VTA, MD thalamus and ventral striatum, and that responses in the MD thalamus correlated with inter-individual variability in the explicit measure of adaptive reward learning. Finally, we demonstrate that simply presenting cues with a low relative to high probability of reward elicited robust hemodynamic responses in the lateral frontal pole and DLPFC, and that responses in the lateral frontal pole correlated negatively with inter-individual differences in the extent of explicit adaptive reward learning.
Presenting cues associated with a high probability of reward relative to those with a low probability of reward elicited hemodynamic responses in the midbrain (corresponding to the VTA), MD thalamus and ventral striatum (A). Furthermore, the magnitude of hemodynamic responses in the MD thalamus was strongly correlated with the degree to which participants learned to distinguish the reward probabilities of the high- and low-probability cues (B and C). Since we did not record arterial pulsation during the scan we cannot exclude the possibility that the hemodynamic signal change in the midbrain may have been confounded by pulsation of brain stem arteries. Nevertheless, the location of the peak voxel identified in the midbrain corresponded to the anatomical location of the VTA (Mai et al., 2003
), and the increased BOLD signal in the ventral striatum and MD thalamus, both innervated by dopamine, is consistent with an explanation in terms of increased input from the VTA.
The MD thalamus, which receives afferent inputs from the ventral striatum both directly and indirectly via the ventral pallidum, sends efferent projections to the ventral part of the prefrontal cortex (Alexander et al., 1986; Lawrence et al., 1998
). The ventral prefrontal cortex itself projects back to the ventral striatum, thus closing a circuit involved in the automatic processing of emotionally relevant environmental stimuli, the ‘affective’ cortico–striatal–thalamic loop (Ongur and Price, 2000
). Therefore, we suggest that the responses we identified reflect the arousing and invigorating effect of stimuli associated with reward, probably mediated by dopamine release in the ventral striatum (Berridge and Robinson, 1998
). We further speculate that the MD region of the thalamus plays a role in orienting attention toward motivationally salient stimuli, as demonstrated in previous studies (Small et al., 2005
), in a similar fashion to that demonstrated for the pulvinar nucleus of the thalamus in mediating attention toward visually salient stimuli (Robinson and Petersen, 1992; Morris et al., 1997
Presenting cues with a low probability of reward relative to those with a high probability of reward elicited strong responses in the lateral frontal pole (A) and DLPFC. Furthermore, the magnitude of hemodynamic response in the lateral frontal pole was strongly correlated with the degree to which participants learned to distinguish the reward associations of the high-probability and low-probability cues (B). The involvement of DLPFC is consistent with previous findings implicating this region in associative learning (Corlett et al., 2004
). Responses in the more ventral lateral frontal pole area have been reported during the processing of internal representations as opposed to external stimuli, or ‘mind-wandering’ (Christoff et al., 2004; Burgess et al., 2007
). Furthermore, as we also found, such responses tend to occur under conditions where participants respond more slowly to stimuli (see Gilbert et al., 2006
for a meta-analysis). Responses in this region have also been reported during instrumental reinforcement learning when participants chose a low-probability reward stimulus over a high-probability reward stimulus (the ‘exploratory choice’ condition in Daw et al., 2006
). However, in the present study, simply presenting low-probability reward cues was sufficient to elicit responses in the lateral frontal pole.
By contrast, the presentation of cue features associated with identical reward probabilities (i.e. comparing the two levels of the task-irrelevant stimulus dimension) did not elicit consistent differential responses across participants. This is perhaps unsurprising, as on average participants exhibited very little aberrant reward learning. However, our covariate analysis revealed robust relationships between the degree of aberrant reward learning across participants and differential responses to irrelevant cue features in the DLPFC and MTG. Irrelevant cue features that were erroneously inferred to be associated with a higher probability of reward elicited smaller DLPFC responses and greater MTG responses, and these differential regional responses were expressed more strongly the higher participants scored on the explicit aberrant reward learning measure. One possibility is that the responses in these regions simply reflect the erroneous prediction of lower vs higher reward, given that the same DLPFC region was evident in the adaptive reward learning contrast (see above). However, this cannot be the complete explanation because there was a spatial dissociation between fictitious (aberrant) and veridical (adaptive) reward-related responses. Aberrant reward prediction signals were not evident in sub-cortical regions such as striatum, thalamus and midbrain, which were apparent in adaptive reward learning analysis, whereas the converse was true for the MTG. This suggests that the processing of fictitious and veridical value is qualitatively different and engages distinct (if overlapping) systems. We now consider the nature of this difference.
Our results speak to an ongoing discussion in the literature regarding which aspects of stimuli drive associative learning. On the one hand, Mackintosh (1975)
suggested that the associability of a stimulus is determined by how reliably
it predicts an outcome. Thus highly predictive (i.e. low uncertainty) cues should be most associable, and over time non-predictive cues are learned to be ignored; this type of mechanism would encourage adaptive reward learning on the SAT. On the other hand, Pearce and Hall (1980)
proposed that cues with uncertain
consequences are most associable, more strongly capturing attention; such a mechanism could possibly contribute to aberrant reward learning on the SAT.
One way to express uncertainty about possible outcomes is the information theoretic concept of entropy, which represents the average surprise over all possible outcomes, in our case either presence or absence of reward (e.g. see Strange et al., 2005
). On the SAT, the uncertainty about the outcome is equal for both ends of the task-relevant stimulus dimension: reward can be either present with 87.5% probability and absent with 12.5%, or vice versa, both corresponding to an entropy of 0.377. Contrasting these cue features revealed robust responses in a well-characterized circuit innervated by dopaminergic projections as described above, suggesting an important role for the affective cortico-striatal loop in associative learning from predictable cues (Mackintosh, 1975
On the other hand, the two ends of the task-irrelevant stimulus dimension on the SAT are both associated with a highly uncertain indication of reward outcome (50% probability, corresponding to a maximal entropy of 0.693). According to the Pearce and Hall (1980)
model, these cue features should be highly associable. When probing aberrant reward learning by comparing the two levels of the task-irrelevant stimulus dimension, we did not find any significant responses across the group as a whole; however, we did observe significant correlations between the degree of aberrant reward learning and responses to irrelevant cues in DLPFC and MTG. These regions have both previously been reported to be sensitive to changes in uncertainty (represented by entropy: Bischoff-Grethe et al., 2000
). Therefore, the correlations we observed across participants between responses in these regions and the degree of aberrant reward learning might reflect individual preferences in how uncertain cues are evaluated (see also Huettel et al., 2006; Chew et al., 2008
It should be noted that this somewhat speculative interpretation also implies that the responses we observed in the adaptive reward prediction contrast should reflect associative learning under low uncertainty. Some of these regions, however, were previously reported to correlate positively with the uncertainty of reward outcome (Fiorillo et al., 2003; Preuschoff et al., 2006
). This apparent contradiction may be related to the fact that these previous studies assessed responses during a delay period immediately preceding reward delivery, while we modeled responses to the cue stimulus itself. Perhaps even more importantly, in these previous studies the probability distribution of outcomes was known to the subjects (due to overtraining or instruction, respectively) whereas in our study it was not (c.f. the distinction between “risk” and “ambiguity” in the economics literature: Camerer and Weber, 1992; Huettel et al., 2006
In summary, we demonstrate that the extent of aberrant reward learning across individuals is strongly associated with the magnitude of differential MTG and DLPFC responses to cues erroneously inferred to differ in terms of reward association. By contrast, adaptive reward prediction responses were identified in a network of structures including regions of the thalamus, striatum and prefrontal cortex comprising the ‘affective’ cortico-striatal loop. Following this initial study in healthy volunteers, it will be important in future work to assess the neural mechanisms underpinning aberrant reward processing in patients with psychosis, since maladaptive reinforcement signaling has been posited as a central mechanism underlying psychotic symptoms (Kapur, 2003; Jensen et al., 2008; Murray et al., 2008
). In particular, it will be of interest to test whether DLPFC and MTG responses to irrelevant cues during aberrant reward learning correlate with the severity of psychotic symptoms.