Using dopaminergic signals in the ventromesial striatum (VS), organisms learn to approach stimuli that signal a high probability of reward . During instrumental learning, single-unit activity in the striatum shifts forward from reward delivery to the presentation of reward-predictive stimuli . Accordingly, human functional magnetic resonance imaging (fMRI) experiments show that regions of the striatum are activated by: 1) learned  cues that signal eligibility to respond for uncertain reward in monetary incentive delay (MID) tasks [4–6], 2) the salience of a reward cue , 3) anticipation of maximally uncertain (50%) reward relative to more certain outcomes , and 4) reward delivery [9, 10] that is contingent on behavior .
Of interest here is to use rapid, event-related fMRI to further characterize the extent to which human ventromesial striatum (VS) activation by reward anticipation is dependent on the interaction between reward delivery probabilities and instrumental response requirements. The distinction between instrumental and classical conditioning is muddled in many behavioral tasks because stimuli signaling availability to respond for reward also inherently convey information about impending reward as would a classically-conditioned (Pavlovian) cue. VS activation directly correlated with positive affective responses to reward availability [4, 5] suggesting the possibility of a Pavlovian component to this activation. Moreover, populations of VS neurons fire in response to reward-predictive cues under Pavlovian conditions  (reward not contingent on an instrumental response). If anticipatory VS activation by reward in instrumental tasks is primarily elicited by positive affect itself, this activation should also be elicited by cues associated with reward delivered in a Pavlovian manner.
We designed a Factorial Reward Anticipation (FRA) task (a variant of the MID task) to examine activation by linear contrasts between anticipation of potential reward versus non-reward reported previously [4–6], but here under each of Pavlovian and instrumental conditions. As a secondary objective, we wished to obtain evidence that reward-anticipatory activation previously elicited by MID tasks [4, 5] was also partially dependent on how reward for effort was not certain. Notably, Fiorillo et al demonstrated that anticipatory single-unit activity in mesolimbic dopaminergic neurons is maximal when the probability of payoff for the instrumental response is maximally uncertain (50%) . The FRA included trials with an explicitly-briefed 50% probability of payoff for a successful response, where outcomes were not primarily a function of the subject’s successful behavior (as with previous MID tasks). We hypothesized that 1) anticipatory VS activation would be dependent on the requirement to respond , 2) VS activation would be more robust with uncertainty of reinforcement , and 3) VS activation would be most evident under the dual conditions of an instrumental requirement and uncertainty of reward delivery.
Ten men (mean age 34.2 ± 6.4) and 10 women (33.9 ± 6.4) free of any physical or mental illness gave written informed consent to participate in this experiment. All procedures were reviewed and approved by the NIAAA Institutional Review Board. The FRA task stimuli were white on a black background, and were projected on a screen and viewed with a head coil mirror. The six trial types (n = 18 each) were pseudo-randomly presented, and were separated by a jittered inter-trial interval (2–8 s) with fixation crosshair. Each trial lasted 6 s, and featured an instruction cue (500 ms), a target (500 ms), and feedback (2000 ms; see Figure 1). Response trials (square cue series) required the subject to respond on a button box while the subsequent target (white square) was presented. A square enclosing a “$,” a “?,” and a “0” indicated 1.0, 0.5, and 0 probabilities, respectively, of winning $1 for hitting the target. Non-response trials (circle cue series) instructed the subject to withhold response while the subsequent target was presented. A circle enclosing a “$,” a “?” and a “0” indicated 1.0, 0.5, and 0 probabilities, respectively, of passive receipt of $1 after the following target was presented. Reward presentation was not contingent on withholding a response to the target. Responding during a 500 ms target presentation was intended to promote attention to the task, to reduce variance in the timing of responses, and was twice the mean reaction time (RT) of the slowest subject in a previous study (235 ms; Bjork et al, 2004), promoting ~ 1.0 probability of hits. During feedback, both trial and cumulative winnings were presented. Each subject was trained about the reward contingencies of the six instruction cues, and performed a 4-minute practice session of the task. Subjects were shown the cash they could win.
We used a 3 T scanner (General Electric, Milwaukee, WI) and a quadrature head coil. We collected 24 3.8-mm-thick axial slices with a 1 mm interslice gap. In-plane resolution was 3.75 × 3.75 mm. Functional scans were acquired using a T2*-sensitive echoplanar sequence with a repetition time (TR) = 2000 msec, echo time (TE) =40 msec, flip = 90°. Structural scans were acquired using a T1-weighted MP-RAGE sequence (TR, 100 msec; TE, 7 msec; flip, 90°) for co-registration of functional data. Each subject’s head was immobilized by a deflatable head restraint cushion.
We analyzed blood oxygen level-dependent (BOLD) signal time-locked to instruction cue presentation. Preprocessing and statistical analyses were conducted using Analysis of Functional Neural Images (AFNI) software  as follows: (1) volumes were concatenated across the three task runs; (2) voxel time series were interpolated to correct for non-simultaneous slice acquisition within each volume, (3) volumes were corrected for motion. Motion-correction estimates indicated that no participant’s head moved more than 1.5 mm between volumes. We then applied a 10 mm smoothing kernel, a de-spiking algorithm, and bandpass filtering which smoothed cyclical fluctuations in signal that were greater than 0.011/sec or less than 0.15/sec.
The regression model consisted of orthogonal regressors corresponding to presentation of the six instruction cues (anticipation), outcome notifications, residual motion following volume correction, and baseline and linear trends for each run. Regressors of interest were convolved with a gamma variate function that modeled a prototypical hemodynamic response function. Statistical maps were generated by the following linear contrasts (LC: area-under-curve activation), which were planned a priori to replicate previously reported contrasts and to extend them to Pavlovian conditions: 1) anticipation of responding for certain reward (p = 1.0) versus nonreward (p = 0), 2) anticipation of responding for uncertain reward (p = 0.5) versus nonreward, 3) anticipation of passive receipt of certain reward versus certain nonreward, and 4) anticipation of passive receipt of uncertain reward versus nonreward.
We used higher-order LC to isolate activation by the combination of an incentive together with the requirement to make an instrumental response . For each of the p = 0.5 and p = 1.0 trial types, this was accomplished with a higher-order LC that could be conceptualized in two ways, either: 1) activation during anticipation of responding for reward versus anticipation of passively-obtained reward, while masking out activation by the contrast: (responding for non-reward versus passive receipt of non-reward), or 2) activation during anticipation of responding for reward versus anticipation of responding for no reward, while masking out the contrast: (passive receipt of reward versus passive nonreward). Thus this comparison allows us to identify those voxels that are most activated by the combination of anticipated reward and the need to generate a motor action to obtain the reward.
Individual subject maps of linear contrast t-statistics were transformed into Z scores and warped to common Talairach space and combined into a group map using a meta-analytic formula (average Z * square root (n))[4, 5]). For each contrast, activations were objectively detected using AFNI programs AlphaSim, 3dmerge, and 3dExtrema, where: 1) voxels each exceeded a statistical significance threshold of p < .0001, and 2) activated voxels were part of a contiguous cluster of sufficient size to obtain a family-wise corrected type I error rate ≤ 0.05 using Monte Carlo simulation.
Subjects hit a large majority of targets, such that intended reinforcement probabilities in response trials were not appreciably degraded by failures to hit the targets. Omission error rates (in response trials) were 3.3% in p = 1.0 trials, 3.3 % in p = 0.5 trials, and 5.8% in p = 0 trials. Compound repeated-measures ANOVA revealed a significant increase in omission errors in the final run of the task (main effect of time (Runs 1–3) F(2,38) = 5.216, P < .01), and a trend toward greater omission errors in p = 0 trials (main effect of probability F(2,38) = 2.940, P = .065). When subjects responded to a target, hit rates were 97.2%, 98.1%, and 96.4% in p = 1.0, p = 0.5 and p = 0 trials, respectively. There was a main effect of incentive probability on reaction time (RT) (F(2,38) = 6.325, p < .001). Simple effect paired t-tests indicated that responses to both p = 1.0 targets (mean 267.2 ± 51.3 ms) and p = 0.5 targets (mean 267.8 ± 40.4 ms) were significantly faster than responses to p = 0 (mean 294.2 ± 56.3 ms) targets (p < .01). There were no main or interactive effects of time on RT (runs 1–3). The incidence of commission errors (in non-response trials) was 1.4%.
Neither the LC between anticipation of passive receipt of uncertain (p = 0.5) reward versus nonreward, nor passive receipt of certain (p = 1.0) reward versus nonreward, activated any cortical or subcortical voxels. Anticipation of responding for uncertain (p = 0.5) reward (versus nonreward) activated different regions of motor cortex, and activated the putamen bilaterally, with activated voxels extending ventro-mesially into the NAcc (See Table 1 and Figure 2A). The LC between anticipation of responding for certain (p = 1.0) reward versus responding for nonreward activated left dorsal thalamus, left parietal cortex, left putamen, cerebellar vermis, as well as insula and superior post-central gyri bilaterally (See Table 1 and Figure 2B). The VS activation was centered in the Putamen, with activated voxels extending ventro-mesially into nucleus accumbens (NAcc). A post hoc LC between certain (p = 1.0) versus uncertain (p = 0.5) reward activated mesial frontal cortex under each of response and non-response conditions (Table 1).
For each of the p = 0.5 and p = 1.0 trial types, higher-order LC revealed regions selectively activated by the interaction of the presence of potential gain (versus nongain) with the requirement to emit an instrumental response. This higher-order LC with p = 0.5 trials activated left precentral gyrus and putamen and lentiform nuclei bilaterally. (Table 2, Figure 3A). The activation in the putamen extended into the VS. The higher-order LC with p = 1.0 trials activated bilateral insula, cingulate motor area, left pre- and post-central gyrus, posterior putamen, and right cerebellum, but with no recruitment of rostral Putamen or VS voxels (Table 2, Figure 3B).
We also examined activation time-locked to trial outcome notification. Across p = 0.5 trials (response and nonresponse conditions collapsed for statistical power), the LC between BOLD signal change time-locked to notifications of wins “+$1.00” versus nonwins “+$0.00” activated anterior and posterior cingulate cortices (Table 3). Certain gains contrasted with certain non-gains activated VS and mesofrontal cortex in non-response conditions, and activated several points of motor circuitry in response conditions.
These respective patterns of activation demonstrated general support for our hypotheses that: 1) anticipatory striatal activation in a MID task is dependent on the requirement to respond and not just on imminent, potential reward delivery itself, 2) VS activation in a MID task is enhanced by the uncertainty of reinforcement, and 3) VS activation by prospective reward is sensitive to an instrumental contingency together with uncertainty of reinforcement for successful response.
The absence of significant reward anticipation activation in non-response trials is in apparent conflict with single-unit studies (e.g. ) that demonstrate that subpopulations of VS neurons fire in response to reward-predictive cues under Pavlovian conditions. However, we note that even under conditions where reward delivery is not contingent on an instrumental response, in  and other reports, a forthcoming motor response is nevertheless signaled by the reward predictive cue in that the organism must prepared to lick and swallow a liquid reward. In contrast, non-response rewards in the FRA engendered no motor response. Alternatively, the lack of Pavlovian VS activation here may have resulted from the use of an abstract monetary reward, or because BOLD signal results predominantly from local field potentials (e.g. maintenance of gradients), not action potentials .
Interestingly, instrumental incentive-elicited activation in other nodes of the basal ganglia-thalamocortical circuit was more extensive in p = 1.0 trials compared to p = 0.5 trials, indicating instead a direct relationship between activation and the expected value (EV; the product of reward magnitude * probability) of the instrumental response in those motor effector regions. In addition, the p = 1 vs p = 0.5 LC also activated mesofrontal cortex under both Pavlovian and instrumental conditions, in accord with mesofrontal activation by increasing payoff probability as reported in a recent fMRI study of EV . Finally, this event-related experiment also replicated a previous finding of a block-design experiment - that activation of putamen and other striatal regions by environmental cues for potential rewards is critically dependent on the requirement for an instrumental response.
Activation of multiple points in the motor circuit by learned incentives has been demonstrated in other reports. For example, Haruno and Kawato  used a choice learning task to elicit incentive-dependent activation in bilateral superior parietal, dorsolateral prefrontal, dorsal premotor and occipital cortices, thalamus, supplementary motor area, and right superior temporal sulcus. These activations specifically correlated with the degree to which a stimulus-action-reward association was learned. Similarly, Lau et al. , reported that presentation of a visual cue to respond on one of two buttons to win an unspecified amount (versus nongain) activated the putamen and the CMA, with additional activation in left precentral sulcus and left postcentral gyrus when the proper choice of button on which to respond was signaled to the subject in advance. Finally, the posterior mesial frontal cortex activated by instrumental reward aniticaption in the FRA task included regions shown to activate when subjects prepare an intended motor response .
Dreher at al recently used a slot-machine task to assess activation by reward anticipation and feedback, and reported bilateral putamen activation during outcome anticipation when maximally uncertain reward outcomes (50%) were contrasted with more certain reward outcomes (25%). Our findings of more extensive VS voxels activated by the prospect of responding for an uncertain reward versus nonreward (compared to certain reward versus nonreward, which only activated posterior regions of putamen) shares this directionality, and also suggest that uncertainty-specific activation need not require a learning context in that subjects in both the present study and in the Dreher et al study had been explicitly briefed on the probability of rewarding outcomes.
Activation time-locked to notification of monetary outcomes activated several mesial cortical regions in uncertain outcome (p = 0.5) trials, and also activated mesofrontal cortex in the p = 1 vs p = 0 contrast in non-response conditions. We caution, however, that because outcomes always followed the anticipatory cue by 4 s, activations ostensibly attributed to reward notifications in p = 1 trials may have resulted instead from a protracted hemodynamic response to the reward-anticipatory cue. Temporal jittering between events within a trial would better distinguish between anticipatory and feedback activations.
In conclusion, these findings retrospectively assist the interpretation of previous activations by MID tasks, and indicate that anticipatory VS activation by reward-predictive cues in these and similar incentive tasks is at least partially dependent on uncertainty of reward delivery, the requirement to mobilize an instrumental response, and on the interaction of these two factors.