|Home | About | Journals | Submit | Contact Us | Français|
In both the wild and the laboratory, animals' preferences for one course of action over another reflect not just reward expectations but also the cost in terms of effort that must be invested in pursuing the course of action. The ventral striatum and dorsal anterior cingulate cortex (ACCd) are implicated in the making of cost-benefit decisions in the rat but there is little information about how effort costs are processed and influence calculations of expected net value in other mammals including the human. We carried out a functional magnetic resonance imaging (fMRI) study to determine whether and where activity in the human brain was available to guide effort-based cost-benefit valuation. Subjects were scanned while they performed a series of effortful actions to obtain secondary reinforcers. At the beginning of each trial, subjects were presented with one of eight different visual cues which they had learned indicated how much effort the course of action would entail and how much reward could be expected at its completion. Cue-locked activity in the ventral striatum and midbrain reflected the net value of the course of action, signaling the expected amount of reward discounted by the amount of effort to be invested. Activity in ACCd also reflected the interaction of both expected reward and effort costs. Posterior orbitofrontal and insular activity, however, only reflected the expected reward magnitude. The ventral striatum and anterior cingulate cortex may be the substrate of effort-based cost-benefit valuation in primates as well as in rats.
Animals choose a course of action not simply on the basis of the expected reward but also on the actions' potential costs (Hull, 1943; Charnov, 1976; Stephens and Krebs, 1986). For example, birds' choices reflect not just the reward rate associated with courses of action but also metabolic costs of the actions themselves (Bautista et al., 2001). When deciding between two repetitive response options, both rats' and monkeys' choices reflect not just the expected reward magnitude but also the number of responses that comprise each action (Walton et al., 2006).
A seminal series of experiments emphasized the importance of dopamine, particularly in the ventral striatum, in mediating effort-related behavior in the rat (Salamone et al., 1994; Salamone et al., 2003). More recently medial frontal cortex, particularly the Cg1 and Cg2 fields of ACCd, has also been implicated in effort-based cost-benefit decision-making in the same species (Walton et al., 2002; Walton et al., 2003; Schweimer and Hauber, 2005).
Despite increasing knowledge of the neural basis of effort-based decision-making in rats and birds, little is known of how such decisions are made by primates including humans. Neuroimaging studies of human delay-based decision-making have been conducted but behavioral (Stevens et al., 2005), pharmacological (Denk et al., 2005), and lesion studies (Aoki et al., 2006a; Aoki et al., 2006b; Rudebeck et al., 2006) in other species suggest that delay- and effort-cost decisions depend on partially separable neural systems. In a direct test of this, it was demonstrated that ACCd and orbitofrontal cortex (OFC) lesions in rats produce dissociable deficits that can be interpreted respectively as impairments in reward-effort decision-making and the ability to sustain reward expectations across a delay (Rudebeck et al., 2006). The prediction for the present human fMRI experiment was, therefore, that while ACCd activity might reflect both anticipated reward and effort, activity in OFC, particularly caudal agranular OFC areas bordering the insula which most resemble rat OFC (Preuss, 1995), would reflect reward expectation in isolation.
To investigate this, we scanned human subjects while they performed a sequence of effortful actions to obtain secondary reinforcers associated with of one of two amounts of money. The effortful action entailed the repeated cancelling of visual targets using a track-ball computer mouse. The number of targets to cancel, and hence effort, varied between four levels. Visual cues at the beginning of each trial indicated the levels of reward and of effort to be expected.
An influential account of ACCd implicates it in detection of response conflict (Botvinick, 2007) and so an important feature of the experiment was an absence of any opportunity for choosing between response alternatives. Instead the study focused on cue-locked activity to discern whether and where in the brain there was activity reflecting the effort costs and reward benefits of an option that could be used to guide the making of a decision had another alternative been available. The study therefore adopts an approach that proved fruitful when studying neural representations of reward magnitude and probability (Knutson et al., 2005).
19 subjects participated in the experiment (10 females), all right-handed (ages 19-27). The fMRI data from three subjects was excluded from the fMRI analysis due to registration problems or scanner faults. Even sufficient behavioral data for analysis was not available for one of these subjects. All subjects gave written informed consent, and the study was carried out under permission from the Central Oxford Research Ethics Committee (COREC, 05-Q1606-8).
A schematic of the task is shown in Fig. 1a. The trial began with a fixation cross displayed for 2sec, followed by the appearance of one of the circular cost-benefit cues (70mm in diameter) in the centre of the screen which informed the subject which trial type they were experiencing. After a variable interval (2.5-4sec) the fixation cross reappeared in the centre of the cue indicating that the subject was required to make a button press response to move on to the effort investment period. The time they took to make the button press response after the appearance of this cross was designated the response time (RT). After subjects had responded, the cue and fixation cross remained on the screen for another 2 seconds, following which, after a brief delay where just the fixation cross was present (1sec), the effort period began. In the effort phase subjects were required to cancel white square targets (10×10mm in size), by moving a cursor to the target position using a track ball mouse. Targets appeared consecutively in random locations on a black screen (except that a target never appeared in a location which overlapped with that of the preceding target), each target appearing after the previous one had been cancelled. The number of targets appearing in a trial determined its associated effort level (low effort – 3 or 4 targets; high effort – 15 or 20 targets). All responses were made using a Logitech Trackman Wheel corded trackball mouse (Logitech UK Limited, Berkshire).
There were eight different types of trial defined by the combination of the four effort levels and the two reward levels (Table 1). Each up-coming trial type was signaled by the circular cost-benefit cue presented at the start (Fig. 1b). The cues were similar to those used in a previous study of reward anticipation (Knutson et al., 2005). The vertical and horizontal black lines on the cues indicated the amount of effort (left = low effort; right = high effort), and reward (top = high reward; bottom = low reward) to be expected respectively. The “effort” line was placed across the width of the circle according to a logarithmic scale to ensure that the difference between 3 and 4 clicks could be easily seen.
After all the targets had been cancelled, the fixation cross re-appeared for 2 seconds before subjects then received visual feedback about the amount of money they had won in the form of a reward “totaliser”. The totaliser showed the amount gained on the current trial (either high reward: 25p (£0.25), or low reward: 5p (£0.05)) as a red rectangle on top of the money previously gained in the block in white, and remained on the screen for 2sec. This was followed by an inter-trial interval of 2sec.
Subjects were never explicitly told of the reward and effort contingencies of the cost-benefit cues, but instead learned them during a training block outside of the scanner where they performed 12 practice trials of the task pseudo-randomly arranged so that they saw each type of stimulus at least once. We ensured that subjects assigned motivational significance to each cue during the fMRI experiment by telling them that they only had a limited time period in which to win money. In fact, subjects had unlimited time to finish the experiment.
In the fMRI experiment, subjects completed six blocks of 16 trials (pseudo-randomly arranged so that each cue type was presented twice in a block), with a brief rest period (20s) between each block. The task was controlled by Presentation software (Neurobehavioral Systems).
We established whether subjects' behavior was influenced by the net value of the cost-benefit cues by examining RTs to the reappearance of the fixation cross prior to the effort period to see if they were influenced by the expected level of reward or effort. However, we also wanted an explicit measure of subjects' understanding of the reward and effort components of the cues. To obtain this, after the fMRI experiment, we performed a second behavioral experiment in which subjects were given choices between pairs of the cost-benefit cues (presented on the left and right of fixation). The task was identical to the fMRI experiment except that subjects were asked to choose between two alternative cost-benefit cues by clicking the left or right mouse button, with the object being to win as much money as possible. As in the fMRI experiment, there was no imposed time limit, although subjects were encouraged at the start to perform the task as quickly and accurately as possible. Choices were categorical such that, on each trial, the effort and reward experienced corresponded to the cue that had been selected and subjects were unable to change their choices to the other option during a trial. The patterns of choice on trials which differed only in terms of either the effort level or the reward magnitude were analyzed to see if subjects preferred low effort over high effort options and high reward over low reward options respectively.
We used a 3-Tesla Siemens MRI scanner (maximum gradient strength 40 mTm−1) with a 4-channel Nova birdcage coil to collect T2*-weighted echoplanar images (EPI) (41 × 3mm slices, TR = 3.0sec, TE = 30ms, matrix 64×64 voxels, FOV 192mm × 192mm). We used a slice angle of +15° from the horizontal plane as used by Diechmann et al. (2003) for optimizing scans of orbital and ventral frontal brain regions. A T1-weighted FLASH image was acquired for each subject (TR = 3ms, TE = 4.71ms, flip angle = 80°, voxel size 3×1×1mm).
Analysis was carried out using tools from FMRIB's Software library (http://www.fmrib.ox.ac.uk/fsl). We discarded the first four fMRI volumes to allow for T1 equilibrium effects. We carried out Probabilistic Independent Components Analysis on the rest of the images (ICA) (Beckmann and Smith, 2004) to identify and remove large motion artifacts, and an artifact related to an intermittent fault in the radiofrequency head coil which affected the data from one subject. We corrected the ICA-adjusted data for motion (Jenkinson et al., 2002). The data in each volume were spatially smoothed with an 8mm full-width half-maximum Gaussian kernel. We applied a high-pass temporal filter of 75sec to the data to remove low-frequency noise which may arise from scanner drift. Local autocorrelation correction (Woolrich et al., 2001) was used instead of low-pass filtering. The 4D data set was normalised with grand mean scaling (Aguirre et al., 1998). Images were skull-stripped (Smith, 2002) and then co-registered using FMRIB's Linear Registration Tool, each subject's EPI images being registered with their high resolution structural image and transformed into standard (MNI) space carried out using affine transformations (Jenkinson and Smith, 2001; Jenkinson et al., 2002).
The fMRI data analysis focused on three brain regions, the ACCd, striatum, and in the vicinity of the dopaminergic midbrain that have all previously been implicated in effort-cost benefit related decision-making. Data were analyzed using a univariate general linear model approach with cluster-based thresholding (clusters determined by Z=2.3 and a significance threshold of p<0.05, (Worsley et al., 1992) corrected for multiple comparisons.). Higher-level analysis was carried out using FMRIB's Local Analysis of Mixed Effects (Beckmann et al., 2003; Woolrich et al., 2004). The 8 types of cue were modeled separately in the analysis as explanatory variables (EVs) (see Fig. 1b) (note that all of the 8 cue EVs were orthogonal to each other). Also included in the model were two EVs for the movement of the mouse in the effort period (one modeled as a flat regressor consisting of each motor movement and the other as a linearly increasing ramp lasting the duration of the effort period on each trial), one EV for trial-by-trial response times to the reappearance of the fixation cross after the variable period of cue presentation, and one EV for the feedback period. These regressors were convolved with a hemodynamic response function (Gamma function of 6sec with standard deviation of 3sec). Temporal derivatives of the EVs were included as covariates of no interest to improve statistical sensitivity, along with each individual subject's motion parameters (Smith et al., 2004).
The principal contrasts all concerned the anticipatory activity time-locked to the cost-benefit cues relating to appropriate combinations of the 8 EVs which correspond to the 8 types of cues. Signals relating to the cue period and the start of the effort period could be unconfounded as the interval between them was jittered to be between ~5.5-7sec (plus the RT on each trial). In order to test the prediction that ACCd, striatal and midbrain activity would be related to both reward and effort expectation, a contrast was employed in the first stage of analysis that indexed a cost-benefit value for each cue, assumed to be the net value of responding, i.e., the amount of reward anticipated on a trial (25p or 5p) divided by the amount of effort anticipated on each trial (3, 4, 15 or 20 responses) (net value contrast). Prior to inclusion as a regressor in the analysis, we log transformed the net values to take account of the fact that there is good evidence that animals may process costs and benefits on an internal logarithmic scale (Brunner et al., 1992; Bateson and Kacelnik, 1995), and subtracted the overall mean of the 8 cues' net value from the value of each individual cue. While there is precedent for using a logarithmic scale when considering costs and benefits and our response time (RT) data were generally better explained using log transformed effort levels than by non-log-transformed effort levels, it should be noted that our main results were comparable regardless of whether we used log-transformed or non-log-transformed data in our analyses (data not shown).
While the main analysis centered on the net value contrast, the interpretation of the results of this contrast was aided by an additional pair of contrasts. In the first of these, signals relating to the high reward and low reward expectation cues regardless of anticipated effort were compared (reward contrast). The second additional contrast indexed the four different levels of effort anticipated on different trials regardless of reward expectation, looking principally for increasing signal with decreasing effort (and hence increasing net cost-benefit value) (effort contrast). The effort levels were also log-transformed and the overall mean of these for the 8 cues was subtracted from each score prior to their use as regressors or contrasts in the fMRI analysis.
Because the net value contrast was itself dependent on differences in either reward or effort anticipation it was necessary, in a second stage of analysis, to test whether the BOLD signal in a brain area was genuinely sensitive to both effort and reward anticipation factors rather than just to one factor by conducting additional analyses. Percentage signal change against an implicit baseline representing the unexplained variance in each subject's timeseries that is not explicitly modeled as part of the general linear model (GLM) (e.g., the inter-trial intervals) for the effects of the 8 different cue types were therefore calculated in regions of interest (ROIs) centered over net value contrast activation peaks in ACCd, ventral striatum, and in the vicinity of the dopaminergic midbrain (D'Ardenne et al., 2008). Despite the focus on these three areas, we also carried out additional ROI analyses in other peaks of activation identified by the reward or effort contrasts in order to check if the BOLD signal in any other area was modulated by both reward and effort or an interaction of both factors. In particular, we investigated whether there was any signal change in parts of medial and central orbitofrontal cortex which have recently be associated with measures of net value that look at how much money hungry people are willing to pay in order to receive a particular food item (Plassmann et al., 2007; Hare et al., 2008). Given that previous studies have reported distinct effects of ACCd and OFC lesions on cost-benefit decision-making in rats (Rudebeck et al., 2006) a region of BOLD signal change in the posterior OFC/insula was also investigated.
An advantage of the effort-based cost-benefit decision-making task is that the effortful course of action and the secondary reinforcing outcome was actually experienced by the subject during each trial. Single unit recording studies have demonstrated cells in several regions, including the OFC, ventral striatum, dopaminergic midbrain, and the ACCd, which respond at different points during an extended multi-step response schedule towards reward (Shidara et al., 1998; Shidara and Richmond, 2002; Ravel and Richmond, 2006). However, only ACCd units have been shown to exhibit characteristic progressive changes with proximity to the final reward (Shidara and Richmond, 2002). To investigate whether we could observe similar effects in the human brain, we also investigated whether there were any significant changes in signal as subjects progressed through the effort requirement for each of the regions showing significant cue-related BOLD activity from the above analyses. We were only able to perform this analysis on high effort trials as when the effort level was low it was not possible reliably to distinguish the effort investment period from the reward delivery period due to the short duration of the effort period.
ROIs to be subjected to further analysis were derived as spheres of radius 4mm centered on the peak voxel in each region identified in the initial whole-brain cluster analysis, transformed into the space of each individual subject's functional data. For each of these, the mean percentage change in the parameter estimate averaged across all the voxels in the ROI was calculated from an implicit baseline of 10000 (an arbitrary value representing the re-scaled overall mean for each time-series) (Smith et al., 2004). The results were then analyzed using a repeated-measures GLM.
We examined whether the mean response time taken to respond to the cue (Fig. 1c) depended on subjects' effort or reward expectations (including those for 2 subjects for whom it was not possible to analyze the imaging data owing to scanner or registration problems). Response time (RTs), excluding trials where subjects' responses were greater than 3 standard deviations away from their mean, were examined with a repeated-measures GLM with 2 within-subjects factors reflecting reward and log effort expectation. There was a significant linear effect of effort (F1, 17=8.58, p=0.009) and a linear interaction between reward and effort (F1, 17=6.97, p=0.017), with the main effect of reward also approaching significance (F1, 17=3.65, p=0.073). In summary, subjects were faster to respond to the reappearance of the fixation cross after cues that signaled lower levels of effort expectation and higher levels of reward expectation.
The significant effect of reward and effort expectations on RTs implicitly demonstrated that subjects appreciated the significance of the cue despite the lack of opportunity for making choices during the course of the fMRI experiment. After the fMRI experiment, however, subjects were given a second behavioral test in which they were given explicit choices between pairs of cue-signaled options. Subjects clearly understood the meaning of the cues as they chose the more advantageous option in terms of high reward when the effort levels were equal (95.5%) (t17=32.221, p<0.001) and low effort when the reward sizes were the same (88.9%) (t17=12.907, p<0.001) (Figure 1d).
The net value contrast, examining correlates of cue-related, effort-discounted reward value in the brain, identified several significant clusters of activation (Table 2) including those which we had predicted would be active based on studies of effort-based cost-benefit decision-making in rodents: left ACCd, striatum (including both ventral striatum and putamen) and midbrain regions (Figure 2a-c, table 2; approximately similar striatal and midbrain areas were identified in each hemisphere). Although evaluating the precise location of midbrain activations is problematic given the small size of the dopaminergic nuclei and the problems with group registration in this region (D'Ardenne et al., 2008), close inspection of the activated voxels suggests that the midbrain activations in both hemispheres likely included both the substantia nigra and ventral tegmental area (Figure 2c). If an area's activity is sensitive to reward expectations it may be identified by the net value contrast even if its BOLD signal is not altered by effort anticipation; this is a consequence of the net value contrast's partial dependence on differences in reward expectation. It was therefore necessary to test whether both effort and reward each affect the BOLD signal in each of these areas.
Further analysis of the percent signal change for the effects of the 8 cue types for each region (Figure 3a-f; the two different levels of both high and low effort have been collapsed for illustrative purposes) confirmed that both the anticipated effort and reward determined the BOLD changes bilaterally in the ventral striatum (left ventral striatum, main effect of reward: F1,15=6.011, p=0.027, linear main effect of effort: F1,15=6.116, p=0.026; right ventral striatum, main effect of reward:F1,15=7.926, p=0.013, linear main effect of effort:F1,15=4.682, p=0.047), and in the left midbrain (main effect of reward: F1,15=7.133, p=0.017, linear main effect of effort: F1,15=7.903, p=0.013) (the right midbrain showed a main effect of reward only: F1,15=8.435, p=0.011; main effect of effort: F1,15=3.070, p=0.100). In the left ACCd, there was an interaction of reward and effort (F1,15=6.618, p=0.021). However, the putamen signal only reflected the anticipated effort component (linear main effect of effort: F1,15=17.090, p=0.001; main effect of reward: F<2.8, p>0.1) (Figure 4a, c). In summary, BOLD signal changes in the ventral striatum, in the vicinity of the dopaminergic midbrain and in ACCd were modulated by both reward and effort expectation.
Using our calculation of net value which assumes no additional discounting owing to time differences, two pairs of cues (high effort / high reward pair and the low effort / low reward pair: “reference value” cue pairs) had an equivalent net cost-benefit value even though they were associated with different component reward levels and effort levels (see Table 1). If the activation in a region is caused by anticipation of the net effort-discounted reward value, it should be possible to demonstrate with post-hoc tests on the percentage signal change that the response to the cues with highest net value (low effort / high reward) is greater than that to either of the reference value cue pairs even though they signal either the same expected reward magnitude or the same anticipated effort expenditure. Similarly, the opposite should hold true (smaller signal change) when comparing the cues with lowest net value (high effort / low reward) with either of the reference value cue pairs. While none of the above regions showed this complete pattern, in the ventral striatum across both hemispheres three out of the four possible comparisons were significant (although only one after correcting for multiple comparisons) (high net value vs low reward/effort reference: F1,15=7.42, p=0.016; low net value vs low reward/effort reference: F1,15=8.99, p=0.009; low net value vs high reward/effort reference: F1,15=4.54, p=0.050) and the other approached significance (high net value vs high reward/effort reference: F1,15=3.97, p=0.065). This implies that the ventral striatum activation was being primarily driven by a net cue-related effort-discounted reward value, rather than one of the component aspects of the cue, reward expectation or effort expectation, in isolation.
Persistent effort also contains a temporal component as well as a pure effort-based cost. To investigate whether our net value signals were related simply to the average amount of time it took subjects to perform the effortful phase of the task, and hence the expected time before the reward would be received, we looked for a negative correlation between average effort investment period duration and the BOLD signal across all of the effort levels (divided up by reward level). Even when treating each observation as an independent variable (i.e., 4 levels of effort × 16 subjects = 64 data points per region), none of the regions showing both reward and effort effects (ACCd, bilateral ventral striatum or midbrain) exhibited this pattern (all Rs > −0.13, ps > 0.3). Furthermore, we also calculated for each subject the linear coefficient of the best fit line for the BOLD signal in a particular region across different levels of effort (divided up by reward magnitude) and compared this with the linear coefficient for average duration of the effort investment period across different levels of effort (also divided up by reward magnitude). Again, a pure effect of the average time to complete the effort requirement on BOLD signal change should result in significant negative correlations between these measures. However, none of the correlations in any of the activated regions approached significance (all Rs > −0.18, ps>0.5). Taken together, this suggests that temporal element of the effort was not clearly driving the net value-related signal change in response to the cues.
In addition to the regions of the ACCd, ventral striatum and midbrain, we extended our analyses to examine significant signal changes in other frontal and subcortical areas which were not part of our a priori hypotheses but which were identified by the net value contrast, to investigate whether these too might reflect both reward and effort expectation (i.e., the net cost-benefit value parameters as calculated here). Several premotor and motor regions, such as the region spanning the supplementary motor area (SMA) and cingulate motor area in the posterior medial frontal cortex (Figure 2b, table 2), were identified by the net value contrast but, as for the putamen, further analysis of extracted percent signal change for the eight cue types demonstrated that the BOLD signal change was only determined by effort expectation alone and was not modulated by reward expectation (Figure 4b, d, SMA/cingulate motor area: linear main effect of effort F1,15=25.633, p<0.001, main effect of reward and interaction of effort x reward F1,15<1.66, p>0.05). However, no region within the orbitofrontal cortex was activated at this threshold. We also extended our analyses to focus specifically on BOLD signal change in ROIs centered on the parts of central and medial OFC activated in two recent studies investigating the monetary price subjects were willing to pay for food items (Plassmann et al., 2007; Hare et al., 2008). Again, none of these OFC regions showed effects of either reward or effort (see supplementary Figure 1).
Finally, we considered the possibility that brain areas identified by either the reward or effort contrast might also contain information about the other factor that might not have been sufficient to reach significance when correcting for multiple comparisons or cluster size. These analyses also helped identify regions that responded uniquely to either expected reward magnitude or effort costs in the absence of any modulation by the other factor. A large activation was identified by the reward contrast in the anterior insula and posterior OFC (Figure 2d, e, table 3). Reward expectation, but not anticipated effort, however, was the sole significant determining factor in this region (Figure 5b, d, left insula/posterior OFC: main effect of reward: F1,15=6.631, p=0.021; right insula/posterior OFC: F1,15=6.484, p=0.022; main effect of effort or interaction between effort and reward: all Fs<1, ps>0.34). The effort contrast (increasing signal to smaller effort costs) identified several motor regions (see Table 4) that had also been identified by the net value contrast but as already explained further tests of the extracted time course from these areas failed to confirm the possibility of simultaneous reward and effort encoding.
Although both reward and effort modulated activity throughout much of the ventral striatum, the reward and effort contrasts highlighted that activity was sometimes more closely related to one factor than the other within certain regions of striatum (Figure 2f). As already described, the putamen signal was only significantly modulated by the anticipated effort and not by reward. By contrast, activity in a rostral ventral striatum ROI centered on the peak voxel from the reward contrast was found to be primarily driven by expected reward magnitude and not by effort anticipation (main effect of reward, left rostral ventral striatum: F1,15=5.98, p=0.027; right rostral ventral striatum: F1,15=10.766, p=0.005; main effect of effort: both Fs < 2.8, ps>0.12) (Figure 5a, c). In order to test directly whether reward and effort anticipation were differentially driving the activation in the two regions, we ran a single ANOVA with within-subjects factors of region (rostral ventral striatum vs putamen), reward, and effort. This demonstrated both a significant region × reward (F3,45=6.685, p=0.021) and a region × effort interaction (F3,45=9.268, p<0.001), indicating that increases in expected reward magnitude were preferentially activating the rostral ventral striatum over the putamen whereas decreases in anticipated effort expenditure were preferentially activating the putamen over the rostral ventral striatum.
We examined the BOLD signal as subjects engaged in the effortful course of action leading to the secondary reinforcer in regions that showed cue-related effects above. As well as the previous cell recording findings from monkeys showing changes in firing rates in ACCd as animals progressed through a sequence of movements towards reward (Shidara and Richmond, 2002), the analysis was additionally motivated by the finding that while there was clear effort/reward cue-locked activity in some areas, such as ACC (Figure 3d) in other areas such as the striatum and midbrain (Figures 3e-f, ,4d)4d) the cue regressors identified BOLD signal changes that did not swiftly return to the pre-cue baseline but remain tonically changed.
For the regions about which we had a prior hypotheses, while the BOLD signal remained relatively constant in the ventral striatum and midbrain region, at low and high levels respectively (Figures 3h, i), the signal in ACCd increased throughout the effort investment period as subjects engaged in the persistent sequence of actions towards the anticipated reward (Figure 3g). In order to test this effect, we examined the linear coefficients fitted to the data during the entire effort period using an ANOVA with a within subject factor of reward. This showed there was a significant positive slope in the ACCd signal across both the high and low reward signals (F1,15 = 6.389, p=0.013) but not in the signal in the ventral striatum, putamen or midbrain (all Fs<1.7, ps>0.2). When the above analysis was extended to the other regions investigated in the cue phase of the task (Figure 4e, f, and 5e, f), only the signal bilaterally in the insula/posterior OFC also showed a significant linear effect (left insula/posterior OFC: F1,15 = 12.205, p=0.003; right insula/posterior OFC: F1,15 = 4.98, p=0.041) (Figure 5f).
Prior to taking a course of action, BOLD activity in human ACCd, ventral striatum and a region including the dopaminergic midbrain reflects not just the expected level of reward but also the amount of effort that will be exerted to obtain reward. Activity in these regions, particularly in striatum and midbrain, varied continuously as a function of the net cost-benefit value of the intended course of action rather than only being present on trials when subjects evaluated options where reward followed small effort expenditure (cf. Kable and Glimcher, 2007). These areas are monosynpatically interconnected in other primate species (Kunishio and Haber, 1994; Williams and Goldman-Rakic, 1998; Haber et al., 2006), and diffusion weighted imaging and tractography suggest similar connections exist in humans (Croxson et al., 2005; Beckmann et al., 2009). The interconnected areas may constitute a human brain system for the evaluation of effort-related cost-benefit decisions about how hard it is worth working and the value of persisting with a course of action given the expected rewards.
Although our study was designed to examine responses to cost-benefit cues in the absence of an opportunity to make choices between options, it is notable that similar areas are implicated in making effort-related cost-benefit decisions in rodents (though additional regions may also be recruited when people make choices about how much effort to exert). Dopamine-depleting lesions of ventral striatum (Salamone et al., 1994; Mingote et al., 2005) and ACCd lesions (Walton et al., 2003; Schweimer and Hauber, 2005; Rudebeck et al., 2006; Floresco and Ghods-Sharifi, 2007) impair effort-related decision-making. If ACCd, striatum, and dopaminergic nuclei of the midbrain play similar roles during effort-related valuation and decision-making in rodents and primates such as humans, then it is tempting to speculate that the circuit is important for many types of mammal. The arcopallium intermedium plays an analogous role in birds (Aoki et al., 2006a).
The ACCd BOLD signal in the present study showed a phasic increase with increasing reward expectation and, for the high reward, decreased with increasing anticipated effort. However, recent single neuron data from monkey ACCd indicates that there are approximately equal numbers of cells increasing their firing rate as the numbers of movements before reward increases (low net value) as there are with firing rates which increase with decreasing response requirements (high net value) (Kennerley et al., 2008). ACCd neuron activity, however, does tend to reflect the integrated value of a course of action: neurons usually exhibit either both increases with greater reward expectation and decreased effort requirement or the opposite. ACCd cells also represent likelihood information and ACCd region activated in this study is close to that shown to be sensitive to reward probability (Knutson et al., 2005). Taken together, this suggests that ACCd's role in action valuation reflects not just the level of reward and likelihood associated with a course of action, but also the effort costs intrinsic to the action.
The anatomical connections of ACCd are unique in the primate in that there are connections with both the motor system and areas such as ventral striatum, putamen, and amygdala that process reward information (Van Hoesen et al., 1993; Morecraft et al., 2007). The pattern of connections in rodent ACC is comparable, though the specialization is arguably less as adjacent medial frontal regions also share limbic and motoric connections (Heidbreder & Groenewegen, 2003; Gabbott et al., 2005). In macaque, ACCd has little role when required to make decisions based on the expected values associated with stimuli, but it is critical when decisions are based on values associated with different actions (Kennerley et al., 2006; Rudebeck et al., 2008). In addition to receiving information about action plans and reward it is notable that the same ACCd region is responsive to changes in internal energy metabolism and glucose levels (Teves et al., 2004). Two recent fMRI studies investigating how much hungry subjects are willing to pay to receive food items, however, have emphasized signals in the medial and central OFC rather than in ACCd or ventral striatum (Plassmann et al., 2007; Hare et al., 2008). One plausible interpretation is that the ACCd and ventral striatum process the net value of an action which can only be obtained after having exerted energy whereas parts of OFC may be more concerned with how more abstract commodities, such as money, but not necessarily energy, should be spent to obtain reward.
Medial frontal cortex is important not just during reward-guided action selection but also when actions are selected under conditions of conflict (Botvinick et al., 2004). In the present study subjects had only one available option, meaning that ACCd activation cannot just be caused by action conflict. The current experiment was not optimized for data collection during reward delivery but a recent study shows ACCd and ventral striatal activity are also present at these later times, with outcome-related signals modulated by the amount of mental effort required to complete the task (Botvinick and Huffstetler, in press).
Neurons in striatum, dopaminergic midbrain and ACCd are also active as monkeys work their way through schedules of responses to obtain reward but ACCd is distinguished by the presence of neurons showing increased firing rates as animals progress through such schedules (Shidara and Richmond, 2002). Medial frontal cortex signal also correlates with goal proximity in navigation paradigms (Spiers and Maguire, 2007). In the current study, activity in human ACCd, but not in any other region showing correlations with net value, increased as subjects worked through the effort period towards reward. Such signals may reflect a continuous computation of net value which could be important to allow animals to persist with working through a sequence of actions to obtain a distant goal. ACCd lesions in monkeys have previously been shown to impair the use of outcome information for deciding when to persist (Kennerley et al., 2006).
The midbrain, striatum and interconnected regions such as ventral pallidum have a role in representing many parameters that pertain to decision-making and effort, including the amount of force that people will exert for money (Pessiglione et al., 2006). While both reward and effort modulated BOLD signal in some striatal voxels, there were also voxels in which either factor in isolation influenced activity. Such segregation/integration fits with the pattern of cortico-striatal connections as posterior medial motor areas, which showed exclusively effort anticipation responses in the present study, are more likely to project to central and posterior putamen (Inase et al., 1996; Lehericy et al., 2004), whereas anterior insula / posterior OFC, which was only modulated by expected reward, have strong connections with rostral ventral striatum (Ferry et al., 2000; Croxson et al., 2005). The partial overlap of striatal reward and effort regions is reminiscent of a previous report of partially overlapping areas responsive to reward magnitude, probability and uncertainty (Tobler et al., 2007). In sum, while parts of the ventral striatum contains a combined cost-benefit net valuation of an option, adjacent striatal regions may also receive segregated information about its anticipated costs or benefits in isolation.
The ventral tegmental area and ventral striatum are also implicated in delay-based decisions in both rats and humans (Cardinal et al., 2001; Kable and Glimcher, 2007; Roesch et al., 2007). However, there are critical differences between delay- and effort-based decision-making. Whereas in effort-based tasks, animals have control over their response rate and therefore the average reward rate (responding faster, with its associated metabolic costs, will cause reward to occur more quickly), in delay-based tasks, reward rate is largely independent of animals' responding following a choice. Experiments with rats emphasize a pivotal role for OFC, but not ACCd, in delay-based decision-making (Cardinal et al., 2001; Kheramin et al., 2002; Winstanley et al., 2004; Rudebeck et al., 2006). Whereas ACCd integrates information about both reward benefits and effort costs of an action, rat OFC plays a central role in representing reward expectations across long delays without integrating information about effort (Schoenbaum and Roesch, 2005; Roesch et al., 2006).
Although a direct comparison of delay- and effort-based decision-making is beyond the scope of the current study, it was possible to look for brain areas where BOLD signal was modulated by reward expectation in the absence of effort sensitivity. Such activity was found in insula/posterior OFC. The agranular areas that constitute rodent OFC resemble the most posterior OFC and adjacent insula in primates (Preuss, 1995; Wise, 2008). Previous fMRI studies have also reported insula activity related to reward expectation, plus reward expectation uncertainty (Elliott et al., 2000; Tanaka et al., 2004; Knutson et al., 2005; Preuschoff et al., 2008). In birds there is also evidence that some brain areas integrate information about both reward and effort whereas others represent the parameters separately (Izawa et al., 2005; Aoki et al., 2006a; Aoki et al., 2006b). While an integrated representation of both reward and effort, as found in ACCd and striatum, may be a pre-requisite for decision-making, it is nevertheless equally important to represent other aspects of the expected reward independently of the action on which it is contingent (Daw et al., 2005).
Funded by the MRC UK and Wellcome Trust (PLC and MEW). We would like to thank Erie Boorman for helpful comments and support and Nils Kolling for help with data organization.