A fundamental question in the area of decision neuroscience concerns the nature of the associative representations being employed in the brain while subjects' are making choices between different decision options. Although a number of studies have reported expected reward signals in diverse brain regions including ventromedial prefrontal cortex during decision making, the specific associations underlying such signals in the human brain have hitherto remained unclear. Here, we present evidence from a human fMRI study to indicate that ventromedial prefrontal cortex is involved in tracking expectations of future reward attributable to particular motor actions. More specifically, activity in this region was found to scale with value signals derived from a variant of a computational reinforcement learning model while subjects performed a reversal learning task involving a choice between 2 motor actions, in the absence of specific discriminative visual stimuli to denote those choices.
We also compared and contrasted activity in the action-based reversal task with that elicited during the stimulus-based reversal task. In the latter condition, decision options are denoted via presentation of specific discriminative stimuli; however, the 2 physical actions denoting the different choice options are randomly assigned (depending on random spatial position of the 2 discriminative stimuli). In common with the action-based reversal task, we also observed expected reward signals in vmPFC while subjects performed the stimulus-based task, consistent with a number of previous reports (
Daw et al. 2006;
Hampton et al. 2006;
Kim et al. 2006;
Valentin et al. 2007). Stimulus-based reversal is often assumed to depend on stimulus–outcome associations, and indeed neuronal activity in orbitofrontal cortex recorded during performance of such a task typically reveals stimulus-related neuronal activity, but not response-related signals (
Hoshi et al. 2005;
Schoenbaum et al. 1998;
Seo and Lee 2007;
Thorpe et al. 1983;
Wallis and Miller 2003). Consequently, activity we observe in medial orbitofrontal cortex and adjacent medial PFC during the stimulus-based task may pertain to encoding of stimulus-outcome associations. According to this interpretation, expected reward signals in vmPFC could be driven by both stimulus–outcome and action–outcome associations depending on the task context. However, accumulating evidence from rodent lesion studies suggests that at least in the rodent brain, regions of prefrontal cortex involved in mediating stimulus–outcome (or Pavlovian) learning are distinct and dissociable from those regions involved in mediating action-outcome (or goal-directed learning) (
Balleine et al. 2007), with rat orbitofrontal cortex implicated in the former (
Ostlund and Balleine 2007) and prelimbic cortex implicated in the latter (
Ostlund and Balleine 2005). If these findings can be extrapolated to the primate brain, then this would rule out the interpretation that vmPFC (part of which may be homologous to prelimbic cortex in the rodent brain) is involved in both goal-directed and Pavlovian learning.
An alternative possibility compatible with the rodent data is that activity in vmPFC during the stimulus-based reversal task is in common with that in the action–outcome task, also being driven by goal-directed action–outcome associations. Although in the stimulus-based task the particular physical motor response required to implement a specific decision varies on a trial-by-trial basis (depending on where the stimuli are presented), it is possible for associations to be learned between a combination of visual stimuli locations, responses, and outcomes. Thus, the common involvement of vmPFC in both the action- and stimulus-based reversal could be attributable to the possibility that this region is generally involved in encoding values of chosen actions but that those action–outcome relationships are encoded in a more abstract and flexible manner than concretely mapping specific physical motor responses to outcomes. The more flexible encoding of “actions” that this framework would entail, may have parallels with computational theories of goal-directed learning in which action selection is proposed to occur via a flexible forward model system, which explicitly encodes the states of the world, the transition probabilities between those states, and the outcomes obtained in those states. In this context, an “action” is any behavior by the subject causing a particular path to be implemented through those states, for example, an action could be “when in state x choose the action that leads me to state y,” irrespective of the specific physical motor act required to implement that action (
Daw et al. 2005). The fact that value-related activity in vmPFC is best captured by an extension of reinforcement learning that encodes the structure of the reversal learning problem (and hence the appropriate transition probabilities between states), as shown in
Hampton et al. (2006,
2007), and used in the present study, is also consistent with the notion that vmPFC is involved in model- or state-based inference of this sort.
We also found an area of posterior amygdala extending into anterior hippocampus showing correlations with the value of the chosen option in both action- and stimulus-based reversal. Previous studies in both animals and humans have emphasized an important role for amygdala in encoding expected rewards (
Hampton et al. 2006;
Paton et al. 2006;
Schoenbaum et al. 1998). Moreover, interactions between this region and orbitofrontal cortex has been shown to be necessary for establishing expected reward representations in prefrontal cortex in both rodents and humans (
Hampton et al. 2007;
Schoenbaum et al. 1998, 2003). The results of the present study indicate that the amygdala may not be involved exclusively in encoding stimulus-reward value but might also be involved in encoding the value of chosen actions.
We found evidence for distinct encoding of value signals for specific physical motor actions (i.e., during the action-based task) compared with the stimulus-based task in more dorsal parts of the brain that unlike vmPFC are directly connected to primary motor cortex, such as the supplementary motor area (SMA) and midcingulate cortex. These areas are known to be involved in reward-based motor selection tasks in monkeys (
Shima and Tanji 1998) and more generally in response selection (
Picard and Strick 2001). These findings are compatible with the results of a recent monkey recording study in SMA which reported in increase in neuronal spike rates in this area as the animal approached a rewarding target (
Sohn and Lee 2007). A specific role for dorsal anterior cingulate cortex in reward-based action selection has been proposed by
Rushworth et al. (2007), on the basis of a series of both monkey lesion and human fMRI studies. In one such fMRI study, activity in anterior cingulate was observed under situations where subjects actively chose what action to take in a reward-related response task compared with a situation in which subjects were instructed to take a specific action, in which case anterior cingulate was not involved (
Walton et al. 2004). Furthermore, lesions of monkey anterior cingulate cortex were found to impair action selection based on the monkey's history of past reinforcement, but not adjustments in behavior following errors (
Kennerley et al. 2006). Similarly, a recent single-unit recording study in monkeys reported neurons in dorsal anterior cingulate cortex that were modulated by the reward history in accordance with value signals derived from a reinforcement learning model (
Seo and Lee 2007). Although in the present study, we found evidence for midcingulate involvement in both stimulus- and action-based reversal tasks, a part of this area was significantly more active during the action-based condition compared with the stimulus-based condition. Thus, our findings are broadly consistent with the possibility that when choices need to be made between different physical motor responses, additional circuitry in supplementary motor cortex and dorsal mid-cingulate cortex are recruited.
While midcingulate cortex was correlated with expected reward, we found a more anterior region of pericingulate cortex to be correlated negatively with expected reward during the stimulus-based reversal task (see
Supplementary Fig. 2). In other words, the less rewarding a particular chosen option was (according to the model prediction), the greater the activity in this region. Previously we have shown using a multivariate classification approach that activity in a similar region of anterior cingulate cortex is highly predictive of subjects' subsequent behavioral decisions while subjects are performing stimulus-based reversal learning (
Hampton and O'Doherty 2007), with increasing activity in this area predictive of subsequent changes in behavior. Complicating matters even further, we found extensive correlations with expected reward during both action- and stimulus-based reversal in posterior cingulate cortex. These findings therefore suggest that different regions of cingulate cortex (anterior, middle, and posterior) may mediate quite distinct functions during reward-based decision making, both in terms of the nature of the signals being encoded (whether they are positively or negatively correlated with expected reward), and the type of decision task in which they are involved (action based, stimulus based, or both).
In addition to testing for regions involved in tracking the value of particular actions, we also tested for regions correlating with subjects' actual choice behavior. In reversal learning, the subject can implement one of 2 types of behavior: either maintaining their choice of the current decision option or switching their choice to the alternate option. We tested for regions showing activation after subjects receive an outcome on a given trial but before subjects' make a choice on the subsequent trial, that is, activity that differs depending on whether subjects maintain their current choice (stay) or switch to the alternate option (switch). We found increased activity in anterior insula (frontoinsular cortex) extending into caudolateral OFC and bilaterally in the dorsolateral prefrontal cortex, when on subsequent trials subjects switched their behavioral choice compared with when they maintained the current choice, replicating a number of previous findings that have implicated these areas in signaling behavioral switches during reversal (
Cools et al. 2002;
O'Doherty et al. 2003). Switch-related activity was present in these regions during both action-based and stimulus-based reversal, suggesting that this region is involved in signaling changes in behavior irrespective of whether this behavioral change involves switching between specific motor actions or more abstractly, switching between different decision options.
While switch-related activity in the above regions was common to both action and stimulus-based reversal tasks, differential switch-related activity between the 2 tasks was found in left intraparietal sulcus (IPS). This area was selectively involved in signaling a switch in subjects' behavioral choices during action-based but not the stimulus-based reversal. Neurons in this area have previously been implicated in processes related to action-based decision making in nonhuman primates (
Dorris and Glimcher 2004;
Platt and Glimcher 1999;
Sugrue et al. 2004). A recent human fMRI study reported a change in activity in this region when subjects switched between exploratory and exploitative decision modes, such that activity in this region was higher when subjects decided to explore actions considered to have lower value than the best available option in order to gain more information about the rewards available on those actions (
Daw et al. 2006). Here, activity in this region appeared to be related to subjects' switching their choices, but only when those choices involved physical actions, not abstract options. Taken together, these findings support an important role for this brain region in action-based decisions.
It should be emphasized that the value signals we report in vmPFC and elsewhere correspond to the expected reward of the chosen option, whether it is the chosen physical action or the chosen discriminative stimulus. Such signals likely reflect the consequence of the decision process in the sense that the chosen option can only be encoded once the decision of what action to choose has been made. However, in order to make the decision itself, a different type of signal needs to be encoded, namely, the value of each individual option in the choice set, be it the value of specific actions or specific stimuli. On account of the anticorrelation between the action reward probabilities or stimulus reward probabilities in the reversal task used here, we cannot separately measure these prechoice action values and so cannot establish whether vmPFC also plays role in encoding such signals. Nevertheless, the chosen value signals we do report in the action-based task likely depend on retrieval of learned action–outcome associations, suggesting that this region does process action–outcome information, even if it is only corresponding to the value of the action ultimately chosen. It is notable that in addition to chosen values, vmPFC also contains signals related to the behavioral choice itself, with activity increasing in this area on trials where subjects decide to continue their current choice strategy as opposed to switching. The presence of behavioral choice signals in vmPFC alongside the value of chosen actions does appear to be consistent with an important role for this region in decision making, either by contributing directly to the decision or at the very least in reporting the consequences of the decision. Further studies will be needed to disambiguate these possibilities.
In conclusion, the main finding of this study is that we found a role for ventromedial prefrontal cortex in a reward-related decision making task during which subjects are required to make a choice between different physical actions (button press vs tracker ball slide), in addition to the previously reported role for this region in decision making tasks in which decision options are denoted by different discriminative stimuli. In both cases, activity in this region correlated significantly with expected future reward derived from a computational model. The finding that vmPFC is involved in action-based reversal in which no discriminative stimulus is present to signal the different decision options suggests that vmPFC is involved in action–outcome learning by encoding the expected future reward attributable to particular physical actions. The present findings therefore demonstrate that human vmPFC is not only merely involved in encoding the values assigned to particular discriminative stimuli but is also involved in encoding values assigned to particular physical motor responses. A parsimonious explanation for the present results is that vmPFC plays a general role in goal-directed learning, encoding action–outcome relationships irrespective of whether those actions correspond to specific physical motor actions or denote implementation of a decision option on a more abstract level.