Our findings implicate a network of brain regions involving the medial prefrontal cortex, medial orbitofrontal cortex and dorsal striatum (specifically anterior medial caudate nucleus) in computing the causal effectiveness of an individual’s own behavior (
Balleine, 2005;
Balleine and Ostlund, 2007). These findings suggest that this network of brain regions may be responsible for the adaptive control of action selection under situations where the temporal relationship between actions performed and rewards obtained vary over time. Sensitivity to the contingency between actions and reward delivery is indicative of goal-directed or action-outcome learning in rats (
Balleine and Dickinson, 1998). Thus, the areas identified in the present study are also candidate regions for mediating goal-directed action selection in humans.
The results of the present study also demonstrate the utility of using a free operant paradigm to study human instrumental learning. Typically in the human and indeed nonhuman primate literature, action selection is studied in a trial based manner, where following the onset of a cue a single response is triggered. However, in the free operant case responding is unsignaled and self-generated, thereby allowing us to explore the means by which subjects’ can modulate their responses as a function of changes in reward contingencies over time, an issue not easily addressable through standard trial based approaches. Furthermore the degree of similarity between the free operant approach used here and that typically used in rodents makes it possible to build bridges between these two literatures and establish the degree of homology between the brain systems mediating instrumental learning in rodents and humans.
Our results suggest distinct contributions for different parts of prefrontal cortex and striatum in implementing goal-directed behavior. Whereas mOFC and dorsomedial striatum were more engaged by situations with high compared to a low contingency, suggestive of a role for these regions in mediating control of behavior by the goal-directed system, the mPFC, was also found to be sensitive to changes in local contingency between responding and reward delivery, suggesting that this region may play a direct role in the on-line computation of contingency. These findings raise the interesting possibility that the cortico-striatal circuitry involved in computing the causal efficacy of actions may be anatomically distinct from those circuits involved in using that knowledge to select and implement a course of action. The fact that mPFC contained representations of on-line causality, whereas dorsomedial striatum did not contain these representations but nevertheless was modulated by contingency suggests that in this case, signals in mPFC might be used to guide activity in its dorsomedial striatal target area. Similarly, an interaction has been described previously between prefrontal cortex and dorsomedial striatum in a rather different task context albeit running in the converse direction to that proposed here (
Pasupathy and Miller, 2005).
A number of previous studies have reported a role for dorsal striatum in processes related to contingency learning in humans. Delgado et al., 2005 used a trial-based approach to changes in neural activity over time while subjects’ learned instrumental associations. Activity in caudate at the time of choice was found to be present during initial learning of contingencies, but decreased over time as subjects’ learned the contingent relationship between responses and outcomes.
Tricomi et al., 2004 reported an increase in activity in this area while subjects perceived an instrumental contingency compared to when no such contingency was perceived even though subjects were in actuality always in a non-contingent situation. The present study demonstrates that caudate is directly modulated as a function of the degree of objective contingency, that is, under situations where contingency is high, activity in this region is increased, compared to situations where contingency is low.
Another important feature of our data is that we found both commonalities and differences in the brain systems exhibiting sensitivity to objective contingency and those responding to subjective causality judgments. While the same region of medial prefrontal cortex was found to respond to both, areas such as dorsolateral prefrontal cortex and lateral orbitofrontal cortex that were found to be active in relation to subjective causality judgments did not show significant objective contingency effects whereas dorsal striatum and medial orbitofrontal cortex found in the objective contingency contrast did not show up in the subjective causality contrast. The differences in the areas engaged in these two contrasts may relate to the fact that while subjective contingency is significantly correlated with objective contingency behaviorally, the correlation is by no means perfect, and thus the differences in the results obtained may highlight differences in the network of brain regions responsible for evaluating subjective awareness of causality from those involved in computing objective contingencies. These findings suggest that the brain systems involved in mediating subjective awareness of contingencies may be at least partly dissociable from brain systems involved in using knowledge of those contingencies to guide behavior.
Another notable feature of our data is the overall decrease in activation in the RESPOND phase compared to the REST phase in mOFC and mPFC (but not in striatum). This effect might relate to the suggestion that vmPFC is part of a network of brain regions that increase in activation when subjects are at rest; the so called “default” network (
Gusnard et al., 2001). However, while this effect may account for the overall differences in activation between RESPOND and REST periods in these regions, it is unlikely that differences observed in these areas as a function of contingency within the RESPOND period across sessions could also be explained by this phenomenon: no significant differences were found in overall response rates or responses per reinforcer in high compared to low contingency conditions, suggesting that the degree of task-related effort exerted is equivalent across these conditions.
While neural responses in a number of brain regions including orbitofrontal cortex but in addition amygdala and ventral striatum have previously been found to be related to expected future reward in relation to the presentation of particular cues or stimuli, these studies are likely to be probing brain systems involved in stimulus-outcome learning, in which associations between a given context and the reward presented in that context are learned, irrespective of whether an action is performed or not and, even if an action is performed, whether or not that action is contingent on reward delivery (
Schoenbaum et al., 1998;
Gottfried et al., 2002, ,
2003;
Paton et al., 2006). Such stimulus-outcome processes may be always present during instrumental conditioning along side action-outcome and stimulus-response learning components. However, the results of the present study are unlikely to be attributable to encoding of stimulus-outcome relationships; no discriminative stimuli were used to signal whether or not an outcome would be delivered at any given point in time, other than the performance of the actions themselves. Although in principle the interval between rewards could act as a form of temporal cue to reward delivery, the fact that no significant difference was found in the mean intervals between rewards in the high and low contingency conditions helps to rule out that explanation for the difference in activation observed between these two conditions.
Habitual or stimulus-response learning processes are also known to be engaged during instrumental conditioning (
Dickinson and Balleine, 1993). However, when behavior is under control of the habitual system, rats become insensitive to changes in contingency between actions and outcome, such that responding persists on an action even if the outcome is no longer contingent on that action (
Balleine and Dickinson, 1998). Thus, the areas identified in the present study most likely pertain to associative learning processes related to the encoding of action-outcome and not stimulus-response associations. This possibility is also supported by previous studies implicating neurons in these areas in discriminating between different action-outcome associations (
Matsumoto et al., 2003;
Schultz et al., 2003), exhibiting sensitivity to reinforcer devaluation during reward-based action selection (
Valentin et al., 2007), and in showing increased activity during the perception of a response-reward contingency compared to when no contingency is perceived (
Tricomi et al., 2004).
To conclude, the present results highlight the brain systems involved in the adaptive control of behavior in humans. Activity in a network of brain regions including medial prefrontal cortex, medial orbitofrontal cortex and dorsomedial striatum was found to track changes in objective contingency. These findings in humans show remarkable parallels to previous results implicating medial frontal and dorsomedial striatum in mediating similar functions in the rodent brain (
Balleine and Dickinson, 1998;
Killcross and Coutureau, 2003;
Yin et al., 2005;
Balleine et al., in press). Indeed, this similarity between species appears to lead to the important conclusion that the brain systems involved in controlling goal-directed action selection are heavily conserved across mammalian species.