Our findings complement the animal literature on the neural basis of habit learning and provide evidence for a habit learning system in humans. Although we all have anecdotal experience of performing outcome-inappropriate cue-driven behavior (e.g., stepping out of an elevator when the doors open, although it has stopped on the wrong floor), one might expect that humans would be able to suppress habitual tendencies more easily than other animals and would not repeatedly make outcome-inappropriate responses in a free-operant task. Our behavioral data show, however, a clear experience-dependent shift from goal-directed to habitual behavior in a free-operant task in humans. After minimal training, participants reduced their response rates during presentation of the fractal linked with the food they no longer wanted, whereas their response rates remained high during presentation of the fractal linked with the food they still found pleasant. After more extensive training, this outcome sensitivity was not present; response rates did not differ significantly during presentation of fractals whether linked with the valued or the devalued outcome.
In our experiment, we found a region in the posterior putamen extending into the globus pallidus that showed increasingly greater activation with experience to the onset of task-related stimuli relative to the onset of rest-related stimuli. In other words, this region became increasingly sensitive to stimuli that were associated with a particular behavioral response, consistent with a potential role in S-R learning. Based on our behavioral findings, 12 sessions of training on our task was enough to elicit habitual behavior, whereas after only two sessions of training, actions were still goal-directed. Our imaging results in this region correspond well with these behavioral results, in that this region showed a significant difference in task vs. rest cue sensitivity between the last two and the first two sessions of training, suggesting that the posterior putamen/globus pallidus region may play a central role in the development and/or control of habitual behavior in humans.
We also identified a similar area showing a within-day increase in task versus rest cue sensitivity. As shows, during the last session of the second day of training, this sensitivity is almost as strong as during the last session of the last day of training. This sensitivity appears to be diminished at the beginning of the third day of training, and then it increases again over the course of training that day. This may reflect a potential resurgence in goal-directed responding relative to habitual responding at the beginning of task performance each day. However, the fact that there is also a significant difference between the first two sessions of training on the first day and the last two sessions on the last day indicates there is also a cumulative effect of multiple days of training on activation in this region, consistent with the notion that this area becomes more involved after successive days of training, mirroring the behavioral development of habits.
These data also underscore the point that the transition from goal-directed to habitual control of behavior is highly dynamic and that the early phase of the habit learning process occurs even while behavior is still demonstrably goal-directed (Graybiel, 2008
). As these results make clear, it is not the case that the DLS is suddenly engaged at the moment that behavior becomes habitual. Rather, the recruitment of the DLS, and the degree to which S-R associations influence performance, increases gradually with training (Balleine & Ostlund, 2007
). Note, for example, that there is a slight (nonsignificant) increase in sensitivity of the DLS to the block onsets in the second session compared to the first session of training for the 1-day group. This small increase, however, is not enough to result in habitual behavior following devaluation; only with extended training and further increases in the sensitivity of the DLS to task-relevant cues is there a significant shift toward responding habitually following outcome devaluation.
The results of this study suggest that S-R habit learning may be mediated separately from goal-directed learning; in contrast to habits, goal-directed learning is commonly thought to depend on a process of response-outcome association (Colwill & Rescorla, 1985
; Dickinson & Balleine, 2002
). This is in line with evidence from research on rodents that the development of S-R habits, but not response-outcome associations, relies on the DLS, which corresponds to the dorsal putamen in humans (Yin et al., 2004
). The region we identified lies on the border of the putamen and globus pallidus, two basal ganglia subregions which are highly interconnected (Spooren et al., 1996
), and which are thought to play an important role in the “motor loop” of the cortico-basal ganglia-thalamo-cortical pathway (Parent & Hazrati, 1995
). Although this connectivity puts this area in a prime position to influence behavioral responses, the results of this study suggest that its role extends beyond motor control to a role in building up S-R associations. Indeed, the portion of the putamen that we found to show increasing sensitivity to response-linked cues is different from a left-lateralized, more anterior portion of the putamen that can be identified by a simple task versus rest comparison.
In contrast to the DLS, the vmPFC has been implicated in governing goal-directed action in humans (Valentin et al., 2007
). This region may play a role in supporting goal-directed behavior by representing the value of the upcoming outcome (Schoenbaum et al., 1998
; Tanaka et al., 2004
; Daw et al., 2006
; Hampton et al., 2006
; Kim et al., 2006
; Roesch & Olson, 2007
). In our experiment, we found that the vmPFC shows activation that ramps up from the block onset or previous reward until the next reward is presented, which is consistent with the idea that this region is involved in anticipation of an upcoming reward. Although performance of well-trained actions on VI schedules is not controlled by reward expectation within the interval, the role of reward expectation (and of vmPFC) in performance in undertrained actions suggests that this ramping may play a role in goal-directed but not habitual performance. Indeed, since the effect in the vmPFC does not appear to diminish with training, habitual behavior may come about not because the outcome value is no longer represented in the vmPFC, but rather because regions such as the DLS may come to preferentially influence behavior. That is, it appears that circuits responsible for goal-directed and habitual behavior are simultaneously engaged, but may compete for control of behavior. Habitual behavior may be produced as relative engagement of the DLS increases, even while the individual remains aware of outcome value. Indeed, rodent lesion studies show that disruption of the habit system reinstates goal-directed behavior, suggesting that goal-related representations remain intact even once the habit system has come to control behavior (Coutureau & Killcross, 2003
; Yin et al., 2006
). Similarly, habit learning need not involve a reduction in brain processing related to reward receipt. Indeed, the response in the nucleus accumbens to reward presentation remained consistent throughout the three days of training, indicating that it may process reward-related information even once control of behavior shifts toward being governed by habit rather than by the goal of obtaining the reward.
Due to our a priori hypotheses based on rodent work indicating a role of the striatum in habit learning, we have focused on our results in the DLS. However, other regions in the cortex also showed a significant increase in sensitivity to the task-relevant fractals over the course of training (Table S1
). For example, regions were identified in the temporal cortex in both our between-subjects and within-subjects analyses. The inferior temporal cortex has been implicated in the formation of visuomotor associations in monkeys (Mishkin et al., 1984
) and is connected with the caudal putamen and tail of the caudate through the “visual” corticostriatal loop (Middleton & Strick, 1996
). Although our interpretation of the role of these regions in our study must remain speculative, our results point to candidate regions for further study on corticostriatal networks involved in the development of habits in humans.
Our study provides evidence that stimulus-driven, outcome insensitive habits can be shown to be present in humans following overtraining on a VI reward schedule. The finding that persistent outcome insensitive behavior can be induced even in healthy human subjects may have important implications for research into the etiology and treatment of a range of human neuropsychiatric diseases thought to involve impairments in habitual control, such as drug addiction, pathological gambling and obsessive compulsive disorder (Graybiel & Rauch, 2000
; Goudriaan et al., 2004
; Everitt & Robbins, 2005
). Moreover, our finding that the development of these habits correlates with activity changes in the DLS identifies a specific neuroanatomical target for subsequent research into the neural mechanisms underlying habitual behavior in both adaptive and maladaptive contexts.