The goal of this study was to use fMRI to investigate neural circuits involved in learning via positive and negative reinforcers. Specifically, this experiment probed how the human striatum, a structure typically implicated in reward-related processes, was modulated during learning when the motivational context is driven by the presence of a negative reinforcer. Participants acquired an adaptive behavioral response (i.e., a correct button press) via positive (approach learning) or negative (avoidance learning) reinforcers separately, in a within-subjects design that allowed direct comparisons when learning occurred under each motivational context. Participants showed greater subjective and physiological responses across learning (pre vs. post-learning), particularly when presented with trials that afforded the opportunity to either attain a monetary reward or avoid a monetary loss, compared to trials where the positive or negative outcome was fully predictable. Increased motivated behavior was also observed during approach and avoidance learning trials overall, as indexed by faster responses than those recorded during trials with certain outcomes. Activity within an independently defined ROI in the ventral striatum revealed an interaction between type of session (approach and avoidance) and type of learning phase (pre and post), highlighted by greater responses during the acquisition of a behavior aimed at avoiding a negative outcome. These results suggest that despite overlapping neural circuitry when approaching or avoiding a conditioned stimulus, negative reinforcers can lead to greater influences on ventral striatum signals involved in mediating reinforcing effects on behavior.
The striatum is a multi-faceted structure with several anatomical connections that facilitate goal-directed behavior (for review see Haber and Knutson, 2010
). Across species, the striatum has been found to be important for affective learning, particularly in the context of predicting potential rewards (for review see Delgado, 2007
; Montague and Berns, 2002
; O'Doherty, 2004
; Rangel, et al., 2008
; Robbins and Everitt, 1996
). For instance, signals corresponding to prediction errors, or the mismatch between expected and experienced rewards, are often correlated with BOLD signals in dorsal and ventral striatum (O'Doherty, et al., 2003
; O'Doherty, 2004
; van den Bos, et al., 2009
) with greater correlations suggestive of increased behavioral performance during reward-learning tasks (Schonberg, et al., 2007
). Further, striatum signals are found to be important particularly during the acquisition of reward contingencies, showing a decrement as associations become fully predictable (Delgado, et al., 2005
; Haruno, et al., 2004
; Pasupathy and Miller, 2005
). Our findings are consistent with this literature, as striatum BOLD responses show main effect of learning phase during approach learning sessions, with greater responses during the initial acquisition of a behavioral action to attain a reward.
More recently, neuroimaging experiments have also implicated the human striatum in aversive learning. For instance, aversive prediction errors have been found to correlate with striatum BOLD signals during classical conditioning paradigms (Delgado, et al., 2008
; Seymour, et al., 2004
; Seymour, et al., 2007
), with striatum activity correlating with predictions of potentially negative outcomes regardless if an opportunity to avoid it existed or not (Jensen, et al., 2003
). Furthermore, studies using active avoidance of negative outcomes have found striatal activation during the initial acquisition of avoidance contingencies (Delgado, et al., 2009
) and expression of learned avoidance (Schlund and Cataldo, 2010
; Schlund, et al., 2010
). Taken together, these studies support a role for the striatum in learning with negative reinforcers, which is also echoed in the current study.
Our study has two distinct features that helps advance the understanding of the role of the striatum in affective learning and processing of monetary incentives. First, it is one of the few studies where learning can take place in both a positive and a negative context using the same reinforcer (money), thus ensuring a within-subject comparison of the contribution of the striatum across affective learning with both reinforcers. Second, it presents a new way of comparing positive with negative contexts using monetary reinforcers that attempt to control for issues typically associated with this type of comparison. With respect to the first feature, it was observed that BOLD signals within an independent functionally defined ventral striatum ROI showed an interaction between type of session and learning phase, which suggested that learning signals within the striatum were greater when learning via negative, compared to positive reinforcers. One plausible explanation for this finding is the idea that the saliency of a stimulus can drive activity in the striatum (Zink, et al., 2004
), which can be exaggerated in a negative context using primary reinforcers such as shock (Jensen, et al., 2007
). However, increases in striatum activity are not always modulated by the occurrence of salient events such as monetary loss (Delgado, et al., 2000
), a gamble signifying loss (Tom, et al., 2007
) or even shock itself (Seymour, et al., 2004
). In the current study, the certain stimuli are examples of potentially salient stimuli as they fully predict positive (approach CS+) or negative (avoidance CS+) outcomes. Previous studies have used CS+ stimuli to signal an outcome (e.g., Delgado, et al., 2009
; Jensen, et al., 2003
; Jensen, et al., 2007
), and have seen robust neural responding to such stimuli, but many of these studies either had participants learn the nature of the CS (for a review see Phelps and LeDoux, 2005
), or used primary reinforcers (Delgado, et al., 2009
; Jensen, et al., 2003
; Jensen, et al., 2007
). In our experiment, little to no activity was observed in the striatum in response to these stimuli, potentially because they were fully predictable, which has shown to be less dependent on striatal responses (Berns, et al., 2001
; Delgado, et al., 2005
) and participants had no control over their outcome (Tricomi, et al., 2004
Another potential explanation for differences in striatum signals between avoidance and approach learning could be due to our choice of reinforcer (money). Specifically, when participants are presented with the avoidance learning sessions, they may be displaying behavioral tendencies akin to loss aversion, or a preference for avoiding losses rather than acquiring gains (Kahneman and Tversky, 1979
). Within this idea, neural signals in the ventral striatum have been found to correlate with individual differences in loss aversion (Tom, et al., 2007
) and value computations related to changes with respect to a reference point (Breiter, et al., 2001
; De Martino, et al., 2009
In the current paradigm, participants also acquire an experimental bank via a gambling task before each approach and avoidance learning session. This bank is essential for participants to feel like they are actually losing something that has been earned and thus creates an endowment that may enhance the subjective value of accrued losses during the avoidance learning sessions (Delgado, et al., 2006
; Tom, et al., 2007
). In this experiment, the experimental banks are equated across approach and avoidance to allow for a direct comparison during learning sessions, but one could conjure up a scenario where gambling sessions are created to present a context in which avoidance sessions start with either more or less than what was earned in the approach sessions. This contextual manipulation with respect to endowment size is an interesting manipulation for future studies.
A second distinct feature of our paradigm is the use of secondary reinforcers, such as monetary incentives, as a common reinforcer that can be either positive (reward) or negative (loss), unlike primary reinforcers such as shock or food which are more difficult to equate. To adopt this type of incentive, we used a spinner procedure, described in detail in the methods, which kept the actual monetary value of a single trial ambiguous until the end of the experiment. The goal of this procedure was to ensure that the only thing that mattered for participants was the occurrence (or non-occurrence) of a reinforcer. Indeed, this was important, as the concept of marginal utility (value of gains decreases based on individual's asset) is known to influence reward-related circuitry, particularly the striatum (Tobler, et al., 2007
). While others have elegantly tried to take absolute value out of the equation and primarily examine questions related to the magnitude of the incentive (Galvan, et al., 2005
), our procedure allowed participants to treat positive and negative outcomes as just that, without any influence of actual value or magnitude. This procedure is promising for studies across development that use monetary incentives as a potential tool for isolating the affective meaning, rather than value of the presented incentives.
In this paradigm, the absolute value gained or lost is unknown, thus participants presumably calculate the value of their actions based on internal tendencies associated with positive and negative reinforcers. For instance, people are more likely to avoid social situations where they can be evaluated than approach them, despite the possibility of forming rewarding relationships (Beck and Clark, 2009
), while striatum responses to losses, but not monetary rewards, correlate with increased behavioral choices in some contexts such as social competitions (Delgado, et al., 2008
). The current study was limited by simple choices (i.e., find appropriate response), thus investigating the influence of negative contexts on complex behavioral choices therefore becomes another interesting future investigation.
Within the striatum, we observed greater influences of negative reinforcers on more lateral regions of the ventral striatum. In contrast, more ventromedial striatum regions including ventral caudate nucleus showed a main effect of learning phase, irrespective of type of reinforcer. Further studies are necessary to fully understand this potential dissociation within the striatum, although given the vast connectivity in this structure (see Haber and Knutson, 2010
for review) it is not surprising that different regions within the striatum would express sensitivity to different task factors. Interestingly, no amygdala activation was observed during either approach or avoidance learning cues. Amygdala activity was apparent in the certain stimuli contrast, but not during the learning trials. The lack of amygdala activity is in contrast with animal studies implicating this structure in avoidance learning (see Cain and Ledoux, 2008
), and human neuroimaging studies of avoidance learning using primary reinforcers (Delgado, et al., 2009
) or in contexts in which participants acquired stable avoidance responding prior to scanning (Schlund and Cataldo, 2010
; Schlund, et al., 2010
). Our design, on the other hand, used secondary reinforcers and had participants acquire the avoidance response during scanning, potentially creating a quick response coping mechanism which can be driven primarily by the striatum (for review see LeDoux and Gorman, 2001
). Importantly, it is difficult to interpret a null result in neuroimaging, so the lack of amygdala activity during learning trials in this paradigm should be treated with caution.
Our paradigm and findings have implications for developmental studies of affective processing. First, as already discussed, the paradigm presents an opportunity to compare the influence of positive and negative reinforcers across development while attempting to control for valuation of monetary reinforcers (also see Galvan, et al., 2005
). Second, our results present an interesting complement to the influential triadic model of motivated behavior during adolescence (Ernst, et al., 2006
; Ernst and Fudge, 2009
). Briefly, this model suggests that increased reward responses (ventral striatum), decreased avoidance responses (amygdala) and poor regulation (prefrontal cortex) contribute to aberrant behavior seen in adolescents. In the current experiment, young adults show a propensity to learn from both positive and negative reinforcers, engaging the striatum irrespective of motivational context, but not the amygdala. Interestingly, behaviorally inhibited adolescents show an augmented response to both positive and negative outcomes of increasing value in both the striatum and amygdala (Guyer, et al., 2006
). While our discussion of the amygdala is limited due to it being a null finding, our study does raise questions about a role, if any, of the striatum during negative motivational contexts across development.
In conclusion, this study extends the growing literature implicating the striatum in learning from both positive and negative reinforcers. Our results further suggest that specific regions in the lateral ventral striatum are modulated in particular by learning from negative reinforcers. The results provide a direct comparison between the influence of positive and negative reinforcers on acquisition of behaviors and the human striatum, setting up future studies that further probe similarities and differences across development which can translate to clinical studies focusing on acquisition and extinction of maladaptive behaviors (e.g., drug use) reinforced by positive or negative outcomes.