We observed distinct brain activation to the unique cognitive control demands of initial reversal conditions relative to those during initial acquisition. Right lateral OFC, right inferior frontal regions, caudate, and midbrain showed greater responses to initial reversal errors relative to acquisition errors, suggesting a response related to expectation violation following prepotent responding. The right lateral OFC also showed greater activation to correct responses during the initial postreversal trials versus the second acquisition trials. The commonality of lateral OFC to these comparisons, each of which aims to examine cognitive control related to S-R relearning in the face of an existing prepotent association, suggests its involvement in detecting contingency changes and maintaining these changes online for subsequent modification of behavior.
When comparing reversal-specific responses to stopping, we did not observe overlapping regions of activation typically found in assessments of response inhibition (RI), suggesting that the 2 forms of inhibition (S-R association and motor responding) are served by distinct brain processes. However, postreversal activation in a subset of prefrontal regions associated with RI, namely, rIFG and ACC, showed a correlation with change in reversal performance accuracy, suggesting their involvement in control processes important for flexible response execution.
Features of the Deterministic RL Task
Specific features of our novel deterministic RL task allowed observation of brain responses to 2 major behavioral processes involved in RL: inhibition of a previously established S-R association and formation of a new alternative association. Examination of the latter is achieved by allowing protracted acquisition periods that facilitate developing a stable prepotent response. Comparing initial stages of learning and relearning (i.e., before and after a prepotent response has been established) offers insight into the brain processes involved in relearning of S-R associations during RL as distinguished from those generally involved in feedback-based learning.
Probabilistic RL tasks (PRLT) used in most fMRI studies of RL (e.g., O'Doherty et al. 2001
; Cools et al. 2002
) aim to reduce reversal predictability and induce response perseveration by introducing unreliable feedback such that the correct response is not always rewarded. In typical PRLT, participants select between 2 simultaneously presented stimuli that appear in successive trials, and the correct response alternates between the 2 stimuli after each reversal (serial reversal). The task has been widely used to reveal neural substrates important for cognitive control processes involved in RL, including the relevance of dopaminergic activity (Cools et al. 2006
Our goal in using a deterministic task was to increase the likelihood that participants learned prepotent S-R associations during acquisition prior to reversal. We achieved this by 1) pseudorandomly presenting stimuli such that the same stimulus did not appear sequentially; this required participants to concurrently discriminate between several stimuli and their respective associated responses, 2) varying the number of stimulus repetitions so that not all stimuli reversed within the same time period, thus reducing the possibility of participants adopting a strategy to reverse all responses at once, 3) presenting some stimuli that never continued to a reversal stage, leaving uncertain whether a particular stimulus encountered during acquisition would eventually reverse, and 4) changing the appropriate response for reversal stimuli only once (vs. continuous alternation found in serial reversal paradigms) to capture reversal effects after a single prepotent S-R association has been established.
Our measure of reversal-specific activation was free of potential differences associated with comparing trials with incongruous feedback valence (i.e., errors vs. correct responses), as is sometimes the case in PRLT studies in which the critical comparison is between final reversal errors (negative feedback) and correct responses (positive feedback) (e.g., Cools et al. 2002
; Kringelbach and Rolls 2003
; O'Doherty et al. 2003
; but for comparison with an “affectively neutral” baseline, see Remijnse et al. 2005
). We separately examined errors and correct responses between acquisition and reversal to control for potential differences of rewards/punishments across comparison conditions. Moreover, our reversal-specific contrasts controlled for cognitive processes generally involved in goal-directed performance, such as integration of feedback to adjust behavior.
Another difference between our task and most PRLTs lied in the particular task component that involved reversal. In PRLT and most object discrimination tasks, an alternative stimulus is rewarded at reversal (stimulus/object reversal), whereas our task used response (action) reversals during which participants must make an alternate response (i.e., pressing an alternate button) to an individually presented stimulus. Although stimulus- and action-based reversals have not been directly compared using neuroimaging (for a comparison in nonhuman primates with OFC and ACC lesions, see Rudebeck et al. 2008
), a prior fMRI study compared response and outcome reversal, in which the correct response to an individually presented stimulus was not coupled with a particular button press (Xue et al. 2008
). Results from that study showed similar regions of activation for the 2 reversal conditions, including inferior frontal and ACC cortices, regions we found to correlate with successful reversal performance in this study. Thus, despite differences between response reversal and stimulus/object reversal tasks, our results indicate that the 2 recruit similar frontal regions.
Frontostriatal Function Specific to Reversal versus Acquisition
We observed greater activation during reversal errors than during initial acquisition errors in a subset of areas associated with RI (ventral and dorsal right inferior frontal regions) as well as the right lateral OFC, striatum, and midbrain. The essential behavioral difference in comparing initial reversal errors to initial acquisition errors is the prior existence of a prepotent S-R association. Thus, the main neural processes revealed by this contrast could be associated with several behavioral events, such as encountering violation of expectation when a prepotent response is incorrect, detecting contingency change, and making prospective error corrections for subsequent responding. The activated regions we observed may work in concert to perform these functions.
The midbrain activity for reversal errors observed in this study is consistent with a similar previous finding (Jocham et al. 2009
). Although the spatial resolution of fMRI precludes us from determining specific midbrain nuclei (e.g., substantia nigra pars compacta and ventral tegmental area), activation in our study as well as in the study of Jocham et al. (2009)
occurred in a region consistent with the location of midbrain dopamine (DA) cell groups. The DA system is known to exhibit negative prediction error (PE) signals that appear as a reduction in phasic DA neuronal activity in the absence of an expected reward (“negative PE,” e.g., Schultz et al. 1997
). Thus, one might postulate that the increase in midbrain activity for reversal errors would indicate such a PE signal, potentially reflecting the inhibitory inputs that cause the negative PE signal. However, recent work has suggested that fMRI signals in the midbrain reflect positive PEs (D'Ardenne et al. 2008
). Likewise, fMRI signal in the ventral striatum (a major target of midbrain DA neurons) is known to strongly correlate with positive PEs (e.g., Pagnoni et al. 2002
; Pessiglione et al. 2006
). This difference in results suggests that the DA response in RL may differ from the response in other forms of learning; further work is necessary to determine the specifics of this difference.
Our striatal findings are supported by lesion studies that indicate the importance of this structure for RL. Medial striatal lesions in nonhuman primates lead to reversal deficits despite intact acquisition (Clarke et al. 2008
), and similarly, patients with striatal lesions (especially in dorsal striatum) show much slower relearning after reversal than controls even though their acquisition performance is normal (Bellebaum et al. 2008
), suggesting the importance of the striatum for rapidly detecting changes in expected reward contingencies during reversal. It is plausible that this change detection would occur through phasic DA release that is thought to support PE signals (White 1997
; Schultz 2002
In addition to the striatum, the OFC also shows sensitivity to expectation violations. Lateral OFC positron emission tomography activation has been associated with breaches of expectation during visual attention tasks (Nobre et al. 1999
). These findings parallel neurophysiological studies demonstrating sensitivity to reward expectation violations in OFC neurons, potentially reflecting midbrain DA PE signaling (Tremblay and Schultz 2000
; Takahashi et al. 2009
). Importantly, findings from temporary OFC lesions in rodents show that OFC is necessary for learning from unexpected outcomes during RL (Takahashi et al. 2009
), and similar to our results, another RL fMRI study showed right caudolateral OFC activation to incorrect trials just prior to a response switch when a change in reward contingency would be detected (O'Doherty et al. 2003
). Similarly, activation in this region has been observed with emergence of unsteady reward outcomes (Windmann et al. 2006
Interestingly, we observed activation in the same right lateral OFC area in response to correct, initial postreversal versus initial acquisition periods. Nonhuman primate studies of lesions to the orbital inferior convexity (homologous to the lateral, posterior orbital region we observed) (Butter 1969
; Iversen and Mishkin 1970
; Jones and Mishkin 1972
) and neuroimaging studies (Elliott et al. 2000
; Arana et al. 2003
) both associate this region with suppression of a previously learned response. The fact that this same region also responded to errors at reversal versus initial acquisition is in line with the notion that lateral OFC neurons maintain outcome information to bias future responses (Frank and Claus 2006
; Ragozzino 2007
Although our study and other RL fMRI studies report OFC activations with a lateral locus, performance-weighted human lesion mapping results have associated the greatest RL-specific deficits with left posteromedial OFC (Fellows and Farah 2003
). A further study indicated that these patients (whose lesions encompassed ventromedial frontal [VMF] cortex) have difficulty learning from negative feedback, a problem that would lead them to perseverate on incorrect responses during reversal (Wheeler and Fellows 2008
). We may not have observed VMF activations in our reversal-specific contrasts because we separately compared correct and incorrect trials, thus equating negative feedback across reversal and acquisition during incorrect trials. The discrepancies between these human lesion studies and fMRI studies bear further investigation as it is unclear whether the medial–lateral inconsistencies reflect factors, such as RL task or performance strategy differences, the lesion size and extent in patients, or the indirect nature of the fMRI BOLD signal as a measure of neural activity.
We designed our experiment to minimize the possibility of participants predicting reversal events across the course of the experiment (see description of task features above), and participants’ poor performance accuracy on reversal trials indicated that they were unable to successfully predict reversals. In regard to whether the lateral OFC detected contingency changes prior to the reversal event (i.e., a sign of reversal prediction), we examined lateral OFC activation during the trial just prior to when a reversal might be expected (i.e., the sixth acquisition trial) across the 3 scanning runs (Supplementary Fig. S3
). If participants learned to predict reversals, one may expect the lateral OFC response to increase when participants perceive that a contingency change is about to occur. However, we found no change in activation across runs, suggesting that the lateral OFC did not reflect expectancy or prediction of reversals.
Regions Serving Flexible Updating of S-R Associations Are Distinct from Those Underlying RI
It has been suggested that brain regions associated with motor RI may provide a neural basis for a generalized inhibitory control mechanism that extends to inhibition of learned associations (e.g., Aron et al. 2004
). With respect to inhibitory control requirements involved in reversing a well-learned S-R association, our results suggest that the 2 forms of inhibition are largely served by distinct prefrontal brain regions. Different forms of inhibitory control (e.g., motor and cognitive) may not necessarily share the same neural substrates.
Although we have compared 2 different tasks (stop signal and RL) that are assumed to have strong inhibitory control components, further studies may examine the degree of inhibitory control within a single task by, for example, comparing reversal of learned associations that are strongly or weakly reinforced. In the current study, the number of acquisition repetitions (12 vs. 6) did not have significant behavioral or brain effects on reversal. A stronger reinforcement manipulation that elicits varying levels of prepotent responding may be required to assess degrees of inhibitory control.
Prefrontal Regions Important for RI Underlie Change in RL Performance
Although we did not find overlapping prefrontal regions when comparing reversal-specific responding to stopping, we found that a frontal subset of regions commonly activated by stopping, VLPFC/anterior insula, rIFG, and dACC, is positively correlated with the degree of change in performance accuracy during relearning. Specifically, activation corresponding to correct responses during the first postreversal trial correlated with the change in accuracy between the first and second postreversal trials. Activation in these regions may reflect stability of subsequent correct responding such that greater activation corresponds to a stronger likelihood for correct responses on subsequent presentations. This relationship points to the importance of these regions in accelerating and stabilizing relearning, potentially via inhibition of prior incorrect responses. Therefore, in contrast to the potential role of lateral OFC in detecting shifts in established reward contingencies and updating prior learned S-R associations, these regions appear to guide future actions (responses) such that they are consistent with current reward contingencies.
We believe that we have dissociated neural components for key features of cognitive control mechanisms serving adaptive learning. First, we have shown that, when controlling for cognitive control processes invoked during initial stages of feedback-based learning, reversal-specific responding is supported by a lateral OFC region. This area is likely involved in both detecting change in stimulus–response contingencies and updating S-R associations by possibly inhibiting prior associations to allow formation of new ones. Although we found little overlap between reversal-specific regions of activation and those supporting motor RI (stopping), we have shown a relationship between activation in a subset of regions associated with RI (rIFG and dACC) and change in reversal performance, highlighting their role in guiding motor responses to fit current reward contingencies—a major component of RL behavior. Thus, we show a potential distinction between the lateral OFC that detects and updates established S-R representations and the combination of rIFG and dACC that may use these representations to direct appropriate actions. Overall, the behavioral flexibility required to perform RL task components is likely supported by an interaction between these brain regions.