Search tips
Search criteria 


Logo of cercorLink to Publisher's site
Cereb Cortex. 2010 August; 20(8): 1843–1852.
Published online 2009 November 13. doi:  10.1093/cercor/bhp247
PMCID: PMC2901019

Neural Components Underlying Behavioral Flexibility in Human Reversal Learning


The ability to flexibly respond to changes in the environment is critical for adaptive behavior. Reversal learning (RL) procedures test adaptive response updating when contingencies are altered. We used functional magnetic resonance imaging to examine brain areas that support specific RL components. We compared neural responses to RL and initial learning (acquisition) to isolate reversal-related brain activation independent of cognitive control processes invoked during initial feedback-based learning. Lateral orbitofrontal cortex (OFC) was more activated during reversal than acquisition, suggesting its relevance for reformation of established stimulus–response associations. In addition, the dorsal anterior cingulate (dACC) and right inferior frontal gyrus (rIFG) correlated with change in postreversal accuracy. Because optimal RL likely requires suppression of a prior learned response, we hypothesized that similar regions serve both response inhibition (RI) and inhibition of learned associations during reversal. However, reversal-specific responding and stopping (requiring RI and assessed via the stop-signal task) revealed distinct frontal regions. Although RI-related regions do not appear to support inhibition of prepotent learned associations, a subset of these regions, dACC and rIFG, guide actions consistent with current reward contingencies. These regions and lateral OFC represent distinct neural components that support behavioral flexibility important for adaptive learning.

Keywords: cognitive control, fMRI, orbitofrontal cortex, response inhibition, reversal learning


Adaptive control of behavior requires the ability to voluntarily inhibit or change established responses. For example, to safely cross the street in a foreign country in which people drive in a direction opposite to what we are accustomed, we need to inhibit a well-learned, “prepotent” response in order to look in the appropriate direction of oncoming traffic. Reversal learning (RL) tasks measure this ability by providing a context to test participants’ capacity to change previously acquired behavior when environmental rules change. Typically, this task measures the acquisition of a discrimination (a stimulus or action among a set of competing alternatives associated with a desired outcome), followed by a reversal in which the associative structure changes and responses must be appropriately updated. The change in associative structure may take different forms. For example, participants may be required to select either the appropriate stimulus (e.g., a picture or physical object) or action (e.g., button press) that is associated with a rewarding outcome (e.g., food or positive feedback). Consequently, tasks of this type can be used to examine the cognitive control of prepotent responding.

Although frontal lobe damage is known to impair visuomotor stimulus–response learning (e.g., Petrides 1985, 1997), subregions within the frontal lobe, particularly in orbitofrontal cortex (OFC), are especially important for optimally modifying learned associations. Lesions to specific regions within the OFC in rodents and nonhuman primates lead to continued responding according to previously learned associations (perseveration) after reversal of stimulus–response contingencies despite feedback indicating that a change in response is required (Butter 1969; Iversen and Mishkin 1970; Jones and Mishkin 1972; Dias et al. 1996; Izquierdo et al. 2004; for reviews, see Schoenbaum et al. 2002; Murray et al. 2007; Ragozzino 2007). Patients with ventromedial (but not dorsolateral) prefrontal cortex (PFC) lesions make RL errors even though initial learning is intact (Fellows and Farah 2003; Hornak et al. 2004), and irregular OFC structure and function are found in neuropsychiatric disorders, such as obsessive compulsive disorder, in which perseverative behaviors are hallmarks (Cavedini et al. 2002; Pujol et al. 2004; Remijnse et al. 2006; Chamberlain et al. 2008). These effects have been explained in terms of OFC involvement in inhibiting prepotent responses (e.g., Jones and Mishkin 1972) or in facilitating rapid learning through negative feedback processing (Fellows 2007).

Although human functional magnetic resonance imaging (fMRI) studies have associated reversal of learned responses with striatal, ventrolateral PFC (VLPFC), ventromedial PFC, and OFC activations (O'Doherty et al. 2001, 2003; Cools et al. 2002; Kringelbach and Rolls 2003; Remijnse et al. 2005; Hampton et al. 2006), it is not clear which components of RL are indicated by these activations. For example, assuming that RL requires inhibition of a learned response, it is difficult to distinguish between activations that reflect the inhibition of a previously learned motor response or execution of a newly learned alternative. Additionally, most RL fMRI studies have not focused on distinguishing between brain activation associated with the control processes specific to RL versus those that are generally involved in feedback-driven learning (but see Budhani et al. 2007), even though patient studies show strong evidence for a distinction between these (e.g., Fellows and Farah 2003).

Studies of motor response inhibition (RI) may provide clues in determining the functional components of RL. A network reliably engaged during stopping a motor response (Aron and Poldrack 2006; Aron et al. 2007) has been hypothesized to serve more general behavioral control requirements such as inhibition of established stimulus-response (S-R) associations (Aron et al. 2004)—an hypothesis that has not yet been directly examined. Although it is assumed that RI processes are involved in RL, it has not been established whether regions involved in a common form of RI, motor stopping, are also involved in inhibiting a prepotent S-R association. Investigating a potential link between brain regions serving motor RI and inhibition of a well-learned S-R association would help delineate the functional neural components underlying RL behavior.

We performed a blood oxygenated level–dependent (BOLD) fMRI study using a novel RL task 1) to dissociate brain activation associated with postreversal relearning and initial acquisition, and 2) to test the hypothesis that S-R association inhibition is supported by regions serving motor RI. We designed a deterministic RL task to induce consistent responding during an acquisition phase and allow strong S-R associations to be formed prior to measuring reversal of these associations. To assess overlap of S-R and regions associated with motor RI, we directly compared our results with those from previously reported stop-signal task (SST) fMRI data. We examined reversal-specific brain responses by comparing post-RL with initial acquisition. The latter comparison assessed brain responses specifically related to reversal, controlling for learning effects (e.g., effortful vs. fluid performance) and cognitive processes involved in early stages of feedback-based learning (e.g., integration of feedback for error correction), occurring during initial periods of both acquisition and reversal. In addition, to determine brain regions important for successful reversal performance, we correlated brain activation during initial postreversal trials with change in performance on subsequent trials.

Materials and Methods


Sixteen right-handed adults (5 males and 11 females; age: 18–30 years; mean [M] = 23, standard deviation [SD] = 4) participated in the experiment. All had normal or corrected-to-normal vision and were screened to ensure no history of neurological or psychiatric disorders. All participants gave informed consent according to the University of California Los Angeles (UCLA) Institutional Review Board protocol and were paid $30 for their participation, as well as an amount earned on a trial-by-trial basis during the experiment (M = $22.17, SD = $1.11). One participant was excluded from fMRI analyses due to very poor task performance.


Forty-four abstract computer-generated images (ArtMatic Pro, U&I Software LLC, were used in the task (see example stimuli in Fig. 1). Twenty-two additional images were used in the postscan memory test.

Figure 1.
Trial structure used in the RL task. Participants are presented with an abstract image and have 1 s to make a category judgment (left or right key). If their response is correct, a blue frame appears around the image and they receive 1 point (the display ...

Stimulus presentation and timing of all stimuli and response events were achieved using MATLAB (Mathworks, Natick, MA) and the Psychtoolbox ( on an Apple PowerBook G4 running Mac OSX (Apple Computers, Cupertino, CA). Visual stimuli were presented using magnetic resonance imaging (MRI)–compatible goggles (Resonance Technologies, Van Nuys, CA).

Task and Design

During scanning, participants performed a deterministic, feedback-driven discrimination task. A schematic of the trial structure is presented in Figure 1. On each trial, they were presented with an abstract visual pattern and were asked to decide whether it was associated with a left or right key response. The picture was presented for 1 s, during which participants made their response. After this period, feedback appeared in the form of a colored square frame around the stimulus for 1 s. A blue frame indicated a correct response, and a red frame indicated an incorrect response. If participants did not respond within the 1-s stimulus presentation period, the phrase “no response recorded” appeared above the image (these trials were excluded from analyses and accounted for no more than 5% of the trials per participant). Participants received 1 point for a correct response and zero points for an incorrect response. A running total of points appeared beneath the stimulus during feedback presentation. Participants were informed prior to the scan that they would be given $0.10 for each point earned. Following presentation of feedback, a blank screen was displayed for a variable duration delay (interstimulus interval, ISI) of 0.5–16 s (sampled from an exponential distribution with a mean of 3 s) before the next trial.

Participants were encouraged to respond as quickly and as accurately as possible and were told that their goal should be to accrue as many points as possible. Prior to scanning, participants performed a practice session with a separate set of stimuli to become familiarized with the task. No reversals appeared in the practice session. Participants were not explicitly informed about the response reversals, but upon postscan debriefing, all participants stated that they became aware of the reversals during the experiment.

Our aim in designing this task was to minimize the potential for participants to predict reversals via rule following in order to emphasize S-R associative learning. To this end, in addition to systematically varying the number of stimulus repetitions, we introduced stimuli that never reversed, leaving in question for participants whether a given encountered stimulus would eventually require response reversal. Supplementary Figure S1 shows a sample sequence of trials.

Each of the 22 stimuli used in the experiment fell under 1 of 3 conditions: 1) 6 repetitions prior to reversal (“6 rep,” 8 stimuli), 2) 12 repetitions prior to reversal (“12 rep,” 6 stimuli), or 3) no reversal (“Norev,” 8 stimuli). Stimuli were organized in 12 “miniblocks” such that 4 stimuli were randomly presented 6 times within each miniblock (see Supplementary methods Table S1). Overall, a total of 8 reversals in the 6 rep condition and 6 reversals in the 12 rep condition were presented. These repetitions (12 and 6) were determined to be sufficient for reaching 90% accuracy in prior pilot testing.

The sequence of trials and ISIs were determined using an in-house algorithm that used a Monte Carlo method to optimize the general linear model design matrix for maximal statistical efficiency. Although the Monte Carlo procedure is not as efficient as other optimization methods (e.g., genetic algorithms or m-sequences), it offers flexibility in experimental design that other more constrained methods lack. Miniblocks were presented sequentially. Trials within a miniblock were pseudorandomized such that no stimulus repeated in succession. Each stimulus reversed only once and was phased out of the experiment once the assigned repetitions were completed. Participants were only required to keep 4 stimuli in mind at any given point in time. Working memory load across stimulus repetitions did not differ (i.e., the number of trials or “lag” between stimulus repetitions did not differ across stimuli [M = 4.52 trials, SD = 2.62; F21,244 = 0.741, mean standard error = 0.74, P = 0.107).

Each of the 3 runs included 324 trials intermixed with 36 “baseline task” trials in which participants were presented with a fixation cross for 1 s along with the words “press a key.” This task provided visuomotor control data independent from the classification task and served as a comparison baseline.


Imaging was performed using a 3-T Siemens AG (Erlangen, Germany) Allegra MRI scanner at the UCLA Ahmanson-Lovelace Brain Mapping Center. We acquired 240 functional T2*-weighted echoplanar images (EPI) (slice thickness, 4 mm; 34 slices; repetition time [TR], 2 s; echo time [TE], 30 ms; flip angle, 90°; matrix, 64 × 64; field of view [FOV], 200 mm). Two additional volumes were discarded at the beginning of each run to allow for T1 equilibrium effects. In addition, a T2-weighted matched bandwidth high-resolution anatomical scan (same slice prescription as EPI) and magnetization prepared rapid acquisition gradient echo (MP–RAGE) scan were acquired for each participant for registration purposes (TR, 2.3; TE, 2.1; FOV, 256; matrix, 192 × 192; sagittal plane; slice thickness, 1 mm; 160 slices). The orientation for matched bandwidth and EPI scans was oblique axial so as to maximize full brain coverage and to optimize signal from ventromedial prefrontal regions.

Data Analysis

fMRI image analysis was performed using the FSL (3.3.7) toolbox from the Oxford Centre for fMRI of the Brain (FMRIB, Each participant's image time course was first realigned to compensate for small head movements (Jenkinson et al. 2002). Images were denoised for motion-related artifacts using MELODIC independent components analysis within FSL. Motion-related components were identified manually using a set of heuristics (Tohka et al. 2008), and the data were then reconstructed after removing the motion-related components. Data were spatially smoothed using a 6-mm full-width-half maximum Gaussian kernel. Prior to registration, the MP–RAGE was “unwarped” using an algorithm that incorporates a scanner-specific description of gradient nonlinearities to reduce image distortion (Jovicich et al. 2006), and skull stripping was performed using Freesurfer software ( Registration was conducted through a 3-step procedure, whereby EPI images were first registered to the matched bandwidth high-resolution structural image, then to the MP–RAGE structural image, and finally into standard (Montreal Neurological Institute [MNI]) space (MNI avg152 template) using 12-parameter affine transformations (Jenkinson and Smith 2001). Statistical analyses were performed in native space, with the statistical maps normalized to standard space prior to higher level analysis.

Whole-brain statistical analysis was performed using a multistage approach to implement a mixed-effects model treating participants as a random effect. Statistical modeling was first performed separately for each imaging run. Regressors of interest were created by convolving a delta function representing trial onset times with a canonical (double gamma) hemodynamic response function. Stimulus and feedback were modeled as a single event. For the primary whole-brain analysis, we separately analyzed correct and incorrect trials (correct/incorrect trial analysis). For region of interest (ROI) analyses, we used a second model in which each stimulus repetition was modeled separately (stimulus repetition analysis).

In the correct/incorrect trial analysis, only the first 2 trials during both acquisition and reversal were divided into correct and incorrect accuracy conditions and separately analyzed. Because performance was high on this task, we could only include both correct and incorrect trials for the initial acquisition trial. For the second trial during acquisition and reversal, we only examined correct trials because the number of incorrect trials was insufficient for statistical analysis. For the same reason, only incorrect initial reversal trials were analyzed. All other trials were modeled in a single nuisance regressor.

In the stimulus repetition analysis, we modeled each stimulus repetition separately. To make matrix computations feasible, trials from the prereversal 6 rep, prereversal 12 rep (first 6 repetitions), and nonreversing conditions were combined into a single regressor. Only correct trials were analyzed with the exception of the first acquisition and reversal trials for which incorrect responses were examined. For ROI analyses, we extracted data from contrast images that modeled each repetition.

For all analyses, time series statistical analysis was carried out using FILM (FMRIB's improved linear model) with local autocorrelation correction (Woolrich et al. 2001) after high-pass temporal filtering (Gaussian-weighted least squares function straight line fitting, with sigma = 33.0 s).

For between-participant analyses, we used the FMRIB Local Analysis of Mixed-Effects module in FSL (Beckmann et al. 2003; Woolrich et al. 2004) and a 1-sample t-test performed at each voxel for each contrast of interest. Z (Gaussianised T) statistic images were thresholded using cluster-corrected statistics with a height threshold of Z > 2.3 (unless otherwise noted) and a cluster probability threshold of P < 0.05, whole-brain corrected using the theory of Gaussian random fields (Worsley et al. 1992). Anatomical locations of activations were confirmed using the sectional brain atlas by Duvernoy and Bourgouin (1999); activation locations along the prefrontal medial wall were verified by consulting Picard and Strick (2001).

We used conjunction analyses to compare our RL fMRI results with those from a RI task (the SST) used in the study by Aron and Poldrack (2006). The SST requires participants to respond to a cue on a majority of trials but to stop on other trials upon receiving a stop signal (e.g., auditory tone). The delay between the cue and the stop signal adaptively varies according to performance with the goal of achieving 50% accuracy on stop trials. Brain responses related to RI can be examined via the contrast of successful stopping versus go trials. Conjunction analyses were performed using the revised minimum statistic approach proposed by Nichols et al. (2005) and cluster-corrected statistics. For comparisons with SST, smoothness estimates were derived from each Z statistic image separately.


Learning Performance

Performance accuracy is shown in Figure 2. Participants reached above 75% accuracy during acquisition by the second stimulus repetition in both the 6 and 12 repetition conditions (6 rep: M = 0.76, SD = 0.17; 12 rep: M = 0.78, SD = 0.11). We compared accuracy measures between the reversal trials and first postreversal trials—the period where participants must modify their responses in reaction to the reversal. Postreversal accuracy reached 75% or above by the first postreversal trial in both 6 and 12 repetition conditions (6 rep: M = 0.75, SD = 0.16; 12 rep: M = 0.81, SD = 0.22), showing that participants switched to the reversed response most of the time by the first postreversal trial.

Figure 2.
RL performance accuracy. Plot shows mean proportion correct across 15 participants for images repeated 12 (blue line, circles) or 6 (red line, triangles) times prior to reversal (acquisition period) and images that repeated 6 times but were not presented ...

To assess the effect of 6 versus 12 stimulus repetitions on reversal performance, we compared both accuracy and response time measures during the first postreversal trial for these conditions but found no significant differences between the 2 repetition conditions in accuracy (t15 = −1.03, P = 0.32) or response times (6 repetitions: 0.696 ± 0.067 s; 12 repetitions: 0.711 ± 0.083 s; t15 = −0.67, P = 0.5). Because behavioral measures did not differ across the 6 and 12 repetition conditions, we collapsed across these conditions for greater statistical power in the fMRI analyses.

fMRI Results

Dissociating Reversal Learning from Acquisition

Many lesion and patient studies show impaired reversal performance with intact acquisition, suggesting neural responses that are unique to the cognitive control demands of RL. To determine brain responses specific to reversal, we asked what brain areas were uniquely activated in the initial phase of RL versus early in acquisition, stages of learning during which S-R representations are most labile. In one analysis, we compared correct trials between initial points in the 2 phases, and in another, we compared errors. These 2 contrasts allowed us to examine differences in reversal versus acquisition because the only major difference between the 2 conditions compared was whether a previous response association with the stimulus had already been established (i.e., without confounding positive and negative feedback presentations associated with correct and incorrect performance feedback and without differences in effortful versus fluent cognitive processing). In other words, we compared 2 initial stages of learning that only differed by the existence of a prior established prepotent response.

To assess brain regions that respond to cognitive control demands required for inhibiting more versus less of a prepotent response, we compared correct responses from the first postreversal trial with those during the second acquisition trial. This comparison mainly showed activation within right lateral OFC, primarily within the lateral and posterior orbital gyri, as well as in right superior and middle temporal, right inferior parietal, and posterior cingulate cortices (Fig. 3A, Table 1). The opposite contrast (acquisition vs. reversal phase) showed posterior activations in occipitotemporal areas, including fusiform gyrus, that most likely reflected decreases usually observed in these areas with visual object repetition (e.g., Grill-Spector et al. 2006).

Table 1
Locations of significant activation in comparison of first postreversal and second acquisition trials (correct trials only)
Figure 3.
Reversal-specific responses: comparison of early reversal versus early acquisition trials. (A) Warm colors—first postreversal trial > second acquisition trial (correct trials): right lateral OFC, right superior and middle temporal cortices, ...

Comparing errors at the reversal trial versus those at the first acquisition trial allowed us to examine responses to expectancy violations. This comparison revealed right lateral OFC activation in the same area as in the contrast above, right anterior insula, and right posterior inferior frontal gyrus (IFG), extending dorsally into the precentral sulcus, midbrain, caudate head, and posterior cingulate (Fig. 3B, Table 2). The opposite comparison showed occipitotemporal and cerebellar activation.

Table 2
Locations of significant activation in comparison of errors on reversal trials and errors on first acquisition trials

To determine the extent of commonality of the right lateral OFC activations to both of the above contrasts, we computed a conjunction map of the 2 thresholded statistical images (see Fig. 4A). The resulting image showed a large cluster located in the posterolateral OFC. A region of interest analysis in which we plotted each stimulus repetition during acquisition and reversal showed the greatest response difference at the reversal trial followed by the first postreversal trial (Fig. 4B).

Figure 4.
Reversal-specific right lateral orbitofrontal activation. (A) Conjunction of errors and correct trials for early reversal versus early acquisition reveals common activation in lateral OFC. (B) Region of interest analysis of lateral OFC response for each ...

Reversal-Specific Responding and Motor Response Inhibition Activate Distinct Prefrontal Regions

To evaluate the hypothesis that reversal-specific inhibition and motor RI (as indexed by the SST) are served by common brain regions, we compared reversal-specific group activation maps from this study with RI activation from a stop-signal fMRI study by Aron and Poldrack (2006). Specifically, we computed a conjunction of the statistical maps (whole-brain cluster corrected, Z = 1.96, P = 0.05) (Nichols et al. 2005) corresponding to the following contrasts: for RL, correct responding on the first postreversal trial versus that on the second acquisition trial (Fig. 3A) and for stop signal, successful stopping versus go trials. The latter contrast is a typical comparison used to show RI-related activations in the SST. The results revealed activation overlap in temporoparietal areas but little among regions typically associated with motor RI (i.e., VLPFC/insula, pre-supplementary motor area, dorsal anterior cingulate (dACC), subthalmic nucleus, and right IFG [rIFG]) (Fig. 5). These results suggest that brain areas supporting inhibitory processes specific to reversal (i.e., controlling for performance fluency and feedback valence—correct/incorrect feedback) are distinct from those involved in motor RI. In particular, the lateral OFC appears to be especially involved in inhibiting a well-learned association.

Figure 5.
Conjunction of reversal-specific responding and stopping. Overlapping regions between first postreversal > second acquisition trials (correct trials) from the current study (Fig. 3A) and stop > go trials (from the SST used in a prior fMRI ...

RL Performance Correlates with Prefrontal Regions

To determine brain activation corresponding to successful RL performance, we performed a between-participant whole-brain correlation analysis with change in RL performance. Specifically, we correlated activation during the first postreversal trial (vs. baseline), presumably the point at which the greatest exertion of cognitive control occurs for successful reversal performance, with change in performance accuracy between the first and second postreversal trials. Several regions showed significant positive correlations with change in performance accuracy, namely, right anterior insula, rIFG (pars opercularis/triangularis), and dACC (Fig. 6). Notably, these regions are common to a subset of prefrontal regions that show activation for stopping in the SST.

Figure 6.
Correlation of change in RL performance accuracy and brain activation at the first postreversal trial (correct responses only). Activation map shows regions that correlated with postreversal change in accuracy (difference of second and first postreversal ...


We observed distinct brain activation to the unique cognitive control demands of initial reversal conditions relative to those during initial acquisition. Right lateral OFC, right inferior frontal regions, caudate, and midbrain showed greater responses to initial reversal errors relative to acquisition errors, suggesting a response related to expectation violation following prepotent responding. The right lateral OFC also showed greater activation to correct responses during the initial postreversal trials versus the second acquisition trials. The commonality of lateral OFC to these comparisons, each of which aims to examine cognitive control related to S-R relearning in the face of an existing prepotent association, suggests its involvement in detecting contingency changes and maintaining these changes online for subsequent modification of behavior.

When comparing reversal-specific responses to stopping, we did not observe overlapping regions of activation typically found in assessments of response inhibition (RI), suggesting that the 2 forms of inhibition (S-R association and motor responding) are served by distinct brain processes. However, postreversal activation in a subset of prefrontal regions associated with RI, namely, rIFG and ACC, showed a correlation with change in reversal performance accuracy, suggesting their involvement in control processes important for flexible response execution.

Features of the Deterministic RL Task

Specific features of our novel deterministic RL task allowed observation of brain responses to 2 major behavioral processes involved in RL: inhibition of a previously established S-R association and formation of a new alternative association. Examination of the latter is achieved by allowing protracted acquisition periods that facilitate developing a stable prepotent response. Comparing initial stages of learning and relearning (i.e., before and after a prepotent response has been established) offers insight into the brain processes involved in relearning of S-R associations during RL as distinguished from those generally involved in feedback-based learning.

Probabilistic RL tasks (PRLT) used in most fMRI studies of RL (e.g., O'Doherty et al. 2001; Cools et al. 2002) aim to reduce reversal predictability and induce response perseveration by introducing unreliable feedback such that the correct response is not always rewarded. In typical PRLT, participants select between 2 simultaneously presented stimuli that appear in successive trials, and the correct response alternates between the 2 stimuli after each reversal (serial reversal). The task has been widely used to reveal neural substrates important for cognitive control processes involved in RL, including the relevance of dopaminergic activity (Cools et al. 2006, 2009).

Our goal in using a deterministic task was to increase the likelihood that participants learned prepotent S-R associations during acquisition prior to reversal. We achieved this by 1) pseudorandomly presenting stimuli such that the same stimulus did not appear sequentially; this required participants to concurrently discriminate between several stimuli and their respective associated responses, 2) varying the number of stimulus repetitions so that not all stimuli reversed within the same time period, thus reducing the possibility of participants adopting a strategy to reverse all responses at once, 3) presenting some stimuli that never continued to a reversal stage, leaving uncertain whether a particular stimulus encountered during acquisition would eventually reverse, and 4) changing the appropriate response for reversal stimuli only once (vs. continuous alternation found in serial reversal paradigms) to capture reversal effects after a single prepotent S-R association has been established.

Our measure of reversal-specific activation was free of potential differences associated with comparing trials with incongruous feedback valence (i.e., errors vs. correct responses), as is sometimes the case in PRLT studies in which the critical comparison is between final reversal errors (negative feedback) and correct responses (positive feedback) (e.g., Cools et al. 2002; Kringelbach and Rolls 2003; O'Doherty et al. 2003; but for comparison with an “affectively neutral” baseline, see Remijnse et al. 2005). We separately examined errors and correct responses between acquisition and reversal to control for potential differences of rewards/punishments across comparison conditions. Moreover, our reversal-specific contrasts controlled for cognitive processes generally involved in goal-directed performance, such as integration of feedback to adjust behavior.

Another difference between our task and most PRLTs lied in the particular task component that involved reversal. In PRLT and most object discrimination tasks, an alternative stimulus is rewarded at reversal (stimulus/object reversal), whereas our task used response (action) reversals during which participants must make an alternate response (i.e., pressing an alternate button) to an individually presented stimulus. Although stimulus- and action-based reversals have not been directly compared using neuroimaging (for a comparison in nonhuman primates with OFC and ACC lesions, see Rudebeck et al. 2008), a prior fMRI study compared response and outcome reversal, in which the correct response to an individually presented stimulus was not coupled with a particular button press (Xue et al. 2008). Results from that study showed similar regions of activation for the 2 reversal conditions, including inferior frontal and ACC cortices, regions we found to correlate with successful reversal performance in this study. Thus, despite differences between response reversal and stimulus/object reversal tasks, our results indicate that the 2 recruit similar frontal regions.

Frontostriatal Function Specific to Reversal versus Acquisition

We observed greater activation during reversal errors than during initial acquisition errors in a subset of areas associated with RI (ventral and dorsal right inferior frontal regions) as well as the right lateral OFC, striatum, and midbrain. The essential behavioral difference in comparing initial reversal errors to initial acquisition errors is the prior existence of a prepotent S-R association. Thus, the main neural processes revealed by this contrast could be associated with several behavioral events, such as encountering violation of expectation when a prepotent response is incorrect, detecting contingency change, and making prospective error corrections for subsequent responding. The activated regions we observed may work in concert to perform these functions.

The midbrain activity for reversal errors observed in this study is consistent with a similar previous finding (Jocham et al. 2009). Although the spatial resolution of fMRI precludes us from determining specific midbrain nuclei (e.g., substantia nigra pars compacta and ventral tegmental area), activation in our study as well as in the study of Jocham et al. (2009) occurred in a region consistent with the location of midbrain dopamine (DA) cell groups. The DA system is known to exhibit negative prediction error (PE) signals that appear as a reduction in phasic DA neuronal activity in the absence of an expected reward (“negative PE,” e.g., Schultz et al. 1997). Thus, one might postulate that the increase in midbrain activity for reversal errors would indicate such a PE signal, potentially reflecting the inhibitory inputs that cause the negative PE signal. However, recent work has suggested that fMRI signals in the midbrain reflect positive PEs (D'Ardenne et al. 2008). Likewise, fMRI signal in the ventral striatum (a major target of midbrain DA neurons) is known to strongly correlate with positive PEs (e.g., Pagnoni et al. 2002; Pessiglione et al. 2006, 2008). This difference in results suggests that the DA response in RL may differ from the response in other forms of learning; further work is necessary to determine the specifics of this difference.

Our striatal findings are supported by lesion studies that indicate the importance of this structure for RL. Medial striatal lesions in nonhuman primates lead to reversal deficits despite intact acquisition (Clarke et al. 2008), and similarly, patients with striatal lesions (especially in dorsal striatum) show much slower relearning after reversal than controls even though their acquisition performance is normal (Bellebaum et al. 2008), suggesting the importance of the striatum for rapidly detecting changes in expected reward contingencies during reversal. It is plausible that this change detection would occur through phasic DA release that is thought to support PE signals (White 1997; Schultz 2002).

In addition to the striatum, the OFC also shows sensitivity to expectation violations. Lateral OFC positron emission tomography activation has been associated with breaches of expectation during visual attention tasks (Nobre et al. 1999). These findings parallel neurophysiological studies demonstrating sensitivity to reward expectation violations in OFC neurons, potentially reflecting midbrain DA PE signaling (Tremblay and Schultz 2000; Takahashi et al. 2009). Importantly, findings from temporary OFC lesions in rodents show that OFC is necessary for learning from unexpected outcomes during RL (Takahashi et al. 2009), and similar to our results, another RL fMRI study showed right caudolateral OFC activation to incorrect trials just prior to a response switch when a change in reward contingency would be detected (O'Doherty et al. 2003). Similarly, activation in this region has been observed with emergence of unsteady reward outcomes (Windmann et al. 2006).

Interestingly, we observed activation in the same right lateral OFC area in response to correct, initial postreversal versus initial acquisition periods. Nonhuman primate studies of lesions to the orbital inferior convexity (homologous to the lateral, posterior orbital region we observed) (Butter 1969; Iversen and Mishkin 1970; Jones and Mishkin 1972) and neuroimaging studies (Elliott et al. 2000; Arana et al. 2003) both associate this region with suppression of a previously learned response. The fact that this same region also responded to errors at reversal versus initial acquisition is in line with the notion that lateral OFC neurons maintain outcome information to bias future responses (Frank and Claus 2006; Ragozzino 2007).

Although our study and other RL fMRI studies report OFC activations with a lateral locus, performance-weighted human lesion mapping results have associated the greatest RL-specific deficits with left posteromedial OFC (Fellows and Farah 2003). A further study indicated that these patients (whose lesions encompassed ventromedial frontal [VMF] cortex) have difficulty learning from negative feedback, a problem that would lead them to perseverate on incorrect responses during reversal (Wheeler and Fellows 2008). We may not have observed VMF activations in our reversal-specific contrasts because we separately compared correct and incorrect trials, thus equating negative feedback across reversal and acquisition during incorrect trials. The discrepancies between these human lesion studies and fMRI studies bear further investigation as it is unclear whether the medial–lateral inconsistencies reflect factors, such as RL task or performance strategy differences, the lesion size and extent in patients, or the indirect nature of the fMRI BOLD signal as a measure of neural activity.

We designed our experiment to minimize the possibility of participants predicting reversal events across the course of the experiment (see description of task features above), and participants’ poor performance accuracy on reversal trials indicated that they were unable to successfully predict reversals. In regard to whether the lateral OFC detected contingency changes prior to the reversal event (i.e., a sign of reversal prediction), we examined lateral OFC activation during the trial just prior to when a reversal might be expected (i.e., the sixth acquisition trial) across the 3 scanning runs (Supplementary Fig. S3). If participants learned to predict reversals, one may expect the lateral OFC response to increase when participants perceive that a contingency change is about to occur. However, we found no change in activation across runs, suggesting that the lateral OFC did not reflect expectancy or prediction of reversals.

Regions Serving Flexible Updating of S-R Associations Are Distinct from Those Underlying RI

It has been suggested that brain regions associated with motor RI may provide a neural basis for a generalized inhibitory control mechanism that extends to inhibition of learned associations (e.g., Aron et al. 2004). With respect to inhibitory control requirements involved in reversing a well-learned S-R association, our results suggest that the 2 forms of inhibition are largely served by distinct prefrontal brain regions. Different forms of inhibitory control (e.g., motor and cognitive) may not necessarily share the same neural substrates.

Although we have compared 2 different tasks (stop signal and RL) that are assumed to have strong inhibitory control components, further studies may examine the degree of inhibitory control within a single task by, for example, comparing reversal of learned associations that are strongly or weakly reinforced. In the current study, the number of acquisition repetitions (12 vs. 6) did not have significant behavioral or brain effects on reversal. A stronger reinforcement manipulation that elicits varying levels of prepotent responding may be required to assess degrees of inhibitory control.

Prefrontal Regions Important for RI Underlie Change in RL Performance

Although we did not find overlapping prefrontal regions when comparing reversal-specific responding to stopping, we found that a frontal subset of regions commonly activated by stopping, VLPFC/anterior insula, rIFG, and dACC, is positively correlated with the degree of change in performance accuracy during relearning. Specifically, activation corresponding to correct responses during the first postreversal trial correlated with the change in accuracy between the first and second postreversal trials. Activation in these regions may reflect stability of subsequent correct responding such that greater activation corresponds to a stronger likelihood for correct responses on subsequent presentations. This relationship points to the importance of these regions in accelerating and stabilizing relearning, potentially via inhibition of prior incorrect responses. Therefore, in contrast to the potential role of lateral OFC in detecting shifts in established reward contingencies and updating prior learned S-R associations, these regions appear to guide future actions (responses) such that they are consistent with current reward contingencies.


We believe that we have dissociated neural components for key features of cognitive control mechanisms serving adaptive learning. First, we have shown that, when controlling for cognitive control processes invoked during initial stages of feedback-based learning, reversal-specific responding is supported by a lateral OFC region. This area is likely involved in both detecting change in stimulus–response contingencies and updating S-R associations by possibly inhibiting prior associations to allow formation of new ones. Although we found little overlap between reversal-specific regions of activation and those supporting motor RI (stopping), we have shown a relationship between activation in a subset of regions associated with RI (rIFG and dACC) and change in reversal performance, highlighting their role in guiding motor responses to fit current reward contingencies—a major component of RL behavior. Thus, we show a potential distinction between the lateral OFC that detects and updates established S-R representations and the combination of rIFG and dACC that may use these representations to direct appropriate actions. Overall, the behavioral flexibility required to perform RL task components is likely supported by an interaction between these brain regions.

Supplementary Material

Supplementary material can be found at


Whitehall Foundation to R.A.P.; National Science Foundation (BCS-0223843 to R.A.P.); National Institutes of Health Roadmap Initiative (1P20-RR020750 to R.M.B., J.D.J., R.A.P.; Consortium for Neuropsychiatric Phenomics (RL1DA024853, UL1RR024911 to R.M.B.); Human Translational Applications Core (PL1MH083271 to R.M.B., R.A.P.); Tennebaum Center for the Biology of Creativity at UCLA to R.M.B., J.D.J., R.A.P.; National Institute on Drug Abuse Center for Translational Research on the Clinical Neurobiology of Drug Addiction (P20DA022539-02 to J.M., J.D.J., R.A.P.).

Supplementary Material

[Supplementary Data]


Thanks to Baldwin Way for helpful comments on earlier versions of the manuscript, Jeanette Mumford for helpful discussions on statistical analysis, and Anne C. Smith for her contributions to earlier analyses. Conflict of Interest: None declared.


  • Arana FS, Parkinson JA, Hinton E, Holland AJ, Owen AM, Roberts AC. Dissociable contributions of the human amygdala and orbitofrontal cortex to incentive motivation and goal selection. J Neurosci. 2003;23(29):9632–9638. [PubMed]
  • Aron AR, Behrens TE, Smith S, Frank MJ, Poldrack RA. Triangulating a cognitive control network using diffusion-weighted magnetic resonance imaging (MRI) and functional MRI. J Neurosci. 2007;27(14):3743–3752. [PubMed]
  • Aron AR, Poldrack RA. Cortical and subcortical contributions to stop signal response inhibition: role of the subthalamic nucleus. J Neurosci. 2006;26(9):2424–2433. [PubMed]
  • Aron AR, Robbins TW, Poldrack RA. Inhibition and the right inferior frontal cortex. Trends Cogn Sci. 2004;8(4):170–177. [PubMed]
  • Beckmann CF, Jenkinson M, Smith SM. General multilevel linear modeling for group analysis in FMRI. Neuroimage. 2003;20(2):1052–1063. [PubMed]
  • Bellebaum C, Koch B, Schwarz M, Daum I. Focal basal ganglia lesions are associated with impairments in reward-based reversal learning. Brain. 2008;131(3):829–841. [PubMed]
  • Budhani S, Marsh AA, Pine DS, Blair RJ. Neural correlates of response reversal: considering acquisition. Neuroimage. 2007;34(4):1754–1765. [PubMed]
  • Butter CM. Perseveration in extinction and in discrimination reversal tasks following selective frontal ablations in Macaca mulatta. Physiol Behav. 1969;4(2):163–171.
  • Cavedini P, Riboldi G, D'Annucci A, Belotti P, Cisima M, Bellodi L. Decision-making heterogeneity in obsessive-compulsive disorder: ventromedial prefrontal cortex function predicts different treatment outcomes. Neuropsychologia. 2002;40(2):205–211. [PubMed]
  • Chamberlain SR, Menzies L, Hampshire A, Suckling J, Fineberg NA, del Campo N, Aitken M, Craig K, Owen AM, Bullmore ET, et al. Orbitofrontal dysfunction in patients with obsessive-compulsive disorder and their unaffected relatives. Science. 2008;321(5887):421–422. [PubMed]
  • Clarke HF, Robbins TW, Roberts AC. Lesions of the medial striatum in monkeys produce perseverative impairments during reversal learning similar to those produced by lesions of the orbitofrontal cortex. J Neurosci. 2008;28(43):10972–10982. [PubMed]
  • Cools R, Altamirano L, D'Esposito M. Reversal learning in Parkinson's disease depends on medication status and outcome valence. Neuropsychologia. 2006;44(10):1663–1673. [PubMed]
  • Cools R, Clark L, Owen AM, Robbins TW. Defining the neural mechanisms of probabilistic reversal learning using event-related functional magnetic resonance imaging. J Neurosci. 2002;22(11):4563–4567. [PubMed]
  • Cools R, Frank MJ, Gibbs SE, Miyakawa A, Jagust W, D'Esposito M. Striatal dopamine predicts outcome-specific reversal learning and its sensitivity to dopaminergic drug administration. J Neurosci. 2009;29(5):1538–1543. [PMC free article] [PubMed]
  • D'Ardenne K, McClure SM, Nystrom LE, Cohen JD. BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science. 2008;319(5867):1264–1267. [PubMed]
  • Dias R, Robbins TW, Roberts AC. Dissociation in prefrontal cortex of affective and attentional shifts. Nature. 1996;380(6569):69–72. [PubMed]
  • Duvernoy HM, Bourgouin P. The human brain: surface, three-dimensional sectional anatomy with MRI, and blood supply. 2nd completely revised and enlarged ed. New York: Springer; 1999.
  • Elliott R, Dolan RJ, Frith CD. Dissociable functions in the medial and lateral orbitofrontal cortex: evidence from human neuroimaging studies. Cereb Cortex. 2000;10(3):308–317. [PubMed]
  • Fellows LK. The role of orbitofrontal cortex in decision making: a component process account. Ann N Y Acad Sci. 2007;1121(1):421–430. [PubMed]
  • Fellows LK, Farah MJ. Ventromedial frontal cortex mediates affective shifting in humans: evidence from a reversal learning paradigm. Brain. 2003;126(Pt 8):1830–1837. [PubMed]
  • Frank MJ, Claus ED. Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychol Rev. 2006;113(2):300–326. [PubMed]
  • Grill-Spector K, Henson R, Martin A. Repetition and the brain: neural models of stimulus-specific effects. Trends Cogn Sci. 2006;10(1):14–23. [PubMed]
  • Hampton AN, Bossaerts P, O'Doherty JP. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J Neurosci. 2006;26(32):8360–8367. [PubMed]
  • Hornak J, O'Doherty J, Bramham J, Rolls ET, Morris RG, Bullock PR, Polkey CE. Reward-related reversal learning after surgical excisions in orbito-frontal or dorsolateral prefrontal cortex in humans. J Cogn Neurosci. 2004;16(3):463–478. [PubMed]
  • Iversen SD, Mishkin M. Perseverative interference in monkeys following selective lesions of inferior prefrontal convexity. Exp Brain Res. 1970;11(4):376–386. [PubMed]
  • Izquierdo A, Suda RK, Murray EA. Bilateral orbital prefrontal cortex lesions in rhesus monkeys disrupt choices guided by both reward value and reward contingency. J Neurosci. 2004;24(34):7540–7548. [PubMed]
  • Jenkinson M, Bannister P, Brady M, Smith S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage. 2002;17(2):825–841. [PubMed]
  • Jenkinson M, Smith S. A global optimisation method for robust affine registration of brain images. Med Image Anal. 2001;5(2):143–156. [PubMed]
  • Jocham G, Klein TA, Neumann J, von Cramon DY, Reuter M, Ullsperger M. Dopamine DRD2 polymorphism alters reversal learning and associated neural activity. J Neurosci. 2009;29(12):3695–3704. [PMC free article] [PubMed]
  • Jones B, Mishkin M. Limbic lesions and the problem of stimulus–reinforcement associations. Exp Neurol. 1972;36(2):362–377. [PubMed]
  • Jovicich J, Czanner S, Greve D, Haley E, van der Kouwe A, Gollub R, Kennedy D, Schmitt F, Brown G, Macfall J, et al. Reliability in multi-site structural MRI studies: effects of gradient non-linearity correction on phantom and human data. Neuroimage. 2006;30(2):436–443. [PubMed]
  • Kringelbach ML, Rolls ET. Neural correlates of rapid reversal learning in a simple model of human social interaction. Neuroimage. 2003;20(2):1371–1383. [PubMed]
  • Murray EA, O'Doherty JP, Schoenbaum G. What we know and do not know about the functions of the orbitofrontal cortex after 20 years of cross-species studies. J Neurosci. 2007;27(31):8166–8169. [PMC free article] [PubMed]
  • Nichols T, Brett M, Andersson J, Wager T, Poline JB. Valid conjunction inference with the minimum statistic. Neuroimage. 2005;25(3):653–660. [PubMed]
  • Nobre AC, Coull JT, Frith CD, Mesulam MM. Orbitofrontal cortex is activated during breaches of expectation in tasks of visual attention. Nat Neurosci. 1999;2(1):11–12. [PubMed]
  • O'Doherty J, Critchley H, Deichmann R, Dolan RJ. Dissociating valence of outcome from behavioral control in human orbital and ventral prefrontal cortices. J Neurosci. 2003;23(21):7931–7939. [PubMed]
  • O'Doherty J, Kringelbach ML, Rolls ET, Hornak J, Andrews C. Abstract reward and punishment representations in the human orbitofrontal cortex. Nat Neurosci. 2001;4(1):95–102. [PubMed]
  • Pagnoni G, Zink CF, Montague PR, Berns GS. Activity in human ventral striatum locked to errors of reward prediction. Nat Neurosci. 2002;5(2):97–98. [PubMed]
  • Pessiglione M, Petrovic P, Daunizeau J, Palminteri S, Dolan RJ, Frith CD. Subliminal instrumental conditioning demonstrated in the human brain. Neuron. 2008;59(4):561–567. [PMC free article] [PubMed]
  • Pessiglione M, Seymour B, Flandin G, Dolan RJ, Frith CD. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature. 2006;442(7106):1042–1045. [PMC free article] [PubMed]
  • Petrides M. Deficits on conditional associative-learning tasks after frontal- and temporal-lobe lesions in man. Neuropsychologia. 1985;23(5):601–614. [PubMed]
  • Petrides M. Visuo-motor conditional associative learning after frontal and temporal lesions in the human brain. Neuropsychologia. 1997;35(7):989–997. [PubMed]
  • Picard N, Strick PL. Imaging the premotor areas. Curr Opin Neurobiol. 2001;11(6):663–672. [PubMed]
  • Pujol J, Soriano-Mas C, Alonso P, Cardoner N, Menchon JM, Deus J, Vallejo J. Mapping structural brain alterations in obsessive-compulsive disorder. Arch Gen Psychiatry. 2004;61(7):720–730. [PubMed]
  • Ragozzino ME. The contribution of the medial prefrontal cortex, orbitofrontal cortex, and dorsomedial striatum to behavioral flexibility. Ann N Y Acad Sci. 2007;1121:355–375. [PubMed]
  • Remijnse PL, Nielen MM, Uylings HB, Veltman DJ. Neural correlates of a reversal learning task with an affectively neutral baseline: an event-related fMRI study. Neuroimage. 2005;26(2):609–618. [PubMed]
  • Remijnse PL, Nielen MM, van Balkom AJ, Cath DC, van Oppen P, Uylings HB, Veltman DJ. Reduced orbitofrontal-striatal activity on a reversal learning task in obsessive-compulsive disorder. Arch Gen Psychiatry. 2006;63(11):1225–1236. [PubMed]
  • Rudebeck PH, Behrens TE, Kennerley SW, Baxter MG, Buckley MJ, Walton ME, Rushworth MF. Frontal cortex subregions play distinct roles in choices between actions and stimuli. J Neurosci. 2008;28(51):13775–13785. [PubMed]
  • Schoenbaum G, Nugent SL, Saddoris MP, Setlow B. Orbitofrontal lesions in rats impair reversal but not acquisition of go, no-go odor discriminations. Neuroreport. 2002;13(6):885–890. [PubMed]
  • Schultz W. Getting formal with dopamine and reward. Neuron. 2002;36(2):241–263. [PubMed]
  • Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275(5306):1593–1599. [PubMed]
  • Takahashi YK, Roesch MR, Stalnaker TA, Haney RZ, Calu DJ, Taylor AR, Burke KA, Schoenbaum G. The orbitofrontal cortex and ventral tegmental area are necessary for learning from unexpected outcomes. Neuron. 2009;62(2):269–280. [PMC free article] [PubMed]
  • Tohka J, Foerde K, Aron AR, Tom SM, Toga AW, Poldrack RA. Automatic independent component labeling for artifact removal in fMRI. Neuroimage. 2008;39(3):1227–1245. [PMC free article] [PubMed]
  • Tremblay L, Schultz W. Modifications of reward expectation-related neuronal activity during learning in primate orbitofrontal cortex. J Neurophysiol. 2000;83(4):1877–1885. [PubMed]
  • Wheeler EZ, Fellows LK. The human ventromedial frontal lobe is critical for learning from negative feedback. Brain. 2008;131(Pt 5):1323–1331. [PubMed]
  • White NM. Mnemonic functions of the basal ganglia. Curr Opin Neurobiol. 1997;7(2):164–169. [PubMed]
  • Windmann S, Kirsch P, Mier D, Stark R, Walter B, Gunturkun O, Vaitl D. On framing effects in decision making: linking lateral versus medial orbitofrontal cortex activation to choice outcome processing. J Cogn Neurosci. 2006;18(7):1198–1211. [PubMed]
  • Woolrich MW, Behrens TE, Beckmann CF, Jenkinson M, Smith SM. Multilevel linear modelling for FMRI group analysis using Bayesian inference. Neuroimage. 2004;21(4):1732–1747. [PubMed]
  • Woolrich MW, Ripley BD, Brady M, Smith SM. Temporal autocorrelation in univariate linear modeling of FMRI data. Neuroimage. 2001;14(6):1370–1386. [PubMed]
  • Worsley KJ, Evans AC, Marrett S, Neelin P. A three-dimensional statistical analysis for CBF activation studies in human brain. J Cereb Blood Flow Metab. 1992;12(6):900–918. [PubMed]
  • Xue G, Ghahremani DG, Poldrack RA. Neural substrates for reversing stimulus-outcome and stimulus-response associations. J Neurosci. 2008;28(44):11196–11204. [PubMed]

Articles from Cerebral Cortex (New York, NY) are provided here courtesy of Oxford University Press