Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Neurosci. Author manuscript; available in PMC 2011 November 18.
Published in final edited form as:
PMCID: PMC3114883

Dorsal Striatal D2-Like Receptor Availability Co-varies with Sensitivity to Positive Reinforcement during Discrimination Learning


Deviations in reward sensitivity and behavioral flexibility, particularly in the ability to change or stop behaviors in response to changing environmental contingencies, are important phenotypic dimensions of several neuropsychiatric disorders. Neuroimaging evidence suggests that variation in dopamine signaling through dopamine D2-like receptors may influence these phenotypes, as well as associated psychiatric conditions, but the specific neurocognitive mechanisms through which this influence is exerted are unknown. To address this question, we examined the relationship between behavioral sensitivity to reinforcement during discrimination learning and D2-like receptor availability in vervet monkeys. Monkeys were assessed for their ability to acquire, retain and reverse three-choice, visual-discrimination problems, and once behavioral performance had stabilized, they received positron emission tomography (PET) scans. D2-like receptor availability in dorsal aspects of the striatum was not related to individual differences in the ability to acquire or retain visual discriminations but did relate to the number of trials required to reach criterion in the reversal phase of the task. D2-like receptor availability was also strongly correlated with behavioral sensitivity to positive, but not negative, feedback during learning. These results go beyond electrophysiological findings by demonstrating the involvement of a striatal dopaminergic marker in individual differences in feedback sensitivity and behavioral flexibility, providing insight into the neural mechanisms that are affected in neuropsychiatric disorders that feature these deficits.

Keywords: Dopamine, receptors, reversal learning, cognitive control, striatum, PET


Impaired ability to update behaviors and actions rapidly in response to changes in environmental rules is present in individuals diagnosed with externalizing and impulsive-control disorders, and this dysfunction may be related to deviations in behavioral sensitivity to reinforcement, to poor inhibitory control, or to both (Johansen et al., 2009; Jentsch and Taylor, 1999). Because behavioral inflexibility may represent heritable factors that index risk for ADHD and addictions (Groman et al., 2008; Jentsch and Taylor, 1999; Ersche et al., 2010), understanding the biological mechanisms that mediate individual differences could illuminate the mechanistic basis of these neuropsychiatric disorders.

Sensitivity to reinforcing feedback and behavioral flexibility can be objectively studied by examining the ability to acquire and reverse discrimination problems. In these tasks, subjects select from an array of stimuli, each being associated with availability of or absence of positive reinforcement. Subjects progressively learn to direct their behavior to the stimuli associated with desirable outcomes. After achieving competency in the initial acquisition stage, the contingencies of the task are reversed, requiring that the subjects adapt their behavior. Both initial discrimination acquisition and reversal learning require sensitivity to reinforcement feedback, but the reversal-learning stage also involves a change from an established response pattern.

The ability to update behavior in response to rule reversal has been associated with integrity of the orbitofrontal cortex (McEnaney and Butter, 1969; Dias et al., 1996) and dorsomedial striatum (Castane et al., 2010; Clarke et al., 2008). This corticostriatal circuit is modulated by dopamine, which may act sub-cortically (O’Neill and Brown, 2007; Cools et al., 2009). Pharmacological studies have shown a specific involvement of the D2/D3 (D2-like) receptor system in reversal-learning performance across species (Herold et al., 2010; Boulougouris et al., 2009; Lee et al., 2007; Cools et al., 2009). Reversal-learning deficits are also found in human carriers of the A1 allele of the TaqIa polymorphism in the D2 receptor gene (Jocham et al., 2009), a variant associated with relatively lower striatal D2-like receptor availability (Pohjalainen et al., 1998). Furthermore, pharmacological perturbations to the D2-like receptor system have been reported to influence sensitivity to feedback, such that D2-like receptor agonists facilitate adjustments in behavior elicited by positive feedback, while D2-like antagonists have the opposite effect (Frank and O’Reilly, 2006). Moreover, TaqIa A1 allele carriers exhibit impaired learning in response to negative feedback (Frank and Hutchinson, 2009), suggesting that having relatively low levels of D2-like receptors may confer behavioral inflexibility by altering feedback sensitivity.

To examine the question of how naturally occurring variation in sensitivity to feedback during reversal learning relates directly to D2-like receptor density, we combined assessments of responding during acquisition, retention and reversal of discrimination problems with positron emission tomographic (PET) measures of D2-like receptor availability in non-human primates. On the basis of the available data, we hypothesized that D2-like receptor availability in the striatum would be correlated with negative-feedback sensitivity (Frank et al., 2009) and that this relationship would be exaggerated under reversal-learning conditions, when the demands to use feedback to guide behavior are greatest.



Twelve male vervet monkeys (Chlorocebus aethiops sabaeus from the UCLA Vervet Research Colony), ranging from 5 to 9 years of age, were included in this study. Monkeys were individually housed in a climate-controlled vivarium, where they had unlimited access to water and received twice-daily portions of standard monkey chow (Teklad, Madison, WI). All of the subjects were able to see, hear and communicate with other individuals in the room. Monkeys received half of their daily portion of allotted chow in the morning after behavioral testing was conducted (approximately 1100 h) and their second half in the afternoon (approximately 1500 h); the total amount of chow received was never reduced during the experiment to facilitate task performance.

All monkeys were maintained in accordance with the ‘Guide for the Care and Use of Laboratory Animals’ of the Institute of Laboratory Animal Resources, National Research Council, Department of Health, Education and Welfare Publication No. (NIH) 85-23, revised 1996. Research protocols were approved by the UCLA Chancellor’s Animal Research Committee.

Discrimination Acquisition, Retention and Reversal Learning

Monkeys were trained to move from their individual cages into a transport cart, and were brought to a quiet testing room where the transport cart was aligned to a Wisconsin General Testing Apparatus, which has been described elsewhere (Lee et al., 2007). It was equipped with an operable opaque screen that separated the monkeys from three equally spaced opaque boxes. Each box was equipped with a hinged opaque lid so that food rewards (small piece of apple, banana, grape or orange) could be concealed inside. Moreover, each box lid could be fitted with a unique visual stimulus, (clip art from the Microsoft Office® library that consisted of colored objects unfamiliar to the monkey) that the monkeys could easily view when sitting at the apparatus.

Testing sessions began when the opaque screen was raised to present the three boxes (each fitted with a unique stimulus) to the monkey. Only one response, in which the monkey opened a box fitted with a stimulus, was allowed per trial. A trial ended after a correct choice, an incorrect choice or an omission (no response for 2 min), and a 20-s intertrial interval followed. The next trial ensued with a different spatial box sequence, but with the reward associated with the same visual stimulus. Up to 80 trials per session were given.

Monkeys were trained to acquire, retain and reverse novel visual discriminations. The first session of a discrimination problem was a discrimination-acquisition phase and was held on a Monday or Thursday. The monkey was presented with three novel stimuli and had to learn which one was associated with reward, solely on the basis of trial and error. After a performance criterion (seven correct choices within ten consecutive trials) was reached, the session was terminated and the monkey was returned to his home cage. If a monkey did not reach criterion within 80 trials, the session ended but the same discrimination problem was presented the following day(s) until the performance criterion was met.

One day after reaching criterion, subjects were assessed in the retention phase, during which stimulus-reward contingencies were unchanged, until a criterion of four correct choices in five consecutive trials was met. The reversal phase then began immediately with no explicit signal that the transition between retention and reversal had occurred, other than the change in feedback experienced by the subject. During the reversal phase, the stimulus that was previously rewarded was no longer rewarded, and one of the two previously non-rewarded stimuli was rewarded. The reversal phase continued until the monkey achieved criterion (seven correct choices in ten consecutive trials) or until 80 trials had been completed, whichever occurred first. The number of trials required to reach criterion in the acquisition, retention and reversal phases were the primary dependent measures. For the reversal phase, the number of responses directed at the previously rewarded stimulus (perseverative responses) and the number of responses directed at the never rewarded stimulus (neutral responses) were also measured. The probability of a monkey making each response type was also calculated by dividing the number of correct, perseverative or neutral responses by the total number of trials in the reversal phase.

Subjects acquired and reversed consecutive discrimination problems, each of which featured three novel visual stimuli. Due to technical delays in the acquisition of PET scans, the total number of discrimination problems completed and the number of days between completion of the last discrimination problem and the PET scans differed and are exhibited in Table 1; therefore, the analysis described here focused on the averages of the dependent measures collected across the last three problems, as these were closest in time to the subsequent PET scans.

Table 1
The total number of discrimination, problems, the number of days between last completed discrimination session and assessment of D2-like receptor availability, and the average number of trials required to reach criterion in the last three reversal phases ...

Feedback Sensitivity Measures

Because behavioral sensitivity to positive and/or negative feedback can affect learning performance, we examined choice behavior on a trial-by-trial basis during the reversal phase. Here, we categorized trials according to whether the subject experienced positive or negative feedback on the preceding trial. This allowed calculation of the probability that after experiencing positive feedback, a subject would make either: a) another correct response, b) a response directed to the stimulus that was previously rewarded or c) a response directed at the stimulus that was never rewarded. The response to negative feedback was assessed by calculating the probability that a negative feedback event would be followed with either: a) the same incorrect response or b) a response directed at a different stimulus, irrespective of whether this response was correct or incorrect. We also performed a similar analysis of choice behavior for the data gathered during the discrimination acquisition; however, because perseverative responses were not possible in this phase, behavioral sensitivity to positive feedback was calculated by examining the probability of following a correct response with either: a) another correct response or b) an incorrect response.

[18F]fallypride/PET Scans

A variable number of days after behavioral performance had stabilized (Table 1), D2-like receptor availability was assessed using a microPET Model P4 scanner (Concorde Instruments, Knoxville, TN). Dopamine transporter (DAT) availability was assessed, using [11C]WIN-35,428, in the same subjects for a larger study; however, DAT availability measures were not included in the hypothesized mechanism for our primary analyses and, therefore, are not described here. Monkeys received an intramuscular injection of ketamine hydrochloride (10 mg/kg) and glycopyrrolate (0.01 mg/kg). After monkeys were immobilized, an endotracheal tube was placed to provide inhalation of 2-3% isoflurane (in 100% O2) anesthesia throughout the duration of the experiment. Vital signs (heart rate, respiratory rate, oxygen saturation and temperature) were monitored and recorded every 15 min throughout the scan. A tail-vein catheter was placed, and the monkey was positioned on the scanning bed such that the imaging planes were parallel to the orbitomeatal line and the top of the head at the front of the field of view. A 20-minute 68Ge transmission scan was acquired before administration of the radioligand for attenuation correction. All subjects received a bolus injection [11C]WIN 35428 (1.0 mCi/kg), followed by a 5-mL saline flush, and data were acquired for 90 min. When radioactivity had fallen to baseline levels (~3 h after [11C]WIN-35,428 administration), a bolus injection of [18F]fallypride was delivered (0.3 mCi/kg), followed by a 5-mL saline flush. Dynamic data were acquired in list mode for 180 min. After the scan, animals were removed from the gas anesthesia and allowed to recover overnight before being returned to their home cages.

Reconstruction of PET images

Three-dimensional sinogram files were created by binning the data into a total of 33 frames (six 30-sec frames, seven 60-sec frames, five 120-sec frames, four 300-sec frames, nine 600-sec frames, one 1200-sec frame and one 1800-sec frame). We applied a previously validated algorithm to the transmission scan list-mode data to generate attenuation maps (Vandervoort and Sossi, 2008). This algorithm uses an analytical scatter correction, based upon the Klein-Nishina formula, for singles-mode transmission data. Following construction of the attenuation maps, emission list-mode files were reconstructed using Fourier rebinning and filtered back projection, and corrected for normalization, dead time, scatter and attenuation within software provided by the manufacturer (microPET Manager version The resultant images had voxel dimensions of 0.949 mm × 0.949 mm × 1.212 mm and matrix dimensions of 128 × 128 × 63.

MRI Acquisition

Structural magnetic resonance (MR) images were acquired to allow for anatomically based demarcation of regions of interest (ROI). MR images were acquired one week after the PET scans. The monkeys received an intramuscular injection of ketamine hydrochloride (10 mg/kg) and atropine sulfate (0.01 mg/kg). Once the monkey was immobilized, an endotracheal tube was inserted to provide inhalation of 2-3% isoflurane gas (in 100% O2) for the remainder of the scan. Monkeys were positioned on the bed of a 1.5 T Siemens scanner, with the head in the gantry, surrounded by an 8-channel, high-resolution, knee-array coil (Invivo Corporation). Nine T1-weighted volumes with three-dimensional, magnetization-prepared, rapid-acquisition, gradient-echo (MPRAGE) images were acquired (TR=1900 ms TE=4.38 ms, FOV=96 mm, flip angle 15 degrees, voxel size 0.5 mm, 248 slices, slice thickness 0.5 mm). Individual images were aligned to each other using Statistical Parametric Mapping 5 (Institute of Neurology, University College London, London, England), averaged together and resliced according to a previously developed MR template (Fears et al., 2009).

Data Processing

ROIs were drawn twice, referred to as replicates, on each subject’s structural MR image by a single experimenter blind to the subject identity using FSL View (FMRIB’s Software Library v4.0). ROIs included the whole caudate nucleus, putamen, ventral striatum and cerebellum.

  1. ROI-based determination of binding potential (BP): Reconstructed PET images were corrected for motion and coregistered to the subject’s MR image using the PFUS module within PMOD (version 3.15; PMOD Technologies). Using the ROIs, activity was extracted from the coregistered PET images and imported into the PMOD kinetic analysis program (PKIN). Time-activity curves were fit using the Simple Reference Tissue Model (SRTM) (Lammertsma et al., 1996) to provide an estimate of the k2’ value, the rate constant of tracer transfer from the reference region to plasma. The k2’ estimates of the high-activity areas in the caudate nucleus and putamen were averaged and time-activity curves refit using the SRTM2 model using the average, fixed k2’ value applied to all brain regions (Wu and Carson, 2002). BP was then calculated by subtracting 1.0 from the product of tracer delivery (R1) and tracer washout (k2’/k2a). BPs from the left and right brain structures were averaged to create a single BP measurement for the caudate nucleus, putamen and ventral striatum. The BP of the ROI replicates were highly correlated, so the BPs of the ROI replicates were averaged to obtain our final ROI-based measurements of D2-like receptor availability in each of the brain regions.
  2. Generation of Whole Brain BP Maps: Parametric binding maps, showing BP, were generated for each subject in PXMOD (PMOD), using the SRTM2 model with the same fixed k2’ values used above. This modeling requires time-activity data for low- and high-activity regions to generate the initial parameters for modeling. We used the activity in the putamen and cerebellum ROIs as the high- and low-activity references, respectively. In order to perform voxel-wise statistical analyses with BP maps, we realigned all BP maps to a study-specific MR template, which was created by sequentially registering each subject’s skull-stripped MR scan (Multitracer, AIR version 5.0) using affine registration (FLIRT, FMRIB’s Software Library v4.0), and creating an average of the registered images. Individual skull-stripped MR scans were then registered to the study-specific template space using affine registration (FLIRT), and the resultant transformation matrix was applied to each individual subject’s parametric binding map, which was previously registered to the individual subject’s MRI. No additional smoothing was applied to the images.

Statistical Analyses

All statistical analyses were conducted using SPSS 15.0. Reliability of performance was examined by calculating Cronbach’s alpha, a coefficient of reliability, for the number of trials required to reach criterion in the acquisition, retention and reversal phases of the task during the first ten completed sessions. Paired-samples t-tests were conducted to examine the number of trials required to reach criterion in the acquisition and reversal phases, as well as the error types (neutral or perseverative) in the reversal phase of the task. Linear regressions were conducted to examine the relationships between D2-like receptor availability and our behavioral measures; though we found significant linear relationships (Y = a - bX), visual inspection suggested that for some relationships an inverse function (Y = a - b/X) was more appropriate for the data. The asymptote (a) and slope (b) of each curve were estimated using the curve-fitting tool in SPSS. Models were compared using the Akaike Information Criterion (AIC) to determine whether the linear or inverse function best fit the data. When the inverse function was identified as the AIC-preferred model, the independent variables were transformed accordingly and correlations performed with the transformed values to calculate the Pearson correlation coefficient and significance values.

To examine the anatomical distribution of the relationship between positive-feedback sensitivity and BP within the striatum, linear regressions were performed using the FSL RANDOMISE v2.1 tool (Permutation-based nonparametric inference, Oxford University, Oxford UK) with a variance smoothing of 5 mm (FWHM Gaussian). A binary, striatal mask was created and feedback-sensitivity measures transformed according to the model that best fit the data according to our initial ROI analysis (see above). Threshold-free cluster enhancement (TFCE) (Smith and Nichols, 2009) was used to detect significant clusters of activation; this method provides the ability to perform cluster-based inference without the need to specify an arbitrary cluster-forming threshold, as is necessary when using Gaussian random field theory. For each analysis, 10,000 randomization runs were performed. Statistical maps were thresholded at p<0.05 (two-tailed) and corrected for the search volume contained in the striatal mask.


Behavioral Performance

Discrimination performance across the first ten acquisition, retention and reversal sessions completed by each subject showed a high degree of internal consistency, as indicated by the reliability coefficient, Cronbach’s alpha, for acquisition (0.70), retention (0.76), and reversal performance (0.77). During the acquisition phase, the number of trials to reach criterion was 14.81 +/- 1.43 trials (mean +/- SE), which was significantly lower than the 25.69 +/- 3.86 trials (mean +/- SE) required to reach criterion during the reversal phase (t(11)=-3.508; p<0.01). Descriptive statistics for error type in the reversal phase indicated that the probability of making a response to the initially reinforced stimulus was significantly greater than that of making a response to the never rewarded stimulus (t(11)=5.551; p < 0.001). These results indicate that, although monkeys had been trained on multiple reversals, they still found the reversal phase of the task significantly more difficult than the acquisition of a novel stimulus-reward association. Performance during the first and the last completed reversal session were correlated (r10=0.613; p=0.03), indicating that despite the multiple reversal sessions monkeys performed, the ability to flexibly modify behavior was reasonably trait-like.

D2/D3 Receptor Availability and Reversal-Learning Performance

Because technical delays resulted in subjects completing a different numbers of discrimination problems, we examined whether differences in the total number of discrimination problems completed by each subject was associated with either differences in D2-like receptor availability in the striatal regions of interest, or average behavioral performance during the last three discrimination sets; no significant relationships were detected (all correlation|t|’s < 0.91). We also found no significant relationships between striatal D2-like receptor availability and variation in the number of days between completion of the last discrimination problem and when PET scans were acquired (all correlation|t|’s < 2.09).

We then examined the relationship between D2-like receptor availability in each of the three striatal regions and the average number of trials required to reach criterion for the last three acquisition, retention and reversal sessions completed prior to PET scans. Because D2-like receptor availability is negatively correlated with age in humans (Wang et al., 1995; Volkow et al., 1996), we initially included age in the model as a covariate; however, because it was not a significant predictor in our dataset (possibly because the variation in age was restricted), it was removed from the model(s) and all other analyses.

As hypothesized, no significant relationship was found between the average number of trials required to reach criterion in the acquisition or retention phases and D2-like receptor availability in any brain region assessed (all |t|’s < 1.29; Figures 1A and 1B). However, a relationship was found between the average number of trials required to reach criterion in the reversal session and receptor availability in the caudate nucleus (r10=-0.71; p=0.01) and the putamen (r10=-0.67; p=0.02), but not the ventral striatum (r10=0.28; p=0.38) (Figure 1C). Specifically, greater D2-like receptor availability in the caudate nucleus and putamen was associated with better reversal-learning performance, and this relationship was best modeled using an inverse function, as presented in Figure 1C (a solid line for the caudate nucleus and a dashed line for the putamen).

Figure 1
The relationship between D2-like receptor availability in the striatum (caudate nucleus - black circles, putamen - gray circles, and ventral striatum - open circles) and the average number of trials required to reach criterion during the acquisition phase ...

For the reversal phase, we examined whether D2-like receptor availability in the caudate nucleus and putamen was correlated with specific response types normalized to the number of trials required to reach criterion. No significant relationship was found between D2-like receptor availability in the caudate nucleus and the probability of making a correct response (r10=0.48; p=0.12), a perseverative response (r10=-0.31; p=0.32) or a neutral response (r10=-0.46; p=0.13). Similarly, no significant relationships were found with D2-like receptor availability in the putamen and the probability of making a correct response (r10=0.36; p=0.26), a perseverative response (r10=-0.20; p=0.54) or a neutral response (r10=-0.39; p=0.21) (data not shown).

To ensure that this relationship was not specific to the last three discrimination problems completed, we examined the relationship between D2-like receptor availability and the average number of trials required to reach criterion for all reversals completed for each subject. This assessment indicated that the average number of trials required to reach criterion across all the reversal sessions was correlated with D2-like receptor availability in the caudate nucleus (r10=-0.68; p=0.01) and putamen (r10=-0.56; p=0.05) which was best described with an inverse function. The relationship was present even in a specific examination of performance on the first reversal completed (r10=-0.756; p=0.004 for the caudate nucleus and r10=-0.778; p=0.003 for the putamen).

D2/D3 Receptor Availability and Feedback Sensitivity in the Reversal Phase

We next examined the relationship between D2-like receptor availability in the caudate nucleus, putamen and ventral striatum and the measures of behavioral sensitivity to feedback. The probability of following positive feedback with a correct response was correlated with D2-like receptor availability in the caudate nucleus (r10=0.74; p=0.006) and putamen (r10=0.74; p=0.006), but not in the ventral striatum (r10=0.37; p=0.24) (Figure 2A) (see statistical map, Figure 3). Correspondingly, the probability of following positive feedback with a perseverative response (regressive responding to the initially trained stimulus) was related to D2-like receptor availability in the caudate nucleus (r10=-0.61; p=0.04), but not in the putamen (r10=-0.47; p=0.12) (Figure 2B). These relationships were best modeled with the inverse function as presented in Figure 2A and 2B: solid and dashed curves represent the relationship between feedback sensitivity and D2-like receptor availability in the caudate nucleus and putamen, respectively. No significant correlations were found between D2-like receptor availability and the probability of following positive feedback with a response to the never-rewarded stimulus. D2-like receptor availability in the three striatal regions was not correlated with the probability of subjects following negative feedback with either the same incorrect response or a response to one of the two other stimuli (all correlation |t|’s < 0.45) (see Figure 2C).

Figure 2
The relationship between D2-like receptor in the three striatal regions (caudate nucleus - black circles, putamen - gray circles, and ventral striatum - open circles) and feedback sensitivity during the reversal phase. The probability of making a correct ...
Figure 3
Statistical maps (p values) from the voxel-wise regression of D2-like receptor binding potential on the reversal-learning measures. The relationship between D2-like receptor availability and the average number of trials required to reach criterion in ...

Maps of the Relationship between Reversal-Learning Performance and D2-like Receptor Availability

Voxel-wise comparison revealed a significant negative correlation between the number of trials required to reach criterion in the reversal phase and D2-like receptor availability, that extended throughout the caudate nucleus and putamen (Figure 3A). A similar negative correlation was found in the dorsal striatum between the probability of following positive feedback with a perseverative response and D2-like receptor availability (Figure 3B). A moderate correlation was found between D2-like receptor availability and the probability of following positive feedback with a correct response (r10=0.47; p=0.12), but did not survive the TFCE-corrected p < 0.05 threshold. Significant statistical maps were overlaid to visualize the anatomical distribution of the significant relationships in the coronal (Figure 3C) and the transverse section (Figure 3D).

Exploratory analyses

Although D2-like receptor availability was not correlated with the number of trials required to reach criterion during the acquisition of novel stimulus-reward associations, the strong correlation found with positive feedback-sensitivity measures warranted examination of the relationship that this receptor system may have with feedback sensitivity during acquisition. D2-like receptor availability in the caudate nucleus, but not the putamen or ventral striatum, was linearly related to our measure of positive feedback sensitivity (r10=0.574; p=0.05) (Figure 4), but was not with negative feedback sensitivity (r10=0.182; p=0.572) (data not shown).

Figure 4
The relationship between D2-like receptor availability in the caudate nucleus (black circles), putamen (gray circles) or ventral striatum (open circles) and the probability of making a correct response following positive feedback during the acquisition ...


This study demonstrated that D2-like receptor availability within the dorsal aspects of the striatum was related to the ability to modify behavior during reversal learning, and to behavioral sensitivity to positive feedback. These results directly support the idea that the D2-like receptor system is involved in the ability to shift responding when the association between a stimulus and reward is changed, and suggest that variation in reversal-learning performance reflects individual differences in sensitivity to positive feedback. These relationships are maintained under the conditions of natural variation, rather than manipulation, and together with studies in humans and rodents (Cools et al., 2009; Boulougouris et al., 2009; Frank and O’Reily, 2006), provide powerful convergent evidence that the D2-dependent dopamine signaling system is crucially involved in aspects of behavioral flexibility and reinforcement sensitivity.

D2-Like Receptors and Reversal Learning

Experimental perturbations of D2-like receptor signaling alter performance in tasks that require flexible modifications in behavior; these relationships hold in several species (Herold et al., 2010; Boulougouris et al., 2009; Lee et al., 2007; Cools et al., 2009) indicating that this receptor system represents a phylogenetically conserved mechanism for the rapid adjustment of behaviors. The findings presented here add an important dimension to prior experimental results by demonstrating that individual differences in the ability to update behavior in a reversal-learning task are related to natural variation in D2-like receptor availability.

Our results provide evidence that the relationship between D2-like receptor availability and reversal-learning performance is anatomically confined to the dorsal striatum, with no relationship being found in the ventral striatum. These results are supported by data showing that a lesion of the dorsal, but not ventral, striatum impairs reversal learning in rats (Castane et al., 2010) and monkeys (Clarke et al., 2008). Moreover, activation of the dorsal striatum is observed in human subjects, studied with functional MRI, during a discrimination reversal task (Ghahremani et al., 2010). Though striatal mechanisms may themselves be involved in reversal learning, there is also evidence that striatal D2-like receptor availability is positively correlated with glucose metabolism in the orbitofrontal cortex (Volkow et al., 2000). Therefore, it is possible that striatal D2-like receptor availability may mechanistically relate to molecular and/or functional integrity of the orbitofrontal cortex, which in turn contributes to the correlations reported here.

The radioligand used in this study ([18F]fallypride) has equal affinity for both D2 and D3 receptor subtypes (Mukherjee et al., 1999), precluding assignment of the contributions of specific dopamine receptor subtypes. However, the relationships reported here were restricted to the dorsal striatum, an area with modest D3 receptor expression relative to the ventral striatum (Bouthenet et al., 1991). Moreover, mice lacking the D3 receptor exhibit enhanced reversal-learning performance (Glickstein et al., 2005) and administration of a D3 agonist impairs reversal-learning performance in monkeys (Smith et al., 1999), suggesting that low D3 receptor density would be expected to relate to reversal-learning performance in a manner opposite to that observed here. Therefore, the relationships reported in the current study are most likely due to variation in the D2 receptor subtype. However, further studies using subtype-specific antagonists may help to clarify the validity of these hypotheses.

Taken with a host of pharmacological evidence from humans and rats (Cools et al., 2009; Boulougouris et al., 2009), these data suggest that individual differences in reversal-learning performance are a result of underlying variation in D2-like receptor availability within the dorsal striatum. However, we cannot totally exclude the possibility that training history affected D2-like receptor availability. It is also possible that variation in receptor availability detected in the current study is due to differences in endogenous dopamine levels acting in competition with the radioligand for the D2-like receptor binding site, thereby influencing receptor availability measurements. Although we cannot reject this possibility, we believe it cannot fully account for the current findings. Based on evidence that striatal dopamine synthesis is positively correlated with reversal-learning performance (Cools et al., 2009), a dominant influence of dopaminergic tone on D2-like receptor availability would lead to a positive relationship with the number of trials required to reach criterion, opposite to our current findings. Therefore, we believe that the relationships presented in the current study are most likely due to variation in receptor level, and not to variation in dopamine levels; however, future studies examining D2-like receptor availability in the absence of synaptic dopamine levels are needed to verify this hypothesis.

D2-Like Receptors and Feedback Sensitivity

The ability to learn or reverse a stimulus-response association requires an integration of both positive and negative feedback in order to refine subsequent choices. Several lines of evidence support a crucial role for the dopamine system in these abilities. Schultz et al. (1997) demonstrated that over the course of learning a stimulus-reward association, phasic firing of midbrain dopamine cells shifts from the time of reward presentation to the time of conditioned stimulus presentation. Subsequently, when a predicted reward is omitted, dopamine neuron activity declines below baseline (Hollerman and Schultz, 1998). Frank et al. (2004) have argued that dopamine, acting on specific receptor subtypes that exhibit a segregated distribution on striatal medium spiny neurons, exerts dissociable actions in response to positive and negative feedback during learning. This theory posits that phasic release of dopamine, acting on medium spiny neurons in the direct pathway that express D1 receptors, promotes learning from positive feedback, while declines in dopamine activity, locked to negative feedback, are hypothesized to release the D2-expressing medium spiny neurons in the indirect pathway from inhibition via D2 receptor signaling.

Here, however, we provide evidence that D2-like receptor availability within the dorsal striatum is selectively correlated with the ability of subjects to integrate positive, rather than negative, feedback in their ongoing choice behavior, which is surprising in light of the previously described theory. Notably, a recently developed neurocomputational model by Dreyer et al. (2010) suggested that both increases and decreases in dopamine-neuron activity affect D1- and D2-like receptor function, albeit possibly to different degrees. Therefore, it is possible that the association between D2-like receptor availability and positive feedback sensitivity stems from phasic dopamine release activating D2 receptors, as well as D1 receptors. Dopamine acting on D2-expressing medium spiny neurons may produce long-term depression in corticostriatal synapses on those neurons (Kreitzer and Malenka, 2007), reducing the strength of the indirect pathway that constrains behavior, resulting in an increase in the probability of making the same response on the following trial. Our results are consistent with deficits in positive feedback that have been reported in carriers of the A1 allele of the TaqIA polymorphism (Althaus et al., 2009; Jocham et al., 2009).

Here we report that D2-like receptor availability is correlated with positive feedback sensitivity not only during the reversal of a stimulus-reward association, but also during its initial acquisition. Although the strength of the correlation is greatest in the reversal stage of the task, where strong expectancy violations may magnify underlying deficits in behavioral sensitivity to feedback, D2-like receptors represent a principal substrate for explaining variation in positive feedback.

Implications for Addictive Disorders

Relatively low D2-like receptor levels have been reported in several neuropsychiatric disorders, most prominently in substance abuse and dependence. Substance dependent individuals have lower D2-like receptor availability (Volkow et al. 1993, 1996, 2001; Lee, et al., 2009) and exhibit reversal-learning deficits (Salo et al., 2009; Fillmore and Rush, 2006; Ghahremani et al., 2011). Although animal studies have shown that chronic exposure to drugs can directly produce reductions in the striatal D2-like receptor availability (Nader et al., 2006) and deficits of reversal learning (Jentsch et al., 2002), there is also evidence that preexisting lower D2-like receptor levels may confer risk for behavioral dis-inhibition (Dalley et al., 2007) and drug self-administration (Dalley et al., 2007; Nader et al., 2006). Further, D2-like receptor availability is correlated with known risk factors for substance dependence, such as impulsivity (Lee et al., 2009; Buckholtz et al., 2010) and novelty seeking (Zald et al., 2008; Huang et al., 2010), which are themselves associated with cognitive deficits (Cools et al., 2007; James et al., 2007).

We therefore propose that reversal-learning deficits, which measure behavioral inflexibility, represent an intermediary process between D2-mediated transmission and behavior addictions. Further, pharmacological techniques that increase D2-mediated dopaminergic transmission, and improve behavioral flexibility, represent a principal treatment strategy for substance dependence, as behavioral flexibility is a known correlate of retention in a treatment program (Aharonovich et al., 2006; Moeller et al., 2001). Improving D2-like receptor transmission also constitutes a plausible intervention strategy for individuals at high–risk for substance dependence who have cognitive impairments (Giancola et al., 1996) that are predictive of greater substance use (Aytaclar et al., 1999).


Variation in D2-like receptor availability in the dorsal striatum explains individual differences in behavioral flexibility and positive feedback sensitivity. Genetic influences that modulate D2-like receptor expression and function in the dorsal striatum are therefore expected to influence impulsivity-like phenotypes and ultimately syndromes that involve impulse-control disorders. In this sense, D2-dependent dopamine transmission may represent a final, common, biochemical pathway to manifestations of behavioral inflexibility across diagnostic categories.


These studies were supported by the Consortium for Neuropsychiatric Phenomics at UCLA; the Consortium is funded by PHS grants UL1-DE019580 and RL1-MH083270. Additional support was derived from PHS grants T32-DA024635, P20-DA022539, P50-MH077248 and F31-DA028812.


  • Aharonovich E, Hasin DS, Brooks AC, Liu X, Bisaga A, Nunes EV. Cognitive deficits predict low treatment retention in cocaine dependent patients. Drug Alcohol Depend. 2006;81:313–322. [PubMed]
  • Althaus M, Groen Y, Wijers AA, Mulder LJ, Minderaa RB, Kema IP, Dijck JD, Hartman CA, Hoekstra PJ. Differential effects of 5-HTTLPR and DRD2/ANKK1 polymorphisms on electrocortical measures of error and feedback processing in children. Clin Neurophysiol. 2009;120:93–107. [PubMed]
  • Aytaclar S, Tarter RE, Kirisci L, Lu S. Association between hyperactivity and executive cognitive functioning in childhood and substance use in early adolescence. J Am Acad Child Adolesc Psychiatry. 1999;38:172–178. [PubMed]
  • Boulougouris V, Castane A, Robbins TW. Dopamine D2/D3 receptor agonist quinpirole impairs spatial reversal learning in rats: investigation of D3 receptor involvement in persistent behavior. Psychopharmacology (Berl) 2009;202:611–620. [PubMed]
  • Bouthenet ML, Souil E, Martres MP, Sokoloff P, Giros B, Schwartz JC. Localization of dopamine D3 receptor mRNA in the rat brain using in situ hybridization histochemistry: comparison with dopamine D2 receptor mRNA. Brain Res. 1991;564:203–19. [PubMed]
  • Buckholtz JW, Treadway MT, Cowan RL, Woodward ND, Li R, Ansari MS, Baldwin RM, Schwartzman AN, Shelby ES, Smith CE, Kessler RM, Zald DH. Dopaminergic network differences in human impulsivity. Science. 2010;329:532. [PMC free article] [PubMed]
  • Castane A, Theobald DE, Robbins TW. Selective lesions of the dorsomedial striatum impair serial spatial reversal learning in rats. Behav Brain Res. 2010;210:74–83. [PMC free article] [PubMed]
  • Clarke HF, Dalley JW, Crofts HS, Robbins TW, Roberts AC. Cognitive inflexibility after prefrontal serotonin depletion. Science. 2004;304:878–880. [PubMed]
  • Clarke HF, Robbins TW, Roberts AC. Lesions of the medial striatum in monkeys produce perseverative impairments during reversal learning similar to those produced by lesions of the orbitofrontal cortex. J Neurosci. 2008;28:10972–10982. [PubMed]
  • Cools R, Sheridan M, Jacobs E, D’Esposito M. Impulsive personality predicts dopamine-dependent changes in frontostriatal activity during component processes of working memory. J Neurosci. 2007;27:5506–5514. [PubMed]
  • Cools R, Frank MJ, Gibbs SE, Miyakawa A, Jagust W, D’Esposito M. Striatal dopamine predicts outcome-specific reversal learning and its sensitivity to dopaminergic drug administration. J Neurosci. 2009;29:1538–1543. [PMC free article] [PubMed]
  • Dalley JW, Fryer TD, Brichard L, Robinson ES, Theobald DE, Laane K, Pena Y, Murphy ER, Shah Y, Probst K, Abakumova I, Aigbirhio FI, Richards HK, Hong Y, Baron JC, Everitt BJ, Robbins TW. Nucleus accumbens D2/3 receptors predict trait impulsivity and cocaine reinforcement. Science. 2007;315:1267–1270. [PMC free article] [PubMed]
  • De Steno DA, Schmauss C. A role for dopamine D2 receptors in reversal learning. Neuroscience. 2009;162:118–127. [PMC free article] [PubMed]
  • Dias R, Robbins TW, Roberts AC. Dissociation in prefrontal cortex of affective and attentional shifts. Nature. 1996;380:69–72. [PubMed]
  • Dreyer JK, Herrik KF, Berg RW, Hounsgaard JD. Influence of phasic and tonic dopamine release on receptor activation. J Neurosci. 2010;30:14273–14283. [PubMed]
  • Ersche KD, Turton AJ, Pradhan S, Bullmore ET, Robbins TW. Drug addiction endophenotypes: impulsive versus sensation-seeking personality traits. Biol Psychiatry. 2010;68:770–773. [PMC free article] [PubMed]
  • Fears SC, Melega WP, Service SK, Lee C, Chen K, Tu Z, Jorgensen MJ, Fairbanks LA, Cantor RM, Freimer NB, Woods RP. Identifying heritable brain phenotypes in an extended pedigree of vervet monkeys. J Neurosci. 2009;29:2867–2875. [PMC free article] [PubMed]
  • Fillmore MT, Rush CR. Polydrug abusers display impaired discrimination-reversal learning in a model of behavioural control. J Psychopharmacol. 2006;20:24–32. [PubMed]
  • Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science. 2003;299:1898–1902. [PubMed]
  • Frank MJ, Seeberger LC, O’Reilly RC. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science. 2004;306:1940–1943. [PubMed]
  • Frank MJ. Dynamic dopamine modulation in the basal ganglia: a neurocomputational account of cognitive deficits in medicated and nonmedicated Parkinsonism. J Cogn Neurosci. 2005;17:51–72. [PubMed]
  • Frank MJ, O’Reilly RC. A mechanistic account of striatal dopamine function in human cognition: psychopharmacological studies with cabergoline and haloperidol. Behav Neurosci. 2006;120:497–517. [PubMed]
  • Frank MJ, Hutchison K. Genetic contributions to avoidance-based decisions: striatal D2 receptor polymorphisms. Neuroscience. 2009;164:131–140. [PMC free article] [PubMed]
  • Ghahremani DG, Monterosso J, Jentsch JD, Bilder RM, Poldrack RA. Neural components underlying behavioral flexibility in human reversal learning. Cereb Cortex. 2010;20:1843–1852. [PMC free article] [PubMed]
  • Ghahremani DG, Tabibnia G, Monterosso J, Hellemann G, Poldrack RA, London ED. Effect of modafinil on learning and task-related brain activity in methamphetamine-dependent and healthy individuals. Neuropsychopharmacology. 2011;36:950–959. [PMC free article] [PubMed]
  • Giancola PR, Martin CS, Tarter RE, Pelham WE, Moss HB. Executive cognitive functioning and aggressive behavior in preadolescent boys at high risk for substance abuse/dependence. J Stud Alcohol. 1996;57:352–359. [PubMed]
  • Glickstein SB, Desteno DA, Hof PR, Schmauss C. Mice lacking dopamine D2 and D3 receptors exhibit differential activation of prefrontal cortical neurons during tasks requiring attention. Cereb Cortex. 2005;15:1016–1024. [PubMed]
  • Groman SM, James AS, Jentsch JD. Poor response inhibition: at the nexus between substance abuse and attention deficit/hyperactivity disorder. Neurosci Biobehav Rev. 2009;33:690–698. [PMC free article] [PubMed]
  • Haluk DM, Floresco SB. Ventral striatal dopamine modulation of different forms of behavioral flexibility. Neuropsychopharmacology. 2009;34:2041–2052. [PubMed]
  • Herold C. NMDA and D2-like receptors modulate cognitive flexibility in a color discrimination reversal task in pigeons. Behav Neurosci. 2010;124:381–390. [PubMed]
  • Hollerman JR, Schultz W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci. 1998;1:304–309. [PubMed]
  • Huang HY, Lee IH, Chen KC, Yeh TL, Chen PS, Yang YK, Chiu NT, Yao WJ, Chen CC. Association of novelty seeking scores and striatal dopamine D/D receptor availability of healthy volunteers: single photon emission computed tomography with (1)(2)(3)i-iodobenzamide. J Formos Med Assoc. 2010;109:736–739. [PubMed]
  • James AS, Groman SM, Seu E, Jorgensen M, Fairbanks LA, Jentsch JD. Dimensions of impulsivity are associated with poor spatial working memory performance in monkeys. J Neurosci. 2007;27:14358–14364. [PubMed]
  • Jentsch JD, Taylor JR. Impulsivity resulting from frontostriatal dysfunction in drug abuse: implications for the control of behavior by reward-related stimuli. Psychopharmacology (Berl) 1999;146:373–390. [PubMed]
  • Jentsch JD, Olausson P, De La Garza R, 2nd, Taylor JR. Impairments of reversal learning and response perseveration after repeated, intermittent cocaine administrations to monkeys. Neuropsychopharmacology. 2002;26:183–190. [PubMed]
  • Jocham G, Klein TA, Neumann J, von Cramon DY, Reuter M, Ullsperger M. Dopamine DRD2 polymorphism alters reversal learning and associated neural activity. J Neurosci. 2009;29:3695–3704. [PMC free article] [PubMed]
  • Johansen EB, Killeen PR, Russell VA, Tripp G, Wickens JR, Tannock R, Williams J, Sagvolden T. Origins of altered reinforcement effects in ADHD. Behav Brain Funct. 2009;5:7. [PMC free article] [PubMed]
  • Kreitzer AC, Malenka RC. Endocannabinoid-mediated rescue of striatal LTD and motor deficits in Parkinson’s disease models. Nature. 2007;445:643–647. [PubMed]
  • Kruzich PJ, Mitchell SH, Younkin A, Grandy DK. Dopamine D2 receptors mediate reversal learning in male C57BL/6J mice. Cogn Affect Behav Neurosci. 2006;6:86–90. [PubMed]
  • Lammertsma AA, Bench CJ, Hume SP, Osman S, Gunn K, Brooks DJ, Frackowiak RS. Comparison of methods for analysis of clinical [11C]raclopride studies. J Cereb Blood Flow Metab. 1996;16:42–52. [PubMed]
  • Lee B, Groman S, London ED, Jentsch JD. Dopamine D2/D3 receptors play a specific role in the reversal of a learned visual discrimination in monkeys. Neuropsychopharmacology. 2007;32:2125–2134. [PubMed]
  • Lee B, London ED, Poldrack RA, Farahi J, Nacca A, Monterosso JR, Mumford JA, Bokarius AV, Dahlbom M, Mukherjee J, Bilder RM, Brody AL, Mandelkern MA. Striatal dopamine d2/d3 receptor availability is reduced in methamphetamine dependence and is linked to impulsivity. J Neurosci. 2009;29:14734–14740. [PMC free article] [PubMed]
  • McEnaney KW, Butter CM. Perseveration of responding and nonresponding in monkeys with orbital frontal ablations. J Comp Physiol Psychol. 1969;68:558–561. [PubMed]
  • Moeller FG, Dougherty DM, Barratt ES, Schmitz JM, Swann AC, Grabowski J. The impact of impulsivity on cocaine use and retention in treatment. J Subst Abuse Treat. 2001;21:193–198. [PubMed]
  • Nader MA, Morgan D, Gage HD, Nader SH, Calhoun TL, Buchheimer N, Ehrenkaufer R, Mach RH. PET imaging of dopamine D2 receptors during chronic cocaine self-administration in monkeys. Nat Neurosci. 2006;9:1050–1056. [PubMed]
  • O’Neill M, Brown VJ. The effect of striatal dopamine depletion and the adenosine A2A antagonist KW-6002 on reversal learning in rats. Neurobiol Learn Mem. 2007;88:75–81. [PubMed]
  • Ridley RM, Haystead TA, Baker HF. An analysis of visual object reversal learning in the marmoset after amphetamine and haloperidol. Pharmacol Biochem Behav. 1981;14:345–351. [PubMed]
  • Salo R, Nordahl TE, Galloway GP, Moore CD, Waters C, Leamon MH. Drug abstinence and cognitive control in methamphetamine-dependent individuals. J Subst Abuse Treat. 2009;37:292–297. [PMC free article] [PubMed]
  • Schultz W. Dopamine neurons and their role in reward mechanisms. Curr Opin Neurobiol. 1997;7:191–197. [PubMed]
  • Smith AG, Neill JC, Costall B. The dopamine D3/D2 receptor agonist 7-OH-DPAT induces cognitive impairment in the marmoset. Pharmacol Biochem Behav. 1999;63:201–211. [PubMed]
  • Thompson J, Thomas N, Singleton A, Piggott M, Lloyd S, Perry EK, Morris CM, Perry RH, Ferrier IN, Court JA. D2 dopamine receptor gene (DRD2) Taq1 A polymorphism: reduced dopamine D2 receptor binding in the human striatum associated with the A1 allele. Pharmacogenetics. 1997;7:479–484. [PubMed]
  • Vandervoort E, Sossi V. An analytical scatter correction for singles-mode transmission data in PET. IEEE Trans Med Imaging. 2008;27:402–412. [PubMed]
  • Volkow ND, Fowler JS, Wang GJ, Hitzemann R, Logan J, Schlyer DJ, Dewey SL, Wolf AP. Decreased dopamine D2 receptor availability is associated with reduced frontal metabolism in cocaine abusers. Synapse. 1993;14:169–177. [PubMed]
  • Volkow ND, Wang GJ, Fowler JS, Logan J, Gatley SJ, MacGregor RR, Schlyer DJ, Hitzemann R, Wolf AP. Measuring age-related changes in dopamine D2 receptors with 11C-raclopride and 18F-N-methylspiroperidol. Psychiatry Res. 1996;67:11–16. [PubMed]
  • Volkow ND, Wang GJ, Fowler JS, Logan J, Hitzemann R, Ding YS, Pappas N, Shea C, Piscani K. Decreases in dopamine receptors but not in dopamine transporters in alcoholics. Alcohol Clin Exp Res. 1996;20:1594–1598. [PubMed]
  • Volkow ND, Chang L, Wang GJ, Fowler JS, Ding YS, Sedler M, Logan J, Franceschi D, Gatley J, Hitzemann R, Gifford A, Wong C, Pappas N. Low level of brain dopamine D2 receptors in methamphetamine abusers: association with metabolism in the orbitofrontal cortex. Am J Psychiatry. 2001;158:2015–2021. [PubMed]
  • Waelti P, Dickinson A, Schultz W. Dopamine responses comply with basic assumptions of formal learning theory. Nature. 2001;412:43–48. [PubMed]
  • Wang GJ, Volkow ND, Logan J, Fowler JS, Schlyer D, MacGregor RR, Hitzemann RJ, Gur RC, Wolf AP. Evaluation of age-related changes in serotonin 5-HT2 and dopamine D2 receptor availability in healthy human subjects. Life Sci. 1995;56:PL249–253. [PubMed]
  • Wu Y, Carson RE. Noise reduction in the simplified reference tissue model for neuroreceptor functional imaging. J Cereb Blood Flow Metab. 2002;22:1440–1452. [PubMed]
  • Zald DH, Cowan RL, Riccardi P, Baldwin RM, Ansari MS, Li R, Shelby ES, Smith CE, McHugo M, Kessler RM. Midbrain dopamine receptor availability is inversely associated with novelty-seeking traits in humans. J Neurosci. 2008;28:14372–14378. [PMC free article] [PubMed]