|Home | About | Journals | Submit | Contact Us | Français|
Deviations in reward sensitivity and behavioral flexibility, particularly in the ability to change or stop behaviors in response to changing environmental contingencies, are important phenotypic dimensions of several neuropsychiatric disorders. Neuroimaging evidence suggests that variation in dopamine signaling through dopamine D2-like receptors may influence these phenotypes, as well as associated psychiatric conditions, but the specific neurocognitive mechanisms through which this influence is exerted are unknown. To address this question, we examined the relationship between behavioral sensitivity to reinforcement during discrimination learning and D2-like receptor availability in vervet monkeys. Monkeys were assessed for their ability to acquire, retain and reverse three-choice, visual-discrimination problems, and once behavioral performance had stabilized, they received positron emission tomography (PET) scans. D2-like receptor availability in dorsal aspects of the striatum was not related to individual differences in the ability to acquire or retain visual discriminations but did relate to the number of trials required to reach criterion in the reversal phase of the task. D2-like receptor availability was also strongly correlated with behavioral sensitivity to positive, but not negative, feedback during learning. These results go beyond electrophysiological findings by demonstrating the involvement of a striatal dopaminergic marker in individual differences in feedback sensitivity and behavioral flexibility, providing insight into the neural mechanisms that are affected in neuropsychiatric disorders that feature these deficits.
Impaired ability to update behaviors and actions rapidly in response to changes in environmental rules is present in individuals diagnosed with externalizing and impulsive-control disorders, and this dysfunction may be related to deviations in behavioral sensitivity to reinforcement, to poor inhibitory control, or to both (Johansen et al., 2009; Jentsch and Taylor, 1999). Because behavioral inflexibility may represent heritable factors that index risk for ADHD and addictions (Groman et al., 2008; Jentsch and Taylor, 1999; Ersche et al., 2010), understanding the biological mechanisms that mediate individual differences could illuminate the mechanistic basis of these neuropsychiatric disorders.
Sensitivity to reinforcing feedback and behavioral flexibility can be objectively studied by examining the ability to acquire and reverse discrimination problems. In these tasks, subjects select from an array of stimuli, each being associated with availability of or absence of positive reinforcement. Subjects progressively learn to direct their behavior to the stimuli associated with desirable outcomes. After achieving competency in the initial acquisition stage, the contingencies of the task are reversed, requiring that the subjects adapt their behavior. Both initial discrimination acquisition and reversal learning require sensitivity to reinforcement feedback, but the reversal-learning stage also involves a change from an established response pattern.
The ability to update behavior in response to rule reversal has been associated with integrity of the orbitofrontal cortex (McEnaney and Butter, 1969; Dias et al., 1996) and dorsomedial striatum (Castane et al., 2010; Clarke et al., 2008). This corticostriatal circuit is modulated by dopamine, which may act sub-cortically (O’Neill and Brown, 2007; Cools et al., 2009). Pharmacological studies have shown a specific involvement of the D2/D3 (D2-like) receptor system in reversal-learning performance across species (Herold et al., 2010; Boulougouris et al., 2009; Lee et al., 2007; Cools et al., 2009). Reversal-learning deficits are also found in human carriers of the A1 allele of the TaqIa polymorphism in the D2 receptor gene (Jocham et al., 2009), a variant associated with relatively lower striatal D2-like receptor availability (Pohjalainen et al., 1998). Furthermore, pharmacological perturbations to the D2-like receptor system have been reported to influence sensitivity to feedback, such that D2-like receptor agonists facilitate adjustments in behavior elicited by positive feedback, while D2-like antagonists have the opposite effect (Frank and O’Reilly, 2006). Moreover, TaqIa A1 allele carriers exhibit impaired learning in response to negative feedback (Frank and Hutchinson, 2009), suggesting that having relatively low levels of D2-like receptors may confer behavioral inflexibility by altering feedback sensitivity.
To examine the question of how naturally occurring variation in sensitivity to feedback during reversal learning relates directly to D2-like receptor density, we combined assessments of responding during acquisition, retention and reversal of discrimination problems with positron emission tomographic (PET) measures of D2-like receptor availability in non-human primates. On the basis of the available data, we hypothesized that D2-like receptor availability in the striatum would be correlated with negative-feedback sensitivity (Frank et al., 2009) and that this relationship would be exaggerated under reversal-learning conditions, when the demands to use feedback to guide behavior are greatest.
Twelve male vervet monkeys (Chlorocebus aethiops sabaeus from the UCLA Vervet Research Colony), ranging from 5 to 9 years of age, were included in this study. Monkeys were individually housed in a climate-controlled vivarium, where they had unlimited access to water and received twice-daily portions of standard monkey chow (Teklad, Madison, WI). All of the subjects were able to see, hear and communicate with other individuals in the room. Monkeys received half of their daily portion of allotted chow in the morning after behavioral testing was conducted (approximately 1100 h) and their second half in the afternoon (approximately 1500 h); the total amount of chow received was never reduced during the experiment to facilitate task performance.
All monkeys were maintained in accordance with the ‘Guide for the Care and Use of Laboratory Animals’ of the Institute of Laboratory Animal Resources, National Research Council, Department of Health, Education and Welfare Publication No. (NIH) 85-23, revised 1996. Research protocols were approved by the UCLA Chancellor’s Animal Research Committee.
Monkeys were trained to move from their individual cages into a transport cart, and were brought to a quiet testing room where the transport cart was aligned to a Wisconsin General Testing Apparatus, which has been described elsewhere (Lee et al., 2007). It was equipped with an operable opaque screen that separated the monkeys from three equally spaced opaque boxes. Each box was equipped with a hinged opaque lid so that food rewards (small piece of apple, banana, grape or orange) could be concealed inside. Moreover, each box lid could be fitted with a unique visual stimulus, (clip art from the Microsoft Office® library that consisted of colored objects unfamiliar to the monkey) that the monkeys could easily view when sitting at the apparatus.
Testing sessions began when the opaque screen was raised to present the three boxes (each fitted with a unique stimulus) to the monkey. Only one response, in which the monkey opened a box fitted with a stimulus, was allowed per trial. A trial ended after a correct choice, an incorrect choice or an omission (no response for 2 min), and a 20-s intertrial interval followed. The next trial ensued with a different spatial box sequence, but with the reward associated with the same visual stimulus. Up to 80 trials per session were given.
Monkeys were trained to acquire, retain and reverse novel visual discriminations. The first session of a discrimination problem was a discrimination-acquisition phase and was held on a Monday or Thursday. The monkey was presented with three novel stimuli and had to learn which one was associated with reward, solely on the basis of trial and error. After a performance criterion (seven correct choices within ten consecutive trials) was reached, the session was terminated and the monkey was returned to his home cage. If a monkey did not reach criterion within 80 trials, the session ended but the same discrimination problem was presented the following day(s) until the performance criterion was met.
One day after reaching criterion, subjects were assessed in the retention phase, during which stimulus-reward contingencies were unchanged, until a criterion of four correct choices in five consecutive trials was met. The reversal phase then began immediately with no explicit signal that the transition between retention and reversal had occurred, other than the change in feedback experienced by the subject. During the reversal phase, the stimulus that was previously rewarded was no longer rewarded, and one of the two previously non-rewarded stimuli was rewarded. The reversal phase continued until the monkey achieved criterion (seven correct choices in ten consecutive trials) or until 80 trials had been completed, whichever occurred first. The number of trials required to reach criterion in the acquisition, retention and reversal phases were the primary dependent measures. For the reversal phase, the number of responses directed at the previously rewarded stimulus (perseverative responses) and the number of responses directed at the never rewarded stimulus (neutral responses) were also measured. The probability of a monkey making each response type was also calculated by dividing the number of correct, perseverative or neutral responses by the total number of trials in the reversal phase.
Subjects acquired and reversed consecutive discrimination problems, each of which featured three novel visual stimuli. Due to technical delays in the acquisition of PET scans, the total number of discrimination problems completed and the number of days between completion of the last discrimination problem and the PET scans differed and are exhibited in Table 1; therefore, the analysis described here focused on the averages of the dependent measures collected across the last three problems, as these were closest in time to the subsequent PET scans.
Because behavioral sensitivity to positive and/or negative feedback can affect learning performance, we examined choice behavior on a trial-by-trial basis during the reversal phase. Here, we categorized trials according to whether the subject experienced positive or negative feedback on the preceding trial. This allowed calculation of the probability that after experiencing positive feedback, a subject would make either: a) another correct response, b) a response directed to the stimulus that was previously rewarded or c) a response directed at the stimulus that was never rewarded. The response to negative feedback was assessed by calculating the probability that a negative feedback event would be followed with either: a) the same incorrect response or b) a response directed at a different stimulus, irrespective of whether this response was correct or incorrect. We also performed a similar analysis of choice behavior for the data gathered during the discrimination acquisition; however, because perseverative responses were not possible in this phase, behavioral sensitivity to positive feedback was calculated by examining the probability of following a correct response with either: a) another correct response or b) an incorrect response.
A variable number of days after behavioral performance had stabilized (Table 1), D2-like receptor availability was assessed using a microPET Model P4 scanner (Concorde Instruments, Knoxville, TN). Dopamine transporter (DAT) availability was assessed, using [11C]WIN-35,428, in the same subjects for a larger study; however, DAT availability measures were not included in the hypothesized mechanism for our primary analyses and, therefore, are not described here. Monkeys received an intramuscular injection of ketamine hydrochloride (10 mg/kg) and glycopyrrolate (0.01 mg/kg). After monkeys were immobilized, an endotracheal tube was placed to provide inhalation of 2-3% isoflurane (in 100% O2) anesthesia throughout the duration of the experiment. Vital signs (heart rate, respiratory rate, oxygen saturation and temperature) were monitored and recorded every 15 min throughout the scan. A tail-vein catheter was placed, and the monkey was positioned on the scanning bed such that the imaging planes were parallel to the orbitomeatal line and the top of the head at the front of the field of view. A 20-minute 68Ge transmission scan was acquired before administration of the radioligand for attenuation correction. All subjects received a bolus injection [11C]WIN 35428 (1.0 mCi/kg), followed by a 5-mL saline flush, and data were acquired for 90 min. When radioactivity had fallen to baseline levels (~3 h after [11C]WIN-35,428 administration), a bolus injection of [18F]fallypride was delivered (0.3 mCi/kg), followed by a 5-mL saline flush. Dynamic data were acquired in list mode for 180 min. After the scan, animals were removed from the gas anesthesia and allowed to recover overnight before being returned to their home cages.
Three-dimensional sinogram files were created by binning the data into a total of 33 frames (six 30-sec frames, seven 60-sec frames, five 120-sec frames, four 300-sec frames, nine 600-sec frames, one 1200-sec frame and one 1800-sec frame). We applied a previously validated algorithm to the transmission scan list-mode data to generate attenuation maps (Vandervoort and Sossi, 2008). This algorithm uses an analytical scatter correction, based upon the Klein-Nishina formula, for singles-mode transmission data. Following construction of the attenuation maps, emission list-mode files were reconstructed using Fourier rebinning and filtered back projection, and corrected for normalization, dead time, scatter and attenuation within software provided by the manufacturer (microPET Manager version 18.104.22.168). The resultant images had voxel dimensions of 0.949 mm × 0.949 mm × 1.212 mm and matrix dimensions of 128 × 128 × 63.
Structural magnetic resonance (MR) images were acquired to allow for anatomically based demarcation of regions of interest (ROI). MR images were acquired one week after the PET scans. The monkeys received an intramuscular injection of ketamine hydrochloride (10 mg/kg) and atropine sulfate (0.01 mg/kg). Once the monkey was immobilized, an endotracheal tube was inserted to provide inhalation of 2-3% isoflurane gas (in 100% O2) for the remainder of the scan. Monkeys were positioned on the bed of a 1.5 T Siemens scanner, with the head in the gantry, surrounded by an 8-channel, high-resolution, knee-array coil (Invivo Corporation). Nine T1-weighted volumes with three-dimensional, magnetization-prepared, rapid-acquisition, gradient-echo (MPRAGE) images were acquired (TR=1900 ms TE=4.38 ms, FOV=96 mm, flip angle 15 degrees, voxel size 0.5 mm, 248 slices, slice thickness 0.5 mm). Individual images were aligned to each other using Statistical Parametric Mapping 5 (Institute of Neurology, University College London, London, England), averaged together and resliced according to a previously developed MR template (Fears et al., 2009).
ROIs were drawn twice, referred to as replicates, on each subject’s structural MR image by a single experimenter blind to the subject identity using FSL View (FMRIB’s Software Library v4.0). ROIs included the whole caudate nucleus, putamen, ventral striatum and cerebellum.
All statistical analyses were conducted using SPSS 15.0. Reliability of performance was examined by calculating Cronbach’s alpha, a coefficient of reliability, for the number of trials required to reach criterion in the acquisition, retention and reversal phases of the task during the first ten completed sessions. Paired-samples t-tests were conducted to examine the number of trials required to reach criterion in the acquisition and reversal phases, as well as the error types (neutral or perseverative) in the reversal phase of the task. Linear regressions were conducted to examine the relationships between D2-like receptor availability and our behavioral measures; though we found significant linear relationships (Y = a - bX), visual inspection suggested that for some relationships an inverse function (Y = a - b/X) was more appropriate for the data. The asymptote (a) and slope (b) of each curve were estimated using the curve-fitting tool in SPSS. Models were compared using the Akaike Information Criterion (AIC) to determine whether the linear or inverse function best fit the data. When the inverse function was identified as the AIC-preferred model, the independent variables were transformed accordingly and correlations performed with the transformed values to calculate the Pearson correlation coefficient and significance values.
To examine the anatomical distribution of the relationship between positive-feedback sensitivity and BP within the striatum, linear regressions were performed using the FSL RANDOMISE v2.1 tool (Permutation-based nonparametric inference, Oxford University, Oxford UK) with a variance smoothing of 5 mm (FWHM Gaussian). A binary, striatal mask was created and feedback-sensitivity measures transformed according to the model that best fit the data according to our initial ROI analysis (see above). Threshold-free cluster enhancement (TFCE) (Smith and Nichols, 2009) was used to detect significant clusters of activation; this method provides the ability to perform cluster-based inference without the need to specify an arbitrary cluster-forming threshold, as is necessary when using Gaussian random field theory. For each analysis, 10,000 randomization runs were performed. Statistical maps were thresholded at p<0.05 (two-tailed) and corrected for the search volume contained in the striatal mask.
Discrimination performance across the first ten acquisition, retention and reversal sessions completed by each subject showed a high degree of internal consistency, as indicated by the reliability coefficient, Cronbach’s alpha, for acquisition (0.70), retention (0.76), and reversal performance (0.77). During the acquisition phase, the number of trials to reach criterion was 14.81 +/- 1.43 trials (mean +/- SE), which was significantly lower than the 25.69 +/- 3.86 trials (mean +/- SE) required to reach criterion during the reversal phase (t(11)=-3.508; p<0.01). Descriptive statistics for error type in the reversal phase indicated that the probability of making a response to the initially reinforced stimulus was significantly greater than that of making a response to the never rewarded stimulus (t(11)=5.551; p < 0.001). These results indicate that, although monkeys had been trained on multiple reversals, they still found the reversal phase of the task significantly more difficult than the acquisition of a novel stimulus-reward association. Performance during the first and the last completed reversal session were correlated (r10=0.613; p=0.03), indicating that despite the multiple reversal sessions monkeys performed, the ability to flexibly modify behavior was reasonably trait-like.
Because technical delays resulted in subjects completing a different numbers of discrimination problems, we examined whether differences in the total number of discrimination problems completed by each subject was associated with either differences in D2-like receptor availability in the striatal regions of interest, or average behavioral performance during the last three discrimination sets; no significant relationships were detected (all correlation|t|’s < 0.91). We also found no significant relationships between striatal D2-like receptor availability and variation in the number of days between completion of the last discrimination problem and when PET scans were acquired (all correlation|t|’s < 2.09).
We then examined the relationship between D2-like receptor availability in each of the three striatal regions and the average number of trials required to reach criterion for the last three acquisition, retention and reversal sessions completed prior to PET scans. Because D2-like receptor availability is negatively correlated with age in humans (Wang et al., 1995; Volkow et al., 1996), we initially included age in the model as a covariate; however, because it was not a significant predictor in our dataset (possibly because the variation in age was restricted), it was removed from the model(s) and all other analyses.
As hypothesized, no significant relationship was found between the average number of trials required to reach criterion in the acquisition or retention phases and D2-like receptor availability in any brain region assessed (all |t|’s < 1.29; Figures 1A and 1B). However, a relationship was found between the average number of trials required to reach criterion in the reversal session and receptor availability in the caudate nucleus (r10=-0.71; p=0.01) and the putamen (r10=-0.67; p=0.02), but not the ventral striatum (r10=0.28; p=0.38) (Figure 1C). Specifically, greater D2-like receptor availability in the caudate nucleus and putamen was associated with better reversal-learning performance, and this relationship was best modeled using an inverse function, as presented in Figure 1C (a solid line for the caudate nucleus and a dashed line for the putamen).
For the reversal phase, we examined whether D2-like receptor availability in the caudate nucleus and putamen was correlated with specific response types normalized to the number of trials required to reach criterion. No significant relationship was found between D2-like receptor availability in the caudate nucleus and the probability of making a correct response (r10=0.48; p=0.12), a perseverative response (r10=-0.31; p=0.32) or a neutral response (r10=-0.46; p=0.13). Similarly, no significant relationships were found with D2-like receptor availability in the putamen and the probability of making a correct response (r10=0.36; p=0.26), a perseverative response (r10=-0.20; p=0.54) or a neutral response (r10=-0.39; p=0.21) (data not shown).
To ensure that this relationship was not specific to the last three discrimination problems completed, we examined the relationship between D2-like receptor availability and the average number of trials required to reach criterion for all reversals completed for each subject. This assessment indicated that the average number of trials required to reach criterion across all the reversal sessions was correlated with D2-like receptor availability in the caudate nucleus (r10=-0.68; p=0.01) and putamen (r10=-0.56; p=0.05) which was best described with an inverse function. The relationship was present even in a specific examination of performance on the first reversal completed (r10=-0.756; p=0.004 for the caudate nucleus and r10=-0.778; p=0.003 for the putamen).
We next examined the relationship between D2-like receptor availability in the caudate nucleus, putamen and ventral striatum and the measures of behavioral sensitivity to feedback. The probability of following positive feedback with a correct response was correlated with D2-like receptor availability in the caudate nucleus (r10=0.74; p=0.006) and putamen (r10=0.74; p=0.006), but not in the ventral striatum (r10=0.37; p=0.24) (Figure 2A) (see statistical map, Figure 3). Correspondingly, the probability of following positive feedback with a perseverative response (regressive responding to the initially trained stimulus) was related to D2-like receptor availability in the caudate nucleus (r10=-0.61; p=0.04), but not in the putamen (r10=-0.47; p=0.12) (Figure 2B). These relationships were best modeled with the inverse function as presented in Figure 2A and 2B: solid and dashed curves represent the relationship between feedback sensitivity and D2-like receptor availability in the caudate nucleus and putamen, respectively. No significant correlations were found between D2-like receptor availability and the probability of following positive feedback with a response to the never-rewarded stimulus. D2-like receptor availability in the three striatal regions was not correlated with the probability of subjects following negative feedback with either the same incorrect response or a response to one of the two other stimuli (all correlation |t|’s < 0.45) (see Figure 2C).
Voxel-wise comparison revealed a significant negative correlation between the number of trials required to reach criterion in the reversal phase and D2-like receptor availability, that extended throughout the caudate nucleus and putamen (Figure 3A). A similar negative correlation was found in the dorsal striatum between the probability of following positive feedback with a perseverative response and D2-like receptor availability (Figure 3B). A moderate correlation was found between D2-like receptor availability and the probability of following positive feedback with a correct response (r10=0.47; p=0.12), but did not survive the TFCE-corrected p < 0.05 threshold. Significant statistical maps were overlaid to visualize the anatomical distribution of the significant relationships in the coronal (Figure 3C) and the transverse section (Figure 3D).
Although D2-like receptor availability was not correlated with the number of trials required to reach criterion during the acquisition of novel stimulus-reward associations, the strong correlation found with positive feedback-sensitivity measures warranted examination of the relationship that this receptor system may have with feedback sensitivity during acquisition. D2-like receptor availability in the caudate nucleus, but not the putamen or ventral striatum, was linearly related to our measure of positive feedback sensitivity (r10=0.574; p=0.05) (Figure 4), but was not with negative feedback sensitivity (r10=0.182; p=0.572) (data not shown).
This study demonstrated that D2-like receptor availability within the dorsal aspects of the striatum was related to the ability to modify behavior during reversal learning, and to behavioral sensitivity to positive feedback. These results directly support the idea that the D2-like receptor system is involved in the ability to shift responding when the association between a stimulus and reward is changed, and suggest that variation in reversal-learning performance reflects individual differences in sensitivity to positive feedback. These relationships are maintained under the conditions of natural variation, rather than manipulation, and together with studies in humans and rodents (Cools et al., 2009; Boulougouris et al., 2009; Frank and O’Reily, 2006), provide powerful convergent evidence that the D2-dependent dopamine signaling system is crucially involved in aspects of behavioral flexibility and reinforcement sensitivity.
Experimental perturbations of D2-like receptor signaling alter performance in tasks that require flexible modifications in behavior; these relationships hold in several species (Herold et al., 2010; Boulougouris et al., 2009; Lee et al., 2007; Cools et al., 2009) indicating that this receptor system represents a phylogenetically conserved mechanism for the rapid adjustment of behaviors. The findings presented here add an important dimension to prior experimental results by demonstrating that individual differences in the ability to update behavior in a reversal-learning task are related to natural variation in D2-like receptor availability.
Our results provide evidence that the relationship between D2-like receptor availability and reversal-learning performance is anatomically confined to the dorsal striatum, with no relationship being found in the ventral striatum. These results are supported by data showing that a lesion of the dorsal, but not ventral, striatum impairs reversal learning in rats (Castane et al., 2010) and monkeys (Clarke et al., 2008). Moreover, activation of the dorsal striatum is observed in human subjects, studied with functional MRI, during a discrimination reversal task (Ghahremani et al., 2010). Though striatal mechanisms may themselves be involved in reversal learning, there is also evidence that striatal D2-like receptor availability is positively correlated with glucose metabolism in the orbitofrontal cortex (Volkow et al., 2000). Therefore, it is possible that striatal D2-like receptor availability may mechanistically relate to molecular and/or functional integrity of the orbitofrontal cortex, which in turn contributes to the correlations reported here.
The radioligand used in this study ([18F]fallypride) has equal affinity for both D2 and D3 receptor subtypes (Mukherjee et al., 1999), precluding assignment of the contributions of specific dopamine receptor subtypes. However, the relationships reported here were restricted to the dorsal striatum, an area with modest D3 receptor expression relative to the ventral striatum (Bouthenet et al., 1991). Moreover, mice lacking the D3 receptor exhibit enhanced reversal-learning performance (Glickstein et al., 2005) and administration of a D3 agonist impairs reversal-learning performance in monkeys (Smith et al., 1999), suggesting that low D3 receptor density would be expected to relate to reversal-learning performance in a manner opposite to that observed here. Therefore, the relationships reported in the current study are most likely due to variation in the D2 receptor subtype. However, further studies using subtype-specific antagonists may help to clarify the validity of these hypotheses.
Taken with a host of pharmacological evidence from humans and rats (Cools et al., 2009; Boulougouris et al., 2009), these data suggest that individual differences in reversal-learning performance are a result of underlying variation in D2-like receptor availability within the dorsal striatum. However, we cannot totally exclude the possibility that training history affected D2-like receptor availability. It is also possible that variation in receptor availability detected in the current study is due to differences in endogenous dopamine levels acting in competition with the radioligand for the D2-like receptor binding site, thereby influencing receptor availability measurements. Although we cannot reject this possibility, we believe it cannot fully account for the current findings. Based on evidence that striatal dopamine synthesis is positively correlated with reversal-learning performance (Cools et al., 2009), a dominant influence of dopaminergic tone on D2-like receptor availability would lead to a positive relationship with the number of trials required to reach criterion, opposite to our current findings. Therefore, we believe that the relationships presented in the current study are most likely due to variation in receptor level, and not to variation in dopamine levels; however, future studies examining D2-like receptor availability in the absence of synaptic dopamine levels are needed to verify this hypothesis.
The ability to learn or reverse a stimulus-response association requires an integration of both positive and negative feedback in order to refine subsequent choices. Several lines of evidence support a crucial role for the dopamine system in these abilities. Schultz et al. (1997) demonstrated that over the course of learning a stimulus-reward association, phasic firing of midbrain dopamine cells shifts from the time of reward presentation to the time of conditioned stimulus presentation. Subsequently, when a predicted reward is omitted, dopamine neuron activity declines below baseline (Hollerman and Schultz, 1998). Frank et al. (2004) have argued that dopamine, acting on specific receptor subtypes that exhibit a segregated distribution on striatal medium spiny neurons, exerts dissociable actions in response to positive and negative feedback during learning. This theory posits that phasic release of dopamine, acting on medium spiny neurons in the direct pathway that express D1 receptors, promotes learning from positive feedback, while declines in dopamine activity, locked to negative feedback, are hypothesized to release the D2-expressing medium spiny neurons in the indirect pathway from inhibition via D2 receptor signaling.
Here, however, we provide evidence that D2-like receptor availability within the dorsal striatum is selectively correlated with the ability of subjects to integrate positive, rather than negative, feedback in their ongoing choice behavior, which is surprising in light of the previously described theory. Notably, a recently developed neurocomputational model by Dreyer et al. (2010) suggested that both increases and decreases in dopamine-neuron activity affect D1- and D2-like receptor function, albeit possibly to different degrees. Therefore, it is possible that the association between D2-like receptor availability and positive feedback sensitivity stems from phasic dopamine release activating D2 receptors, as well as D1 receptors. Dopamine acting on D2-expressing medium spiny neurons may produce long-term depression in corticostriatal synapses on those neurons (Kreitzer and Malenka, 2007), reducing the strength of the indirect pathway that constrains behavior, resulting in an increase in the probability of making the same response on the following trial. Our results are consistent with deficits in positive feedback that have been reported in carriers of the A1 allele of the TaqIA polymorphism (Althaus et al., 2009; Jocham et al., 2009).
Here we report that D2-like receptor availability is correlated with positive feedback sensitivity not only during the reversal of a stimulus-reward association, but also during its initial acquisition. Although the strength of the correlation is greatest in the reversal stage of the task, where strong expectancy violations may magnify underlying deficits in behavioral sensitivity to feedback, D2-like receptors represent a principal substrate for explaining variation in positive feedback.
Relatively low D2-like receptor levels have been reported in several neuropsychiatric disorders, most prominently in substance abuse and dependence. Substance dependent individuals have lower D2-like receptor availability (Volkow et al. 1993, 1996, 2001; Lee, et al., 2009) and exhibit reversal-learning deficits (Salo et al., 2009; Fillmore and Rush, 2006; Ghahremani et al., 2011). Although animal studies have shown that chronic exposure to drugs can directly produce reductions in the striatal D2-like receptor availability (Nader et al., 2006) and deficits of reversal learning (Jentsch et al., 2002), there is also evidence that preexisting lower D2-like receptor levels may confer risk for behavioral dis-inhibition (Dalley et al., 2007) and drug self-administration (Dalley et al., 2007; Nader et al., 2006). Further, D2-like receptor availability is correlated with known risk factors for substance dependence, such as impulsivity (Lee et al., 2009; Buckholtz et al., 2010) and novelty seeking (Zald et al., 2008; Huang et al., 2010), which are themselves associated with cognitive deficits (Cools et al., 2007; James et al., 2007).
We therefore propose that reversal-learning deficits, which measure behavioral inflexibility, represent an intermediary process between D2-mediated transmission and behavior addictions. Further, pharmacological techniques that increase D2-mediated dopaminergic transmission, and improve behavioral flexibility, represent a principal treatment strategy for substance dependence, as behavioral flexibility is a known correlate of retention in a treatment program (Aharonovich et al., 2006; Moeller et al., 2001). Improving D2-like receptor transmission also constitutes a plausible intervention strategy for individuals at high–risk for substance dependence who have cognitive impairments (Giancola et al., 1996) that are predictive of greater substance use (Aytaclar et al., 1999).
Variation in D2-like receptor availability in the dorsal striatum explains individual differences in behavioral flexibility and positive feedback sensitivity. Genetic influences that modulate D2-like receptor expression and function in the dorsal striatum are therefore expected to influence impulsivity-like phenotypes and ultimately syndromes that involve impulse-control disorders. In this sense, D2-dependent dopamine transmission may represent a final, common, biochemical pathway to manifestations of behavioral inflexibility across diagnostic categories.
These studies were supported by the Consortium for Neuropsychiatric Phenomics at UCLA; the Consortium is funded by PHS grants UL1-DE019580 and RL1-MH083270. Additional support was derived from PHS grants T32-DA024635, P20-DA022539, P50-MH077248 and F31-DA028812.