|Home | About | Journals | Submit | Contact Us | Français|
Disordered dopamine neurotransmission is implicated in mediating impulsiveness across a range of behaviors and disorders including addiction, compulsive gambling, attention-deficit/hyperactivity disorder, and dopamine dysregulation syndrome. Whereas existing theories of dopamine function highlight mechanisms based on aberrant reward learning or behavioral disinhibition, they do not offer an adequate account of the pathological hypersensitivity to temporal delay that forms a crucial behavioral phenotype seen in these disorders. Here we provide evidence that a role for dopamine in controlling the relationship between the timing of future rewards and their subjective value can bridge this explanatory gap. Using an intertemporal choice task, we demonstrate that pharmacologically enhancing dopamine activity increases impulsivity by enhancing the diminutive influence of increasing delay on reward value (temporal discounting) and its corresponding neural representation in the striatum. This leads to a state of excessive discounting of temporally distant, relative to sooner, rewards. Thus our findings reveal a novel mechanism by which dopamine influences human decision-making that can account for behavioral aberrations associated with a hyperfunctioning dopamine system.
The characteristic loss of self-control and impulsivity associated with aberrant dopamine function is exemplified by disorders such as addiction, attention-deficit/hyperactivity disorder (ADHD), and dopamine dysregulation syndrome (Winstanley et al., 2006; Dagher and Robbins, 2009; O'Sullivan et al., 2009). In the latter, dopamine replacement therapy in the treatment of Parkinson's disease (PD) renders some patients prone to compulsive behavior, which manifests itself as excess gambling, shopping, eating, and other shortsighted behaviors. However, the broad phenotype of impulsivity that characterizes these behaviors subsumes a diversity of distinct decision-making processes that can be dissociated neurobiologically and pharmacologically (Evenden, 1999; Ho et al., 1999; Winstanley et al., 2004a, 2006; Dalley et al., 2008). These include a lack of inhibition of prepotent motor responses, overweighting of rewards relative to losses, failure to slow down in the face of decision-conflict, and a propensity to choose smaller–sooner over larger–later rewards.
In principle, some of the aforementioned deficits can be related to dopaminergic effects by way of dopamine's established role in reward learning (Redish, 2004; Frank et al., 2007; Dagher and Robbins, 2009). However, temporal (or choice) impulsivity—the preference for smaller–sooner over larger–later rewards, due to excessive discounting of future rewards (Ainslie, 1975; Evenden, 1999; Ho et al., 1999; Cardinal et al., 2004)—is much harder to account for in terms of learning, although it remains an important feature of putative dopaminergic impulsivity. Indeed, laboratory tests of intertemporal choice indicate that addicts and a subgroup of ADHD patients appear to have abnormally high temporal discount rates, strongly preferring smaller–sooner rewards (Sagvolden and Sergeant, 1998; Bickel and Marsch, 2001; Solanto et al., 2001; Winstanley et al., 2006; Bickel et al., 2007). This poses the question of whether dopamine has a specific role in computing how the temporal proximity of a reward relates to its subjective value (i.e., rate of temporal discounting), independent of its established contribution to reward learning.
To investigate whether dopamine modulates time-dependent coding of value, we administered the dopamine precursor l-dopa, the dopamine antagonist haloperidol, and placebo to healthy volunteers performing an intertemporal choice task. The task required subjects to make genuine choices between differing amounts of money, offered over variable time periods, mostly involving the choice between smaller–sooner versus larger–later monetary rewards. Such choices are well characterized by models that incorporate both the discounting effects of time and the discounting effects of increasing reward magnitude (diminishing marginal utility) (Pine et al., 2009). Accordingly, the discounted utility or subjective value of a delayed reward is determined by the product of the discount factor (a number between zero and one) and the utility of the reward. If dopamine modulates an individual's choice in this task, it might reflect a change in either the discount rate or the utility concavity/convexity (see Materials and Methods)—a distinction that we were able to probe here at both behavioral and neurophysiological levels, using functional magnetic resonance imaging (fMRI). Additionally, we assessed whether dopamine had any effect on the rate of slowing down engendered by decision–conflict (Frank et al., 2007; Pochon et al., 2008) to distinguish global from discrete influences on impulsivity.
We used fMRI while subjects chose between two serially presented options of differing magnitude (from £1 to £150) and delay (from 1 week to 1 year) (Fig. 1). Each subject performed the task on three separate occasions (relating to the three drug conditions). These choices were often smaller–sooner versus larger–later options. One of the subjects' choices was selected at random at the end of the experiment (in each experimental session) and paid for real (i.e., at the specified future date) by bank transfer. We used subjects' choices to assess the extent of discounting for both magnitude and time. We assessed a model that combined a utility function (converting magnitude to utility) with a standard hyperbolic discounting function. In simple terms, the function for the discounted utility (subjective value) of a delayed reward (V) is equal to D × U where D is a discount factor between 0 and 1 and U is undiscounted utility. D is typically a hyperbolic function of the delay to the reward and incorporates the discount rate parameter (K), which determines how quickly one devalues future rewards. U is (typically) a concave function of the magnitude of a reward and depends on an individual parameter (r) that determines the concavity/convexity of the function, or the rate of diminishing marginal utility for gains and consequently the instantaneous value of the larger relative to the smaller reward. The greater K or r, the more the individual is likely to choose the sooner option and therefore the more impulsive is the individual (Ho et al., 1999; Pine et al., 2009). In accordance with utility theory, choice is determined by the principle of utility maximization whereby the option with the greatest discounted utility is selected.
Fourteen right-handed, healthy volunteers were included in the experiment (6 males; 8 females; mean age, 21; range, 18–30). Subjects were preassessed to exclude those with a prior history of neurological or psychiatric illness. All subjects gave informed consent and the study was approved by the University College London ethics committee. One subject dropped out of the study after the first session and was not included in the results. Another did not complete the final (placebo) session in the scanner, but their behavioral data from all sessions and imaging data from two sessions were included in the results.
Each subject was tested on three separate occasions. Upon arrival on each occasion, subjects were given an instruction sheet to read explaining how the drug blinding would be implemented. They then completed a visual analog scale (Bond and Lader, 1974) that measured subjective states such as alertness, and were subsequently given an envelope containing two pills that were either 1.5 mg of haloperidol or placebo. One and a half hours after taking the first set of pills, subjects were given another envelope containing two pills that were either Madopar (containing 150 mg of l-dopa) or placebo. The placebo tablets (vitamin C or multivitamins) were indistinguishable from the drugs. In all, each subject received one dose of Madopar on one session, one dose of haloperidol on another, and on one session both sets of tablets were placebo. The order of each drug condition in relation to the testing session was counter-balanced across subjects and was unknown to the experimenter to achieve a double-blind design. Testing commenced 30 min after ingestion of the second set of tablets. The timings were aimed to achieve a peak plasma concentration of the drug approximately halfway through the testing. After testing, subjects completed another (identical) visual analog scale. No two testing sessions occurred within 1 week of each other.
The behavioral task was mostly as described by Pine et al. (2009). Each trial consisted of a choice between a smaller–sooner reward and a larger–later reward. The choice was presented serially, in three stages (Fig. 1). The first two stages consisted of presentation of the details of each option, i.e., the magnitude of the reward in pounds and the delay to its receipt in months and weeks. After presentation of the options, a third screen prompted the subject to choose between option 1 (the option presented first) or option 2, by means of a button-box, using their right hand. A 3 s delay followed each of the three phases. The choice could only be made during the 3 s following presentation of the choice screen. Once a choice had been made, the chosen option was highlighted in blue. Providing there was sufficient time, the subject could change his/her mind. There was a jittered delay of 1–4 s following the choice phase, followed by presentation of a fixation cross for 1 s.
The experiment consisted of a total of 200 trials. Option 1 was the smaller–sooner reward in 50% of trials. In addition, we included a further 20 “catch” trials, where one of the options was both greater in value and available sooner than the other one. These catch trials occurred approximately every tenth trial and enabled us to ascertain how well the subjects were concentrating on the task, under the assumption that the norm was to prefer the larger–sooner reward in these choices. Each subject was given the same array of choices on each testing session (i.e., each drug condition) with the exception of the first two subjects who were given a different set of choices on their first testing session. The option values were created using randomly generated magnitudes varying from £1 to £150 in units of £1 and delays ranging from 1 week to 1 year in units of single weeks (but presented as a number of months and weeks), again with a random distribution. This random nature of the values helped in orthogonalising magnitude and delay. To create choices between smaller–sooner and larger–later rewards, we introduced the constraint that the option with greater magnitude should be delayed more than the smaller, and vice versa for the catch trials. Subjects were assigned to one of two choice arrays depending on their responses within practice trials in their first session. This was done to match the presented choices to the level of impulsivity of the subject.
Payment was performed using a lottery to select one trial from each testing session. To impose ecological validity, we used a payment system that ensured that all the choices would be made in a realistic manner, with realistic consequences. Crucial to this design was the random selection of one of the choices made during the experiment, with real payment of the option chosen for that choice. This was achieved by way of a bank transfer made at the time associated with, and consisting of the amount of the selected option. Payment selection was implemented using a manual lottery after completion of all testing. The lottery contained 220 numbered balls, each representing a single trial from the task. The ball that was selected corresponded to the rewarded trial for that testing session. The magnitude and delay of the option that the subject chose in the selected trial was determined and awarded using a bank transfer. Thus, the payment each subject received was determined by a combination of the lottery and the choices that they made—a manipulation that ensured subjects treated all choices as real. The payment system was designed so that on average each subject would receive £75 per session. No other payment was awarded for participation in the experiment.
Before subjects were taken into the scanner, they were shown the lottery machine and given an explanation as to how the bank transfer would be implemented, to reassure them that the payment and selection system was genuine. After a short practice of six trials, they were taken into the scanner where they performed two sessions of 110 trials each, lasting in total ~50 min.
Functional imaging was conducted by using a 3-tesla Siemens Allegra head-only MRI scanner to acquire gradient echo T2*-weighted echo-planar images (EPI) with blood oxygenation level-dependent (BOLD) contrast. We used a sequence designed to optimize functional sensitivity in the orbitofrontal cortex (Deichmann et al., 2003). This consisted of tilted acquisition in an oblique orientation at 30° to the anterior cingulate–posterior cingulate AC–PC line, as well as application of a preparation pulse with a duration of 1 ms and amplitude of −2 mT/m in the slice selection direction. The sequence enabled 36 axial slices of 3 mm thickness and 3 mm in-plane resolution to be acquired with a repetition time (TR) of 2.34 s. Subjects were placed in a light head restraint within the scanner to limit head movement during acquisition. Functional imaging data were acquired in two separate 610 volume sessions. A T1-weighted structural image and fieldmaps were also acquired for each subject after the testing sessions.
To obtain an overall measure of impulsive choice, we counted the number of sooner options chosen out of the 220 trials, under each drug condition, for each subject. Trials where a response was not made were excluded from this sum in all three drug conditions. For example, if one subject did not respond in time for trial number 35 in the placebo condition, this trial was excluded from the count in the other two conditions for that subject. This ensured that the comparisons were made on a trial-by-trial basis (as the same array of trials was given in each testing session) and any effect of drug on this measure was not related to the number of choices made in each condition. A repeated-measures ANOVA was used to look for any differences in this overall measure across drug conditions.
We implemented the softmax decision rule to assign a probability (PO1 for option 1) to each option of the choice given the value of the option (VO1 for option 1) whereby
VOi represents the value of an option (i.e., a delayed reward) according to a particular model of option valuation (see below). The β parameter represents the degree of stochasticity of the subject's behavior (i.e., sensitivity to the value of each option).
We used a discounted utility model of option valuation, which we previously reported (Pine et al., 2009) as providing an accurate fit to subject's choices in this task. This model states that the discounted utility (V) of a reward of magnitude (M) and with a delay (d) can be expressed as follows:
D can be thought of as the discount factor—the delay-dependent factor (between 0 and 1) by which the utility is discounted in a standard hyperbolic fashion (Mazur, 1987). The discount rate parameter K quantifies an individual's tendency to discount the future such that a person with a high K quickly devalues rewards as they become more distant. U is undiscounted utility and is governed by the magnitude of each option and r, a free parameter governing the curvature of the relationship. The greater the value of r, the more concave the utility function, and where r is negative, the utility function is convex. The greater r (above zero), the greater the rate of diminishing marginal utility and the more impulsive is the individual in choice. Note that according to traditional models of intertemporal choice valuation, which do not take into account the discounting of magnitude (Mazur, 1987), impulsivity, defined by the propensity to choose the smaller–sooner option, is solely a function of K and so the two might be expected to correlate perfectly. Hence, K is often considered a measure of this trait. However, since the discounting of magnitude has also been shown to determine choice outcome in animals and humans (Ho et al., 1999; Pine et al., 2009), we prefer to equate impulsivity with choice behavior as the temporal discount rate does not perfectly correlate with this key measure.
To calculate the maximum likelihood parameters for each model as well as a measure of the fit, maximum likelihood estimation was used. Each of the parameters (including β) was allowed to vary freely. For each subject, the probability was calculated for each of the 220 options chosen from the 220 choices (including catch trials), using the softmax formula and implemented with optimization functions in Matlab (MathWorks). The log-likelihood was calculated using the probability of the option chosen at trial t (PO(t)) from Eq. 1 such that
A repeated-measures ANOVA was used to test for any differences in the discount rate (K) and the utility concavity (r) across drug conditions.
For the purposes of the imaging and reaction time analyses, a further estimation was performed whereby all the choices from each subject in each condition were grouped together (as if made by one subject) and modeled as a canonical subject to estimate canonical parameter values (using the fitting procedure above, Parameter estimation). This was performed to reduce the noise associated with the fitting procedure at the single-subject level. In addition, we did not wish to build the behavioral differences into our regression models when analyzing the fMRI data, as we sought independent evidence for our behavioral findings.
Image analysis was performed using SPM5 (www.fil.ion.ucl.ac.uk/spm). For each session, the first five images were discarded to account for T1 equilibration effects. The remaining images were realigned to the sixth volume (to correct for head movements), unwarped using fieldmaps, spatially normalized to the Montreal Neurological Institute (MNI) standard brain template, and smoothed spatially with a three-dimensional Gaussian kernel of 8 mm full-width at half-maximum (FWHM) (and resampled, resulting in 3 × 3 × 3 mm voxels). Low-frequency artifacts were removed using a 1/128 Hz high-pass filter and temporal autocorrelation intrinsic to the fMRI time series was corrected by prewhitening using an AR(1) process.
Single-subject contrast maps were generated using parametric modulation in the context of the general linear model. We performed an analysis, examining variance in regional BOLD response attributable to different regressors of interest: U, D, and V for all options over all drug conditions. This allowed us to identify regions implicated in the evaluation and integration of different components of value (in the placebo condition) and to look for any differences in these activations across drug conditions.
U, D, and V for each option (two per trial) were calculated using the canonical parameter estimates (K and r) in the context of our discounted utility model and convolved with the canonical hemodynamic response function (HRF) at the onset of each option. All onsets were modeled as stick functions and all regressors in the same model were orthogonalized (in the orders stated above) before analysis by SPM5. To correct for motion artifacts, the six realignment parameters were modeled as regressors of no interest in each analysis. In an additional analysis, we removed any potential confound relating to the orthogonalization of the regressors in our fMRI analysis by implementing another regression model but now removing the orthogonalization step. Here regressors were allowed to compete for variance such that in this more conservative model any shared variance components were removed, revealing only unique components of U, D, and V. Under this model, we again observed the same differences in D and V across drug conditions and no difference in U, although the magnitude of the differences was reduced.
At the second level (group analysis), regions showing significant modulation by each of the regressors specified at the first level were identified through random-effects analysis of the β images from the single-subject contrast maps. We included the change in impulsivity measure (difference in number of sooner chosen) as a covariate when performing the contrast relating to differences in l-dopa and placebo trials. We report results for regions where the peak voxel-level t value corresponded to p < 0.005 (uncorrected), with minimum cluster size of five. Coordinates were transformed from the MNI array to the stereotaxic array of Talairach and Tournoux (1988) (http://imaging.mrc-cbu.cam.ac.uk/imaging/MniTalairach).
The structural T1 images were coregistered to the mean functional EPI images for each subject and normalized using the parameters derived from the EPI images. Anatomical localization was performed by overlaying the t maps on a normalized structural image averaged across subjects and with reference to the anatomical atlas of Mai et al. (2003).
To examine the effect of decision conflict (choice difficulty) on decision latency, we calculated a measure of difficulty for each of the 220 choices by calculating the difference in discounted utility (ΔV) of the two options. This measure was calculated using the discounted utility model and the canonical parameter estimates (for the same reason they were used in the fMRI analyses). A linear regression was then performed to model the relationship between the decision latency for each choice and the difficulty measure. The parameter estimates (βs) were then used as a summary statistic and a second level analysis was performed by means of a one-sample t test comparing the βs against zero. This was performed separately for the group in each drug condition. To test for any differences in the relationship between conflict and latency across drug conditions, we used paired samples t tests.
We first analyzed the effects of the drug manipulation on behavior by considering the proportion of smaller–sooner relative to larger–later options chosen, of a total of 220 choices, made in each condition. These data revealed a marked increase in the number of sooner options chosen in the l-dopa condition relative to the placebo condition (mean 136 vs 110, p = 0.013) (Table 1, Fig. 2). Strikingly, this pattern was observed in all subjects where this comparison could be made. There was no significant difference between haloperidol and placebo conditions on this disposition. Note, the task consisted of the same choice array in each condition.
We next used maximum likelihood estimation to find the best-fitting parameters (K and r) for the discounted utility model, for each subject in each condition, to determine whether a specific effect on either of these parameters mediated the observed increase in behavioral impulsivity. By comparing the estimated parameters controlling the discount rate and utility concavity across conditions, a specific effect of l-dopa on the discount rate was found, with no effect on utility concavity (Table 1, Fig. 2, and supplemental Table 1, available at www.jneurosci.org as supplemental material). Thus, under l-dopa, a higher discount rate was observed relative to placebo (p = 0.01), leading to a greater devaluation of future rewards. By way of illustration, using a group canonical parameter estimate to plot a discount function for each drug condition, it can be seen that under placebo it required a delay of ~35 weeks for a £150 reward to have a present (subjective) value of £100, however, under l-dopa the same devaluation took place with a delay of just 15 weeks (Fig. 2). Canonical parameter estimates used for the imaging analyses were 0.0293 for K and 0.0019 for r (all values of K reported are calculated from time units of weeks).
In accordance with Pine et al. (2009), parameter estimates for each subject (across conditions) were greater than zero, revealing both a significant effect of temporal discounting (p < 0.001) and nonlinearity (concavity) of instantaneous utility (p < 0.05). Note that unlike traditional models of intertemporal choice (Mazur, 1987), where choice outcome is solely a function of K, the model used here entails that the number of sooner options chosen also depends on the r parameter (see Materials and Methods) (Pine et al., 2009) and hence K is not in itself a pure measure of choice impulsivity. Further, the accuracy of estimated parameters depends on both the stochasticity and consistency of subjects' responses. For example, the estimated parameters in subject 13's placebo trial were anomalous in relation to the rest of the data (supplemental Table 1, available at www.jneurosci.org as supplemental material), indicating this subject could have made inconsistent choices in this session. When comparing across subjects, note that the number of sooner choices made is also dependent on the choice set the subject received (one of two).
Additionally, we examined whether a slowing down in decision latencies was apparent as choices became increasingly difficult—consequent upon increasing closeness in option values—and whether any group differences were apparent on this measure. We performed a regression to assess the relationship between decision latency and the difficulty of each choice as measured by the difference in discounted utility (ΔV) between the two choice options, calculated using the estimated parameter values. In placebo (p < 0.001), l-dopa (p < 0.001), and haloperidol (p < 0.001) conditions, subjects' decision latencies increased as ΔV got smaller, that is, as the difference in subjective value between the options got smaller. However, no overall difference was observed in this measure across drug conditions. This indicates that, unlike the choice outcome, dopamine manipulation did not influence the amount of time given to weigh a decision, or ability to “hold your horses,” and corroborates the suggestion that impulsivity is not a unitary construct (Evenden, 1999; Ho et al., 1999; Winstanley et al., 2004a; Dalley et al., 2008). This observation accords with a previous finding that dopamine medication status in PD was not associated with change in decision latencies in a different choice task (Frank et al., 2007).
Subjective effects were analyzed by comparing changes in the three factors identified by Bond and Lader (1974), namely, alertness, contentedness, and calmness, relative to the change in scores observed in the placebo condition. Differences were found in the haloperidol versus placebo conditions, where subjects were less alert under haloperidol (p < 0.05).
To establish how enhanced impulsivity under l-dopa was represented at a neural level, we applied three (orthogonalized) parametric regressors, U, D, and V, associated with the presentation of each option, as dictated by our model, to the brain imaging data. The regressors were created for each subject, in each condition, using canonical parameter values estimated from all subjects' choices over all sessions, in a test of the null hypothesis that brain activity does not differ between conditions.
In a preliminary analysis, we examined correlations for these three regressors in the placebo condition to replicate previous findings (Pine et al., 2009). Our results (supplemental Results, available at www.jneurosci.org as supplemental material) were consistent with those shown previously, in that D, U, and V all independently correlated with activity in the caudate nucleus (among other regions). This supports a hierarchical, integrated view of option valuation where subcomponents of value are dissociably encoded and then combined to furnish an overall value used to guide choice.
The critical fMRI analyses focused on the key behavioral difference in option valuation under l-dopa compared with placebo conditions. When comparing neural activity for U, D, and V, significant differences were found for both D and V, a finding that matches the behavioral result. Specifically, we observed enhanced activity in regions relating to the discount factor D under l-dopa relative to placebo conditions (Fig. 3a and supplemental Results, available at www.jneurosci.org as supplemental material) and no effect of haloperidol (that is, the regression coefficients in the placebo and haloperidol condition did not differ significantly). These regions included the striatum, insula, subgenual cingulate, and lateral orbitofrontal cortices. These results show that the characteristic decrease in activity of these regions as rewards become more delayed (or increase as they become temporally closer) (McClure et al., 2004; Tanaka et al., 2004; Kable and Glimcher, 2007; Pine et al., 2009) (see also supplemental Results for placebo, available at www.jneurosci.org as supplemental material) is more marked in the l-dopa relative to placebo conditions, in a manner that parallels the behavioral finding, where l-dopa increased preference for sooner rewards by increasing the discount rate, thereby rendering sooner rewards more attractive relative to later rewards. Moreover, just as there was no significant difference in the estimated r parameter across these trials, we observed no significant difference in U activity between l-dopa and placebo trials, indicating that l-dopa did not affect the encoding of reward utility.
Previous studies (Kable and Glimcher, 2007; Pine et al., 2009), as well as an analysis of the placebo group alone, implicate striatal regions, among others, in encoding discounted utility (V). When comparing regions correlating with V, decreased activity was observed in caudate, insula, and lateral inferior frontal regions, in l-dopa compared with placebo conditions (Fig. 3b and supplemental Results, available at www.jneurosci.org as supplemental material). This result indicates that for a reward of a given magnitude and delay, reduced activity in regions encoding subjective value (discounted utility) was engendered by l-dopa. This reduction was associated with the enhanced temporal discounting, and led to an increase in the selection of smaller–sooner (impulsive) options in this condition relative to placebo.
Because the fMRI data used the same single set of canonical parameters (across all conditions, testing the null hypothesis that they are all the same), these findings accord with the behavioral results whereby increasing the discount rate under l-dopa leads to a reduction in D, leading to a corresponding reduction in V and, hence, an increased relative preference for sooner rewards. Note that if dopamine encoded discounted utility alone, one would predict the opposite result, with greater activity in the l-dopa condition.
Inspection of the behavioral results (Table 1, Fig. 2) revealed that an increase in impulsivity following l-dopa was expressed to a greater extent in some subjects than in others. On this basis, we performed a covariate analysis on the previous contrasts by calculating a difference score of the number of sooner options chosen in the placebo and l-dopa trials. The larger this metric, the greater the increase in impulsivity (discount rate) induced by l-dopa. By regressing this quantity as a covariate in the contrast comparing D in l-dopa minus placebo conditions (Fig. 3a), we found a significant correlation with activity in the amygdala (bilaterally) (Fig. 4). Because the difference in choice score across subjects may have been partially affected by the fact that subjects were assigned to one of two possible choice sets, and to increase power (being able to include more subjects), we repeated this analysis, this time using the difference in estimated K values from placebo to l-dopa trials. The result of this analysis (see supplemental Results, available at www.jneurosci.org as supplemental material) again demonstrated a strong positive correlation between amygdala activity and degree of increase in K from placebo to l-dopa trials. These results suggest that individual subject susceptibility to impulsivity under the influence of l-dopa is modulated by the degree of amygdala response to temporal proximity of reward.
Existing theories of dopamine focus on its role in reward learning, where dopamine is thought to mediate a prediction error signal used to update the values of states and actions that allow prediction and control, respectively, during decision-making. These models have been used to illustrate how abnormal dopamine processing might lead to impulsive and addictive behaviors, on the basis of experience (i.e., through learning) (Redish, 2004; Frank et al., 2007; Dagher and Robbins, 2009). Here, a distinct aspect of impulsivity was explicitly probed, based on the relationship of the timing of rewards and their utility, independently of feedback and learning. In intertemporal choice, decision-makers must choose between rewards of differing magnitude and delay. This is achieved by discounting the value of future amounts of utility (in accordance with their delay) to compare their present values. Within this framework, dopamine could potentially increase impulsive choice in two distinct ways (Pine et al., 2009), as follows: as a result of an increased rate of diminishing marginal utility for gains (which would decrease the subjective instantaneous value of larger magnitude relative to smaller magnitude rewards), or through enhanced temporal discounting of future rewards. Our results suggest that dopamine selectively impacts on the discount rate, without any significant effect on the utility function. Moreover, these behavioral results were independently supported by the fMRI data in that the key difference engendered by l-dopa was a modulation of neural responses in regions associated with the discounting of rewards and, consequently, their overall subjective value, with no effects evident for the actual utility of rewards. In summary, this study provides evidence that dopamine controls how the timing of a reward is incorporated into the construction of its ultimate value. This suggests a novel mechanism through which dopamine controls human choice and, correspondingly, traits such as impulsiveness.
Our results add weight to the suggestion that impulsivity is not a unitary construct and moreover that different subtypes of impulsiveness can be dissociated pharmacologically and neurobiologically (Evenden, 1999; Ho et al., 1999; Winstanley et al., 2004a; Dalley et al., 2008). The effects of dopamine were only observable in impulsive choice as measured by choice outcome/preference but did not impact on deliberation—“holding your horses” (Frank et al., 2007)—that occurs when options are closely valued, engendering decision–conflict (Botvinick, 2007; Pochon et al., 2008) also related to reflection or preparation impulsiveness (Evenden, 1999; Clark et al., 2006).
No human study has as yet demonstrated dopamine's propensity to enhance temporal impulsiveness. Previous dopamine manipulations in rodents have shown inconsistent effects in intertemporal choice, with some showing that dopamine enhancement leads to a decrease in impulsive choice or that dopamine attenuation leads to an increase (Richards et al., 1999; Cardinal et al., 2000; Wade et al., 2000; Isles et al., 2003; Winstanley et al., 2003; van Gaalen et al., 2006; Bizot et al., 2007; Floresco et al., 2008), whereas others demonstrate the opposite, a dose-dependent effect, or no effect (Logue et al., 1992; Charrier and Thiébot, 1996; Evenden and Ryan, 1996; Richards et al., 1999; Cardinal et al., 2000; Isles et al., 2003; Helms et al., 2006; Bizot et al., 2007; Floresco et al., 2008). A number of factors may contribute to these discrepancies, namely, whether the manipulation occurs prelearning or postlearning, whether a cue is present during the delay, presynaptic versus postsynaptic drug effects, the paradigm used, the drug used/receptor targeted, the involvement of serotonin, and particularly the drug dosage. Human studies of intertemporal choice have observed an increase in self-control (de Wit et al., 2002) or no effect (Acheson and de Wit, 2008; Hamidovic et al., 2008) when enhancing dopamine function. Most of these studies are complicated by their use of monoaminergic stimulants such as amphetamine or methylphenidate, which are often thought to decrease impulsivity. These studies could be confounded by the concomitant release of serotonin (Kuczenski and Segal, 1997), which is also implicated in the modulation of intertemporal choice. Specifically, it has been shown that enhancing serotonin function can reduce impulsiveness in intertemporal choice or vice versa (Wogar et al., 1993; Richards and Seiden, 1995; Poulos et al., 1996; Ho et al., 1999; Mobini et al., 2000) and that destruction of serotonergic neurons can block the effects of amphetamine (Winstanley et al., 2003). Furthermore, it is thought that, on the basis of extensive evidence, moderate doses of amphetamine reduce dopamine neurotransmission via presynaptic effects, which may explain its dose-dependent effects in many previous studies as well as its therapeutic efficacy (in moderate doses) in a putatively hyperdopaminergic ADHD (Seeman and Madras, 1998, 2002; Solanto, 1998, 2002; Solanto et al., 2001; de Wit et al., 2002). l-Dopa has not previously been used to affect impulsive choice, and perhaps offers more compelling and direct evidence for dopamine's role. Although l-dopa can lead to increases in noradrenaline and its precise mode of action is not well understood, noradrenaline is not thought to play a major role in the regulation of intertemporal choice (van Gaalen et al., 2006). Additionally, it is possible that l-dopa could have caused subjective effects that were not picked up by the subjective scales used here.
Our failure to find a corresponding reduction in impulsivity relative to placebo with administration of the putative dopaminergic antagonist haloperidol is likely to reflect a number of factors. These include haloperidol's nonspecific and widespread pharmacological effects or dosage—some studies indicate haloperidol may paradoxically boost dopamine in small doses, due to presynaptic effects on the D2 autoreceptor (Frank and O'Reilly, 2006). Additionally, the subjective effects caused by the drug, including reduction in alertness, may have made the data noisier. Further studies should use more specific dopamine antagonists to assess whether a reduction in dopamine function can decrease impulsivity in humans.
Dopamine is known to have a dominant effect on primitive reward behaviors such as approach and consummation (Parkinson et al., 2002). Such effects are consistent with a broad role in the construction of incentive salience (Berridge, 2007; Robinson and Berridge, 2008) and are more difficult to account for in terms of learning, per se. The mediation of unconditioned and conditioned responses by dopamine relates to the concept of Pavlovian impulsivity, where responses associated with primary, innate values form a simple, evolutionarily specified action set operating alongside, and sometimes in competition with, other control mechanisms, such as habit-based and goal-directed action (Dayan et al., 2006; Seymour et al., 2009). Importantly, these “Pavlovian values and actions” are characteristically dependent on spatial and temporal proximity to rewards and, as such, provide one possible mechanism via which dopamine could control the apparent rate of temporal discounting. If such a process underlay dopamine-induced impulsivity in this task, then it would suggest that this innate (Pavlovian) response system operates in a much broader context than currently appreciated, since the rewards in this task are secondary rewards occurring at a minimum of 1 week. This explanation stands in contrast to the idea of a selective dopaminergic enhancement of a system (based in limbic areas) that only values short-term rewards (McClure et al., 2004). Such a duel-system account would be difficult to reconcile with previous studies (Kable and Glimcher, 2007; Pine et al., 2009), which suggest that limbic areas value rewards at all delays.
Such an account raises important questions about the amygdala-dependent susceptibility to dopamine-induced impulsivity that we observed in our data. Here, amygdala activity in response to D covaried with the degree to which behavior became more impulsive following l-dopa. In Pavlovian–instrumental transfer (PIT), a phenomenon dependent on connectivity between amygdala and striatum (Cardinal et al., 2002; Seymour and Dolan, 2008), and whose expression is known to be modulated by dopamine (Dickinson et al., 2000; Lex and Hauber, 2008), appetitive Pavlovian values increase responding for rewards. Notably, individual susceptibility to this influence correlates with amygdala activity (Talmi et al., 2008), suggesting that the amygdala might modulate the extent to which primary conditioned and unconditioned reward values influence instrumental (habit-based and goal-directed) choice. If this is indeed the case, then it predicts that concurrent and independent presentation of reward-cues during intertemporal choice might elicit enhanced temporal impulsivity via an amygdala-dependent mechanism. We note evidence that basolateral amygdala lesions increase choice impulsivity in rodents (Winstanley et al., 2004b), an observation opposite to what we would expect based on the current data. In contrast, amygdala activity has previously been reported to correlate with the magnitude of temporal discounting in an fMRI study (Hoffman et al., 2008). These issues provide a basis for future research that can systematically test these divergent predictions in humans.
Lastly, these results speak to a wider clinical context and offer an explanation as to why an increase in impulsive and risky behaviors is observed in dopamine dysregulation syndrome, addiction, and ADHD, all of which are associated with hyperdopaminergic states caused by striatal dopamine flooding or sensitization (Solanto, 1998, 2002; Seeman and Madras, 2002; Berridge, 2007; Robinson and Berridge, 2008; Dagher and Robbins, 2009; O'Sullivan et al., 2009). In support of this thesis, Voon et al. (2009) found that dopamine medication status in PD patients with impulse-control disorders was associated with increased rates of temporal discounting. In conclusion, the results presented here demonstrate dopamine's ability to enhance impulsivity in humans and offer a novel insight into its role in modulating impulsive choice in the context of temporal discounting. These findings suggest that humans may be susceptible to temporary periods of increased impulsivity when factors that increase dopamine activity, such as the sensory qualities of rewards, are present during decision-making.
This work was funded by a Wellcome Trust Programme Grant to R.J.D., and A.P. was supported by a Medical Research Council studentship. We thank K. Friston, J. Roiser, and V. Curran for help with planning and analyses, and for insightful discussions.