|Home | About | Journals | Submit | Contact Us | Français|
Negative reinforcement results in behavior to escape or avoid an aversive outcome. Withdrawal symptoms are purported to be negative reinforcers in perpetuating substance dependence, but little is known about negative reinforcement learning in this population. The purpose of this study was to examine reinforcement learning in substance dependent individuals (SDI), with an emphasis on assessing negative reinforcement learning. We modified the Iowa Gambling Task to separately assess positive and negative reinforcement. We hypothesized that SDI would show differences in negative reinforcement learning compared to controls and we investigated whether learning differed as a function of the relative magnitude or frequency of the reinforcer.
Thirty subjects dependent on psychostimulants were compared with 28 community controls on a decision making task that manipulated outcome frequencies and magnitudes and required an action to avoid a negative outcome.
SDI did not learn to avoid negative outcomes to the same degree as controls. This difference was driven by the magnitude, not the frequency, of negative feedback. In contrast, approach behaviors in response to positive reinforcement were similar in both groups.
Our findings are consistent with a specific deficit in negative reinforcement learning in SDI. SDI were relatively insensitive to the magnitude, not frequency, of loss. If this generalizes to drug-related stimuli, it suggests that repeated episodes of withdrawal may drive relapse more than the severity of a single episode.
Failure to learn from negative feedback may be manifested as poor decision making in the face of choices that involve ambiguity and risk. A laboratory task used to assess such decision making is the Iowa Gambling Task (IGT; Bechara et al., 1994). In this task, participants decide on each trial which one of four decks of cards to play. The decks vary in the magnitude and probability of short-term and long-term monetary gains and losses, such that over time, two decks result in a net gain and two result in a net loss. To be successful, participants must learn to play the advantageous decks instead of the disadvantageous decks; this involves focusing on the long-term expected value of the decks, not just on short-term rewards. Substance dependent individuals (SDI) often fail to learn to ignore the decks with negative long-term consequences (Bechara et al., 2001; Bechara and Damasio, 2002; Bolla et al., 2003; Grant et al., 2000; Verdejo et al., 2004, 2006, 2007). Individuals dependent on stimulants perform worse than other SDI (Gonzalez et al., 2007) and pharmacologic therapies for drug dependence may further influence performance (Pirastu et al., 2006). (See Buelow and Suhr (2009) for a review of studies on the IGT in SDI.)
The IGT payment schedule is complex (Fellows, 2007), with the contingencies of each deck confounded by the size and frequency of gains and losses in an intermittent reinforcement paradigm. Previous research suggests that magnitude and frequency information may be important variables. For example, van den Bos et al. (2006) found that manipulating relative reward magnitude of good and bad decks led to different choice behavior in control subjects. Mathematical models have also been used to better understand factors that influence performance on the IGT; for example, Stout et al. (2004) found cocaine abusers to be less influenced by losses and more sensitive to gains, and Fridberg and colleagues (2010) found that cannabis users were under-influenced by loss magnitude compared to controls. Further, Frank and Claus (2006) proposed that performance on the IGT relies on the integration of magnitude and frequency information, which is represented across distinct neural regions. However, no empirical study to date has used a procedure that disentangles the influences of magnitude and frequency of gains and losses on the IGT.
The original IGT evaluates the effects of positive reinforcement and punishment on behavior, but another form of learning, negative reinforcement, may play an important role in the persistence of addiction. While initial drug use is largely driven by positive reinforcement, the positive reinforcing effects of drugs decrease over time (Ahmed et al., 2002; Volkow et al., 1997). SDI may persist or reinitiate using drugs, not because of positive effects, but rather to escape or avoid withdrawal symptoms and/or negative affective states. In this scenario, unpleasant physical or psychological conditions serve as negative reinforcers to reinitiate drug use (Koob and Le Moal, 2001). This theory suggests that SDI's continued maladaptive behavior may be influenced significantly by negative reinforcement, in which a behavior is acquired and maintained by the escape or avoidance of an aversive consequence. Examples of negatively reinforced behaviors in everyday life include wearing a seatbelt to avoid an aversive sound, stopping at a Stop sign to avoid a ticket or accident, and saying no to a “get rich quick” scheme to avoid losing one's money. This type of learning has not been studied in SDI so it is not known whether SDI are similar to, worse than, or better than normal controls in their negative reinforcement learning.
The purpose of this study was to examine the effects of negative and positive reinforcement in SDI while controlling for frequency and magnitude information using a decision making task based on the IGT. We adopted a modified version of the IGT (mIGT) used in previous studies that required an active response to avoid cards from disadvantageous decks (Cauffman et al., 2010; Tanabe et al., 2007). Subjects had to learn to increase specific behaviors to minimize loss (negative reinforcement) and maximize gain (positive reinforcement). We modified the task further so that two decks varied only in the magnitude of gains and losses, and two decks varied in the frequency of gains and losses.
Secondarily, we compared this modified IGT with other measures commonly used in the IGT research literature (general intelligence, self-reported impulsivity, delay discounting, risk taking, and executive function) to explore whether the relationships of these measures and our mIGT were similar to that found with the IGT. IGT performance usually does not correlate with general intelligence (Bechara et al., 1994; Grant et al., 2000) or executive function as exemplified by the Wisconsin Card Sorting Test (Brand et al., 2006), but typically is correlated with self-reported impulsivity (Franken et al., 2008; Sweitzer et al., 2008). Studies of risk taking (e.g., Balloon Analogue Risk Task; BART) and delay discounting have usually shown poorer performance in SDI (Crowley et al., 2006; Madden et al., 1997; Petry, 2002; Reynolds, 2006), but correlations with IGT have been mixed (Hammers and Suhr, 2010; Monterosso et al., 2001; Reynolds, 2006; Sweitzer et al., 2008).
Study participants were 30 SDI and 28 community controls (CTL). All SDI were involved in residential treatment programs at the University of Colorado School of Medicine. Twenty eight were enrolled in the Addiction Research and Treatment Service (ARTS) program and two were at the Center for Dependency, Addiction, and Rehabilitation (CeDAR).
Male and female SDI were recruited if they met DSM-IV criteria for dependence upon cocaine, methamphetamine, or both (APA, 2000). Twenty nine participants were also dependent upon other drugs, which is typical of the polysubstance abuse seen in this population (See Table 1). SDI were at ARTS for at least 60 days before becoming eligible for the study, during which time they were monitored for drug use by observation and toxicology screening. CeDAR is a 30-day residential program, with a minimum of 14 days abstinence prior to recruitment. Self-reported abstinence was 1.34 years (SD=0.98); many SDI entered treatment directly from jail or prison, but we have verification only for the time each subject was in treatment prior to participating in this study.
Control participants (CTL) were recruited from the community via newspaper ads, flyers, and a research firm that provided names of individuals living in the same neighborhoods as the SDI. CTL were excluded if they met criteria for dependence upon alcohol or any drug except tobacco. Seven CTL were dependent on tobacco.
All candidates were excluded if they had history of head trauma with loss of consciousness greater than 15 minutes, neurological illness, schizophrenia, bipolar disorder, or current major depression.
Participants completed diagnostic structured interviews, cognitive and behavioral tasks, and an impulsivity questionnaire. All measures were administered according to standardized procedures by a trained research assistant; sessions ranged from 2–4 hours, with breaks as needed. All participants provided written informed consent approved by the Colorado Multiple Institutional Review Board.
This computerized structured interview (Cottler et al., 1989, 1995) was administered to characterize the substance dependence diagnoses of the SDI and to ensure that CTL did not meet criteria for any dependence diagnosis other than tobacco. Results provided DSM-IV diagnoses and symptom counts for tobacco, alcohol, and nine other drug categories (stimulants, cocaine, marijuana, hallucinogens, opioids, inhalants, sedatives, club drugs, and PCP).
This computerized structured interview provides information about psychiatric diagnoses according to the DSM-IV (Robins et al., 1995). Three modules were administered to exclude subjects with history of schizophrenia or bipolar disorder, or current major depression.
Participants completed a modified IGT (mIGT) incorporating four changes, but maintaining an intermittent reinforcement schedule similar to the IGT. First, rather than allowing participants to choose a card from any of the decks on each trial, the computer presented a card from one of the four decks for the participant to “Play” or “Pass”; either choice required a button press. In that way, learning to avoid a bad deck required an active response. If the subject pressed neither Play nor Pass within 1.2 seconds of stimulus onset, “No Response” was recorded and the next card was presented. Second, decks were presented in a pseudo-random order to ensure that participants received identical outcomes after a given number of “Play” responses, thus allowing participants to learn the nature of the decks at a similar rate. Third, the outcome was a single positive or negative monetary value rather than a gain that was intermittently accompanied by a loss (Cauffman et al., 2010; Peters and Slovic, 2000; Tanabe et al., 2007). A fourth change separated decks on the basis of type of feedback provided. Two decks were advantageous and two decks were disadvantageous, but in our mIGT two decks differed only in the magnitude of gain and loss (keeping the frequency of gain/loss constant), while the other two decks differed only in the frequency of gain and loss (keeping the magnitude of gain/loss constant). (See Table 2 for payout structure).
Participants started with a $2000 credit. If the subject pressed Play, a monetary outcome (gain or loss) was displayed, and this amount was added to or subtracted from the running total. If the subject pressed Pass, the running total remained the same. The task was programmed in E-prime 2.0 (Psychology Software Tools, 2010) and given during functional MRI (fMRI) scanning, the results of which will be reported separately. Each deck was presented 50 times for a total of 200 trials and interspersed with fixation trials. Subjects were told that they could earn an extra $10.00 if they did well on the game. In fact, all subjects received the $10.00 regardless of their performance.
To examine reinforcement learning, we compared the active responses (Pass or Play) made during the first half of the task [Pass-1 or Play-1] with responses made during the second half [Pass-2 or Play-2]. The primary variables of interest were number of Pass responses on Bad Decks to measure negative reinforcement and number of Play responses on Good Decks to measure positive reinforcement.
BIS-11 is a 30-item self-report questionnaire that provides a measure of impulsivity (Patton et al., 1995).
The WASI 2-subtest version (Vocabulary and Matrix Reasoning) was used to estimate general intelligence (Psychological Corporation, 1999).
Participants completed this computerized risk task (Lejuez et al., 2002), in which participants earn hypothetical money by increasing the size of a balloon, but if the balloon “pops” (which can happen at any time), earnings for that balloon are lost. Each trial requires a decision between increasing earnings versus “collecting” money already earned. The dependent variable was average number of pumps excluding balloons that popped (Lejeuz et al., 2002).
Participants completed a computerized discounting task in which they made decisions to choose a hypothetical $1000 reward at some time in the future or a lesser amount now. There were seven delays ranging from 1 day to 10 years and 30 possible immediate amounts ranging from $1 to $999 (Green et al., 1994, 1996). To assess the rate of discounting of delayed reward, we used two approaches: (1) estimating the discounting rate from the hyperbolic equation: V = A / (1 + kD) where V is the current subjective value of the delayed reward, A is the amount of the delayed reward, D is the delay to the reward and k is a free parameter representing the rate of devaluation of the delayed reward and (2) computing area under the curve (AUC) for each subject's response trajectory.
Participants completed the WCST (Heaton et al., 1993), a standardized test that requires utilization of feedback to shift cognitive sets. The variable of interest was number of perseverative errors.
Groups' demographics were compared with chi-square and independent t-tests. Dependent variables were inspected for homogeneity of variance and normal distribution. For variables that were not approximately normally distributed, non-parametric analyses, i.e., Mann-Whitney U tests, were performed. For normally-distributed variables, t-tests or analyses of variance (ANOVA) were calculated. If a demographic variable differed by group and correlated significantly with a normally-distributed dependent measure, it was included as a covariate in subsequent analyses.
For the mIGT, ANOVA was performed on two variables of interest (Pass Bad Decks, Play Good Decks) with initial models evaluating the between-subject effect of group (SDI, CTL) and within-subjects effects of time (1,2) and type of feedback (magnitude, frequency) as well as all interactions. Non significant interactions were removed sequentially beginning with the three-way group by time by type interaction, followed by the most non-significant two-way interaction, etc., and the model was re-run after each removal, until only main effects and significant interactions remained in the final model. For finer grained qualitative analysis of behavior change, cumulative Pass response fractions as a function of deck and card number were calculated and graphed. Specific correlations between our primary measure of negative reinforcement learning (Pass Bad 2) and other dependent measures were performed using Pearson r for normally distributed variables and Spearman's rho for variables that were not. Analyses were performed with SPSS (SPSS, 2010).
There were no differences in gender or IQ. (See Table 3.) The groups differed in age [t(56)=−2.32, p<.025] and education [t(56)=−3.31, p<.002]; CTL were older (Mean=37.29, SD=8.5) than SDI (Mean=32.53, SD=7.08) and had more years of education (Mean=13.18; SD=1.8) than SDI (Mean=11.5, SD=2.14). Correlations between these variables and the dependent measures were performed. Age correlated with No Response-1 on the mIGT (rho=−.344, p<.01). Education correlated with No Response-1 (rho=−.270, p<.05), No Response-2 (rho=−.454, p<.005), and BIS (r=−.440, p<.001). Education was entered as a covariate in analysis of BIS. `No Response' variables were not normally-distributed and were analyzed with Mann-Whitney U tests.
ANOVA on Passing Bad Decks with group as the between-subjects variable and time (first half/second half) and feedback type (magnitude/frequency) as the within-subjects variables revealed a three-way interaction (group × feedback type × time) [F(1,56)=7.75, p=.007]. To determine which factors were driving the interaction, subsequent analyses compared groups over time on feedback type separately.
ANOVA on number of Passes on the Bad Magnitude deck revealed a group × time interaction [F(1,56)=9.67, p=.003]. CTL passed cards on the bad magnitude deck significantly more during the second half of the task [Mtime1=8.07, (SD=2.14) vs. Mtime2=12.21, (SD=6.30)] while SDI did not [Mtime1=6.67, (SD=1.90) vs. Mtime2=6.67, (SD=4.48)], suggesting greater learning via negative reinforcement in CTL. (See Figures 1 and and22.)
ANOVA of Play responses revealed no interactions among group, time, and/or feedback type effects; the only significant difference was a main effect of time [F(1,56)=7.53, p=.008] with fewer good cards played in the second half [Mtime2=32.72 (SD=7.19)] compared to the first half [Mtime1=34.67 (SD=5.22)]. (See Table3.)
SDI had more No Responses than CTL during the first half (Mann-Whitney U=285.5, p=.034) and second half of the mIGT (Mann-Whitney U=257.0, p=.010). (See Table 3 for medians, means, and standard deviations.)
ANCOVA with education as the covariate revealed a significant group difference [F(1,54)=20.36, p<.001], with SDI obtaining higher scores (Mean=75.31, SD=12.3) than CTL (Mean=59.60, SD=7.4).
No group differences were found.
After using Johnson and Bickel's (2008) procedure to exclude subjects with nonsystematic data, we analyzed data for 20 SDI and 18 CTL, using the k parameter as the measure of rate of discounting. A Mann-Whitney U test revealed a significant group difference, with the SDI discounting at a higher rate than CTL (U=96, p=.014). Because 19 subjects were excluded for the hyperbolic curve analysis, we also evaluated group differences of the entire sample utilizing area under the discounting response curve (AUC; Myerson et al., 2001). After normalizing delay and subjective values, a t-test revealed that the SDI group had a significantly lower AUC, consistent with greater discounting, than CTL (t(1,47.5)=−2.24, p=.03).
No group difference was found for perseverative errors.
Number of cards Passed on the Bad Decks during the second half of the mIGT (Pass Bad 2) as the primary measure of negative reinforcement learning was correlated with the four other measures: BIS, BART, DD, and WCST. No significant correlations were found between Pass Bad 2 and BART, DD, or WCST. A correlation was found between Pass Bad 2 and BIS in the SDI group (r=−.439, p=.017), but this correlation was not significant when Bonferroni correction was applied.
The purpose of the present study was to assess SDI performance on a decision making task that evaluated the influence of negative reinforcement on decision making behavior. The mIGT required subjects to actively press a “pass” a button on bad decks as well as to actively press a “play” button on good decks in order to maximize long-term gain.
Increasing Passing behavior on bad decks was the way to avoid significant losses. This is different from the IGT in which participants are “punished” for Playing bad decks, and learning involves a decrease in Plays. In our task, participants were forced to confront each deck repeatedly, thus enabling us to evaluate their learning to Pass (or say “No”) on the bad decks. An analogy in the real world would be saying “no” to an alcoholic beverage before driving a car. This behavior is neither rewarded nor punished, but rather it occurs to avoid potentially adverse consequences such as an accident or ticket for driving under the influence (DUI).
Our findings revealed an interesting interaction in which SDI did not increase “passing” on the bad deck over time (learn to say “no”) when feedback was based exclusively on the magnitude of gain/loss, while CTL did. Neither group increased Passing on the Bad Frequency deck, although there was a trend in that direction. Our results are partially consistent with van den Bos et al. (2006) who found that manipulating magnitude significantly influenced control subjects' behavior on the original IGT. Magnitude feedback may require more working memory to track the sequence of payouts for each deck. If SDI are impaired in working memory, this might explain the group difference.
The overall expected outcome value, the variable thought to lead to successful performance on the IGT, was identical for the frequency and magnitude decks; thus our findings in the control group suggest that expected outcome value was not the only contributing factor to task performance. This finding is consistent with previous studies that explored individual deck selection in the original IGT reporting that controls differed in their selection of one bad deck over the other (Dunn et al., 2006; Lin et al., 2007).
Although the number of No Responses was small in both groups, SDI failed to respond within the 1.2 second time limit significantly more often than controls. This could reflect an inability to resolve conflict over a pre-potent response, difficulty focusing attention consistently, and/or less interest in the task among SDI. The number of no responses was small and the data were highly skewed, so we did not have the power to examine the number of no responses for each deck separately.
Impulsivity is a trait implicated in the pathogenesis and maintenance of addictive disorders (Ersche et al., 2010; Thompson et al., 2006). Similar to Sweitzer et al. (2008) using the original IGT, we found a correlation between BIS and mIGT in SDI with higher BIS scores associated with poorer mIGT performance. Impulsivity among SDI may have contributed to failing to learn to Pass on the Bad Decks. This would be consistent with Franken et al.'s (2008) findings that impulsivity was related to decision making task performance only when there was a learning component.
Consistent with Reynolds (2006), we found steeper rates of DD in SDI but performance did not correlate with mIGT. In contrast, other studies have found an association between DD and IGT (Monterosso et al., 2001; Sweitzer et al., 2008).
BART, a risk taking task, has previously differentiated SDI from controls (Crowley et al., 2006). In our study, while mean pumps was higher in SDI than controls, the difference was not significant. This is consistent with Hammers and Suhr (2010), who found poorer IGT performance in substance abusing college students compared to controls, but did not find group differences on the BART. As expected and consistent with research using the original IGT (Bechara et al., 1994; Brand et al., 2006) performance on the mIGT was not related to IQ or the Wisconsin Card Sorting Test.
In general, our findings regarding the relationships among the mIGT and other commonly used measures such as DD and BART coincide with the findings of previous studies with the original IGT, but our inclusion of smokers in the control group may have weakened the likelihood of finding group differences. Heavy smokers show higher DD rates than non-smoking controls (Businelle et al., 2010) and may take more risks on BART (Lejeuz et al., 2003), although Dean et al. (2011) did not find increased risk taking in smokers after controlling for other substance abuse and psychiatric disorders. Repeating the analyses with the seven CTL removed did not change the results.
Although not intended, our design was insensitive to positive reinforcement. Our major goal was to examine effects of negative reinforcement, but we did not anticipate the finding that both groups showed a decrease in playing good decks over time. Our design may have resulted in a “ceiling” effect at the beginning of the task when participants tended to play cards from all decks as an investigative strategy; this tendency was also noted by Peters and Slovic (2000) when they first developed the Play/Pass variant. As a result, there was little room for an increase in playing any given deck as the task proceeded. Both groups played approximately two fewer cards in the second half compared to the first. It is not known whether the decrease is clinically meaningful.
Our design of single deck presentation per trial with the choice of playing or passing has the advantage of exposing all subjects to the four decks equally, removing search strategy as a potential factor in achieving success on the task. This method may have made certain distinctions more difficult, however. For example, frequency of gain and loss on a particular deck of the original IGT is easily ascertained if one focuses on that deck exclusively by selecting it on consecutive trials. Presenting decks in a fixed order where one deck is always followed by another deck may have made distinguishing the decks on the basis of frequency or magnitude more difficult. Therefore, results from our study cannot be directly compared to the original IGT or other modified versions.
Our findings suggest that negative reinforcement is a valuable construct to study in substance dependent individuals. SDI did not change their responding when presented with large magnitude losses, while CTL did. Further research is needed to determine if this insensitivity to the magnitude of negative reinforcers generalizes to drug-related phenomena. Behavioral management strategies remain the most effective treatments for cocaine and stimulant dependence. Consequently, ascertaining the effects of frequency and magnitude as potentially salient types of negative reinforcers deserves further study and may contribute to behavioral treatments of addiction.
We thank the ARTS for its support, particularly staff from the programs who helped with subject recruitment.
Role of Funding Source Funding for this study was provided by NIDA Grants R21 DA 024104 and R01 DA 027748; NIDA had no further role in study design; in the collection, analysis and interpretation of data; in the writing of the report; or in the decision to submit the paper for publication.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Author Disclosure Form – Negative Reinforcement Learning is Affected in Substance Dependence
Contributors Drs. Jody Tanabe, Eric Claus, Marie Banich, Thomas Crowley, and Laetitia Thompson designed the study and Dr. Tanabe wrote the protocol. Drs. Thompson and Tanabe managed the literature searches and summaries of previous related work. Theodore Krmpotich cleaned and managed the data base. Drs. Tanabe, Thompson, David Miller, and Susan Mikulich-Gilbertson, and Mr. Krmpotich undertook the statistical analysis. Drs. Thompson, Tanabe, Claus, and Banich and Mr. Krmpotich were involved in the interpretation of the data. Dr. Thompson wrote the first draft of the manuscript, and all coauthors edited the manuscript. All authors contributed to and have approved the final manuscript.
Conflict of Interest Dr. Thomas Crowley receives travel support from the American Psychiatric Association to participate in revising the Diagnostic and Statistical Manual of Mental Disorders and from the National Institute on Drug Abuse for serving on its National Advisory Council. All other authors declare that they have no conflicts of interest.