|Home | About | Journals | Submit | Contact Us | Français|
Rewards and punishments may make distinct contributions to learning via separate striato-cortical pathways. We investigated whether fronto-striatal dysfunction in schizophrenia (SZ) is characterized by selective impairment in either reward- (Go) or punishment-driven (NoGo) learning.
We administered two versions of a Probabilistic Selection task (Frank et al., 2004) to 40 SZs and 31 controls, using difficult-to-verbalize stimuli (Exp 1) and nameable objects (Exp 2). In an acquisition phase, participants learned to choose between three different stimulus pairs (AB, CD, EF) presented in random order, based on probabilistic feedback (80%, 70%, 60%). We used ANOVAs to assess the effects of group and reinforcement probability on two measures of contingency learning. To characterize the preference of subjects for choosing the most rewarded stimulus and avoiding the most punished stimulus, we subsequently tested participants with novel pairs of stimuli involving either A or B, providing no feedback.
Controls demonstrated superior performance during the first 40 acquisition trials in each of the 80% and 70% conditions versus the 60% condition; patients showed similarly impaired (<60%) performance in all three conditions. In novel test pairs, patients showed decreased preference for the most rewarded stimulus (A; t=2.674; p=0.01). Patients were unimpaired at avoiding the most negative stimulus (B; t=0.737).
The results of these experiments provide additional evidence for the presence of deficits in reinforcement learning in SZ, suggesting that reward-driven (Go) learning may be more profoundly impaired than punishment-driven (NoGo) learning.
Cognitive deficits are widely recognized as central features of schizophrenia (SZ), (Barch 2005; Wilk et al 2005). Of the impairments documented in the literature, deficits involving the use of feedback to guide decision-making and learning are highly reliable and sometimes clinically dramatic. Patients' poor performance on many of these tasks like the Wisconsin Card Sort test (WCST; Goldberg et al 1987) and conditional associative learning paradigms (Gold et al 2000) and often interpreted as evidence of dysfunction in either dorsolateral regions of prefrontal cortex (DLPFC; Weinberger et al 1986), or lateral and medial areas of ventral prefrontal cortex (Boettiger and D'Esposito 2005), also called orbitofrontal cortex (OFC).
In contrast, several (but not all) studies of procedural, or habit, learning (Keri et al 2000; Keri et al 2005; Weickert et al 2002) have documented surprisingly normal learning among SZ patients. These tasks also employ feedback to guide learning, but tend to involve gradual learning of difficult-to-discern probabilistic response-outcome relationships. Both functional imaging and studies of patient populations such as Parkinson's disease (PD) suggest that the basal ganglia (BG) play a critical role in this gradual learning of stimulus-response mappings (Knowlton et al 1996; Seger and Cincotta 2005).
Explaining the differential impairment of these learning processes in schizophrenia is difficult, given the evidence that brain dopamine (DA) systems are known to play a critical role in both PFC-mediated and BG-dependent reinforcement learning processes. One possible explanation for the relative sparing of habit learning in SZ is that some DA pathways in the BG are largely intact. To investigate this question, we adopted the experimental methods and computational framework of Frank and colleagues (2004) who examined learning performance in a group of PD patients studied both on and off L-Dopa. Frank et al. (2004) used a Probabilistic Stimulus Selection (PSS) task, where subjects are initially presented with three different stimulus pairs (AB, CD, EF) and have to learn to choose the most-frequently reinforced stimulus from each pair using probabilistic feedback (see Figure 1). After achieving the learning criterion in this “acquisition phase,” subjects are then presented with the original stimuli in novel pairings in a “post-acquisition test phase.” This design provides a means of studying the contributions of positive and negative feedback to probabilistic learning, in that it enables the assessment of whether subjects have a bias for choosing frequently-reinforced stimuli, or for avoiding frequently-punished stimuli.
Frank et al. (2004) demonstrated that unmedicated PD patients showed considerable impairment in learning driven by positive feedback, when compared with their performance in the medicated state. Importantly, their learning driven by negative feedback, was superior to that in the medicated state. These results were interpreted in the context of computational models of reward-based learning (Frank 2005; Frank et al 2001) that formalize ideas about the role of dopaminergic signaling in the BG. These signals are thought to communicate information about reward contingencies in the environment that guide action selection and learning. A degree of functional segregation characterizes pathways in the BG, such that activity in the “direct” pathway sends a “Go” signal to facilitate the execution of a response considered in cortex, whereas activity in the “indirect” pathway sends a “NoGo” signal to suppress inappropriate responses (see Figure 2; Centonze et al 2001; Nishi et al 1997). Furthermore, dopaminergic innervation of these pathways is thought to be relatively distinct, such that the direct pathway is excited via D1 receptors by bursting activity in dopamine neurons, while the indirect pathway is tonically inhibited via D2 receptors. Phasic DA bursts are thought to support “Go” learning to reinforce rewarding choices by enhancing neural activity and plasticity in the direct (D1) pathway following reinforcement and enhancing inhibition of the indirect (D2) pathway. Transient cessations of DA cell firing, following negative feedback, are thought to have the opposite effect: they release inhibition of the indirect pathway and cause reductions of activity in the direct pathway, thereby supporting “NoGo” learning to avoid unrewarding choices (Frank 2005; O'Reilly and Frank 2006). These authors concluded that, in unmedicated PD patients, DA depletion attenuates the impact of DA bursts. In medicated PD patients, the impact of DA “dips” is attenuated due to overall increased levels of synaptic DA.
While learning at the level of the basal ganglia is thought to occur on a gradual time scale, Go and NoGo signals emanating from the BG are hypothesized to impact the rapid learning of changing reinforcement contingencies in the frontal cortex, via parallel striato-cortical circuits, by updating WM representations required for representing differences in relative magnitude of reinforcement online (Frank et al 2001; O'Reilly and Frank 2006). This idea extends earlier computational work emphasizing the role of phasic DA in driving the updating of PFC WM representations (Braver and Cohen 2000; Cohen et al 1996). The idea that OFC figures critically in the online representation of reward and punishment magnitudes, and thus subserves a kind of working memory, is supported by recent evidence (Rolls et al 2003; Schoenbaum and Roesch 2005). Simulations by Frank and Claus (2006) have shown that models capable of instantaneously updating WM representations of reward value in OFC and using them to bias behavior via efferent projections to the BG and motor cortical areas show rapid acquisition of probabilistic contingencies, whereas models with OFC damage exhibit much slower learning, because they can only acquire probabilistic contingencies via changes in synaptic weights in the BG.
This framework has the potential to offer a differentiated account of feedback-driven learning deficits in SZ. Whereas PD involves mainly BG hypofunction brought on by dopamine depletion, SZ may be characterized by DA dysfunction in both PFC and the BG. While the severity and consequences of PFC hypofunction in schizophrenia appear to be profound (Weinberger 1987; Weinberger and Berman 1988), BG dysfunction in schizophrenia may be more mild, based on findings of relatively intact procedural learning (Keri et al 2005; Kern et al 1997; Weickert et al 2002).
We tested three specific hypotheses by applying the paradigm used by Frank and colleagues (2004) in their study of PD patients. We expected the relative severity of prefrontal cortical vs. basal ganglia dysfunction in schizophrenia to have two specific effects on the performance of probabilistic reinforcement learning tasks. First, due to PFC hypofunction, we expected that patients would show marked deficits in the initial learning of the most favorable stimulus in each pair, which is critically dependent on the rapid updating of reward value representations. Second, based on prior studies showing relatively intact procedural learning in SZ, we expected that patients would show delayed but eventual acquisition of the stimulus pairs. Finally, we speculatively hypothesized that SZ patients would show the pattern of reduced “Go” learning seen in unmedicated PD patients. This might occur if the fidelity of burst-driven phasic signaling is reduced in SZ, but the sensitivity of D2 receptors (which are sensitive to small decreases in dopamine; Frank and O'Reilly 2006) is enhanced (Curran et al 2004; Seeman et al 2005).
Forty outpatients with a diagnosis of schizophrenia, based on the Structured Clinical Interview for DSM-IV (SCID-I; First et al 1997), were recruited from the Maryland Psychiatric Research Center (MPRC; Table 1). All patients were clinically stable, as determined by their treating clinician. All patients were tested while receiving stable medication regimens (no changes in type or dose within 4 weeks of study). Most patients (28/40) were on antipsychotic monotherapy, while twelve patients were taking two antipsychotics (almost all clozapine with risperidone; Table 1).
Thirty-one healthy control subjects, recruited through newspaper advertisements and random phone number dialing, participated in the study. They were extensively screened for Axis I and II disorders using the SCID-I (First et al 1997) and the Structured Interview for DSMIII-R Personality Disorders (SIDP-R; Pfohl et al 1989). Subjects were also screened for family history of psychosis and medical conditions that might impact cognitive performance, including drug use. All control subjects were free of any significant personal psychiatric and medical history, had no history of severe mental illness in first-degree relatives, and did not meet criteria for current substance abuse or dependence.
After explanation of study procedures, all subjects provided written informed consent. Before signing consent documents, patients had to demonstrate adequate understanding of study demands, risks, and means of withdrawing from participation in response to structured probe questions. All subjects were compensated for study participation.
Data collection occurred through a battery of standard and experimental neuropsychological tests. Tests included measures of word reading, word list learning, and working memory. Patients were also characterized using the Brief Psychiatric Ratings Scale (BPRS; Overall and Gorman 1962), the Scales for the Assessment of Negative Symptoms (SANS; Andreasen 1984), and the Calgary Depression Scale (CDS; Addington et al 1992).
We used the PSS task from Frank et al. (2004), described above (see Figure 1). Blocks consisted of 60 trials (20/condition), in which three different stimulus pairs (AB, CD, EF) were presented in pseudo-random order, and the acquisition phase was terminated when participants achieved criterion on all three stimulus pairs in the same block, or when subjects had completed 360 trials, whichever occurred first. The discontinuation criterion was 65% correct in the AB (80:20) condition, 60% in the CD (70:30) condition, and 50% in the EF (60:40) condition. This liberal criterion was intended to prevent over-learning of contingencies prior to the post-acquisition test phase.
In the post-acquisition test phase, no feedback was provided. To examine of whether subjects had preferentially learned through the use of positive or negative feedback, we analyzed performance on the 32 test trials involving novel combinations of stimulus pairs involving either an A or a B. All subjects who failed to demonstrate initial learning of the AB contingency by choosing A at least three times in the four test trials (8 of 40 patients and 4 of 28 controls), were excluded from the analysis of transfer performance.
We performed two experiments using this paradigm. All 40 patients completed both experiments, while 28 of 31 controls who completed Experiment 1 also completed Experiment 2. In Experiment 1, we used the Hiragana characters from Frank et al. (2004). In Experiment 2, we used clip art images of common objects (flashlight, clock, etc). The second experiment was initiated after approximately half of the participants had finished the first experiment, because we had found extremely poor learning in the patient group, with few meeting criterion for the transfer analysis. Using images of common objects was intended to address the possibility that poor performance was the result of difficulty encoding the Hiragana characters. Thus, all subjects received the Hiragana version first, followed by the clip art version, with the two testing occasions separated by up to 9 months (Mean = 3.18 months). In Experiment 2, all subjects performed at least two training blocks in order to facilitate examination of early acquisition, with the same discontinuation criteria applied to subsequent blocks.
In comparing the acquisition of contingencies between patients and controls, we performed two-way analyses of variance (ANOVAs), as well as appropriate post-hoc tests, on subjects' proportion of correct responses in the first two blocks of the acquisition phase of each experiment, with factors of group and reinforcement probability. For Experiment 1, proportion-correct scores from the first acquisition block (first 20 trials of each type) were carried forward to the second block for 8 controls and 2 patients who reached the acquisition criterion after only one block. In the results, we term the first two blocks (the first 40 trials of each type) of the acquisition phase “early acquisition”.
The learning of probabilistic contingencies was also assessed at the post-acquisition test using ANOVAs, with factors of group and reinforcement probability. These scores reflect learning of contingencies after up to six blocks of training, rather than just the first two. Whereas acquisition was assessed using the three training pairs repeated during the test phase, group differences in transfer performance in the test phase were assessed using t-tests for measures of Go- and NoGo–learning generated from subjects' cumulative test scores on the four novel pairs involving A (Go) and the four novel pairs involving B (NoGo).
We interpret performance on the post-acquisition test items to reflect the gradual, habit-like acquisition of contingencies, largely dependent on the BG (Frank et al 2006; Frank et al 2004). A second kind of reinforcement learning involves the (PFC-dependent) ability to represent and integrate feedback online to rapidly learn contingencies. In order to directly assess the contribution of online feedback integration to the rapid acquisition of probabilistic contingencies, we computed “win-stay” and “lose-shift” scores for each reinforcement condition in Block 1 (i.e., early in the acquisition phase). We computed “win-stay” scores by computing the proportion of repeated stimulus selections in a given condition that followed reinforced choices. We computed “lose-shift” scores by computing the proportion of switched stimulus selections in a given condition that followed non-reinforced choices. We then generated total “win-stay” and “lose-shift” scores by averaging scores across conditions for each, and between-group differences in mean scores were assessed using t-tests. We also computed effect sizes (Cohen's D scores) to characterize between-group differences in means by dividing each mean difference by the pooled standard deviation.
We used Pearson correlation analyses to assess relationships between probabilistic selection performance and three types of characterizing variables: symptom ratings, antipsychotic medication doses (converted to haloperidol equivalent units; see Supplementary Table 1), and standard neuropsychological measures and probabilistic selection performance. To do so, we created a summary measure of probabilistic selection performance by averaging the proportion of correct responses from all three conditions in the first two acquisition blocks. To separately assess psychotic and disorganized symptoms from the BPRS, sub-scores were grouped into reality distortion, disorganization, negative symptom, and anxiety/depression clusters based on the 4-factor model of McMahon et al. (2002).
In our first experiment, patients demonstrated dramatic impairment in the acquisition of probabilistic contingencies, whereas healthy subjects demonstrated clear learning of the two most-frequently rewarded stimuli. Two-way ANOVAs for data from both early acquisition and post-acquisition test revealed main effects of group (see supplementary text and Supplementary Figure 1 for details), indicating that patients performed worse than controls in Experiment 1, regardless of reinforcement condition. Because fully 50% of patients failed to reach criterion, however, we did not analyze the transfer results from Experiment 1 due to concern that the patients who did meet criterion were unrepresentative of the total group. Thus, Experiment 1 provided robust evidence of marked reward processing impairments in patients, but we were unable to address whether this impairment resulted from a more selective deficit in the processing of positive or negative outcomes.
Our entire sample of subjects performed better during the acquisition phase in Experiment 2 than in Experiment 1, reflecting greater ease of encoding verbalizable stimuli (see supplementary online data). Proportions of correct responses given by subjects in the first two blocks of the acquisition phase and the post-acquisition test phase in Experiment 2 are shown in Figure 3. The ANOVA for acquisition measures from Experiment 2 revealed no main effect of group [F(1,66)=1.43], a main effect of reward contingency [F(2,132)=10.27; p<0.001], and a significant group × reward contingency interaction [F(2,132)=4.54; p=0.012]. Post-hoc tests revealed that controls performed better than patients in the 80% reward probability condition [t(132)=2.32; p=0.022], while there was a trend in the direction of controls performing better than patients in the 70% reward probability condition [t(132)=1.80; p<0.10]. In the 60% condition, there was a trend for the patients to perform better than controls [t(132)=1.78; p<0.10]. Controls in experiment 2 demonstrated robust early acquisition performance in the 80% and 70% conditions clearly superior to that in the 60% condition (t>2.75 for both the 80%-60% and 70%-60% comparisons). In contrast, patients showed no difference in performance among the three conditions, with <70% accuracy in all conditions.
The ANOVA of training-pair performance in the post-acquisition test phase of Experiment 2 (see Figure 3B) also failed to show a main effect of group [F(1,66)=1.92; p>0.010], although it revealed a main effect of reward contingency [F(2,132)=10.00; p<0.001], and a significant group × reward contingency interaction [F(2,132)=3.60; p=0.030]. This interaction resulted from the fact that both groups performed similarly on the easiest [80%; t(132)=0.85] and most difficult [60%; t(132)=1.07] pairs, while controls continued to outperform patients on the 70% pairs [t(132)=3.06, p=0.003]. Within-group analyses revealed that controls performed significantly better on AB and CD pairs than on the EF pairs [t(132)>3 for both], whereas, patients only showed significantly better performance on AB pairs relative to EF pairs [t(132)=2.84, p<0.005; t(132)=0.24 for the 70:30 vs. 60:40 comparison]. Thus, patients were only able to discriminate the easiest from the hardest pairs.
As a test of the influence of general neuropsychological functioning on experimental task performance, we used subjects' scores on the Wechsler Test of Adult Reading (WTAR; the standard neuropsychological measure showing the strongest association with probabilistic learning scores) as a covariate in analyses of covariance (ANCOVAs), with factors of group and reinforcement probability. We found that, although there was evidence of an association between WTAR scores and experimental task performance, the use of WTAR scores as a covariate in an ANCOVA did not substantially alter the effects of group and reward contingency on task performance (see supplementary data). These results argue against the possibility that group differences in experimental task performance primarily reflect differences in global neuropsychological functioning.
Feedback had a greatly reduced impact on the subsequent choices of patients, relative to those of controls, in Block 1 of the acquisition phase. For data from Experiment 1, t-tests revealed that patients were much less likely to repeat reinforced stimulus selections (“win-stay”) than controls [t(69)=4.06, p<0.001, Cohen's D = 0.969; see Supplementary Figure 2]. Patients were also much less likely than controls to choose the alternative stimulus after being told they were incorrect [“lose-shift”; t(69)=3.74, p<0.001, D = 0.880]. For data from Experiment 2, t-tests revealed trends for both of these effects [t(66)=1.90 for “win-stay” comparison, D = 0.482; t(64)=1.79 for “lose-shift”; D = 0.436; see Figure 4A]. Thus, we observed large effect sizes for Experiment 1, and medium effect sizes for Experiment 2, indicating that patients still had difficulty using feedback to rapidly modify choice behavior.
In Experiment 2, we included 32 patients (80%) and 24 controls (86%) in the transfer analysis who met the 75% correct criterion on the AB test trials. A t-test revealed that controls more consistently chose A (the most-frequently rewarded stimulus) when presented in novel pairs than did patients [82±3%; vs. 70±3%; t(54)=2.852; p=0.01; see Figure 4B]. This result is consistent with our operationalization of impaired Go learning. By contrast, patients (70±3%) did not show a decreased avoidance of the least frequently rewarded stimulus when presented in novel pairs, when compared with controls [72±4%; t(54)=0.397], consistent with our operationalization of intact NoGo learning.
Note that the lose-shift results described above appear to contradict this evidence of intact NoGo learning. However, if one views the measures as assessments of two different kinds of feedback-driven learning, it is entirely plausible that a between-group difference might be evident in reward- or punishment-driven learning in one case, but not the other. We interpret this result to indicate that SZ patients can gradually integrate negative outcomes to generalize and avoid poor choices over many trials (BG-dependent), whereas they are impaired at the online/cortical-dependent use of a single instance of negative feedback to modify behavior in the very next trial.
Pearson correlation analyses between performance measures from the PSS paradigm and clinical and standard neuropsychological ratings revealed a moderate relationship between total proportion correct during acquisition phase and total score on the SANS (r=−0.372, p=0.020). Correlations between our combined measure of probabilistic selection performance and total scores on the Calgary Depression Scale (−0.063) and BPRS (−0.161) did not achieve significance. Only the negative symptom sub-score of the BPRS correlated with the total proportion correct during early acquisition, at the trend level (r=−0.299, p=0.061). None of the reality distortion (r=0.067), disorganization (r=0.007), or depression (r=−0.021) sub-scores of the BPRS showed any evidence of a systematic relationship with our combined measure of probabilistic selection performance. No correlations between PSS performance and standard neuropsychological measures were significant, with Pearson coefficients ranging from 0.197 for our spatial short-term memory span measure to 0.242 for the Wechsler Test of Adult Reading (p > 0.10). This result further suggests that patients' poor performance on the probabilistic learning task is not simply a product of impaired neuropsychological performance, in general.
We examined the performance of patients and controls on two probabilistic learning and transfer tasks. In the first version, using Hiragana characters, patients exhibited profound impairment in the acquisition of probabilistic contingencies. This seemed to reflect impairments in the use of feedback to modify behavior on a trial-by-trial basis, consistent with models of PFC/OFC dysfunction. In the second experiment (using clip-art stimuli), patients showed impairment in the early acquisition stages of the task, but demonstrated eventual learning of the easiest (80:20) discrimination. However, even the patients who learned the 80:20 discrimination showed a less robust preference than controls for the 80% stimulus when it was presented in new pairings. Patients exhibited normal performance in avoiding the least-frequently rewarded stimulus when it was presented in novel pairings, successfully generalizing from repeated exposure to negative outcomes.
Thus, patients did not exhibit a simple failure in generalization, but rather more selective difficulty in learning from positive outcomes. This dissociation cannot be easily explained by the presence of generally lower levels of neuropsychological performance in patients, relative to controls, as no standard neuropsychological measure correlated significantly with probabilistic contingency acquisition, and none of the main effects or interactions from the ANOVAs for acquisition data were modified substantially by the inclusion of WTAR scores as a covariate in ANCOVAs.
Within the context of the computational model described above, the deficit exhibited by patients may result primarily from dysfunction of the “direct” (“Go”) BG pathway linking the dorsal striatum and the globus pallidus interna, which is thought to be driven largely by activity at D1 receptors, whereas the intact “NoGo” learning exhibited by patients can be interpreted as evidence of preserved function of the “indirect” BG pathway, which is driven largely by activity at D2 receptors (Aubert et al 2000; Hernandez-Lopez et al 2000). Thus, it is possible that SZ patients have a compromised ability to use dopamine bursts to drive behavior in habit-learning type tasks, but a surprisingly intact ability to use to use momentary cessations of dopamine cell firing (“dips”) that may signal the absence of expected reinforcement. A possible consequence of disrupted reward-driven learning, as McClure et al. (2003) have noted, can be the inappropriate attribution of incentive salience to a stimulus. Consistent with this formulation, Juckel et al (2006b) found evidence of reduced activity in the BG during reward processing and anticipation in unmedicated SZ patients.
The patient deficits in early learning and failure to generalize from positive feedback after training may represent two different types of DA dysfunction. In the model, cortical DA hypofunction should be expected to interfere with rapid learning of relative reward value of different responses—precisely what we observed in the first two blocks of both experiments. This behavioral finding is consistent with a broad body of evidence suggestive of PFC dysfunction in schizophrenia (Heinz et al 2003; Weinberger et al 2001), with D1 hypofunction widely considered to be a critical contributor to the deficit (Goldman-Rakic 1994; Weinberger 1987). The finding of a significant correlation between the early acquisition of probabilistic contingencies and ratings of negative symptoms, also thought to reflect prefrontal dysfunction (Kirkpatrick and Buchanan 1990), further supports the claim that intact PFC is critical for the rapid learning of changing reinforcement contingencies.
The impairment in Go learning observed using the transfer measure may reflect a second, albeit related, abnormality of DA function: excessive DA release in the neostriatum (Abi-Dargham et al 1998; McGowan et al 2004). Several reports (Bertolino et al 2000; Meyer-Lindenberg et al 2002), in fact, point to a systematic relationship between PFC hypofunction and striatal hyperactivity. Based on evidence that tonic DA levels regulate the level of phasic DA release via inhibitory presynaptic D2 autoreceptors (O'Donnell and Grace 1998), it has been proposed that elevated levels of tonic DA in SZ reduce the effectiveness of phasic DA signals (Bilder et al 2004; Grace 2000). Such a mechanism might provide an explanation for why schizophrenia has been associated with both tonic hyperactivity in the BG in blood flow studies (Abi-Dargham et al 1998; McGowan et al 2004) and reduced stimulus-evoked activity in the BG during task performance in MRI studies (Juckel et al 2006b; Kumari et al 2002)
Given the large body of work showing impaired use of error information in SZ, what might explain the lack of a NoGo-learning deficit in patients in this experiment? Many feedback-driven learning tasks, such as the WCST, require the on-line representation of feedback in WM and its use in resolving response conflicts, appearing to rely on regions of prefrontal cortex. The selection task used here involves learning probabilistic discriminations over many trials by integrating the overall frequency of reinforcement. Rapid updating after unexpected feedback would serve to actually impair performance. One possibility is that a dissociation exists between the ability to rapidly use single instances of negative feedback versus the more gradual learning of the most and least advantageous choices may reflect the properties of different learning systems as suggested by several theoretical accounts (Rolls 2004; Schoenbaum and Roesch 2005).
Importantly, our patients did not exhibit a complete inability to learn the three training pairs, especially in Experiment 2, where most patients reached criterion in all three reinforcement conditions. Rather, patients showed evidence of delayed acquisition of probabilistic reward contingencies. As noted above, several findings (Keri et al 2000; Weickert et al 2002) indicate that SZ patients have a relatively intact ability to learn in this fashion. Importantly, these studies did not necessarily demonstrate fully normal performance compared with controls, but only roughly normal improvement from initial, impaired performance.
How might cessations of dopamine activity still be effective in providing a learning signal in schizophrenia? One possible answer is that D2-receptors are supersensitive in patients with chronic schizophrenia (Curran et al 2004; Seeman et al 2005). It is also possible that the D2-blocking effects of antipsychotic medications (when chronically administered) actually increase activity and plasticity in the “indirect” BG pathway (Centonze et al 2004), facilitating NoGo learning (Amtage and Schmidt 2003). We do not argue that D2-blockade might be beneficial for reward processing and procedural learning, in general. Indeed, the results of several recent studies (Beninger et al 2003; Juckel et al 2006a) indicate that D2-blockade, especially by conventional antipsychotics, has an overall detrimental effect. We suggest only that blocking D2-transmission may benefit NoGo learning, thought to depend on “dips” in DA levels.
One important issue to address in future studies would be the effects that different antipsychotic medications have on reward- and punishment-drive learning. Antipsychotic medications vary widely in their affinities for different receptor types, with second-generation antipsychotics generally having weaker lower affinities for D2-receptors than first-generation antipsychotics, but greater affinities for D1 and serotonin receptors (Kapur and Seeman 2001). Unfortunately, we were not able to study medication effects in this study in a systematic way, as medication effects are fully confounded with the patient clinical characteristics that lead clinicians to choose specific drugs. For clinical historical reasons, subgroups of patients on similar medication regimens are not well-matched in terms of demographics or symptom profiles.
Furthermore, less than 20% of patients at MPRC are taking one of the first-generation antipsychotics. Patients taking conventional agents have largely chosen to do so because they are doing well clinically and are unwilling to risk the instability that would follow a treatment change. Only 8 of our patients were on typical neuroleptics alone, and these patients, in fact, tended to be the youngest (mean age=39.71±3.39) and most treatment-responsive patients in our sample (mean BPRS=28.00±4.59). They also experienced the least severe negative symptoms (mean SANS=22.63±5.46). Therefore, it was impossible in our study to determine whether typical neuroleptics, relative to second-generation antipsychotics, had a more severe impact on BG function, as Beninger et al (2003) and Juckel et al (2006a) have shown. The unique effects that individual antipsychotic medications have on aspects of reinforcement learning need to be studied in the context of controlled clinical trials.
While the study of unmedicated patients would also certainly be of interest in order to inform our understanding of the extent to which the deficits documented here can be attributed to illness as opposed to treatment effects, we argue that the current results observed in medicated patients are clinically-relevant. Almost all patients with schizophrenia are treated with antipsychotics that block D2 receptors, and the clinical challenge facing the field is to develop novel treatment approaches to the reward processing deficits observed in patients receiving medications that block DA receptors.
Nonetheless, our results indicate that the primary BG-dependent learning impairment in SZ is a deficit in Go learning (learning in response to positive feedback), which may not be remedied by D2-blockade. Furthermore, the results of Frank et al. (2004) indicate that treatment with a dopamine precursor reverses a Go learning deficit. Given that dopamine agonists have been associated with mild improvement of various cognitive impairments in SZ (Barch and Carter 2005; Goldberg et al 1991), it is possible that performance on probabilistic reinforcement learning tasks would benefit from treatment with dopamine agonists, with the risk of exacerbating positive symptoms (Levy et al 1993; van Kammen et al 1982). Thus, the key to adequately treating both the positive symptoms and cognitive deficits of schizophrenia may lie in the relative agonistic and antagonistic properties of drugs acting at different dopamine receptor types in the BG-PFC action selection system.
This research was made possible by Grant # P30 MH068580-01 and Grant # 1 R24 MH72647-01A1 from the National Institutes of Mental Health. Mary Ramsey, Pablo Diego, Sharon August, and Kimberly Warren assisted with the collection of experimental and characterizing data. These data were presented, in part, at the 61st annual convention of the Society of Biological Psychiatry in Toronto, Ontario, Canada, on May 19th, 2006.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
We have no conflicts of interest, financial or otherwise, to report.