PSS task performance
We examined the performance of patients and controls on two probabilistic learning and transfer tasks. In the first version, using Hiragana characters, patients exhibited profound impairment in the acquisition of probabilistic contingencies. This seemed to reflect impairments in the use of feedback to modify behavior on a trial-by-trial basis, consistent with models of PFC/OFC dysfunction. In the second experiment (using clip-art stimuli), patients showed impairment in the early acquisition stages of the task, but demonstrated eventual learning of the easiest (80:20) discrimination. However, even the patients who learned the 80:20 discrimination showed a less robust preference than controls for the 80% stimulus when it was presented in new pairings. Patients exhibited normal performance in avoiding the least-frequently rewarded stimulus when it was presented in novel pairings, successfully generalizing from repeated exposure to negative outcomes.
Thus, patients did not exhibit a simple failure in generalization, but rather more selective difficulty in learning from positive outcomes. This dissociation cannot be easily explained by the presence of generally lower levels of neuropsychological performance in patients, relative to controls, as no standard neuropsychological measure correlated significantly with probabilistic contingency acquisition, and none of the main effects or interactions from the ANOVAs for acquisition data were modified substantially by the inclusion of WTAR scores as a covariate in ANCOVAs.
Within the context of the computational model described above, the deficit exhibited by patients may result primarily from dysfunction of the “direct” (“Go”) BG pathway linking the dorsal striatum and the globus pallidus interna, which is thought to be driven largely by activity at D1 receptors, whereas the intact “NoGo” learning exhibited by patients can be interpreted as evidence of preserved function of the “indirect” BG pathway, which is driven largely by activity at D2 receptors (
Aubert et al 2000;
Hernandez-Lopez et al 2000). Thus, it is possible that SZ patients have a compromised ability to use dopamine bursts to drive behavior in habit-learning type tasks, but a surprisingly intact ability to use to use momentary cessations of dopamine cell firing (“dips”) that may signal the absence of expected reinforcement. A possible consequence of disrupted reward-driven learning, as
McClure et al. (2003) have noted, can be the inappropriate attribution of incentive salience to a stimulus. Consistent with this formulation,
Juckel et al (2006b) found evidence of reduced activity in the BG during reward processing and anticipation in unmedicated SZ patients.
The patient deficits in early learning and failure to generalize from positive feedback after training may represent two different types of DA dysfunction. In the model, cortical DA hypofunction should be expected to interfere with rapid learning of relative reward value of different responses—precisely what we observed in the first two blocks of both experiments. This behavioral finding is consistent with a broad body of evidence suggestive of PFC dysfunction in schizophrenia (
Heinz et al 2003;
Weinberger et al 2001), with D1 hypofunction widely considered to be a critical contributor to the deficit (
Goldman-Rakic 1994;
Weinberger 1987). The finding of a significant correlation between the early acquisition of probabilistic contingencies and ratings of negative symptoms, also thought to reflect prefrontal dysfunction (
Kirkpatrick and Buchanan 1990), further supports the claim that intact PFC is critical for the rapid learning of changing reinforcement contingencies.
Given the large body of work showing impaired use of error information in SZ, what might explain the lack of a NoGo-learning deficit in patients in this experiment? Many feedback-driven learning tasks, such as the WCST, require the on-line representation of feedback in WM and its use in resolving response conflicts, appearing to rely on regions of prefrontal cortex. The selection task used here involves learning probabilistic discriminations over many trials by integrating the overall frequency of reinforcement. Rapid updating after unexpected feedback would serve to actually impair performance. One possibility is that a dissociation exists between the ability to rapidly use single instances of negative feedback versus the more gradual learning of the most and least advantageous choices may reflect the properties of different learning systems as suggested by several theoretical accounts (
Rolls 2004;
Schoenbaum and Roesch 2005).
Importantly, our patients did not exhibit a complete inability to learn the three training pairs, especially in Experiment 2, where most patients reached criterion in all three reinforcement conditions. Rather, patients showed evidence of delayed acquisition of probabilistic reward contingencies. As noted above, several findings (
Keri et al 2000;
Weickert et al 2002) indicate that SZ patients have a relatively intact ability to learn in this fashion. Importantly, these studies did not necessarily demonstrate fully normal performance compared with controls, but only roughly normal improvement from initial, impaired performance.
How might cessations of dopamine activity still be effective in providing a learning signal in schizophrenia? One possible answer is that D2-receptors are supersensitive in patients with chronic schizophrenia (
Curran et al 2004;
Seeman et al 2005). It is also possible that the D2-blocking effects of antipsychotic medications (when chronically administered) actually increase activity and plasticity in the “indirect” BG pathway (
Centonze et al 2004), facilitating NoGo learning (
Amtage and Schmidt 2003). We do not argue that D2-blockade might be beneficial for reward processing and procedural learning, in general. Indeed, the results of several recent studies (
Beninger et al 2003;
Juckel et al 2006a) indicate that D2-blockade, especially by conventional antipsychotics, has an overall detrimental effect. We suggest only that blocking D2-transmission may benefit
NoGo learning, thought to depend on “dips” in DA levels.
One important issue to address in future studies would be the effects that different antipsychotic medications have on reward- and punishment-drive learning. Antipsychotic medications vary widely in their affinities for different receptor types, with second-generation antipsychotics generally having weaker lower affinities for D2-receptors than first-generation antipsychotics, but greater affinities for D1 and serotonin receptors (
Kapur and Seeman 2001). Unfortunately, we were not able to study medication effects in this study in a systematic way, as medication effects are fully confounded with the patient clinical characteristics that lead clinicians to choose specific drugs. For clinical historical reasons, subgroups of patients on similar medication regimens are not well-matched in terms of demographics or symptom profiles.
Furthermore, less than 20% of patients at MPRC are taking one of the first-generation antipsychotics. Patients taking conventional agents have largely chosen to do so because they are doing well clinically and are unwilling to risk the instability that would follow a treatment change. Only 8 of our patients were on typical neuroleptics alone, and these patients, in fact, tended to be the youngest (mean age=39.71±3.39) and most treatment-responsive patients in our sample (mean BPRS=28.00±4.59). They also experienced the least severe negative symptoms (mean SANS=22.63±5.46). Therefore, it was impossible in our study to determine whether typical neuroleptics, relative to second-generation antipsychotics, had a more severe impact on BG function, as
Beninger et al (2003) and
Juckel et al (2006a) have shown. The unique effects that individual antipsychotic medications have on aspects of reinforcement learning need to be studied in the context of controlled clinical trials.
While the study of unmedicated patients would also certainly be of interest in order to inform our understanding of the extent to which the deficits documented here can be attributed to illness as opposed to treatment effects, we argue that the current results observed in medicated patients are clinically-relevant. Almost all patients with schizophrenia are treated with antipsychotics that block D2 receptors, and the clinical challenge facing the field is to develop novel treatment approaches to the reward processing deficits observed in patients receiving medications that block DA receptors.
Nonetheless, our results indicate that the primary BG-dependent learning impairment in SZ is a deficit in Go learning (learning in response to positive feedback), which may not be remedied by D2-blockade. Furthermore, the results of
Frank et al. (2004) indicate that treatment with a dopamine precursor reverses a Go learning deficit. Given that dopamine agonists have been associated with mild improvement of various cognitive impairments in SZ (
Barch and Carter 2005;
Goldberg et al 1991), it is possible that performance on probabilistic reinforcement learning tasks would benefit from treatment with dopamine agonists, with the risk of exacerbating positive symptoms (
Levy et al 1993;
van Kammen et al 1982). Thus, the key to adequately treating both the positive symptoms and cognitive deficits of schizophrenia may lie in the relative agonistic and antagonistic properties of drugs acting at different dopamine receptor types in the BG-PFC action selection system.