As noted above, it is thought that the basal ganglia play a critical role in the acquisition of skills, habits, procedures and the longer term learning of optimal response choices from feedback.48
The performance of patients with schizophrenia on these types of learning tasks has been examined by a number of investigators, with surprising evidence of normal/close to normal performance on tasks ranging from motor learning using the pursuit rotor and motor sequence learning using serial reaction time tasks49–51
to cognitive skill learning using variants of the Tower of London task,52
to reinforcement learning using tasks such as the Weather Prediction task,53,54
and to S-R learning using the Rutgers Acquired Equivalence task.55
It should be acknowledged that this literature contains numerous findings of deficits on tasks of supposedly implicit/procedural learning: Deficits have been noted in both motor learning as well as a variety of reinforcement learning paradigms,56–58
and some evidence that behavioral performance and neurophysiology may be modulated by antipsychotic medications.22,55,59
To expect completely consistent fully normal behavioral performance among schizophrenia patients in order to identify areas of relatively preserved function may be setting the bar at an unrealistically high level. That is, many of the tasks used to assess slow or habit learning, also require the use of other cognitive systems (perception, response generation, working memory, etc.) that are likely to be impaired in a substantial portion of patients. Thus, the reader of this literature is left to decide whether the slow learning glass is half-full or half-empty as evidence can be marshaled on both sides, and a consistent, principled explanation of the overall pattern of results has yet to emerge. Even recognizing that the literature has not been unanimous, the fact that patients are sometimes able to perform at, or near, normal levels on this class of tasks is remarkable, in our view, given the profound deficits that patients demonstrate on episodic memory tasks and in using working memory to guide action selection as discussed above.
We have done several experiments looking at different aspects of gradual reinforcement learning, seeking to test the limits of this system and to begin to address possible mechanisms that are implicated. Morris et al60
examined learning-related changes in behavior and brain activity in 26 patients with schizophrenia and 27 controls performing a probabilistic learning task. In this task, subjects saw color photographs of everyday objects on a computer screen and had to decide whether to make a left- or right-hand button press for each stimulus based on feedback. Six stimuli were included in each of 4 blocks of 300 trials. In each block, correct responses to 2 of the stimuli (one for the left hand, one for the right) were reinforced on 100% of trials. Correct responses to 2 other stimuli were reinforced 80% of the trials and incorrect choices were reinforced on 20% of the trials. Responses to the remaining 2 stimuli were reinforced randomly. Patients did not differ from controls in overall response accuracy for either the 80% or 100% stimuli. They did, however, take longer to meet the criterion used for determining when picture-response pairs had been learned (3 consecutive correct responses for both stimuli at a given probability level), a difference that was significant in the 100% condition. Thus, this study provides evidence of intact gradual learning, coupled with suggestive evidence that this learning may be delayed, a finding we interpret as resulting from an impairment in the rapid learning system. Interestingly, patients also showed a marked reduction in the amplitude of the feedback negativity, an event-related brain potential that occurs following the receipt of negative feedback. This reduction was most marked during the early acquisition phase of the stimuli in 100% probability condition (ie, the condition in which patients showed impaired learning), suggesting a reduction in the magnitude of error signals in schizophrenia, a potential mechanism implicated in rapid learning.
Our next experiment addressed the question of how this type of gradual learning might be mediated. Note that when learning from positive and negative feedback, it can be equally effective to learn what to do because it is rewarded (choose the right-hand response, it is often rewarded) as to learn what not to do because it is punished (do not choose the left-hand response, it is often punished). To address this issue, Waltz et al61
adopted the behavioral methods and computational framework developed by Frank.62,63
In brief, Frank's computational model includes multiple cortical-striatal-thalamic-cortical loops. Critical for positive reinforcement learning is the direct pathway, where phasic dopamine release primarily stimulates D1 receptors and facilitates learning responses associated with positive reinforcement (termed “Go” learning). In contrast, in the indirect pathway, D2 receptors predominate, and transient decreases in dopamine cell firing serve to enhance the learning of which responses should be inhibited because they are associated with negative outcomes (termed “NoGo” learning).
In order to distinguish between these 2 kinds of learning, Frank63
devised a novel probabilistic stimulus selection task. In this task, subjects are presented with pairs of stimuli and must learn from feedback which stimulus is the optimal choice. The pairs vary in the degree of discrepancy between the reinforcement values of each choice. One pair combines an 80% item with a 20% item (ie, the correct choice was reinforced 80% of the time but was not reinforced on the remaining 20% of trials and vice versa), another pair is 70% and 30%, and the third is 60% and 40%. During the acquisition phase of the experiment, subjects perform up to 360 trials to learn the initial pairings. Given the need to weigh multiple reinforcement values and update these rapidly, these early acquisition trials should present a substantial challenge to patients as this phase of the task imposes a substantial load on the rapid learning system. This is precisely what we found in 2 versions of the task using different stimulus materials—patients showed minimal ability to learn the advantageous member of any pair over the first 120 trials. Further, we found that patients made fewer appropriate “win stay” (repeat a reinforced choice) and “lose shift” (try a different response after a loss) responses than did controls, further evidence of difficulty using working memory representations to guide action selection.
Frank introduced a transfer, or generalization, phase to the experiment that offers novel information. In this phase, the stimulus pairs are recombined. Thus, the 80% stimulus is now paired with the 70%, 60%, 40%, and 30% stimuli, and the recombinations are also made with the 20% stimulus. Thus, if subjects learned that the 80% stimulus was truly the best, it should be consistently preferred when paired with all the other stimuli, including the 70% and 60% stimuli that were the “winners” in their original presentation. Similarly, if subjects learned that that 20% stimulus was to be avoided, it should be avoided when paired with all other items. As seen in , patients with schizophrenia learned to avoid the 20% stimulus at the same level as controls but failed to pick the 80% stimulus as frequently as did controls. In terms of the model, patients demonstrated intact NoGo learning but reduced Go learning (ie, reduced learning from positive reinforcement). Note, patients did learn to choose the 80% stimulus when paired with the 20% stimulus at a similar level as controls, but they did not generalize from this experience to the same degree as did controls when the 80% stimulus was encountered in novel pairs. We do not claim that these findings suggesting intact NoGo but impaired Go learning can provide a seamless, integrated account of all the findings in the “slow” or habit-learning literature in schizophrenia. However, these data and computational framework open up new questions for investigation and may provide a means of organizing a complex set of findings.
Fig. 6. Performance of schizophrenia (SZ) Patients and Controls on Transfer Measures from the Frank Probabilistic Selection Paradigm (Waltz et al61). Patients showed significant impairment, relative to controls, on the measure of procedural Go learning (choosing (more ...)
Additional evidence was recently published by Heerey et al38
using an adaptation of the reward sensitivity paradigm developed by Pizzagalli et al.64
In this task, subjects see brief displays containing a simple outline drawing of a face and are asked to decide if the face had a “short” mouth or a “long” mouth. In fact, the 2 mouths differ only slightly, and the discrimination is difficult for many subjects. The critical manipulation is that selection of the mouths is reinforced in an asymmetric fashion: 30 of 40 reinforcers occur in response to correct choices of mouth A, whereas 10 of 40 reinforcers occur in response to mouth B. Because it is a difficult discrimination, subjects are uncertain and must guess on some proportion of trials. The question of interest is whether patients develop a bias toward picking the most frequently rewarded choice, as do healthy subjects. Pizzagalli et al64
reported that depressed subjects fail to develop this reward-seeking bias and have proposed this task as a method to assess anhedonia or insensitivity to reward. We observed 2 key findings in patients with schizophrenia. First, they tended to have more difficulty performing the discrimination than controls, an expected finding given evidence for low-level perceptual deficits in schizophrenia.65,66
Second, and most surprisingly, patients showed the same reward-seeking bias as did controls. Interestingly, postexperiment debriefing revealed that none of the patients was aware of the differential stimulus-reward contingencies. Thus, the evidence of intact reward sensitivity occurred completely outside of awareness, suggesting that this performance was accomplished without use of working memory representations that are available for introspection and report.
One question that arises immediately is how to reconcile the evidence from Waltz et al61
that Go learning is impaired in schizophrenia with the findings from the Pizzagalli reward sensitivity task which appears to show a normal reward-seeking bias.38
Further, how to reconcile our findings that patients are less concerned than controls with potential losses when making decisions, but do learn from negative outcomes over time? These findings may not be contradictory. For example, if lack of reinforcement in the reward sensitivity paradigm is considered to be negative feedback, then NoGo learning would lead to an apparent preference for the more frequently rewarded stimulus: subjects would “avoid” guessing the negative stimulus and choose to guess the more frequently rewarded stimulus when uncertain. Alternatively, it is possible that the contrast between the 2 reinforcement rates is so large that even a compromised Go learning system is able to effectively discriminate between the 2. The design of the reward sensitivity task does not provide a means of discerning how learning was accomplished, and it will remain for future work to address this question in a more definitive fashion.
The finding that patients fail to consider the possibility of losses when making decisions, but appear to benefit from negative feedback in the longer term, can be understood as resulting from the independence of the rapid learning system, heavily dependent on working memory resources, and the slow learning system that integrates reinforcement signals over longer temporal intervals. The clinical implications of these results are potentially quite interesting: patients may seek rewards but fail to adequately learn about the relative value of stimuli and responses from positive outcomes. In contrast, patients may not encode discrete instances of negative feedback when making decisions but may be able to use nondeclarative systems to learn from repeated punishments. In both instances, there is a mismatch between slow learning performance and decision-making biases, likely limiting adaptive functioning.
Furthermore, we do not mean to imply that the frontal and striatal systems involved in reinforcement learning are entirely functionally segregated. There are clear instances where they interact—rapid reversal learning is one example.42
As with other areas of cognitive research in schizophrenia, it is extremely difficult to isolate the function of one specific system with a specific task and be confident that the potential impact of other impairments has been controlled. Nonetheless, we argue that circuits in dorsal striatum and prefrontal cortex make distinct contributions to reinforcement learning and may be differentially disturbed in schizophrenia. Our interpretive confidence is enhanced by converging evidence, and several such findings have emerged from these experiments, summarized in . One such finding is that the “slow” learning system appears to be surprisingly intact—given enough learning trials, patients are able to use feedback to guide response and stimulus selection in many experimental paradigms. It remains for future experiments to determine whether both positive and negative feedback are equally effective in guiding such learning.
In contrast, there appears to be a highly reliable impairment in the rapid learning system. These deficits can be detected in the early phases of learning tasks, when faced with the need to shift response selection in the face of changing outcomes and, perhaps, more generally when confronted by negative feedback. These feedback-driven learning systems are the basic mechanisms that facilitate adaptive behavior. That is, even if evoked emotional experience is surprisingly normal in patients with schizophrenia, the impact of this experience on subsequent behavior depends on the fidelity of learning mechanisms and representational systems. As seen in our studies of decision making, the degree to which patients undervalue future rewards or the possibility of punishment is correlated with working memory performance, suggesting that certain basic cognitive capacities may form a critical computational substrate for aspects of affective and motivational function.