|Home | About | Journals | Submit | Contact Us | Français|
Patients with schizophrenia demonstrate deficits in motivation and learning that suggest impairment in different aspects of the reward system. In this article, we present the results of 8 converging experiments that address subjective reward experience, the impact of rewards on decision making, and the role of rewards in guiding both rapid and long-term learning. All experiments compared the performance of stably treated outpatients with schizophrenia and demographically matched healthy volunteers. Results to date suggest (1) that patients have surprisingly normal experiences of positive emotion when presented with evocative stimuli, (2) that patients show reduced correlation, compared with controls, between their own subjective valuation of stimuli and action selection, (3) that decision making in patients appears to be compromised by deficits in the ability to fully represent the value of different choices and response options, and (4) that rapid learning on the basis of trial-to-trial feedback is severely impaired whereas more gradual learning may be surprisingly preserved in many paradigms. The overall pattern of findings suggests compromises in the orbital and dorsal prefrontal structures that play a critical role in the ability to represent the value of outcomes and plans. In contrast, patients often (but not always) approach normal performance levels on the slow learning achieved by the integration of reinforcement signals over many trials, thought to be mediated by the basal ganglia.
The reward system is an attractive target for translational research in schizophrenia for a number of reasons. First, there is an enormous basic neuroscience literature to draw from that details the functional neuroanatomy and neurochemistry of the system, as well as a rich set of behavioral paradigms that have been extensively studied in a range of mammalian species.1,2 There is extensive evidence suggesting a high degree of conservation of this system across species, including humans.3 Second, the reward system has also been the subject of a growing body of human functional neuroimaging and behavioral studies, as well as sophisticated, biologically inspired computational modeling approaches to reinforcement learning.4–6 Importantly, human neuroimaging studies have shown that the same reward circuitry is involved when subjects receive symbolic feedback as when primary reinforcers are at stake, suggesting a general role of this system in outcome-based learning.7,8 Third, there is a great deal of evidence that the dopamine system plays a critical role in reward processing, a system that is clearly implicated in schizophrenia and is the primary target of current pharmacological treatment approaches.9,10 Fourth, patients with schizophrenia have marked motivational and learning deficits that may reflect impairments in different aspects of reward processing. Thus, there are many reasons to suspect that the study of reward-processing impairments in schizophrenia may shed new light on the functional disability and neurobiology of the illness.
Abnormalities in reward processing are likely relevant to several different aspects of schizophrenia. For example, reward receipt is associated with hedonic experience, and an abnormality in this aspect of reward function bears a straightforward relationship with anhedonia, long considered to be a critical feature of the illness in many patients.11,12 Further, diminished hedonic experience would undermine motivation for goal-directed behavior as the achievement of behavioral goals would result in an attenuated reward experience. Similarly, rewards function as teaching signals about which stimuli and which responses are associated with valued outcomes. There is a large literature documenting deficits in patients with schizophrenia in the performance of tasks where feedback is provided to guide performance.13–15 Thus, several of the critical cognitive and affective deficits in schizophrenia might reflect impairments in specific aspects of reward processing, in addition to or instead of primary cognitive impairments.
For the past several years, our group at the University of Maryland has been conducting a series of behavioral studies addressing different aspects of reward processing in schizophrenia. We refer the reader to several recent reviews on emotion, motivation, and negative symptoms in schizophrenia that provide coverage of the broader literature.16–19 Our initial studies have investigated subjective reward experience, the impact of rewards on decision making, and the role of rewards in guiding both rapid and long-term learning. This article offers an integrative account of 8 separate experiments that provide an initial systems level perspective on areas of preserved and impaired reward function in schizophrenia, with findings summarized in table 1. Such behavioral evidence is needed to guide and constrain the interpretation of functional imaging studies of reward processing in schizophrenia that are beginning to appear in the literature.20–22 All the studies included clinically stable, medicated outpatients diagnosed with schizophrenia, typically in their 30s and 40s, and demographically matched controls. Many subjects participated in multiple experiments.
The literature on hedonic experience in schizophrenia reveals a consistent pattern of contradictory findings that appears to be explained at least in part by methodological differences. Clinician ratings of anhedonia as well as patient self-report using trait questionnaire measures, such as the Chapman Physical and Social Anhedonia scales,23 provide consistent evidence that a substantial proportion of patients with schizophrenia have clinically significant anhedonia.24,25 The experimental laboratory literature provides a largely consistent body of evidence, suggesting that patients with schizophrenia have normal, or nearly normal, experience of positive emotion when presented with emotionally evocative stimuli. These studies have included a wide variety of stimuli ranging from visual images to movie clips, to flavored drinks, and to valenced words.24,26–28
Indeed, the similarity of patient and control ratings of emotionally evocative stimuli is often quite striking. An example of this kind of result is shown in figure 1, adapted from the findings of Heerey and Gold.29 In this study, 41 patients with schizophrenia and 31 controls were asked to make pleasantness and arousal ratings for 42 slides, each containing 3 images from the International Affective Picture System (IAPS).30 The 3 images in each slide had similar valence and arousal ratings based on normative IAPS data. As seen in the figure, there is remarkable similarity between patient and control ratings, and both groups closely resemble the normative values, ruling out the possibility that the null result can be attributed to a nonrepresentative control group. In a recent report from Herbener et al26 using similar stimuli, the correlation between affect ratings made by patients and healthy controls was greater than .90. Although not every study reports this degree of agreement between patient and control affect ratings, the overall literature suggests that the subjective experience of evoked positive emotion is surprisingly normal in patients with schizophrenia. We acknowledge that the question of whether this normative subjective experience is accompanied by normal neurophysiology remains an unresolved and interpretively complex issue in the literature.31–34 However, the behavioral literature provides a consistent set of surprising findings that are also interpretively challenging given common clinical thinking about the role of anhedonia in the illness.
The clear contradiction of findings across methods (questionnaire-based self-report and clinician ratings vs laboratory experiments) presents a quandary. Have clinicians been asking the wrong questions and providing the wrong answers about anhedonia? Or, might the experimental approaches to this issue be so artificial that the evidence for normative experience can be quickly discounted? Or, might both approaches be providing reliable evidence about different phenomena? We suspect that the answer to the last question is “yes” and that the discrepancy reveals something important about reward/emotion/motivation deficits in schizophrenia.
Clinician ratings of anhedonia are influenced by what patients do and how they report their experience. That is, if patients report a lack of reward-seeking behavior, it would be easy for a clinician to infer that this reflects a lack of interest in, or enjoyment of, pleasurable activities. In light of the fact that many patients have very limited opportunities to pursue rewarding activities, an interview querying for positive emotional experience and response vigor is likely to yield little evidence of normative experience of, or motivation by, the pursuit of pleasurable activity. The critical point is that clinical ratings of anhedonia are likely substantially influenced by environmental opportunities as well as broader motivational issues having to do with the generation of goal-directed behavior, not only with the experience of positive emotion per se. In contrast, the experimental approach may be tapping primarily into the primary experience of hedonia and may not capture these critical motivational and environmental influences. Seen in this way, the conflicting findings between methods might be more apparent than real as these methods address different aspects of patient behavior and experience.
The more challenging contradiction comes between the laboratory literature and patient self-report on instruments like the Chapman Anhedonia scales.24,35 That is, when asked to make true/false judgments in response to probes such as “The beauty of a sunset is greatly overrated,” patients routinely report less pleasure than controls. However, if one were to show a patient a “real” sunset, the experimental literature would suggest that patients would report the same amount of pleasure as controls. One way of understanding the contradiction is that the laboratory measures involve the presentation of an evocative stimulus whereas the questionnaire measures require that the subject invoke an internal representation of the experience in question and make a judgment about the affective value of that internally generated and maintained representation. If patients with schizophrenia have difficulty generating and evaluating representations of affective value, then one might expect them to respond in an anhedonia-consistent fashion on scales such as the Chapman and other self-report scales, even though their actual in-the-moment pleasure might approximate that of healthy subjects.35
We have addressed the relationship between the ability to represent the value of an experience and decision making with 3 different experimental approaches. In the same experiment in which subjects produced the affect ratings shown in figure 1, they were also asked to make speeded button presses to indicate whether they wished to see the stimulus again in a later condition. This button pressing began 3 s after stimuli were removed from the computer screen. In a second experimental condition, subjects used speeded button presses to either increase or decrease their viewing times while the stimuli were in view. Thus, in one condition, the button pressing required a memory representation of the stimuli whereas the other condition did not.
Two critical aspects of the results can be seen in figure 2. When responding to either the actual stimulus or a memory representation of the stimulus, patients show reduced differentiation in their response rate as a function of stimulus valence. That is, patients showed less difference in their response rate to neutral stimuli and stimuli they rated as having a positive or negative valence than did controls. When we examined the correlation between individual subject's pleasantness ratings and their rates of speeded button pressing, we found that healthy subjects had a significantly higher degree of correspondence between their ratings and button-pressing rates and that this between-group difference was significantly magnified in the representational condition. That is, the volitional behavior of controls was more tightly predicted by their own ratings of affective value than was observed in the patient group. Among patients, the degree of correspondence between value and behavior was correlated with standard working memory measures, suggesting that the ability to represent value may draw upon the same cognitive substrate that is used to maintain and manipulate recent perceptual experience. The results from this study suggest that even when affective values are assigned in a normative fashion, these assignments have reduced impact on the modulation of motivated behavior in patients with schizophrenia, potentially due in part to difficulty generating, accessing, or maintaining internal representations of affective value (see table 1).
The results from this first experiment raise the possibility that perceptually available experiences may be valued differently from those represented in memory. To extend that result, we examined delayed discounting to investigate how patients weigh the relative value of immediate vs delayed, future rewards.36 In this experiment, subjects were asked to choose between smaller immediate rewards and larger delayed rewards: “Would you prefer $36 today or $80 in 59 days?,” using methods developed by Kirby et al.37 The term delay discounting refers to the fact that the imposition of time reduces the value of future rewards. Thus, an individual might prefer to receive $10 in a week rather than $1 today but might choose the $1 today over $10 promised in 1 year's time. By presenting hypothetical choices that span different time intervals and involve varying relative magnitude differences, it is possible to estimate an overall discounting rate.
As seen in figure 3, patients discounted the value of future rewards far more steeply than do controls. That is, they will choose a much smaller immediate reward over a larger delayed reward than will healthy volunteers. As in the previously described experiment, better patient performance on measures of episodic and working memory related to less severe discounting of the value of future rewards. In other words, patients with better episodic and working memory showed less severe discounting of future rewards, suggesting again that aspects of decision making and affective processing may be related to broader aspects of cognitive functioning in schizophrenia.
In delayed discounting tasks, subjects are asked to weigh the relative value of immediate and delayed rewards. Might the delayed discounting deficit in patients reflect a difficulty in integrating multiple features of a decision? That is, in delay discounting, 2 different reward magnitudes must be weighed while also considering the time interval involved, whereas ratings of evoked experiences do not involve the integration and comparison of multiple stimuli and features. To examine the issue of whether patients have difficulty incorporating multiple features in making decisions, Heerey et al38 presented subjects with a probabilistic decision-making task in which subjects had to choose which of 2 gambles they wanted to play. The 2 gambles differed in the magnitude of the potential reward (ranging from $3 to $17) and the probability of winning. Further, on some trials, losing the chosen gamble incurred a loss (ranging from $3 to $17), and on other trials, no loss was possible (lose $0). These features were presented using simple visual stimuli that were explained to subjects prior to beginning the task. The experiment produced 3 critical findings. First, healthy controls made more optimal choices than patients: the choices of controls more closely matched the actual differences in expected value between the 2 gambles (expected value=probability of winning multiplied by value of the win, minus the probability of losing multiplied by the value of the loss). Second, it was possible to estimate the extent to which the subjective value of potential gains and potential losses guided decision making using a logistic regression model. Interestingly, patients’ valued potential gains in an identical fashion as controls. However, they failed to weigh possible losses to the same degree as did controls when choosing which gamble to play. Consistent with the results of the prior 2 experiments, working memory ability correlated with the ability to optimally weigh potential outcomes. That is, patients with better working memory made more optimal decisions, and when the impact of working memory on decision making was statistically controlled, the patient vs control difference was no longer significant.
Generalizing across experiments, it appears that the decision making and response generation/selection deficits in schizophrenia occur in the context of normative evoked emotional experience and normal valuation (and potentially overvaluation) of potential immediate rewards. The failure of normal emotional experience to have expectable influence on behavior and decision making may be a consequence of a broader deficit in the ability to mentally represent the expected value of multiple response options—a form of working memory for value. We do not mean to suggest that the patient deficit is directly attributable to working memory capacity per se. For example, in the delay discounting and gambling paradigms, the subject makes choices based on visually available stimuli—there is no memory involved per se. The performance of decision-making tasks, however, imposes a demand to simultaneously represent and consider the multiple cognitive/affective attributes associated with different choices. Based on our correlational evidence, we suggest that affective “representational complexity” is related to more conventional measures of working memory capacity and that the working memory capacity limits in schizophrenia may impact decision making and the ability to finely tune response selection/generation with affective valuation. These results suggest that the working memory and affective functions typically attributed to the dorsal and orbital prefrontal cortex, and typically considered to be dissociable, may both be implicated in the behavioral deficits of patients with schizophrenia. Whether this occurs on the basis of shared abnormal inputs to both regions (eg, dopamine) or intrinsic anatomic pathology of both regions remains a question for future research using different methods.
Reinforcement learning occurs on multiple time scales in order to optimize adaptive performance. On the one hand, long-term knowledge structures, skills, and habits develop over extended intervals in order to correspond to the long-range statistical regularities, constraints, and affordances of the environment. On the other hand, rapid learning systems are needed to deal with unexpected changes in reward contingencies and with novel situations. These learning systems are complementary and interactive, providing for both behavioral stability and plasticity. There is a great deal of evidence that different neural systems are implicated in these 2 kinds of learning. The basal ganglia appear to play a critical role in the “slow” learning system, integrating longer term reinforcement outcomes.6 It is thought that the phasic activity of dopaminergic neurons plays a critical role in this process, with phasic increases in activity coding that an outcome was better than expected, whereas phasic decreases in activity may correspond to outcomes that were worse than expected.39,40 Learning in such cases is often slow because many trials are needed to develop reliable predictions of less than certain outcomes.
In contrast, the prefrontal cortex, particularly the orbital frontal cortex, plays a critical role in the “rapid” learning system, updating representations of the relative value of different stimuli and response alternatives on a trial-to-trial basis and maintaining this information on-line so that it can guide behavior.6,41,42 This type of rapid learning is critical for behavioral flexibility in the face of changing outcomes as in the case of reversal learning. Both systems process the same dopaminergic reinforcement signals but respond on different time scales. The prefrontal system, driven by sustained neural firing, has a limited temporal horizon. In contrast, basal ganglia–based learning is thought to involve longer term structural changes in synaptic connection strength, a less flexible but also less fallible process.43 These important differences suggest that these systems, while densely interactive, may be partially dissociable, and we believe that there is evidence suggesting this is the case in schizophrenia.
The Wisconsin Card Sorting Test (WCST) has played a significant role in the schizophrenia research literature and is widely regarded as a measure of executive function, set shifting, problem solving, and behavioral flexibility that draws on working memory to integrate performance feedback.14 The task requires subjects to sort cards using one of 3 potential sorting rules (matching by color of the stimuli, the form of the stimuli, or the number of forms on each card), using feedback to discover which dimension is currently reinforced. After 10 consecutive correct responses, the sorting rule, unbeknownst to the subject, changes. This change becomes evident only when previously reinforced responses are suddenly incorrect. Perseverative responding to the previously correct dimension is considered the critical behavioral evidence of an inability to shift set in the face of changing feedback.
The focus on set shifting cannot, however, account for patients’ frequent inability to achieve even one set which appears to implicate a much more basic difficulty in feedback processing. In order to examine the role of the rapid learning system reflected in participants’ trial-by-trial use of feedback to guide action selection, Prentice et al44 performed an analysis of archival WCST data from 145 patients and 80 controls who had performed the task for a variety of protocols. The analysis focused on the first 4 trials of the test, where participants’ use of negative feedback should lead them, by process of elimination, to the first sorting rule. As seen in figure 4a, we found that the majority of patients with schizophrenia were less accurate than healthy controls on these early cards. After an initial guess on Card 1 resulted in negative feedback (all groups showed a predisposition to sort Card 1 according to Form, though the correct answer was Color), patients were significantly less able than healthy controls to use that negative feedback to redirect them to a different answer on Card 2. The 2 groups maintained a significant accuracy difference across Cards 2, 3, and 4. Also shown in figure 4b is the remarkable capacity for accuracy on the first 4 cards to foreshadow subsequent performance, represented here by the number of categories achieved across the entire 128-card task. A comparison of patients who were able to achieve 3 or more categories against those who achieved fewer than 3 showed significant differences in their accuracy on Cards 2, 3, and 4. That is, patients who had difficulty using negative feedback to make correct choices over the first 4 cards were largely unable to master the task, whereas several correct responses during the first 4 trials were highly predictive of later task success.
Schizophrenia patients’ impaired accuracy on early WCST cards cannot be attributed to traditional perseveration, which is a failure to abandon a previously rewarded response when negative feedback indicates one should do so because participants have not yet made any rewarded responses. Among the minority of patients who did generate correct responses to Cards 2, 3, or 4, most were able to use positive feedback as a cue to stay with their rewarded response for the subsequent card. Thus, the locus of impairment in patients’ rapid trial-to-trial learning appears to be in the ability to use negative feedback, or error information, to alter behavior. With the WCST, however, one cannot rule out the possibility that other cognitive deficits might contribute to patients’ difficulties with the first few cards. That is, poor performance could arise from deficits outside the rapid learning system. For example, if the idea of matching by color or number did not occur to a subject, then they might continue to sort to form despite knowing that it was wrong simply because they had no other rule available to use. While we consider that to be an unlikely explanation, only an experiment that did not involve the requirement to form abstract categories could directly address the question.
Waltz and Gold45 turned to a probabilistic reversal learning paradigm to isolate the ability to rapidly use negative feedback.42 In a forced-choice response task, 37 patients and 25 controls were presented with pairs of abstract fractal patterns. Responses to one of the patterns were reinforced on 80% of trials and responses to the other pattern were reinforced on 20% of trials. If subjects learned to choose the 80% stimuli on 9 of 10 consecutive trials, then the contingencies were reversed. If subjects learned the new outcome by choosing the new 80% stimulus on 9 of 10 trials, the contingencies reversed back to the original probabilities, allowing for up to 2 reversals for each stimulus pair. This procedure was repeated for 2 additional stimulus pairs, allowing for a maximum of 6 reversals. Performance on this kind of reversal task provides a critical test of the rapid learning system as a previously reinforced response must be abandoned on the basis of changing reinforcement outcomes. As seen in figure 5, similar proportions of both groups achieved the 3 initial discriminations. However, controls achieved significantly more successful reversals than did patients. Thus, the patient deficit is in the ability to reverse previously learned associations, not in the ability to acquire such associations when these are presented in a simple fashion such that if one choice is correct, the other is incorrect.
These results may appear to differ from multiple studies using the intradimensional/extradimensional shifting task from the Cambridge Neuropsychological Test Automated Battery where reversal deficits appear to be confined to a very small percentage of patients.46,47 However, the reversal learning on that task is based on pairs of totally deterministic stimuli (ie, one is always right, the other always wrong until they switch). The probabilistic task puts a greater stress on working memory as multiple outcomes associated with a response must be held “on-line” in order to determine when contingencies have actually shifted. Consistent with our prior experiments, the patient deficit emerges when multiple outcomes, values, or S-R contingencies must be integrated, compared, and evaluated to determine the optimal response. Thus, the experiments on decision making and rapid learning converge on the idea that the ability to fully represent the affective value of different stimulus features or response alternatives may be implicated across a range of behaviors (see table 1).
As noted above, it is thought that the basal ganglia play a critical role in the acquisition of skills, habits, procedures and the longer term learning of optimal response choices from feedback.48 The performance of patients with schizophrenia on these types of learning tasks has been examined by a number of investigators, with surprising evidence of normal/close to normal performance on tasks ranging from motor learning using the pursuit rotor and motor sequence learning using serial reaction time tasks49–51 to cognitive skill learning using variants of the Tower of London task,52 to reinforcement learning using tasks such as the Weather Prediction task,53,54 and to S-R learning using the Rutgers Acquired Equivalence task.55 It should be acknowledged that this literature contains numerous findings of deficits on tasks of supposedly implicit/procedural learning: Deficits have been noted in both motor learning as well as a variety of reinforcement learning paradigms,56–58 and some evidence that behavioral performance and neurophysiology may be modulated by antipsychotic medications.22,55,59 To expect completely consistent fully normal behavioral performance among schizophrenia patients in order to identify areas of relatively preserved function may be setting the bar at an unrealistically high level. That is, many of the tasks used to assess slow or habit learning, also require the use of other cognitive systems (perception, response generation, working memory, etc.) that are likely to be impaired in a substantial portion of patients. Thus, the reader of this literature is left to decide whether the slow learning glass is half-full or half-empty as evidence can be marshaled on both sides, and a consistent, principled explanation of the overall pattern of results has yet to emerge. Even recognizing that the literature has not been unanimous, the fact that patients are sometimes able to perform at, or near, normal levels on this class of tasks is remarkable, in our view, given the profound deficits that patients demonstrate on episodic memory tasks and in using working memory to guide action selection as discussed above.
We have done several experiments looking at different aspects of gradual reinforcement learning, seeking to test the limits of this system and to begin to address possible mechanisms that are implicated. Morris et al60 examined learning-related changes in behavior and brain activity in 26 patients with schizophrenia and 27 controls performing a probabilistic learning task. In this task, subjects saw color photographs of everyday objects on a computer screen and had to decide whether to make a left- or right-hand button press for each stimulus based on feedback. Six stimuli were included in each of 4 blocks of 300 trials. In each block, correct responses to 2 of the stimuli (one for the left hand, one for the right) were reinforced on 100% of trials. Correct responses to 2 other stimuli were reinforced 80% of the trials and incorrect choices were reinforced on 20% of the trials. Responses to the remaining 2 stimuli were reinforced randomly. Patients did not differ from controls in overall response accuracy for either the 80% or 100% stimuli. They did, however, take longer to meet the criterion used for determining when picture-response pairs had been learned (3 consecutive correct responses for both stimuli at a given probability level), a difference that was significant in the 100% condition. Thus, this study provides evidence of intact gradual learning, coupled with suggestive evidence that this learning may be delayed, a finding we interpret as resulting from an impairment in the rapid learning system. Interestingly, patients also showed a marked reduction in the amplitude of the feedback negativity, an event-related brain potential that occurs following the receipt of negative feedback. This reduction was most marked during the early acquisition phase of the stimuli in 100% probability condition (ie, the condition in which patients showed impaired learning), suggesting a reduction in the magnitude of error signals in schizophrenia, a potential mechanism implicated in rapid learning.
Our next experiment addressed the question of how this type of gradual learning might be mediated. Note that when learning from positive and negative feedback, it can be equally effective to learn what to do because it is rewarded (choose the right-hand response, it is often rewarded) as to learn what not to do because it is punished (do not choose the left-hand response, it is often punished). To address this issue, Waltz et al61 adopted the behavioral methods and computational framework developed by Frank.62,63 In brief, Frank's computational model includes multiple cortical-striatal-thalamic-cortical loops. Critical for positive reinforcement learning is the direct pathway, where phasic dopamine release primarily stimulates D1 receptors and facilitates learning responses associated with positive reinforcement (termed “Go” learning). In contrast, in the indirect pathway, D2 receptors predominate, and transient decreases in dopamine cell firing serve to enhance the learning of which responses should be inhibited because they are associated with negative outcomes (termed “NoGo” learning).
In order to distinguish between these 2 kinds of learning, Frank63 devised a novel probabilistic stimulus selection task. In this task, subjects are presented with pairs of stimuli and must learn from feedback which stimulus is the optimal choice. The pairs vary in the degree of discrepancy between the reinforcement values of each choice. One pair combines an 80% item with a 20% item (ie, the correct choice was reinforced 80% of the time but was not reinforced on the remaining 20% of trials and vice versa), another pair is 70% and 30%, and the third is 60% and 40%. During the acquisition phase of the experiment, subjects perform up to 360 trials to learn the initial pairings. Given the need to weigh multiple reinforcement values and update these rapidly, these early acquisition trials should present a substantial challenge to patients as this phase of the task imposes a substantial load on the rapid learning system. This is precisely what we found in 2 versions of the task using different stimulus materials—patients showed minimal ability to learn the advantageous member of any pair over the first 120 trials. Further, we found that patients made fewer appropriate “win stay” (repeat a reinforced choice) and “lose shift” (try a different response after a loss) responses than did controls, further evidence of difficulty using working memory representations to guide action selection.
Frank introduced a transfer, or generalization, phase to the experiment that offers novel information. In this phase, the stimulus pairs are recombined. Thus, the 80% stimulus is now paired with the 70%, 60%, 40%, and 30% stimuli, and the recombinations are also made with the 20% stimulus. Thus, if subjects learned that the 80% stimulus was truly the best, it should be consistently preferred when paired with all the other stimuli, including the 70% and 60% stimuli that were the “winners” in their original presentation. Similarly, if subjects learned that that 20% stimulus was to be avoided, it should be avoided when paired with all other items. As seen in figure 6, patients with schizophrenia learned to avoid the 20% stimulus at the same level as controls but failed to pick the 80% stimulus as frequently as did controls. In terms of the model, patients demonstrated intact NoGo learning but reduced Go learning (ie, reduced learning from positive reinforcement). Note, patients did learn to choose the 80% stimulus when paired with the 20% stimulus at a similar level as controls, but they did not generalize from this experience to the same degree as did controls when the 80% stimulus was encountered in novel pairs. We do not claim that these findings suggesting intact NoGo but impaired Go learning can provide a seamless, integrated account of all the findings in the “slow” or habit-learning literature in schizophrenia. However, these data and computational framework open up new questions for investigation and may provide a means of organizing a complex set of findings.
Additional evidence was recently published by Heerey et al38 using an adaptation of the reward sensitivity paradigm developed by Pizzagalli et al.64 In this task, subjects see brief displays containing a simple outline drawing of a face and are asked to decide if the face had a “short” mouth or a “long” mouth. In fact, the 2 mouths differ only slightly, and the discrimination is difficult for many subjects. The critical manipulation is that selection of the mouths is reinforced in an asymmetric fashion: 30 of 40 reinforcers occur in response to correct choices of mouth A, whereas 10 of 40 reinforcers occur in response to mouth B. Because it is a difficult discrimination, subjects are uncertain and must guess on some proportion of trials. The question of interest is whether patients develop a bias toward picking the most frequently rewarded choice, as do healthy subjects. Pizzagalli et al64 reported that depressed subjects fail to develop this reward-seeking bias and have proposed this task as a method to assess anhedonia or insensitivity to reward. We observed 2 key findings in patients with schizophrenia. First, they tended to have more difficulty performing the discrimination than controls, an expected finding given evidence for low-level perceptual deficits in schizophrenia.65,66 Second, and most surprisingly, patients showed the same reward-seeking bias as did controls. Interestingly, postexperiment debriefing revealed that none of the patients was aware of the differential stimulus-reward contingencies. Thus, the evidence of intact reward sensitivity occurred completely outside of awareness, suggesting that this performance was accomplished without use of working memory representations that are available for introspection and report.
One question that arises immediately is how to reconcile the evidence from Waltz et al61 that Go learning is impaired in schizophrenia with the findings from the Pizzagalli reward sensitivity task which appears to show a normal reward-seeking bias.38 Further, how to reconcile our findings that patients are less concerned than controls with potential losses when making decisions, but do learn from negative outcomes over time? These findings may not be contradictory. For example, if lack of reinforcement in the reward sensitivity paradigm is considered to be negative feedback, then NoGo learning would lead to an apparent preference for the more frequently rewarded stimulus: subjects would “avoid” guessing the negative stimulus and choose to guess the more frequently rewarded stimulus when uncertain. Alternatively, it is possible that the contrast between the 2 reinforcement rates is so large that even a compromised Go learning system is able to effectively discriminate between the 2. The design of the reward sensitivity task does not provide a means of discerning how learning was accomplished, and it will remain for future work to address this question in a more definitive fashion.
The finding that patients fail to consider the possibility of losses when making decisions, but appear to benefit from negative feedback in the longer term, can be understood as resulting from the independence of the rapid learning system, heavily dependent on working memory resources, and the slow learning system that integrates reinforcement signals over longer temporal intervals. The clinical implications of these results are potentially quite interesting: patients may seek rewards but fail to adequately learn about the relative value of stimuli and responses from positive outcomes. In contrast, patients may not encode discrete instances of negative feedback when making decisions but may be able to use nondeclarative systems to learn from repeated punishments. In both instances, there is a mismatch between slow learning performance and decision-making biases, likely limiting adaptive functioning.
Furthermore, we do not mean to imply that the frontal and striatal systems involved in reinforcement learning are entirely functionally segregated. There are clear instances where they interact—rapid reversal learning is one example.42 As with other areas of cognitive research in schizophrenia, it is extremely difficult to isolate the function of one specific system with a specific task and be confident that the potential impact of other impairments has been controlled. Nonetheless, we argue that circuits in dorsal striatum and prefrontal cortex make distinct contributions to reinforcement learning and may be differentially disturbed in schizophrenia. Our interpretive confidence is enhanced by converging evidence, and several such findings have emerged from these experiments, summarized in table 1. One such finding is that the “slow” learning system appears to be surprisingly intact—given enough learning trials, patients are able to use feedback to guide response and stimulus selection in many experimental paradigms. It remains for future experiments to determine whether both positive and negative feedback are equally effective in guiding such learning.
In contrast, there appears to be a highly reliable impairment in the rapid learning system. These deficits can be detected in the early phases of learning tasks, when faced with the need to shift response selection in the face of changing outcomes and, perhaps, more generally when confronted by negative feedback. These feedback-driven learning systems are the basic mechanisms that facilitate adaptive behavior. That is, even if evoked emotional experience is surprisingly normal in patients with schizophrenia, the impact of this experience on subsequent behavior depends on the fidelity of learning mechanisms and representational systems. As seen in our studies of decision making, the degree to which patients undervalue future rewards or the possibility of punishment is correlated with working memory performance, suggesting that certain basic cognitive capacities may form a critical computational substrate for aspects of affective and motivational function.
To a large degree, our motivation to study different aspects of reward processing was based on the idea that this system might be critically involved in the pathophysiology of negative symptoms. For example, if patients were unable to enjoy rewarding experiences, it would be very easy to understand their lack of motivated goal-directed behavior. Similar deficits in motivated behavior could arise if patients were unable to learn from rewarding outcomes. Thus, there is very high “face validity” for the idea that abnormalities in different aspects of reward processing might be implicated in this critical clinical feature of the illness. However, here our findings have been consistent and disappointing. We have examined correlations between performance on this collection of experimental tasks and ratings of negative symptoms assessed using subscales of the Scale for the Assessment of Negative Symptoms67 and the Brief Psychiatric Rating Scale Anergia factor.68,69 Results have been uniformly modest. We have occasionally observed correlations in the .3–.4 range but have more frequently been confronted by values quite a bit lower. In essence, the highest correlations we find are in the same effect size “ballpark” as is typically observed between negative symptoms and conventional neuropsychological tasks that are not thought to involve reward or emotional processing, and our correlations are often of lesser magnitude.
It is challenging to understand the lack of relationship. Several possibilities may be considered. First, it is possible that clinical ratings of negative symptoms are too imprecise, reflecting variance from a number of sources including reward system–related variance, variance attributable to the impact of positive symptoms and depression, medication side-effect variance, and environmental deprivation/limited life opportunity variance. This criterion “contamination” may make it difficult to isolate the relationship to specific aspects of reward processing. This possibility might be explored using different approaches to quantifying and defining negative symptoms to determine if the reward signal is enhanced.
Second, it is possible that the reward processing-negative symptom relationship may be altered in medicated patients (as we have studied here). This could occur if antipsychotic medications induce “secondary” negative symptoms (but do not effect experimental task performance) or because medications may interfere with experimental task performance (but not clinical symptom ratings). That is, for medications effects to obscure a real relationship, they would have to impact one side of the equation but not the other. If both symptoms and task performance were similarly impacted by medication, the essential relationship would be preserved. In such a case, the intercept might change, but the slope of the line relating these 2 constructs would not be altered. Our available data cannot address whether this complex scenario underlies the failure to document robust relationships between aspects of reward processing and negative symptoms seen in the experiments discussed above. Note, it is implausible that the motivational problems of schizophrenia can be largely or fully attributed to antipsychotic medications: these types of deficits were well described in the preneuroleptic era.
We acknowledge that dopamine-blocking medications may be a critical potential confound for the study of reward processing and one that is not easily addressed. For example, there would be clear value in the study of medication-free first-episode patients. Note, however, such studies are typically done at initial clinical presentation, at a time of extreme stress and severe psychosis, states associated with acute, and likely transient changes in dopaminergic transmission.70,71 Whether studies conducted in such an extreme state generalize to more trait-like features of the illness is unknown. The study of chronically medicated patients offers nearly opposite limitations, whereby clinical stability and “trait-like” deficits are most likely to be in evidence but may include a mix of iatrogenic and trait features of the illness. In our view, both types of studies offer useful, but different information, and that only by combining results across these types of samples, perhaps complimented by studies of unaffected family members, will it be possible to address the extent to which reward abnormalities are core features of the illness, are related to clinical state, or are medication-related side effects.
Third, it is possible that we have yet to investigate the dimension of reward processing that is powerfully related to clinically observed negative symptoms. There is recent basic behavioral neuroscience research from Rushworth's laboratory that offers an intriguing possibility. This work suggests that animals make an additional, critical value–related decision: is the pursuit of a particular reward worth the effort required to obtain it?72–74 Thus, even if animals can discriminate the differences in hedonic value between 2 choices, they still need to decide if the higher “expected value” is worth the cost of the work that is needed to obtain that value. There is suggestive evidence that such effort-based decision making is mediated in part by the cingulate cortex. This work may be particularly relevant to schizophrenia given other evidence suggestive of structural and functional abnormalities in the cingulate and it may provide a means to reconcile the fact that largely intact hedonic experience is coupled with deficient sustained goal/reward–directed behavior.75–79 Could it be that patients like what they like, but due to cingulate dysfunction, fail to pursue goals because the response cost appears to be prohibitively high? This interesting question remains to be investigated.
This emphasis on the “effort” involved in seeking rewards converges with the recent proposal by Kring and colleagues28, 35 that patients with schizophrenia demonstrate impairment in anticipatory pleasure that results in reduced behavioral activation by goals, whereas actual consummatory pleasure may be relatively intact, as suggested above in the discussion of hedonics. The results of Heerey and Gold could potentially be interpreted in this light. Both a deficit in anticipatory pleasure or an exaggerated estimation of response cost would likely lead to the same outcome: limited pursuit of valued goals and enjoyable experiences.
This set of converging behavioral experiments suggests that patients with schizophrenia demonstrate islands of preserved and impaired reward processing. Hedonic experience appears to be intact, suggesting that the conventional clinical understanding anhedonia as a negative symptom needs to be reconsidered and more carefully defined. Similarly, there is suggestive evidence that patients are able to integrate feedback over extended learning, a form of learning thought to be mediated by the basal ganglia. Impairments are reliably observed when patients must use feedback on a trial-by-trial basis to guide response selection, or when multiple representations of the value of response options must be weighed to guide decision making. This impairment in the ability to represent value appears to be related to more general working memory abilities and implicates both dorsal and orbital frontal cortex as a major locus of reward-processing abnormalities in schizophrenia.
National Institutes of Mental Health (grant # R24 MH72647 to J.M.G.); National Institutes of Health (grants P30 MH068580 and K12 RR023250 to J.A.W. and T32MH067533 to J.A.W. and E.A.H.); a VA Rehabilitation and Development Service (grant D3540V to S.E.M.).
This work was informed by the intellectual contributions of Paul Shepard, Greg Elmer, Elliot Stein, Julie Schweitzer, Robert Buchanan, Robert Conley, and Clay Holroyd who work on different aspects of the grant R24 MH72647. We gratefully acknowledge the contributions of Kimberly Bell-Warren, Sharon August, Pablo Diego, Mary Beth Ramsey, Rebecca Wilbur, Ben Robinson, and Sara Mitchell to the conduct of the experiments. Most importantly, we are grateful to our volunteers who suffer from schizophrenia and give so generously of their time and effort.