|Home | About | Journals | Submit | Contact Us | Français|
While dopamine systems have been implicated in the pathophysiology of schizophrenia and psychosis for many years, how dopamine dysfunction generates psychotic symptoms remains unknown. Recent theoretical interest has been directed at relating the known role of midbrain dopamine neurons in reinforcement learning, motivational salience and prediction error to explain the abnormal mental experience of psychosis. However, this theoretical model has yet to be explored empirically. To examine a link between psychotic experience, reward learning and dysfunction of the dopaminergic midbrain and associated target regions, we asked a group of first episode psychosis patients suffering from active positive symptoms and a group of healthy control participants to perform an instrumental reward conditioning experiment. We characterized neural responses using functional magnetic resonance imaging. We observed that patients with psychosis exhibit abnormal physiological responses associated with reward prediction error in the dopaminergic midbrain, striatum and limbic system, and we demonstrated subtle abnormalities in the ability of psychosis patients to discriminate between motivationally salient and neutral stimuli. This study provides the first evidence linking abnormal mesolimbic activity, reward learning and psychosis.
Why does a biochemical disturbance in brain dopamine systems lead to delusional ideas and other phenomena of psychosis? Psychotic symptoms are thought to be caused by disturbance in the function of the mesolimbic dopamine system:1,2 it is established that administration of dopaminergic drugs can cause psychosis in healthy individuals,3,4 that patients with schizophrenia show abnormal striatal dopaminergic responses to amphetamine challenge,5,6 and that dopamine D2 receptor blockade is critical in reducing psychotic experiences such as delusions and hallucinations.7 Yet there remains an explanatory gap between what we understand about the neurobiology of psychosis and what we understand about its subjective experience.
There have been attempts to bridge this gap,8-11 although until recently the normal function of the mesolimbic dopamine system may have been insufficiently understood to explain the psychological consequences of its dysfunction. However, recent evidence has demonstrated that dopamine neurons that extend from the tegmental midbrain to the ventral striatum code reward prediction error and thus serve as an important ‘teaching signal’ by which animals can learn about stimulus-outcome associations.12,13 Further evidence indicates that subcortical dopamine contributes causally to the attribution of incentive salience, the process by which a stimulus grabs attention and motivates goal-directed behaviour because of associations with reward or punishment.14-17 Given that theories of delusion formation emphasize the emergence of abnormal associations as the progenitors of irrational beliefs,18 this work has provided a new theoretical framework within which to consider the neurobiology of psychosis. It has been proposed that dysregulated midbrain dopamine neuron firing could result in an individual maladaptively attributing importance to innocuous stimuli or events, that is experiencing abnormal referential ideas.10,11,19,20 At present, this conceptualization of psychosis remains largely theoretical, yet it implies a number of predictions that can be tested empirically. In particular, it predicts that patients with psychosis would show impaired ability to distinguish, both in terms of their neurophysiological responses in the midbrain and ventral striatum and in their overt behaviour, between stimuli that high and low in motivational salience.
To determine whether psychotic experiences occur in the context of dysfunction of the dopaminergic midbrain, and to establish a link between psychotic experiences, the mesolimbic system and reward processing, we asked a group of patients experiencing active psychotic symptoms and a group of healthy control participants to perform an instrumental reward conditioning experiment (similar to O’Doherty et al.21). We characterized mesolimbic responses using fMRI; we applied a standard action-value learning computational model to subjects’ behavioural choices22 and used the ensuing values of reward prediction errors over the course of the experiment as individual-specific regressors in the image analysis.23 In doing so, we were able to establish the relationship between reward prediction error and mesolimbic activity in healthy and psychotic individuals. We predicted that behavioural data would demonstrate impaired ability of psychosis patients to discriminate between rewarding and neutral stimuli, and that their midbrain and ventral striatal physiological responses associated with reward prediction errors would be correspondingly disturbed.
The study was approved by the local research ethics committee. Thirteen individuals (nine men) with current positive psychotic symptoms were recruited from the Cambridge first-episode psychosis service, CAMEO. Study inclusion criteria were (1) age between 17 and 35 years and (2) current psychotic symptoms as reflected by the presence of delusions or hallucinations. Twelve healthy volunteers (nine men) were recruited as control subjects, matched in age, gender, handedness and estimated premorbid IQ as measured using the National Adult Reading Test.24 After complete description of the study to the participants, written informed consent was obtained. Telephone screening interview followed by interview in person ascertained that control subjects were without a history of psychiatric illness, physical illness, head injury, drug or alcohol dependence. Both patient and control subjects were without contraindications for fMRI scanning. Five of the 13 patients were not taking antipsychotic medication; the other 8 were taking atypical antipsychotic medication (of these 8, the median duration of treatment was 2 months, and the mean chlorpromazine equivalent dose was 181±70 mg/day,25). The mean ages were 26 years (s.d. 3 years) for both groups; mean NART scores were 116 (5) for controls, 113 (11) for patients. Twelve months following data collection a psychiatrist (GM) assigned DSM-IV diagnoses to patients using all available clinical information, including case-note review and structured clinical interview for DSM-IV: one patient met criteria for bipolar disorder, one psychosis not otherwise specified and the other eleven schizophrenia. Patients had predominantly positive symptoms compared to negative symptoms at the time of scanning; the mean score of Brief Psychiatric Rating Scale (BPRS)26 hallucinations, unusual thought content and suspiciousness was 3.9 (moderate severity), while the mean score of BPRS self-neglect, blunted affect and emotional withdrawal was 1.9 (very mild severity).
Subjects performed an instrumental learning task involving monetary gains that required choosing between two visual stimuli displayed on a computer screen, so as to maximize payoffs (see Figure 1, Supplementary Figure 1 and Participant Instructions in Supplementary material). On each trial, the participant chose one of the two stimuli on the screen, and feedback was either provided or not in a probabilistic manner. The 160 trials were divided into two trial types, randomly interspersed: reward and neutral, each involving a different pair of stimuli. The reward stimulus pair was potentially associated with rewarding feedback (20 pence or no feedback), whereas the neutral stimulus pair was associated with no financial outcomes (there would either be feedback of a neutral image about the same size as a 20 pence coin or no feedback). The feedback was probabilistic: each trial type had a high probability stimulus (which gave feedback on 60% of occasions) and a low probability stimulus (feedback on 30% of occasions). Therefore, to win money participants had to learn, by trial and error, to select the stimulus that was more likely to produce a reward (see participant instructions, Supplementary material). Participants were not explicitly informed that one pair of stimuli signalled the potential for a reward, and that the other signalled the potential for neutral feedback; rather they learnt this over the course of the experiment. Participants were also unaware of the fact that on any given trial, the probability of their receiving feedback if they chose the high probability stimulus (60%) was independent of the probability of their receiving feedback if they chose the low probability stimulus (30%). Stimuli were variously coloured blocks; the relationship of a given block to feedback was counterbalanced across subjects. Stimulus selection was by button press (left or right). Participants were informed that any money they won in the experiment would be paid to them in cash at the end of the experiment.
A mixed model analysis of variance (ANOVA) was used to assess effects of Valence (Reward or Neutral), and Diagnosis (Psychosis or Control) on the proportion of high-probability stimuli selected (after arcsine transformations to enable parametric analysis). Previous studies have indicated that, on trials where there is a potential for reward, reaction times are faster than in trials where there will be no reward,21,23,27 reflecting increased motivation to obtain rewards. We therefore performed a further ANOVA, this time using mean reaction time as the dependent variable.
A psychiatrist (GM) interviewed participants directly following the scanning session, and rated psychopathology on the Brief Psychiatric Rating Scale.26 To approximate the value placed on the reward by participants, we asked participants to rate the amount of money they earned on a scale of 1-5 as an amount in relation to the amount of time spent, and on a separate scale as an absolute amount (also 1-5). These scores were then summed to create an overall value measure. In addition, we asked, using a visual analogue scale: ‘if you see 20 pence lying on the street, how likely are you to pick it up?’
We fitted a standard reinforcement learning algorithm to each subject’s sequence of choices. We used a basic Q learning algorithm, which has been shown previously to offer a good account of instrumental choice in both humans and primates.22 For each pair of stimuli A and B, the model estimates the expected values of choosing A(Qa) and choosing B(Qb), on the basis of individual sequences of choices and outcomes. This value, termed a Q value, is essentially the expected reward obtained by taking that particular action. These Q values were set at zero before learning, and after every trial t > 0 the value of the chosen stimulus (say A) was updated according to the rule
The prediction error was
where R(t) is defined as the reinforcement obtained as an outcome of choosing A at trial t. In other words, the prediction error δ (t) is the difference between the expected outcome (that is, Q(t)) and the actual outcome (that is, R(t)). The reinforcement magnitude R was +1 for feedback and 0 for ‘nothing’ outcomes. Given the Q values, the associated probability of selecting each action was estimated by implementing the softmax rule, for example, for choosing A,
This is a standard stochastic decision rule that calculates the probability of taking one of a set of actions according to their associated values. The constants α (learning rate) and β (temperature) were adjusted to maximize the probability (or likelihood) of the actual choices under the model. To compare the accuracy of fit between diagnoses and conditions, we used negative log likelihood, which can be summed across trials, sessions and subjects. The learning model was fitted with a single set of parameters across all subjects in both groups, since for our imaging analysis we test the null hypothesis that there is no difference between groups.23 It was then used to create a statistical regressor corresponding to the modelled outcome prediction error in the imaging data. For additional (purely behavioural) analysis, we estimated the model parameters α and β for each individual participant, and tested whether these differed across groups.
A Bruker MedSpec 30/100 (Ettlingen, Germany) operating at 3T was used to collect imaging data. Gradient-echo echo planar T2*-weighted echo planar images depicting BOLD contrast were acquired from 21 non-contiguous near axial planes: TR = 1.1 s, TE = 27.5 ms, flip angle = 66°, in-plane resolution = 3.1 × 3.1 mm, matrix size 64 × 64, field of view 20 × 20 cm, bandwidth 100 kHz. A total of 750 volumes per subject were acquired (21 slices each of 4 mm thickness, interslice gap 1 mm). The first six volumes were discarded to allow for T1 equilibration effects. fMRI data were analysed using statistical parametric mapping in the SPM2 programme (Wellcome Department of Cognitive Neurology, London, UK). Images were realigned, spatially normalized to a standard template and spatially smoothed with a Gaussian kernel (6 mm at full-width half-maximum). The time series in each session were high-pass filtered (to a maximum of 1/120 Hz) and serial autocorrelations were estimated using an AR(1) model.
We used a single statistical linear regression model for all our analyses as follows. Each trial was modelled as a delta function set at the time of the feedback display. Separate regressors were created for reward and neutral trials. Prediction errors generated by the Q learning model were then used as parametric modulators of these regressors. All regressors of interest were convolved with a canonical haemodynamic response function with a temporal derivative.28 Linear contrasts of regression coefficients were computed at the individual subject level and then taken to a group level random effects analysis of variance. We carried out the following contrasts:
We performed these analyses in an a priori hypothesized region of interest, and in the whole brain. Significance level for activation was set at a FDR of P < 0.05.29 For the a priori region of interest, activations were considered significant at P < 0.05 corrected using appropriate small volume corrections for the location of predicted peaks. The region of interest comprised the union of a midbrain and ventral striatal region (see Figure 3D). The midbrain region was a sphere of radius 15 mm centred at MNI coordinates 0, -15, -9 [x, y, z], and encompassed the entire midbrain, including substantia nigra, ventral tegmental area (VTA) and other structures.30 The ventral striatal region was hand drawn in MRIcro31 following the definition of ventral striatum by Laruelle et al.32 For the whole brain analyses, in addition to the FDR threshold of P < 0.05, we stipulated a further threshold of cluster size greater than 100 voxels. We have also reported results at lower thresholds in Supplementary Tables.
The ANOVA of behavioural choice showed a significant main effect of Valence: subjects chose the high probability stimulus more frequently on reward trials than neutral trials (F(1,23) = 22. 2, P < 0.001, see Figure 2a). While controls chose the high probability stimulus on reward trails more frequently than patients, this difference was not significant: there was no significant main effect of Diagnosis (F(1,23) = 1.04, P = 0.3) or Diagnosis by Valence Interaction (F(1,23) = 1.6, P = 0.22). The ANOVA of response latency also confirmed a significant effect of Valence (F(1,23) = 41, P < 0.001) with faster reaction times on reward trials than on neutral trials (see Figure 2b). In addition, there was a significant Diagnosis by Valence interaction (F(1,23) = 7.1, P = 0.014), as the difference between reward and neutral trials was less in patients compared to controls (t(23) = 2.6, P = 0.014), and the patients were significantly faster than controls on the neutral trials (t(23) = 3.3, P = 0.003). Response latencies stratified by high/low probability stimulus choice for each group are presented in Supplementary Figures 2 (reward trials) and 3 (neutral trials).
Patients and controls did not differ on financial ratings (P = 0.32 on visual analogue rating of likelihood of picking up a 20 pence coin in the street, P = 0.11 on experiment earning rating). When the computational model constants α (learning rate) and β (temperature) were adjusted to maximize the probability (or likelihood) of the actual choices under the model, we found α = 0.04, and β = 0.2 (see Supplementary Figure 4). There was no significant difference between patients and controls in goodness of fit of the computational model to behavioural choices (t(23) = 1.4, P = 0.17).
In additional analysis of behavioural data, we estimated individual α and β parameters for each participant (Supplementary Figures 5, 6); these did not differ significantly across groups (α: Mann-Whitney U = 77, P = 0.96; β: Mann-Whitney U = 54, P = 0.15).
When both groups were analysed together, reward prediction error was associated with increased activity, compared to neutral prediction error, in the ventral striatum on whole brain analysis (P < 0.000001 uncorrected) and in the ventral striatum and midbrain on region of interest analysis (P < 0.05 FDR-corrected). See Table 1, Figure 3 and Supplementary Table 1.
In the control subjects, reward prediction error was associated with activity in the midbrain, approximately localized to ventral tegmental and substantia nigra areas of dopamine neuron origin, in addition to several target regions of dopamine neuron output: the striatum, cingulate and temporal cortex (see Table 2, Supplementary Figure 7).
In the psychosis patient group, no reward prediction error activations survived correction for multiple comparison. However, at a reduced threshold (P < 0.005, uncorrected), we observed a small cluster of 12 voxels in the ventral striatum and 11 voxels in the anterior cingulate cortex that were active in the patient group for the contrast of prediction error: reward versus neutral (see Supplementary Table 2).
There were significant differences between cases and controls in bilateral midbrain and right ventral striatum (Z = 2.76 at 22, 20, -10 [x, y, z]) on region of interest analysis (Figures 4a and b). The differing midbrain activations between the two groups were driven by a combination of attenuated response to reward prediction error in psychosis together with an augmented response to neutral prediction error in psychosis (see Figure 4c, and Supplementary Figure 9). In addition, on whole brain analysis there were case-control differences in bilateral midbrain and a number of limbic regions including hippocampus, insula and cingulate cortex in addition to putamen and ventral pallidum (P < 0.05, FDR-corrected. Table 3, Figure 4d, Supplementary Figure 8). The statistics we present are from two-tailed tests (that is, greater activity in patients compared to controls or controls compared to patients), but we note there were no regions with greater activation for these contrasts in psychosis.
To exclude the possibility that the difference between patients and controls were secondary to medication effects, we repeated the case-control comparison with the medicated patients excluded. There were still significant differences between cases and controls in bilateral midbrain on region of interest analysis, even after adjustment for multiple comparisons (Z = 4.64 at -8, -20, -6 [x, y, z]; Z = 3.37 at 12, -22, -4 [x, y, z]). In the patients who were taking medication, there was no relationship between brain reward prediction errors and medication dose (chlorpromazine equivalents), either in the whole brain analysis or region of interest at the relaxed threshold of P = 0.1 (FDR-corrected).
Having established that midbrain group differences were not secondary to medication, we went on to test whether the group differences were solely driven by unmedicated patients by comparing controls against patients taking antipsychotics. On whole brain analysis, there were still bilateral midbrain significant differences, robust to correction for multiple comparison, in addition to differences in various limbic regions (see Supplementary Table 3).
Having established group differences in midbrain activation between groups, we went on to examine whether, within patients, the fMRI midbrain parameter estimates correlated with the level of psychotic symptoms. There was no significant correlation (r = -0.23, P = 0.5).
Our findings demonstrate abnormal responses to reward prediction error in the midbrain and key target regions (striatum, hippocampus, cingulate, insula) in patients with psychosis. They provide direct empirical support for a model of psychosis, which invokes abnormal dopamine-dependent motivational salience as a key underlying disturbance. While patients successfully learnt the required contingencies, suggesting that their abnormal brain responses were not secondary to impaired task performance, these disrupted neural responses were accompanied by significant behavioural differences, notably, a tendency to show rapid reaction times even to stimuli that predicted neutral feedback. Previous reinforcement learning experiments using paradigms similar to ours have reported faster reaction times in response to rewarding stimuli than neutral stimuli: this phenomenon has been termed ‘reinforcement related speeding’.21,23,27 Such reinforcement related speeding is attributed to the anticipation of a potential reward on such trials leading to enhanced motivation and hence faster responding. In our study, both patients and controls were significantly faster on reward trials than neutral trials, in accordance with previous data, but the difference between latencies on reward and neutral trials was attenuated in patients. Patients were significantly faster than controls on neutral trials, consistent with the theory that they found such trials inappropriately motivationally significant. It is not unprecedented that psychosis patients perform rapidly on cognitive tests—it has been previously been shown that deluded patients are faster than controls when making decisions during probabilistic reasoning tasks.33
Our results suggest that, at the behavioural level, psychotic patients are failing to make the distinction between events that are motivationally salient (that is, in this case, signalling a potential for reward) and those that are not. This maladaptive behaviour is consistent with their abnormal midbrain activations. Here, patients failed to show the normal differential response to rewarding and neutral prediction error related activity. In controls, the distinction was reflected in the responses to a number of regions—midbrain, striatum, cingulate, insula—that have been previously implicated in reward processing in both human30,34,35 and animal studies.13 Furthermore, reward processing/reward prediction error are mediated by dopamine in both humans23,36,37 and animals.38 We suggest that the midbrain activations in controls, and its aberration in individuals with psychosis, is related to dopamine activity, though we acknowledge that this experimental design only provides indirect evidence in this regard.
While the results from the neuroimaging analysis show very striking differences between groups, the behavioural differences were more subtle; this may reflect the increased sensitivity of functional MRI compared with behavioural analysis. In fact, controls chose the high probability stimulus more often than patients (this difference was not statistically significant). Perhaps, on a more difficult reward learning test, there would have been more pronounced behavioural differences between groups in choice behaviour; this area demands further empirical investigation in future studies.
Some of the patients were taking atypical antipsychotic dopamine receptor anatagonist medication. However, there are several reasons why the group differences we observed are unlikely to be secondary to medication: the midbrain VTA/substantia nigra group differences remained significant when the analysis was restricted to unmedicated patients; our analysis did not reveal any effect of medication on brain activity in patients taking antipsychotics, and a previous study by Juckel and colleagues39 provided evidence that atypical antipsychotics, rather than inducing abnormal brain responses, in fact normalize physiological responses to reward expectation in schizophrenia.
Although several previous authors have hypothesized that dysfunctional dopamine-mediated reinforcement processing is implicated in the pathology of psychotic illnesses,10,11,19,40-43 few empirical studies have addressed the issue. To our knowledge, this is the first study to examine brain reward prediction error in any psychiatric or neurological disorder. In a reward anticipation task that robustly elicits ventral striatal signal change, patients with schizophrenia displayed abnormal ventral striatal activation compared with controls, though this study did not study learning or examine prediction error.44 Previous behavioural studies have demonstrated disturbances in the classic dopamine-dependent associative learning processes of Kamin blocking and latent inhibition in early psychosis.45 More recent evidence for a model of disrupted error-dependent learning in psychosis comes from Corlett and colleagues,46 who showed that right prefrontal prediction error signal during causal learning predicts subsequent vulnerability to the psychotogenic effects of ketamine in healthy volunteers. Our study provides subtle behavioural and more prominent physiological evidence of reinforcement learning abnormality in psychosis, a psychological process that, it is theorised, is important in both the positive and negative symptoms in schizophrenia and other psychotic disorders.
Graham Murray was supported by a Department of Health Research Capacity Development Award. Paul Fletcher is supported by the Wellcome Trust. The work was completed within the University of Cambridge Behavioural and Clinical Neuroscience Institute, supported by a joint award from the Wellcome Trust and Medical Research Council. CAMEO received pump priming funding from the Stanley Medical Research Institute and GlaxoSmithKline, and now receives support from the UK National Health Service. We are grateful to staff from CAMEO and the Wolfson Brain Imaging Centre for their help with recruitment and data collection, and to the participants.
Supplementary Information accompanies the paper on the Molecular Psychiatry website (http://www.nature.com/mp)