|Home | About | Journals | Submit | Contact Us | Français|
Prediction about outcomes constitutes a basic mechanism underlying informed economic decision making. A stimulus constitutes a reward predictor when it provides more information about the reward than the environmental background. Reward prediction can be manipulated in two ways, by varying the reward paired with the stimulus, as done traditionally in neurophysiological studies, and by varying the background reward while holding stimulus-reward pairing constant. Neuronal mechanisms involved in reward prediction should also be sensitive to changes in background reward independently of stimulus-reward pairing. We tested this assumption on a major brain structure involved in reward processing, the central and basolateral amygdala. In a 2 × 2 design, we examined the influence of rewarded and unrewarded backgrounds on neuronal responses to rewarded and unrewarded visual stimuli. Indeed, responses to the unchanged rewarded stimulus depended crucially on background reward in a population of amygdala neurons. Elevating background reward to the level of the rewarded stimulus extinguished these responses, and lowering background reward again reinstated the responses without changes in stimulus-reward pairing. None of these neurons responded specifically to an inhibitory stimulus predicting less reward compared with background (negative contingency). A smaller group of amygdala neurons maintained stimulus responses irrespective of background reward, possibly reflecting stimulus-reward pairing or visual sensory processes without reward prediction. Thus in being sensitive to background reward, the responses of a population of amygdala neurons to phasic stimuli appeared to follow the full criteria for excitatory reward prediction (positive contingency) rather than reflecting simple stimulus-reward pairing (contiguity).
A stimulus that conveys specific information about a reward is conventionally called a reward predictor. When there is more reward with the stimulus compared with the reward occurring irrespective of the stimulus (background), the stimulus elicits behavioral reactions and constitutes an excitatory predictor. When there is less reward with the stimulus than the background, the stimulus reduces approach behavior and constitutes an inhibitory predictor. Thus the notion of reward prediction concerns the amount of information conveyed by the stimulus that is not available without the stimulus. This information reflects the dependency, or contingency, of the reward on the stimulus (Egger and Miller 1962; Gallistel 2003). Thus it is not just the stimulus reward but the relationship between stimulus and background reward that determines the prediction.
Given that reward prediction is based on the relationship between stimulus and background reward, there are two experimental ways to vary reward prediction. The standard way is to change the reward paired with the stimulus, but one can also vary the background reward while keeping stimulus-reward pairing constant. The relationship, and hence the predictive information, changes with either variation. However, these tests allow one important distinction. Whereas variation of stimulus reward influences both stimulus-reward pairing and prediction, variation of background reward changes reward prediction without modifying stimulus-reward pairing. For example, a stimulus becomes uninformative and loses its prediction when there is as much reward during the background as with the stimulus despite unchanged stimulus-reward pairing. This is the “truly random” control procedure that relates Pavlovian reinforcer prediction to stimulus-reinforcer contingency rather than stimulus-reinforcer pairing (Rescorla 1967).
According to lesion and psychopharmacological studies, the amygdala is involved in Pavlovian reward conditioning (Baxter et al. 2000; Everitt et al. 1991, 2003; Gaffan et al. 1993; Han et al. 1997; Hatfield et al. 1996; Malkova et al. 1997; Setlow et al. 2002). Although these studies tested stimulus-reward pairings, amygdala-dependent conditioning is also sensitive to background reward (Ostlund and Balleine 2008). Single amygdala neurons in monkeys and rats respond to Pavlovian-conditioned stimuli after pairing with reward (Carelli et al. 2003; Ono et al. 1995; Paton et al. 2006; Pratt and Mizumori 1998; Schoenbaum et al. 1999; Sugase-Miyamoto and Richmond 2005; Tye and Janak 2007). However, it is unknown whether these neuronal responses were the result of stimulus-reward pairing or reflected reward prediction according to the criteria outlined in the preceding text. To address the issue, it would be helpful to test the stimulus responses by varying the background reward while keeping the stimulus reward constant. We hypothesized that the stimulus response would change opposite to variations in background reward, namely decreasing when higher background reward reduces reward prediction by the stimulus and increasing when background reward falls below stimulus reward.
In a Pavlovian task, a reward can occur in relation to the stimulus or irrespective of the stimulus in the background. We implemented this distinction by using two time epochs, a stimulus period and an interstimulus interval (background). A reward that occurs during the stimulus and not during the background depends, or is contingent, on the stimulus (Fig. 1A1), and the stimulus constitutes an “excitatory” reward predictor (B1). However, when the same reward occurs also during the background, the contingency of the reward on the stimulus is abolished (Fig. 1A2), the stimulus contains no more reward information than the background, and the reward prediction is extinguished (B2). By contrast, more reward during the background compared with the stimulus produces a negative relationship between the reward and the stimulus (Fig. 1A3). Such a stimulus predicts less reward (“inhibitory” prediction, Fig. 1B3). Thus manipulation of background reward affects the reward prediction by the stimulus despite unchanged stimulus reward.
As reward value can be adequately described by a probability distribution of reward magnitudes, both probability and magnitude determine reward value. Our main experiment varied reward probability. In a 2 × 2 design, we used two reward probabilities for the background (PB = 0.0 and PB = 0.9) and two reward probabilities for the stimuli (CS+, rewarded, PS = 0.9, and CS−, unrewarded, PS = 0.0). All reward magnitudes were constant 0.4 ml. Increasing background reward probability from PB = 0.0 to PB = 0.9 extinguished the reward prediction of the stimulus (CS+) while making the unrewarded stimulus (CS−) an inhibitory predictor (Fig. 1, C1 and C1′). The scheme of Fig. 1C1 represents the “truly random” control procedure of Rescorla (1967). Decreasing background reward probability again reinstated the reward-predicting function of the CS+ and extinguished the inhibitory prediction of the CS− (Fig. 1, C2 and C2′).
In a separate test, we reduced reward magnitude during the background from 0.4 to 0.2 ml while keeping stimulus reward constant at 0.4 ml (Fig. 1D). All reward probabilities were constant P = 0.9. This manipulation increased the reward contingency on the stimulus and made the stimulus a positive reward predictor.
Two adult male rhesus monkeys (Macaca mulatta) weighing 4.4 and 6.7 kg, respectively, participated in the experiment. All animal procedures conformed to U.S. National Institutes of Health Guidelines and were approved by Project License and Personal Licenses from the Home Office of the United Kingdom.
Each behavioral trial consisted of a fixed visual stimulus period of 2.0 s and a fixed interstimulus background period of 4.0 s (Fig. 1E). The task required the animal to maintain its hand on an immobile, touch-sensitive key that was placed conveniently in front of it to allow effortless contact throughout entire trial blocks. Each trial started with an ocular fixation spot that consisted of a small red dot of 1.3° of visual angle shown at the center of the computer monitor. The fixation spot did not provide specific reward information, was part of the background, and was considered an event of no interest. An infrared optical system tracked eye position with 5-ms resolution (Iscan). At 1,150 ms plus mean of 500 ms (truncated exponential distribution) after fixation spot onset, one of two specific visual stimuli of 7.0° appeared for 2.0 s at the center of the computer monitor. The two stimuli were associated with reward probabilities of P = 0.0 (unrewarded stimulus, CS−) and P = 0.9 (rewarded stimulus, CS+), respectively. Animals kept their gaze on the fixation spot at stimulus center within 2–4°. Apart from the fixation spot, the only visual stimulation during the background consisted of the uniformly gray surface of the computer monitor, which was identical in all trial types irrespective of stimulus and background reward. Given the fixed background duration of 4.0 s, each fixation spot appeared at mean 2,350 ms after offset of the stimulus of the preceding trial.
Failure of key touch, or fixation break during fixation or stimulus periods, was considered an error and resulted in cancellation of the trial and repetition of the same trial type. More than three sequential errors led to a pause in behavioral testing.
The two stimuli (CS+ and CS−) alternated pseudorandomly between trials. The two differently rewarded backgrounds were used in separate blocks of 20–50 trials. Task breaks of 60 s signaled block transitions to the animal. We counterbalanced the first trial block with each neuron between low and high background reward. We repeated the first after the second trial block to study the reversible nature of background reward influence.
The conditioning of the two explicit reward-predicting stimuli (CS+ and CS−) should be considered as Pavlovian, as reward occurred irrespective of any specific behavioral reaction to this stimulus. Ocular fixation and key touch constituted operants that were not of interest for the study of reward-predictive stimuli. The operant licking response was required for the consumption but not the occurrence of reward.
The collection of reproducible electrophysiological data from many individual neurons in a small number of monkeys required standardized testing during stable and reproducible behavioral performance. We trained each animal during 3–4 mo prior to neuronal recordings with the two stimuli and the different background reward probabilities and magnitudes (300–400 trial/day, 5 day/wk). The animals were overtrained at the time of neuronal recordings and showed no behavioral signs of further learning.
The repeated changes in background reward probabilities constituted switches between acquisition and extinction of reward prediction in the otherwise stable and well-established task. Thus the occurrence or absence of background reward before the first stimuli in a new trial block signaled the background reward situation for the whole block.
A computer-controlled solenoid valve delivered juice reward from a spout in front of the animal's mouth (valve opening time of 120 ms, corresponding to 0.4 ml). For the magnitude test, a valve opening time of 60 ms resulted in 0.2 ml juice during the background. The animal's tongue interrupted an infrared light beam below the adequately positioned spout. An optosensor monitored licking behavior with 0.5-ms resolution (STM Sensor Technology), and the summed durations of beam interruptions during specific trials and task periods provided a measure of licking.
Stated reward probabilities refer to mean frequency of reward per 2-s period during the stimulus (1 period) or background (2 periods). Thus a reward probability of P = 0.9 resulted in an average of 0.9 reward/2.0 s of stimulus and, respectively, 1.8 reward/4.0 s of background. These probabilities are the conditional probabilities P (R|S) and P (R|B) and are stated as PS and PB; R for reward, S for stimulus, B for background.
Pavlovian tasks often deliver reward at stimulus offset. However, time points of reward delivery should be comparable between stimuli and backgrounds, and similar fixed reward delivery times during the background might produce temporal confounds. Therefore we delivered reward at pseudorandom times during rewarded stimulus (PS = 0.9) and background (PB = 0.9) periods, aiming for flat hazard rates to obtain minimal variations in temporal reward prediction. Hazard rate is defined as the conditional probability of event occurrence given that the event has not yet occurred (Luce 1986). To calculate the time of reward delivery within intervals of 2.0 s, the computer chose at every time step of 50 ms an equally probable random number between 1 and 40 and marked that time step when number 1 occurred. We applied the reward probability schedule (P = 0.9) for that 2.0-s period and delivered one unit of reward at the marked time step if so assigned. The 4.0-s background period was split into two segments of 2.0 s for which reward occurrence was determined separately, thus reducing demotivating unrewarded stretches with rewarded background. The delivered number of rewards matched the number determined by the probability within 10 trials for both stimulus and background periods. To further assure cooperation by the animal, we avoided large variations by recalculating reward occurrences for 2.0-s periods producing more than three rewards. Thus the actual hazard function approximated the intended function but was not necessarily identical to it. In setting the hazard rate, we did not correct for the animals' potentially deviated subjective perception of hazard rate due to inferred, time weighted temporal uncertainty (Janssen and Shadlen 2005). Thus for rewarded stimulus (PS = 0.9) or background (PB = 0.9) periods, the occurrence of reward was predicted by the presence or absence of the stimulus, respectively, whereas the precise time of reward delivery during these periods was based on a uniform and very low temporal prediction (P = 0.025/50 ms).
The occurrence of the conditioned stimulus and, separately, the pseudorandomly timed, temporally unpredicted reward could generate a reward-prediction error. To assess whether a neuronal response might reflect such a prediction error would require comparison with a fully predicted reward that would not generate a prediction error. We therefore used a control task in separate trial blocks in which a specific visual stimulus predicted a reward of fixed magnitude with a probability of P = 1.0 at the end of a fixed stimulus-reward interval of 2.0 s. We applied this task on neurons suspected to respond to the pseudorandomly timed reward during the stimulus or background in the main task. A prediction error coding neuron would respond to the pseudorandomly timed reward but not be affected by the fully predicted reward at stimulus end in the control task. In a variation of the control task, a different stimulus predicted three reward magnitudes of 0.23, 0.36, and 0.56 ml with equal probability of P = 1/3, the stimulus being identical for the three magnitudes.
A head holder and a recording chamber were fixed to the skull under general anesthesia and aseptic conditions. Before neuronal recordings, we located the amygdala from bone marks on coronal and sagittal radiographs taken with a guide cannula and electrode inserted at a known coordinate in reference to the stereotaxically implanted chamber. The anteroposterior position of the amygdala was between the sphenoid bone (rostral) and the posterior clinoid process at and above the dorsoventral position of the posterior clinoid (Aggleton and Passingham 1981). We recorded activity from single amygdala neurons from extracellular positions during task performance, using standard electrophysiological techniques including on-line visualization and threshold discrimination of neuronal impulses on oscilloscopes. We recorded from one neuron at a time; this permitted varied exploratory tests during early experimental phases and specific control tests tailored to the response properties of individual neurons. We aimed to record representative neuronal samples from the central, lateral, and basolateral amygdala nuclei. Our recording tracks followed the vertical stereotaxic direction and started in the central nucleus but did not always reach the bottom of the basolateral and lateral nuclei, which precluded specific statistical analysis of neuronal locations.
After completion of data collection, recording sites were marked with small electrolytic lesions (15–20 μA × 20–60 s). The first animal received an overdose of pentobarbital sodium (90 mg/kg iv) and was perfused with 4% paraformaldehyde in 0.1 M phosphate buffer through the left ventricle of the heart. Recording positions were reconstructed from 50-μm-thick, stereotaxically oriented coronal brain sections and stained with cresyl violet. The histological reconstructions validated also the radiographically assessed anatomical position of the amygdala in agreement with the earlier report (Aggleton and Passingham 1981). As histological reconstruction was not available for the second animal for reasons of ongoing recordings, we reconstructed its approximated recording positions from the radiographic images. We collapsed recording sites from both monkeys spanning 3 mm in the anterior-posterior dimension onto the same coronal section.
Animals performed ≥10 trials of each type during neuronal recordings. We analyzed licking behavior and neuronal responses only in correct trials, the minimum for analysis being eight trials. We counted neuronal impulses in each neuron relative to the different task events with standard time windows that were fixed across all neurons and trial types. We employed the Wilcoxon test to compare activity in the standard time windows with control activity during 1.0 s immediately preceding onset of the conditioned stimulus in each neuron (P < 0.01). In the rare cases of responses to the fixation spot, the control period was the 1.0 s preceding this spot. Task-related neurons showed significant activity changes in reference to at least one task event.
Our final analysis focused on the stimulus responses. Its standard time window extended from 100 to 400 ms following stimulus onset except in a few neurons in which an obviously later response peak required a later 300-ms window. We calculated percent changes of activity during the 300-ms poststimulus period relative to the 1.0-s control period in single neurons and neuronal populations. We compared responses between the rewarded and unrewarded stimulus and between the rewarded and unrewarded background, including their changes after background reward changes, with two-way ANOVA, one-way ANOVA, Wilcoxon test, and Mann-Whitney test. Analysis of background activity used these tests on average activity during the 1.0-s period immediately preceding the stimulus.
We measured the durations of stimulus responses for Fig. 3 with a sliding time window procedure (Schultz and Romo 1992) on neurons showing significant stimulus responses in the Wilcoxon test. The procedure employed the Wilcoxon test (P < 0.01) between the standard 1.0-s control period and a 100-ms time window that started at the standard test period (see preceding text) and was moved in steps of 100 ms through the stimulus period until significance was lost in two consecutive windows.
To quantitatively assess differences in neuronal activity, we performed receiver operating characteristic (ROC) analysis to calculate the probability with which an ideal observer would distinguish two different neuronal response distributions (Britten et al. 1996; Green and Swets 1966). We measured in each trial the neuronal activity in impulses/s and established the distributions of the numbers of trials with specific activity. We then transformed the two distributions to be compared into probability distributions, represented them against each other as an x-y two-dimensional plot, calculated the area under the curve and expressed it as P value, which reflected the probability of discrimination in the interval of P = 0.5 (chance = complete overlap of distributions) and P = 1.0 (perfect discrimination = no overlap). When differences in depressant responses resulted in values of P < 0.5, we transformed them by 1.0-P to obtain P > 0.5 for averaging. We used a two-tailed permutation test with 1,000 iterations to define statistical significance as the probability of the original ROC value (area under the curve) being below or above a given percentile of the probability distribution of shuffled ROCs. For example, a permutation test result of P < 0.01 indicated an ROC value <0.5% or >99.5% of the distribution of 1,000 shuffled ROCs.
Anticipatory licking suggested significant discrimination between the two differently rewarded stimuli (Fig. 2, A, left vs. right, and B, 4 leftmost columns). Note that all licking durations were measured only in the subset of trials in which the trial type or probabilistic reward schedule produced no rewards; this measure of licking did not capture reactions to the rewards but assessed reward anticipation. Furthermore, animals licked very rarely during trial blocks with low background reward probability (PB = 0.0) and significantly more during high background reward (PB = 0.9; Fig. 2, A, top vs. bottom, and B, 4 rightmost columns). Animals detected changes in background reward immediately; licking reached its final level already after the first trial (Fig. 2C). Licking during backgrounds varied insignificantly between trials using differently rewarded stimuli (Fig. 2, A left vs. right, and B, 2 rightmost shaded columns). Anticipatory licking was similar when stimulus and background reward probabilities were the same (Fig. 2, B and D, ratio of 0.5), indicating that licking reflected reward probability irrespective of stimulus presence. These data suggest that the stimuli failed to evoke differential reward predictions when reward rates were equal during stimuli and backgrounds (although reward predictions apparently differed between the two differently rewarded stimuli).
Behavioral errors consisted of breaks of ocular fixation, almost always during the stimuli and very rarely during the fixation spot period preceding the stimuli. Such errors occurred during the rewarded stimulus set against unrewarded (10%) or rewarded background (15%) and during the unrewarded stimulus set against unrewarded (27%) or rewarded background (17%). Although the errors were most common in entirely unrewarded trials (27%) and least frequent during the rewarded stimulus set against unrewarded background (10%), the variations in errors were insignificant (P > 0.05; χ2 = 2.926).
We tested 850 radiographically and histologically located amygdala neurons (Aggleton and Passingham 1981) (Fig. 2E) with unrewarded (PS = 0.0) and rewarded (PS = 0.9) stimuli set against unrewarded (PB = 0.0) backgrounds. A total of 373 amygdala neurons showed statistically significant activity changes in relation to at least one task event. They were located in the central nucleus (97 neurons) or the basolateral and lateral nuclei (276 neurons) of the amygdala. Of the 373 task related neurons, 140 showed stimulus responses. To explore all possible background influences, we investigated responses that were significantly higher to the rewarded stimulus (PS = 0.9; 89 neurons; P < 0.01, Mann-Whitney test), higher to the unrewarded stimulus (PS = 0.9; 16 neurons), or differed insignificantly between the two stimuli (35 neurons). These responses were similar to the stimulus responses reported before (Paton et al. 2006; Sugase-Miyamoto and Richmond 2005). The remaining 233 of the 373 neurons showed task relationships of no interest for the present study, such as responses to the fixation spot or to reward delivery, or flat tonic activations during the stimuli or background.
We investigated the influence of background reward on the 140 amygdala neurons showing stimulus responses. Using a cutoff threshold of P < 0.05 (2-way ANOVA), we identified 75 of these 140 neurons (54%) in which changes in background reward probability influenced the stimulus responses. These 75 neurons were the principal subjects of this report. Their responses were significantly higher to the rewarded compared with the unrewarded stimulus (68 neurons) or differed insignificantly between the two stimuli (7 neurons); none of them showed higher responses to the unrewarded compared with the rewarded stimulus. The 75 neurons were located in the central nucleus (n = 16 neurons) and in the basolateral (n = 32) and lateral nuclei (n = 27) of the amygdala (Fig. 2E). Their responses were predominantly phasic and showed a median duration of 450 ms (Fig. 3). Almost all responses lasted less than the stimulus (2,000 ms), and the few responses that lasted during the whole stimulus duration showed early phasic peaks that were followed by lower sustained activity.
The experiment tested the influence of background reward in three steps. In the conventional test for (excitatory) reward prediction, reward occurred only during the stimulus, following the schemes of Fig. 1, A1 and B1. The rewarded stimulus (CS+, PS = 0.9) set against the unrewarded background (PB = 0.0) induced activations (71 neurons) or depressions (4 neurons; Fig. 4A, left top). Then we elevated reward probability during the background to the level during the stimulus (PB = PS = 0.9; Fig. 1C1), thus extinguishing the (excitatory) reward prediction of the stimulus according to the schemes of Fig. 1, A2 and B2. The 75 neurons lost entirely their stimulus responses to a level insignificantly different from prestimulus control (P > 0.1 Wilcoxon test; Fig. 4A, left middle). Finally, we lowered the background reward back to PB = 0.0 (Fig. 1C2), thus reinstating the (excitatory) reward prediction of the stimulus according to the schemes of Fig. 1, A1 and B1. The stimulus response regained the previous level (Fig. 4, A, left bottom), whereas background activity remained unchanged (P > 0.1). Thus these amygdala responses to the rewarded stimulus were substantially affected by background reward.
The unrewarded stimulus (CS−, PS = 0.0) evoked a significantly lower response compared with the rewarded stimulus, or completely failed to evoke a response, in 68 of the 75 neurons (Fig. 4A, right top). Delivering reward during the background (PB = 0.9; Fig. 1, C1′ and C2′) made the unrewarded stimulus an inhibitory predictor according to the schemes of Fig. 1, A3 and B3 (note that background reward did not continue during the unrewarded stimulus). However, despite extensive and systematic testing, we failed to find statistically significant depressions or activations that were specifically related to this inhibitory prediction in any of the 75 neurons or in the remaining 65 of the 140 neurons with stimulus responses (Fig. 4A, right middle). Reducing background reward again (PB = 0.0) failed to induce any changes in the stimulus response (Fig. 4A, right bottom). Thus we failed to find responses reflecting inhibitory predictions in our sample of amygdala neurons. Consequently, the response sensitivity of amygdala neurons to background reward seemed to be restricted to excitatory reward prediction.
The 75 neurons had been identified by their significant variations in stimulus responses between rewarded and unrewarded stimuli and across the three background reward situations. A two-way ANOVA for the example neuron of Fig. 4A showed rejection of the null hypothesis with P < 0.0001 for both factors [factor 1: left vs. right: F(1,66) = 55.106; factor 2: top vs. middle vs. bottom: F(2,66) = 16.739]. Post hoc analysis located the differences in Fig. 4A at top: left versus right, bottom: left versus right, left: top versus middle, and left: middle versus bottom (P < 0.001, Tukey test). The response to the rewarded stimulus varied insignificantly before versus after the increase of background reward (P > 0.05; Fig. 4A, left: top vs. bottom), suggesting complete reinstatement of stimulus response after reduction of background reward.
Next we measured the differences of the responses to the rewarded stimulus between the two different background rewards (Fig. 4A left, top vs. middle) and compared them with the response differences between rewarded and unrewarded stimuli set against unrewarded background (Fig. 4, A top, left vs. right). We found these differences to be statistically indistinguishable in all 75 neurons (P > 0.4; Mann-Whitney test). This result indicates that the influence of background reward on stimulus responses was comparable to the influence of stimulus reward.
The observed influences of background reward probability may have been due to nonspecific effects of probabilistically occurring events rather than changes in reward value. Reward value can be defined by a probability distribution of reward magnitudes. To distinguish between reward value and a potential nonspecific probability confound, we varied background reward magnitude separately from probability in 13 of the 75 neurons with background reward probability sensitive stimulus responses (Fig. 4B, left). We aimed to reinstate the (excitatory) reward prediction (Fig. 1B1) by reducing background reward magnitude to 0.2 ml instead of 0.4 ml (Fig. 1D) while keeping background reward probability constant at PB = 0.9. The reduction in background reward magnitude increased the response to the rewarded stimulus in all 13 neurons (Fig. 4B, right), replicating the influence of lowered background reward probability (Fig. 4A, left bottom). The similar effects of reward magnitude and probability suggest that background reward influenced stimulus responses via changes in reward value rather than through nonspecific effects of probabilistically occurring events.
Averaged population responses confirmed the substantial influence of background reward probability seen with individual neurons. In particular, the changes in background reward affected the average response to the rewarded stimulus when background reward was elevated to the level of stimulus reward (Fig. 5,left, top vs. middle). The effect was reversible when background reward was lowered again (Fig. 5, left, middle vs. bottom). Elevating background to PB = 0.9 failed to induce a response to the unrewarded stimulus indicative of inhibitory prediction (Fig. 5, right, middle).
The stimulus responses shown in Fig. 5 differed significantly between rewarded and unrewarded stimuli (left vs. right) and across the three background reward situations (top vs. middle vs. bottom) in a two-way ANOVA [factor 1: left vs. right: P < 0.0001, F(1,420) = 93.498; factor 2: top vs. middle vs. bottom: P < 0.004, F(2,420) = 5.818]. Post hoc analysis located the differences in Fig. 5 at top: left versus right, bottom: left versus right, left: top versus middle, and left: middle versus bottom (at P < 0.001, Tukey test). By contrast, the response difference in Fig. 5 between left: top versus bottom was insignificant (P > 0.05), confirming complete response reinstatement in the population.
Mean changes of responses to the rewarded stimulus amounted to >150% when background reward varied in probability between PB = 0.0 and PB = 0.9 [Fig. 6A; P < 0.0001, F(2,210) = 22.010, 1-way ANOVA; P < 0.001 for PB = 0.0 vs. PB = 0.9, left, and PB = 0.0 vs. PB = 0.0, right; Tukey test; P > 0.05 for PB = 0.0 left vs. right]. Similar changes were seen for variations in magnitude between 0.2 and 0.4 ml [Fig. 6B; P < 0.002, F(2,36) = 8.274, 1-way ANOVA; P < 0.05 for PB = 0.0 vs. PB = 0.9 left; Tukey test; P < 0.001 for 0.4 vs. 0.2 ml at PB = 0.9; P > 0.05 for PB = 0.0–0.4 ml, left, vs. PB = 0.9–0.2 ml, right]. The difference in response to the rewarded stimulus set against the two background reward probabilities was significant also in individual neurons (P < 0.05-P < 0.00001, 2-way ANOVA, Fig. 6C). By contrast, neuronal activity during the background varied insignificantly between different reward probabilities (P > 0.05 in all neurons).
ROC analysis suggested that the differently rewarded background resulted in good discrimination of the response to the stimulus despite its identical physically appearance (Fig. 6D, ■; mean area under the curve P = 0.81; 75 neurons; P < 0.01 in 66 of 75 neurons; 1,000 permutations). By contrast, discrimination of neuronal activity was poor during the background (Fig. 6D, ; mean P = 0.55; 75 neurons; P < 0.01 in 6 of 75 neurons).
The decrease in the response to the rewarded stimulus was already significant in the first trial after the increase in background reward probability [Fig. 7A, single black line vs. circled lines; P < 0.01, Tukey test after P < 0.0001, F(5,420) = 16.526, 1-way ANOVA]. The response remained weak during subsequent trials without further significant changes (P > 0.1; Fig. 7B). Decreasing the background reward again led to an immediate, significant increase in stimulus response, which remained high without further significant changes [Fig. 7, C and D; P < 0.01, Tukey test after P < 0.0001, F(5,420) = 8.422, 1-way ANOVA]. These results suggest that the stimulus response switches within the first or the first few trials after background reward change.
None of the 373 task-related amygdala neurons, including the 75 neurons sensitive to background reward, responded to offset of the stimulus even when this event signaled a transition from low stimulus reward to high background reward, or vice versa, irrespective of reward varying in probability or magnitude (Fig. 4). Thus the background reward did not induce own responses but exerted an influence on existing responses to phasic, newly presented stimuli.
The pseudorandom time of reward delivery during stimulus and background periods resulted in very low temporal reward prediction. Hence a substantial positive temporal prediction error occurred every time a reward was delivered. However, none of the 75 neurons the stimulus responses of which were sensitive to background reward responded to the delivery of reward during stimulus or background, including the 13 neurons tested with two different reward magnitudes (Fig. 8). Together with the poor discrimination revealed by the ROC analysis, this result suggested that the influence of background reward on stimulus responses was unlikely due simply to neuronal reward responses during the background.
Of the 140 amygdala neurons tested with different background rewards, 75 showed the background reward-sensitive stimulus responses. Of the remaining 65 neurons, 15 showed inconsistent changes, and 50 neurons showed stimulus responses that were unaffected by changes in background reward, including 13 neurons with significantly higher responses to the rewarded compared with the unrewarded stimulus (Fig. 9A; P > 0.01, Mann-Whitney test). The responses in the 50 neurons may have reflected the pairing of the stimulus with the reward (contiguity) or the known visual properties of amygdala neurons (Paton et al. 2006). Further tests would be necessary to differentiate between these possibilities.
Of the 65 neurons, 5 responded to every stimulus associated with a difference in reward probability against background (Fig. 9B, top left and bottom right). Thus the response occurred indiscriminately to an excitatory reward predictor (left) and an inhibitory predictor (right), following the schemes of Fig. 1, A1 and B1, A3 and B3. None of the 65 neurons responded predominantly to the unrewarded stimulus set against a rewarded background, thus failing to reflect specific inhibitory prediction.
Of the 65 neurons, 28 responded also to the pseudorandomly timed delivery of reward in the main task. To demonstrate potential prediction error coding, we tested these neurons further in the control task with the fully predicted reward delivered at the end of the fixed stimulus-reward interval. Of the 28 neurons, 14 failed to respond to the fully predicted reward (Fig. 9C), suggesting temporal reward-prediction error coding. The remaining 14 of the 28 neurons responded to the predicted reward. These 14 prediction insensitive neurons were subsequently tested with three reward magnitudes, and 7 of them showed significance with single linear regression (P < 0.01; t-test vs. 0 slope). The variations in reward magnitude response in these background reward-insensitive neurons contrasted with the lack of reward response in the background reward-sensitive neurons.
In this experiment, we changed the reward prediction of a stimulus without modifying the reward paired with that stimulus. The manipulation of the reward during the absence of the stimulus (background) affected the relationship between stimulus and background reward and thus the amount of predictive information conveyed by the stimulus relative to the background. Neuronal responses in a group of amygdala neurons followed this change in reward prediction for excitatory predictors. The stimulus response dropped when background reward probability increased to the level during the stimulus, which rendered the stimulus uninformative and extinguished the reward prediction. The response recovered when background reward decreased again and reinstated the stimulus as valid reward predictor. Similar changes occurred also with variations in reward magnitude, indicating that the expected reward value derived from a probability distribution of reward magnitudes was the crucial parameter for the background effect on stimulus responses. However, none of the responses reflected inhibitory prediction, thus restricting reward contingency in amygdala neurons to positive predictors (conditioned excitation). Furthermore the data confirm that reward-predictive amygdala responses can occur independently of visual stimulus properties. Taken together, the observed sensitivity to background reward in a group of amygdala neurons reflected the positive contrast between stimulus and background reward as the essence of excitatory reward prediction. These responses reflected reward prediction rather than simple stimulus-reward pairing and may provide a neuronal correlate for the role of amygdala in Pavlovian responding. Apparently these amygdala neurons processed information from events that were not directly paired with a stimulus but crucially affected its predictive properties.
Several control procedures helped to relate the influence of background reward to reward prediction rather than to less specific events. Background reward magnitude influenced stimulus responses in a similar way as background reward probability. Thus the neuronal sensitivity to background reward was not simply due to unspecific events such as solenoid sounds or somatosensory stimulation by the liquid. The reduction in stimulus responses by increased background reward was fully reversible when background reward dropped again and stimulus responses reappeared immediately. Thus the reduced stimulus responses were unlikely related to nonspecific loss of responsiveness or satiation of the animal due to higher background reward, although more local satiation cannot be ruled out completely.
The influence of background reward was evident with changes in both reward probability and reward magnitude. This result suggests that the common variable determining the sensitivity to background reward might be the general measure of reward value proposed by Blaise Pascal in 1854, namely the expected value of the probability distribution of reward magnitudes. Thus increasing the background reward value irrespective of particular probability-magnitude combinations would reduce reward contingency and consequently the reward-predicting stimulus responses. Indeed neurophysiological studies have shown that reward value defined as probability distribution is a viable parameter determining reward responses in prefrontal, parietal, striatal, and dopamine neurons (Cromwell and Schultz 2003; Leon and Shadlen 1999; Musallam et al. 2004; Platt and Glimcher 1999; Tobler et al. 2005). It might now be interesting to investigate whether the background contingency effect might be effective in these other reward value systems as well.
Our neurons failed to respond to stimulus offset, even when this involved a transition from low stimulus reward to high background reward. Apparently the primary function of background reward for the observed neuronal responses was to set a relationship to stimulus reward and thus determine the reward prediction by the stimulus. However, due to our focus on stimulus responses in a typical Pavlovian task, the result should not indicate that all amygdala neurons are insensitive to reward transitions between tonic reward levels.
The observed background reward-sensitive stimulus responses may be interesting in terms of amygdala state value coding described recently (Belova et al. 2008). As internal states are importantly determined by current and future reinforcers (Sutton and Barto 1998), the employed Pavlovian conditioned, reward-predicting stimuli and “primary” rewards would have been able to induce state transitions and set state values. The phasic response to the rewarded stimulus may reflect the transition from lower state value during unrewarded background to higher state value during the stimulus. Importantly for the present experiment, the observed lack of stimulus responses set against the rewarded background would reflect an absence of state transition. Thus the possible state value coding seems to adhere to the same concept of reward contingency as the underlying Pavlovian reward prediction. These neurons would code state value in particular ways, and relative to particular stimuli rather than indiscriminately; they showed only phasic responses rather than sustained activity during high stimulus or background reward states, and they failed to respond to the state transition between unrewarded stimulus and rewarded background.
According to the temporal difference (TD) reinforcement model, appearance of a reward-predicting stimulus set against low background reward may induce a “higher-order” reward-prediction error (Sutton and Barto 1998). Previous studies on amygdala neurons reported coding of “primary” reward-prediction errors (Belova et al. 2007; Sugase-Miyamoto and Richmond 2005). We confirmed such responses in the background insensitive neurons (Fig. 9C) but failed to find evidence for “primary” reward-prediction error coding in our main group of neurons (Fig. 8). Without the coding of “primary” reward-prediction errors, the sensitivity of stimulus responses to background reward may be difficult to explain by reward-prediction errors.
Neuronal responses to rewards and reward-related stimuli can, in principle, reflect their attentional rather than rewarding components (Maunsell 2004). Stimulus-driven attention would arise with the occurrence of primary rewards and conditioned reward-predicting stimuli. Our main neurons were differentially activated by the rewarded and not by the unrewarded stimulus, but they failed to respond to primary rewards. However, the pseudorandom occurrence of the “primary” reward would most likely elicit stronger attention than the regularly occurring reward-predicting stimulus. Hence these stimulus responses do not seem to reflect stimulus driven attention. Only a few background reward-insensitive neurons showed unselective stimulus responses that might reflect attention (Fig. 9B). Another form of attention arises when conditioned stimuli generate reward expectations and thus set particular attentional states. Such states might lead to sustained elevated discharge rates or to graded influences on existing responses, as shown in visual and, possibly, amygdala neurons (Goldberg and Wurtz 1972; Sugase-Miyamoto and Richmond 2005). However, our response changes were almost all phasic and showed a much more all-or-none character (Figs. 3–5). Taken together, attentional coding is difficult to rule out completely, but it unlikely constitutes the major factor underlying the sensitivity of stimulus responses to background reward.
Manipulations of background reinforcers have been used for investigating basic behavioral mechanisms underlying appetitive and aversive Pavlovian predictions (Baker 1977; Bouton et al. 1993; Delamater 1995; Dwek and Wagner 1970; Rescorla 1967–1969). The tests assessed the reinforcer prediction by the stimulus by changing the dependency, or contingency, of the reinforcer (US) on the stimulus (CS) without affecting the temporal stimulus-reinforcer pairing (contiguity). The crucial “truly random” control procedure dissociated contingency from contiguity (Rescorla 1967). It reduced contingency by increasing reinforcer probability during the background while leaving contiguity constant. The distinction holds for both aversive and appetitive reinforcers (Delamater 1995). Rescorla (1967) concluded that “the contingency between CS and US, rather than the pairing of CS and US, is the important event in conditioning.” Thus acquisition of reinforcer prediction correlates with induction of contingency rather than stimulus-reinforcer pairing. This designed dissociation between contingency and contiguity constituted the essential method in our experiment. However, rather than studying learning in this first neurophysiological contingency experiment with background reward variations, we employed the simpler and more easily interpretable switches between well-known reward probabilities and magnitudes during fully established task performance. Future work may examine whether more finely scaled background reinforcer variations in learning situations, as used by Rescorla (1968), would lead to more graded changes in stimulus responses. Taken together, similar to behavioral reinforcer prediction, our neuronal responses to reward-predicting stimuli seemed to reflect reward contingency rather than stimulus-reward pairing. In the case of the amygdala, which may not be involved in conditioned inhibition (Falls and Davis 1995), the parallelism may hold only for excitatory reward predictors.
Informational learning theories postulate that a stimulus predicts a reinforcer when it contains more information about the reinforcer than the background (Egger and Miller 1962; Gallistel 2003). For reinforcers occurring at unpredictable times during stimuli or backgrounds, the mutual information I conveyed by a stimulus varies according to I = log2 (λS/λB), with λS and λB as stimulus and background reinforcers per unit time, respectively (Balsam and Gallistel 2008; Balsam et al. 2006). Thus prediction is a function of the respective rates of reinforcer related to stimulus and background. The observed speed of stimulus response changes within a single trial followed closely the changes in background reward and suggests that rapid comparisons between stimulus and background reward took place. These data may correspond to the behavioral comparator and timing theories that explain the effects of background reward on reward prediction during established task performance (Gibbon and Balsam 1981; Miller and Schachtman 1985). Our data obtained with established task performance cannot easily address the competitive effects of prediction errors between stimulus and background during learning (Rescorla and Wagner 1972).
Our earlier blocking study on dopamine neurons used differently rewarded stimuli instead of differently rewarded backgrounds to investigate their influence on neuronal responses to reward-predicting target stimuli (Waelti et al. 2001). The target stimulus failed to elicit behavioral and neuronal responses when the simultaneously presented other stimulus predicted as much reward as the target stimulus and thus made the reward noncontingent on the target stimulus. That design tested also reward contingency, but without following technically the truly random control of Rescorla (1967). The combined results may in general suggest similar contingency effects for dopamine and amygdala neurons.
Previous neurophysiological experiments on amygdala neurons studied reward contingency by manipulating stimulus-reward pairing. The studies used differentially rewarded stimuli (Carelli et al. 2003; Ono et al. 1995; Paton et al. 2006; Pratt and Mizumori 1998), stimulus-reward reversals (Paton et al. 2006; Schoenbaum et al. 1999), and variations in reward delays (Sugase-Miyamoto and Richmond 2005). Our experiments built on these studies. After confirming amygdala responses to differentially rewarded stimuli, we manipulated the background reward to dissociate contingency from stimulus-reward pairing. Indeed the stimulus responses in our main group of amygdala neurons were sensitive to background reward despite constant stimulus-reward pairing. By contrast, stimulus responses in other amygdala neurons were insensitive to background reward. These latter responses might reflect stimulus pairing with reward (contiguity) without reward prediction or the known visual sensory processes in the amygdala (Paton et al. 2006). A distinction between these possibilities would require additional tests such as stimulus-reward reversals.
Previous lesion studies identified the role of the amygdala in Pavlovian conditioning (Hatfield et al. 1996; Malkova et al. 1997). The current neurophysiological study used concepts from classical and informational learning theory (Balsam and Gallistel 2008; Balsam et al. 2006; Rescorla 1967) to distinguish reward prediction from simple stimulus-reward pairing. The current finding of background reward-sensitive stimulus responses suggests a neuronal contingency mechanism underlying the function of the amygdala in reward prediction.
It is often assumed that neuronal responses to reward-predicting stimuli arise from influences of phasic reward “teaching” signals on neuronal responses to the external stimuli. Phasic reward signals might drive novel and reversal learning in dopamine neurons, striatum, prefrontal cortex, and amygdala (Belova et al. 2007; Histed et al. 2009; Montague et al. 1996; Schultz et al. 1993; Sugase-Miyamoto and Richmond 2005). The current background reward manipulations revealed that amygdala responses to reward-predicting stimuli depend in addition on the comparison between stimulus reward and background reward. As background rewards are likely to occur over longer time periods than stimulus rewards, the underlying neuronal computations would involve relatively slow sampling of background reward information. The information about background reward should be temporally extended and maintained until the stimulus reward occurs and comparisons can take place. Such traces could involve sustained neuronal impulse activity or intracellular mechanisms similar to reinforcement eligibility traces (Houk et al. 1995; Suri and Schultz 1999; Sutton and Barto 1998) that may occur in the amygdala or originate from direct or indirect inputs from frontal cortical areas (Sesack et al. 1989). The traces could be used for updating reward predictions and eliciting appropriate neuronal and behavioral changes. The observed rapid switches of neuronal and behavioral responses may reflect local mechanisms in the amygdala or be derived from frontal cortical inputs crucially involved in reward learning (Saddoris et al. 2005) and switching of reward related behavior (Dias et al. 1996; Shima and Tanji 1998; White and Wise 1999). Hippocampus afferents to the amygdala (Maren and Fanselow 1995; Pitkänen et al. 2000) might also contribute to contextual background effects but may be predominantly for affective episodic memory (Phelps 2004; Richardson et al. 2004; Smith et al. 2006). Taken together, whereas we have fairly detailed concepts for neuronal reinforcement processes in stimulus-reward pairing (Montague et al. 1996; Sutton and Barto 1998), the mechanisms underlying reward contingency via background reward remain to be further elucidated.
Many amygdala neurons respond to visual stimulus features, as shown by failed stimulus-reward reversals (Sanghera et al. 1979), whereas responses in other amygdala neurons follow reversals typical for reward relationships (Paton et al. 2006). Our data provide further evidence for this distinction. The strong background reward effect on the stimulus response occurred although the stimulus itself remained identical, which suggests reward rather than visual object relationships. By contrast, the unaffected stimulus responses in the background insensitive amygdala neurons might well have reflected visual processes. These data suggest that different groups of amygdala neurons are engaged in visual and reward processing, respectively.
This study tested all possible combinations of stimulus and background reward, even on neurons that responded similarly or more to unrewarded than rewarded stimuli. However, we failed to find specific activating or depressing stimulus responses when background reward exceeded stimulus reward and made the stimulus an inhibitory predictor. The spontaneous activity in our amygdala neurons was high enough (10–20 impulse/s; see Fig. 7B) to reveal depressions. A few neurons responded when stimulus reward was below background reward, but they also responded when stimulus reward was higher than background reward and thus were unspecific (Fig. 9B). The lack of response to conditioned inhibitors with background reward manipulations may relate to the failure of amygdala lesions to induce deficits in conditioned fear inhibition (Falls and Davis 1995) and may argue against a general function of amygdala in conditioned inhibition. However, more studies are needed to resolve a potential role of amygdala in conditioned inhibition.
We acknowledge financial support by the Wellcome Trust, the Behavioural and Clinical Neuroscience Institute Cambridge, and the Human Frontiers Science Program.
We thank A. Dickinson, S. Kobayashi, and M. Lengyel for suggestions and comments and M. Arroyo for expert histology.