Behavioral Tasks and Performance
Two macaque monkeys (one rhesus and one bonnet macaque) performed an object-place associative learning task () in which they learned to associate different object-place combinations with either an early or late bar release response. Animals initiated each trial by holding a bar and fixating a central fixation point for 500 ms. They were then shown one of 4 possible object place combinations for 500 ms. Each combination was composed of one of two possible visual objects in one of two possible spatial locations (). Both the objects and the spatial locations changed daily. Following a 700 ms delay interval when nothing but the fixation point was shown on the screen, the animals could make a bar release response during the presentation of either an orange circle presented for 500 ms (early response), or during the 500 ms presentation of a green circle shown immediately after the orange circle (late response). If the response was correct, a “positive” auditory feedback tone was played 15 ms (+/−1ms) after the bar release. Except for “omit reward” trials (see below) the auditory tone served as a reliable signal of the future delivery of reward. For the vast majority of trials, correct responses were rewarded with between two to four juice drops starting 30 ms (+/−1ms) after bar release and ending 933 ms (+/−1ms) after bar release if 4 drops were given. With trial and error, animals learned to associate each object-place combination with either the early or a late bar release response. For correct trials, juice reward was followed by a 2000 ms inter-trial interval before the initiation of the next trial. For error trials, a 2000 ms inter-trial interval started immediately after an incorrect bar release response. Fixation was required from the time the animal initiated the trial until the time of the bar release response.
Object-Place Task, recording sites and cell population
To characterize behavioral learning of individual object-place combinations, we used a Bayesian state-space model for analyzing learning of multiple problems simultaneously (Smith et al., 2007
; Supplementary Methods 1 and Supplementary Figures 1 and 2). Of the 156 sessions, there were 55 sessions during which animals did not learn any association to criterion. For the remaining sessions, animals learned 1 of 4 (17 sessions), 2 of 4 (20 sessions), 3 of 4 (17 sessions) or 4 of 4 (17 sessions) associations to criterion. Learning criterion was defined as the trial number when the estimated probability correct performance was significantly above chance (Smith et al., 2007
; Supplementary Methods 1). Monkey M required an average of 68.60 +/− 4.34 total (i.e., consecutive) interleaved trials to learn an individual association. This corresponds to and average of 14 +/− 0.75 trials for each individual association. Monkey E required an average of 82.4 +/− 5.08 total interleaved trials to learn an individual association, and an average of 17 +/− 4.02 trials for each individual association.
In order to administer several key control experiments, animals also performed a “fixation only” task given in blocks of trials at the end of the recording sessions (). The fixation only task was identical to the object-place task except that animals were only required to maintain fixation on the central fixation cross through the object-place presentation and delay periods of the trial to receive a reward (i.e., no associative learning required). Animals successfully completed an average of 73.57%, +/− 1.52 of the fixation only trials (i.e. completed trial without break of fixation). In a subset of 30% of the fixation only trials, we examined the effect of omitting reward following successful fixation (omit reward trials; the auditory tone was played on these omit reward trials). On other fixation only trials, we tested the effect of giving random, unexpected reward at variable time points throughout the fixation only trials (random reward control trials).
Outcome selective neuronal activity
We recorded the activity of 165 hippocampal neurons from 2 monkeys (109 from monkey M and 56 from monkey E) during learning of novel object-place-response associations. Recordings were made throughout the full anterior-posterior extent of the hippocampus and based on MRI reconstructions, appeared to include neurons from all hippocampal subdivisions (). We did not attempt to select cells based on their firing properties and instead recorded from the first well-isolated hippocampal cells encountered. To examine how cells signaled information about trial outcome, we focused on neural activity during the 2000 ms following the bar release response. We chose this time period because it was the longest common post-response time period available for analysis for both correct and error trials. For correct trials, this period included both the time period when reward is being given (from 30 ms following bar release to 933 ms following bar release if 4 drops of reward were given) as well as the initial part of the inter-trial interval. For error trials, this 2000 ms period included the entire inter-trial interval period. Our initial analysis revealed a heterogeneous mix of both sustained and more transient responses during this 2000 ms period. In order to characterize these responses with higher temporal resolution, we split the post-release period into two consecutive 1000 ms time periods. Compared to the baseline firing rate period, defined as activity during the 500 ms fixation period at the start of each trial, we found 77% (127 of 165) of the hippocampal neurons responded with significant activity in either one or both of the 2 consecutive 1000 ms periods analyzed (t-test, p<0.05). Moreover, 83 of the 127 responsive neurons (65%, of the responsive cells or 50% of the total population) were outcome-selective in that they differentiated between correct and error trials (t-test comparing responses on correct vs. error trials, p<0.05).
Further analysis revealed two distinct subpopulations of outcome-selective neurons. The first population (30 of 83 responsive neurons), termed correct up cells increased their activity on correct trials compared to error trials in the first (n=11), second (n=7) or both 1000 ms periods (n=12) following bar release (, and Supplementary Figures 4A and C). A sliding window calculation (See Experimental Procedures) revealed that the correct up cells started differentiating correct from error trials 342 +/− 75 ms after bar release (p<0.05, t-test). A second subpopulation termed “error up” cells (38 of 83 responsive neurons) increased their firing rate on error trials relative to correct trials during the first (n=3), second (n=15) or both 1000 ms periods (n=20) after bar release. Among error up cells, 14 cells exhibited a significant decrease in response following a correct trial, though the response following errors was significantly above baseline. Error up cells were also characterized by clear motor-related activity that peaked at the time of bar release (, Supplementary Figures 4B and 4D; See Supplementary Information 2 and Supplementary Figure 5 for a further discussion of the motor-related activity of the error up cells). Error up cells differentiated between correct and error trials starting 489 +/− 63 ms after the animal’s bar release response (p<0.05, t-test). Smaller populations of cells decreased their activity on correct trials (“correct down” cells; n=5) or on error trials (“error down” cells; n=7) relative to baseline, or responded significantly to both correct and error trials relative to baseline activity (“mixed” outcome cells”; n=3). Because of the relatively small number of cells in these latter three categories, they were not examined further. An analysis of the ratio of spike height vs. spike width showed that the majority of our isolated neurons were putative pyramidal cells and that correct up and error up cell categories included both putative principle cells as well as putative fast spiking neurons (Supplementary Information 1 and Supplementary Figure 3).
Population Histograms for the correct up and error up populations
Correct up cells
Given that the correct up signal is not expressed until well after the first drop of juice is delivered 30 ms after the bar release (), one possibility is that these cells may simply provide a delayed signal of reward delivery (Ranck, 1973
; Smith and Mizumori, 2006
). To test the dependence of the correct up signal on aspects of reward delivery, we examined the effect of several different reward manipulations. To test whether correct up cells respond to delivery of reward per se, we examined responses on a subset of both standard and fixation only trials in which random rewards were given. Specifically, we compared neural activity during the 500 ms following the first reward drop in correct trials, to the activity during the 500 ms following a random reward drop using a t-test (p<0.05). None of the correct up cells tested in these conditions (n = 8) showed a reliable response to random reward delivery ().
While the correct up cells did not respond to random drops of juice, we next tested the possibility that they might be sensitive to the timing of reward delivery. To address this possibility, we examined neural activity on standard trials in which the delivery of reward was delayed from 30 ms to 518 ms after the bar release response but the auditory feedback signal continued to be given immediately (i.e., 15 ms +/−1 ms) following a correct response as in standard conditions (See Experimental Procedures). Following several days of habituation to the delayed reward delivery, neural activity was recorded. We found that correct up cells continued to respond with the same latency to the auditory feedback sound even if the reward itself was delayed by 488 ms (). Next, we eliminated the feedback sound for these delayed reward trials and we saw a significant increase in the latency of the differential correct/error signal relative to the standard condition (). These findings suggest that the response latency of the correct up cells signals is not controlled by the timing of reward delivery. Instead, correct up cells appear to be sensitive to information about trial outcome whether it’s signaled by an auditory feedback signal or the delivery of reward.
To determine if correct up cells differentiate between correct and error trials for other tasks, we examined the responses of these cells on the fixation only task. Unlike the differentiation observed during the object-place trials, the correct up cells tested in fixation trials (n =12) did not discriminate between correctly executed fixation only trials and un-rewarded break fixation trials (two sample t-test, p>0.05). This suggests that the correct up cells signal correct/rewarded outcome more specifically during a learning context and do not convey general information about successful trial completion.
Because our previous studies showed that a subset of hippocampal neurons could change their firing rate correlated with new associative learning (Wirth et al., 2003
; Yanike et al., 2008
), we next asked if the magnitude of the correct up signal might change over the course of the learning session. To address this question, we used a one-way ANOVA with time period (i.e., early middle and late periods of the session) as a main factor to examine the amplitude of the correct up signal over time. We analyzed the first 1000 ms following bar release and the second 1000 ms following bar release separately and found no difference in the amplitude of the correct up responses during either time bin over the course of the learning session. For the first 1000 ms, the average rate for each consecutive third of the session was: 14.41+/−3.26; 14.18 +/− 3.06; 14.90 +/−3.06 (F[2,87]=0.01, p=0.98). For the second 1000 ms time bin following bar release the average rate for each consecutive third of the session was 14.70 +/−2.87; 15.09 +/− 2.79; 15.85 +/−2.85 (F[2,87]=0.04, p=0.95). Thus, the response of the correct up cells during the reward and ITI periods of correct trials remained stable over the course of learning.
Error up cells
Error up cells (38/83 outcome selective cells) increased their activity on error trials relative to correct trials in the 2000 ms following bar release for the object-place association task. To test the hypothesis that the error up cells signal the absence of a possible reward, we examined neural activity during correctly executed fixation only trials where reward was occasionally omitted (n = 13, omit reward trials). We hypothesized that if the error up cells provided a general signal of the absence of a possible reward, we should see a similar increase in activity following the omit trials as we saw on the error trials. Consistent with this prediction, we found that the error up cells increased their activity on omit reward trials relative to rewarded fixation only trials (n=13, t-test, p<0.05 ).
If the error up cells signal the absence of a possible reward then we predicted that they might also be sensitive to manipulations of the timing of the reward delivery. To address this question, we examined the response of error up cells on trials in which we delayed the delivery of reward from 30 ms to 518 ms following bar release (no auditory feedback signal given; n = 19 error up cells; no trials were available with the auditory feedback together with delayed reward). Following habituation to the delayed reward signal, we recorded the activity of error up cells and found that delaying the reward resulted in a significant increase in the latency of the differential correct/error signal from the standard object-place trials with no delay in reward delivery (mean latency with no delay = 489 ms +/− 63, n = 19; mean latency with delay = 800 +/−66, n =19; two sample t-test, p <0.05). Thus, errors up cells are sensitive to the latency of reward delivery and may use this information as a cue to signal the absence of a possible reward.
To determine if error up cells differentiate between correct and error trials on other tasks, we compared activity during the object-place associative task to activity during the fixation only task. Unlike the correct up cells, the error up cells exhibited a similar response in both tasks, increasing their response following erroneous break fixation trials relative to correctly executed responses (t-test, p<0.05). The cells discriminated between break fixation trials and correctly completed trials 426 +/− 86 ms after the end of the trial when reward was not delayed (n= 9, shown in ) and 783 +/− 81ms (n=13) after the end of the trial when reward was delayed (n=13, data not shown). These findings support the idea that error up cells provide a general signal of the absence of a possible reward in multiple task situations.
Similar to the correct up cells, we also asked if the magnitude of the neural responses of the error up cells changed over the course of the session. To address this question, we used a one way ANOVA with the time period as a main factor to compare the amplitude of the error up signal averaged over two or three time periods of the session (i.e. early, middle and late for 29 sessions, and early and middle for 7 sessions in which there were no more error trials during the last third of the session) for the population data. We analyzed the first 1000 ms following bar release and the second 1000 ms following bar release separately and found no difference in the amplitude of the correct up responses during either time bin over the course of the learning session. The average firing rate for each consecutive third of the session for the first 1000 ms following bar release was: 10.87 +/−1.81; 11.7 +/− 2.32; 9.69 +/− 1.95 (F[2,98]=0.32; p=0.72). During the second 1000 ms time bin following bar release the values were: 11.88 +/− 2.15; 11.93 +/− 2.29; 11.11 +/− 2.23 (F[2,98]=0.15; p=0.85). Thus, the response of error up cells during the reward and ITI periods of error trials remained stable over the course of the learning session.
The role of outcome-selective cells in learning
While correct up and error up cells both convey information about trial outcome, another important question concerns how these populations of cells might use this information about trial outcome to influence new learning of object-place associations. Numerous pervious studies have shown that neurons in both the medial temporal lobe (changing cells: Wirth et al., 2003
) as well as in cortex (Baker et al., 2002
; Kobatake et al., 1998
; Sigala et al., 2002
) change their stimulus-selective responses in parallel with learning. Given these previous data, we tested the hypothesis that the correct up or error up cell populations might also convey information about learning with shifts in their stimulus-selective response properties. To address this question, for sessions during which significant behavioral learning was seen (21 correct up cells and 24 error up cells), we calculated neural selectivity of the population of correct up and error up cells during the cue and delay periods of the task both before and after behavioral learning was achieved. We also examined the selectivity of a control population of 82 non-outcome selective cells (including 38 non responsive cells and 44 responsive but not outcome selective cells) in the same manner. To determine whether the shifts in selectivity were specific to learning, we calculated selectivity on sessions during which no learning occurred (9 correct up cells, 14 error up cells and 32 non-outcome selective cells) for the first 60 trials and the remaining trials (60 corresponds to the average number of trials to learn). We used a two-way ANOVA applied to the selectivity measures during the cue and delay periods of the task before and after learning (two levels of repeated measures) using cell category and learning as the main 2 factors. The ANOVA revealed a significant interaction between the cell category and learning status of the selectivity measures before and after learning (F[2,144]=4.5. p=0.0012; ). Post hoc comparisons showed there was a significant increase in selectivity after learning relative to before learning in correct up cells only in sessions where significant learning was found (Neuman-Keuls; p<0.001). In contrast, no change in selectivity was seen in either the error up cells or the control non-outcome-selective cells. Differences in excitability between learning and no learning sessions could not explain the striking increase in selectivity seen in the correct up cells (Supplementary Information 3). We also asked if there were learning-related changing cells in this hippocampal population (Wirth et al., 2003
; Yanike et al., 2008
). We identified a subset of hippocampal changing cells during object-place associative learning task, but showed that the changing cells (that also exhibit increased selectivity with learning) were not driving this increase in selectivity exhibited by the correct up cells (Supplementary Methods 2 and Supplementary Information 4). To better illustrate the distribution of selectivity in these different populations of cells, we calculated the differential selectivity between trials before and after learning (). The population of correct up cells also exhibits a wider distribution of the selectivity differences with more cells showing increases relative to the other populations of cells (F-test, p<0.01). Thus, these findings suggests that the correct up cells but not the error up cells or the control cells convey information about learning by shifting their stimulus-selective response properties during the cue and delay periods of the task in parallel with behavioral learning.
Selectivity following correct or error trials