The animals completed an average of 6.3 ± 0.2 (mean ± SEM) learning blocks within each recording session. Behavioral responses for two separate learning blocks are depicted in , and the overall population response in . The monkeys’ behavior indicated mastery of the object-direction pairing for familiar objects. Performance on novel objects improved over the course of a learning block as the monkeys learned the correct associations. Because each object was mapped to a unique direction, presentation of a familiar object before a novel object at the start of a new block decreased the potential responses to the novel object. The animals’ behavior indicated awareness of this feature of the task, as evidenced by the fact that the population learning curves began slightly above 25% ().
Between both animals, a total of 460 learning blocks were completed, encompassing 920 novel cue-movement associations. The animals successfully achieved criterion in 72% of blocks, on trial number 9.3 ± 0.8 (mean ± SEM, counting preceding incorrect and correct trials). Across all trials, reaction time (time between go cue and initiation of movement) for familiar objects was 276 ± 2 ms (mean ± SEM), and for novel objects was 352 ± 2 ms. Movement time (time between initiation of movement and acquisition of target) for familiar objects was 111 ± 0.01 ms, and for novel objects was 107 ± 0.02 ms.
We recorded 73 individual GPi neurons as the animals performed the associative learning task. Recording sites are shown in . We bisected each axis to define anterior vs. posterior, medial vs. lateral, and dorsal vs. ventral subdivisions. Forty-three neurons (59%) were recorded in the skeletomotor region of the GPi, as defined by location in the posterior-lateral-ventral region of the GPi.
Figure 2 Location of GPi recording sites. Recording site location was determined by confirmation between stereotactic coordinates and physiological characteristics of deep nuclei and white matter boundaries. Coronal sections anterior to the inter-aural plane (noted (more ...)
To determine task responsiveness of the neuronal population, we calculated average firing rates of each neuron within the six relevant task epochs (). Firing rates differed significantly across task epochs (p<10−3, Kruskall-wallis test). Individual comparisons of the epochs showed that firing rates first significantly differed from fixation at the go cue (p<0.01, tukey-kramer post-hoc test). As it is known that the composition of GPi neurons is heterogeneous, we divided the population based on whether an individual cell tended to increase or decrease its firing during the trial. The distribution of cells that significantly changed their firing rate during the presentation, go cue, and movement are indicated by the colored bars in . Whereas a few individual cells increased (N=5) or decreased (N=2) firing at the presentation (), a much larger number did so (N=30 and 9, respectively) at the go cue (), and persisted into the movement ().
Figure 3 Population task responsiveness. (a) Boxplot of population firing rates across the six task epochs: fixation (Fix), presentation (Pres), go cue (Go), movement (Move), feedback sound (Sound), and reward (Rew). The central line in each box represents the (more ...)
Because the first significant change in neuronal firing at the population level occurred at the go cue, we examined firing patterns of individual neurons during this epoch over the course of learning. For example, depicts peri-go cue rasters of a neuron’s firing during trials in which one of the novel objects was being learned (), and during trials in the same block in which one of the familiar objects was being presented (). This neuron consistently decreased firing at the go cue on every trial, but this pattern did not modulate with learning over successive trials as the animal learned the correct association (, left panel). Its pattern over the course of the block was relatively stationary, and similar for both novel and familiar objects. depicts a neuron that increased its firing at the go cue during novel and familiar object trials, also in a pattern unrelated to learning.
Figure 4 Example firing pattern of two learning-unrelated neurons. Rasters and peri-stimulus time histograms (PSTHs) for sequential trials (bottom to top) aligned to the go cue (t=0) are shown for two example neurons, over the course of a single learning block. (more ...)
On the other hand, depicts two example neurons whose firing changed over the course of learning a novel object. In the first example (), the peri-go cue firing rate was relatively higher at the beginning and end of the learning block, but was lower in the middle for several trials. In the second example (), the decrease in firing occurred earlier in the learning block. Examination of the animal’s behavioral performance in these blocks revealed a similar difference. In the first case, criterion performance was achieved after 19 trials (counting previous correct and incorrect trials). In the second, criterion was achieved earlier, after 11 trials. Firing in these cells correlated significantly with behavioral performance, as measured by the probability of a correct response (p<0.05, Pearson’s linear correlation). This modulation in firing over the course of a learning block did not take place during concurrent familiar object trials (, respectively), indicating an effect specific to novel object learning.
Figure 5 Example firing patterns of two learning-related neurons. Rasters and PSTHs aligned to the go cue are shown for two neurons. (a, b) Example firing pattern of a learning-related neuron during presentation of a novel object (a) and a concurrently presented (more ...)
In other neurons whose firing transiently decreased for several trials over the course of learning, we observed a similar relationship between the timing of the change and the rate at which learning criterion was reached. In order to better study these dynamic learning-related changes, we aligned trials to the trial at which criterion learning performance was attained. Alignment to the criterion trial allowed us to compare activity across similar stages of learning, compensating for different learning rates across blocks. Only correct trials were included in this analysis. We sought to identify the subset of neurons most involved in learning by choosing those whose activity correlated with the learning curve (Williams and Eskandar, 2006
). Twenty-one of the 73 neurons (29%) showed a significant positive correlation (Pearson’s test, p<0.05) with the learning curve, including the described transient decrease in firing lasting several trials, and were included in subsequent analyses. The two neurons depicted in are representative examples of these learning-related neurons.
To determine whether location was a significant factor in identifying learning-related neurons, we compared the fraction of learning neurons identified in the posterior vs. anterior region, medial vs. lateral region, and dorsal vs. ventral region (). None of these comparisons were significant (p>0.05, Fisher’s Exact test). Four of the 73 neurons showed a significant negative correlation (Pearson’s test, p<0.05) with the learning curve.
The population peri-go cue activity in this subset of learning-related neurons is shown in . There was no modulation in firing rates across trials when the animals were presented with familiar objects. During novel association learning, however, we observed a consistent transient decrease in peri-go cue firing rate starting 8 trials before criterion was reached. This suppression was significant compared to post-criterion points (p<0.05, two-tailed t-test) and lasted for 4 trials. By the time learning criterion was reached, activity had returned to the higher baseline firing rate, and was indistinguishable from the activity observed in familiar trials. This difference between novel and familiar object trials suggests that the effect seen during novel object trials was specifically related to the learning required during those trials, and absent during contemporaneous familiar object trials, in which there was no active learning. This effect was not present when rates were aligned to the fixation or presentation.
Figure 6 GPi firing encodes a facilitation window. (a) Normalized firing rate for novel (red) object trials as a function of criterion-aligned trial number for the subset of learning-related neurons. Familiar object trials (blue) are shown as a function of ordinal (more ...)
The rate of learning did not differ between sessions during which learning-related and learning-unrelated neurons were recorded. The number of trials to criterion was 9.2 ± 0.3 in the former, and 9.4 ± 0.3 in the latter (p=0.71, two-tailed t-test), and the shape of the learning curves was also identical (). The population activity of all 52 learning-unrelated neurons did not exhibit a similar decrease in GPi firing in the peri-go cue period ().
To ensure that this change in neuronal firing was not simply a reflection of variations in movement parameters over the course of learning, we plotted reaction times and movement times as a function of criterion-aligned trial number (). There was no significant difference between pre-criterion and post-criterion reaction times (p=0.15, t-test), nor between pre- and post-criterion movement times (p=0.63). We also confirmed that the direction of the impending movement did not influence the exploration-related changes by first calculating peri-go cue directional preferences for each cell. We then examined the changes in neuronal firing over learning after separating each trial based on whether the impending movement was in the cell’s preferred or anti-preferred direction. At no point was there a significant difference between these curves (p>0.05), suggesting that the direction of movement did not influence the exploration-related changes.
To determine whether this decrease in GPi firing was specifically related to a particular phase of learning, we aligned the trials to the first presentation of the novel object, rather than to the criterion trial. If this effect were simply a function of stimulus novelty, ordinal trial alignment would make it even more prominent. Aligning to the first rather than criterion trial, however, made the effect disappear (). The presence of the decrease in firing depended upon correction for the pace of learning, suggesting that its timing was related to a particular early process in learning (occurring 5–8 trials prior to criterion), rather than other non-specific aspects of the task occurring soon after a block change.
Figure 7 The facilitation window is not an effect of stimulus novelty or reward schedule. (a) The same firing rates of learning-related neurons for novel (red) and familiar (blue) object trials as depicted in , aligned to ordinal trial number, starting (more ...)
Early in the learning block, the fraction of correct trials was relatively low. It is therefore also possible that the observed decrease in firing rate was simply a reflection of the sparser reward frequency early in the block. To rule out that possibility, we performed a control task in 17 of the 73 total neurons, similar in design to the main learning task, except that responses in each trial were indicated by a change in the color of the target, such that no learning was required (see Methods). If the effect were simply a function of the changing amount of reward encountered over the course of a block, it would be identical between the learning block and adjacent control block. Removing the requirement to dynamically learn associations, however, eliminated the transient decrease in firing rate (). This period of decreased GPi activity is therefore also not simply a general appetitive effect of reward schedule or reinforcement.
Relationship Between Neuronal and Behavioral Data
To understand the functional significance of the observed brief decrease in GPi activity, we sought to identify an explicit relationship between peri-go cue GPi firing and behavioral choice. We constructed receiver operating characteristics (ROCs) to evaluate the statistical relationships between changes in GPi firing and the animal’s choice of action. The ROC analysis approximates the likelihood that an ideal observer would be able to predict the behavioral outcome in an individual trial from the neuronal activity (Britten et al., 1996
). We tested two alternative hypotheses: 1) The firing rate in an individual trial predicts a choice different from the previous choice (“exploration” model); 2) The firing rate in an individual trial predicts an impending correct choice (“exploitation” model). The “exploration” model describes a behavioral paradigm in which the animal is actively exploring the parameter space, intentionally choosing a response different from the last, despite the fact that the previous answer may have been correct. The “exploitation” model describes a strategy optimized to consistently choose the response most recently identified as correct, thereby maximizing the chance of obtaining reward.
Discrimination values (area under the ROC curve) for both models were calculated in sliding increments relative to the go cue for the twenty-one learning related neurons. Example discrimination values for two neurons are shown in for both exploratory (blue) and exploitive (red) hypotheses. Significance was estimated using a bootstrap analysis (thick lines). Discrimination values tended to decrease significantly below chance (0.5) during exploration trials, and increase above chance during exploitation trials.
Figure 8 GPi firing predicts a behavioral shift from exploration to exploitation. ROC discrimination values were calculated for “exploration” (blue) and “exploitation” (red) hypotheses in a sliding window 400 ms wide stepped in (more ...)
Population ROC discrimination values for the learning-related neurons are shown in . Prior to the go cue, discrimination values for both models remained near chance and overlapped in distribution. Nearly contemporaneous with the go cue (50 ms prior), however, the models began to diverge significantly (t-test, p<0.05) from each other and chance. The lower values for the “exploration” model indicate that lower firing rates predicted exploratory behavior, and the higher values for the “exploitation” model indicate that higher firing rates predicted exploitive behavior.
To relate the ROC analysis results back to the learning process, ROC discrimination values were calculated as a function of trial number. This analysis was performed in a time window starting at the point at which the two ROC curves diverged significantly (50 ms before go cue) and ending at the mean reaction time for novel objects (350 ms after go cue), thereby including only peri-go cue firing. The averaged firing rate within this window was paired with the animal’s choice on that trial and submitted to the same two-hypothesis population ROC to generate discrimination values as a function of learning. Significance of the ROC discrimination values was again determined by a bootstrap analysis. Thus at every point in learning we arrived at a discrimination value for both exploration and exploitation hypotheses. By comparing the average firing rates for the learning-related neurons (from ) to the discrimination values, we could thereby determine which of the two behavioral strategies is favored at various stages of learning. Because both the normalized firing rates and ROC values were constrained between 0 and 1, a low firing rate (as occurred 5–8 trials prior to criterion) would predict an exploratory behavior on that trial if the peri-go cue discrimination value for exploration were significantly low whereas that for exploitation were significantly high. This relationship was quantified by taking the absolute magnitude of the difference between significant discrimination values and normalized firing rates at each trial, and assigning that trial’s behavioral preference to the behavior with the smaller difference.
These results are displayed in . Ten trials prior to criterion neither ROC discrimination value was significantly different from chance (white marker). On the next trial, the firing rate predicted exploitive choices (red marker), possibly due to carry over effects from the previous learning block. For the next seven trials, during the prominent decrease in GPi firing, the firing rate predicted exploratory choices (blue markers). In the vicinity of the criterion trial, the predictions alternated briefly, before settling on predictions of exploitive behavior for all but one of the post-criterion trials.