The ability to learn new actions and perfect them with practice allows us to master amazing skills like playing the piano or riding a bicycle. Learning these skills usually implies moving faster, more accurately, and less variably
5. However, mastering other types of skills, like playing board games or controlling neuroprosthetic devices, often does not directly involve changes in physical movement
1,6. Cortico-basal ganglia circuits have been implicated in the learning, selection and execution of physical skills
2–4,7,8. In particular, plasticity in the motor cortices and the striatum, the major input region of the basal ganglia, has been shown to accompany the learning of physical skills
2,9. The motor cortex and frontal cortices have also been implicated in the learning of abstract skills
10–13, and in learning to control neuroprosthetic devices irrespective of physical movement
14–17. Some studies suggest that not only cortical areas, but also the striatum, are involved in learning abstract skills
18–20. However, it is still unclear if the striatum is required for abstract skill learning, and if corticostriatal circuits undergo plasticity during the learning of such skills as they do during the learning of physical skills. Here, we use a novel behavioral paradigm in conjunction with electrophysiology and genetic manipulations in rodents to investigate the role of corticostriatal circuits and corticostriatal plasticity in the learning of intentional neuroprosthetic actions, i.e. actions performed with disembodied actuators based on the modulation of specific neural activity and irrespective of physical movement
6.
We developed a novel operant brain-machine interface (BMI) task in which rodents were required to modulate activity in M1, rather than do a physical movement, to obtain reward (). Modulation of M1 ensemble activity resulted in changes in the pitch of an auditory cursor, which provided constant auditory feedback to rodents about task performance. Reward was delivered when rodents precisely modulated M1 activity to move this auditory cursor to one of two target tones, and a trial was marked incorrect if no target had been hit within a set time limit (30 secs). One of these targets was associated with a reward of sucrose solution, while the other target was associated with a pellet reward (see Methods). Two neural ensembles consisting of 2–4 well-isolated units each were used to control the auditory cursor (
Supplementary Figs. 1–2). The action of these two ensembles opposed each other, such that increased activity in one ensemble produced increases in cursor pitch, while increased activity in the other ensemble caused decreases in cursor pitch. Thus, in order to achieve a high-pitched target, rodents had to increase activity in the first ensemble and decrease activity in the second; the opposite was required to hit a low-pitched target (;
Supplementary Fig. 3). These firing rate modulations had to be maintained for several time bins (200 msec bin size) for a target to be hit (
Supplementary Methods). Hence, in this operant task, rodents had to bring the two M1 ensembles into a desired state irrespective of motor output.
We trained six male Long-Evans rats on the task, and verified that they exhibited marked improvement in the percentage of correct trials over the course of 11 days (). As typically observed in motor skill learning
21, there was a phase of rapid improvement followed by a phase of slower learning, representing early (days 2–4) and late (days 8–11) phases of learning. The percentage of correct trials increased significantly from early to late in learning (;
P < 0.001), resulting in performance well above chance (;
P < 0.001; see
Supplementary Methods), while mean time-to-target decreased (
Supplementary Fig. 4). Analyses of M1 firing rates further showed that rats were producing the desired neuronal ensemble rate modulations during task performance (,
Supplementary Fig. 3). Furthermore, sensory feedback was found to be critical for animals to learn this task because when rats were not given auditory feedback during training (although they would still get a reward if they would modulate neural activity correctly), the percentage of correct trials did not increase over the course of 9 days of training (
Supplementary Fig. 5).
We next investigated if animals were performing physical movements that would modulate the activity of those particular M1 ensembles. First, we monitored overall rodent movement with an accelerometer mounted on the recording headstage, which allowed us to measure if the animals produced any body or head movement during target achievement
22. Accelerometer traces exhibited no changes before and during target reaching, but did show prominent deflections after target reaching as the animals retrieved the reward (, see also ), demonstrating that rodents were not relying on gross motor behavior to perform the task. We also monitored movements of the vibrissae with electromyographic (EMG) recordings of the mystacial pad (electrodes targeted M1 areas controlling vibrissae movement,
Supplementary Fig. 2, Supplementary Methods,), and observed no significant EMG signals before target achievement, although there were clear EMG signals afterwards as animals retrieved and consumed the reward (,
Supplementary Fig. 6b). Importantly, there was no correlation between EMG activity and the spiking of the M1 neurons controlling the auditory cursor - the correlation coefficient for all trials in a behavioral session was 0.092 ± 0.003 (mean ± s.e.m.), and the distribution of correlation coefficients across a session was not significantly different from zero (
Supplementary Fig. 6a;
P = 0.57). This was observed across all training days, including during early learning (
Supplementary Fig. 7). These data suggest that rats do not rely on physical movements to learn the task, although it is difficult to exclude the possibility that animals use some movement to generate neural activity to drive the auditory cursor during exploratory phases of the task in early learning. Nonetheless, the data shows that animals eventually learn to perform the task in the absence of overt movement. To further demonstrate that rats did not require vibrissae movements to control M1 activity, we injected lidocaine into the whisker pad to locally inactivate sensory and motor nerve endings during a session in late learning (see
Supplementary Methods). There was no significant change in performance during the temporary inactivation (;
P > 0.9), with rats achieving 78.1 ± 2.2 % correct with lidocaine (mean ± sem) vs. 78.8 ± 6.5 % without lidocaine,. Taken together, these data indicated that rodents were able to learn to operantly control M1 activity irrespective of any overt movement.
Goal-directed actions are sensitive to changes in the relation between performing the action and obtaining a reward (contingency) and to changes in the expected value of the reward, while habits are not
23–24. We asked if these neuroprosthetic actions were performed intentionally because the animal volitionally controlled M1 activity to get the outcome (goal-directed), or habitually due to the reinforcement history. To test this, we first degraded the contingency between executing the action and obtaining the outcome: i.e. the auditory cursor was still under control of M1 ensemble activity, but the probability of obtaining reward was similar irrespective of target achievement, which had no effect on the rate of reward. Following two days of contingency degradation, rats markedly diminished their responding and the percentage of correct trials decreased significantly (;
P < 0.001). When contingency was reinstated, rats resumed responding and the percentage of correct trials returned to plateau levels seen in late learning ().
To further investigate the intentional nature of these neuroprosthetic skills, we performed a test where each of the outcomes was devalued using sensory-specific satiety. Rats were given free access to either sucrose solution or pellets for one hour before the behavioral session, thereby reducing the expected value of that outcome
25. After specific devaluation of each outcome/reward, rats chose the target leading to that reward much less than the target leading to the reward that was not devalued (;
P < 0.001), indicating that their actions were sensitive to changes in outcome value. Importantly, there were no significant differences in reward preference during normal task performance when neither of the outcomes was devalued (
Supplementary Fig. 8;
P > 0.25). Finally, we asked whether rats were able to intentionally inhibit the reaching of one of the two targets in order to obtain the specific reward associated with that target. To examine this we performed an omission test, where the reward previously associated with reaching a particular target was now only delivered when rats successfully inhibited reaching that target throughout the duration of the trial. If the target was reached during the 30s of trial duration, then no reward would be delivered and a new trial would be initiated. Importantly, reaching the other target continued to lead to reward as during training. Animals behaved in a goal-directed manner in the omission test for both targets, since they reduced the number of target reaches for the target they had to omit versus the no-omission target, while increasing the number of correctly omitted responses (;
P < 0.001 for both comparisons). Taken together, these data show that the neuroprosthetic actions in our task are sensitive to changes in the causal relation between performing the action and obtaining the reward (contingency degradation and omission test), and to changes in the expected value of the outcome (sensory-specific devaluation), indicating that they are intentional and goal-directed rather than habitual.
We next examined if learning to operantly control M1 activity irrespective of overt movement involves striatal plasticity, akin to what is observed for natural motor learning
2–4,7,26–28. We verified that the improvement in behavioral performance seen across learning was accompanied by a significant increase in firing rates in the DS in late learning compared to early learning (;
P < 0.001). In addition to this general increase in firing rates, we noticed that firing rates of DS neurons exhibited greatest modulation during target reaching compared to baseline control periods (), as observed during natural motor learning
26. This modulation was significantly greater in late learning than early learning (;
P < 0.05), indicating that DS neurons changed their activity during the volitional control of M1 activity, and that this change increased with learning.
We next investigated if learning, and the observed changes in DS target-related activity, were accompanied by corticostriatal plasticity, i.e. changes in the functional interactions between M1 and DS neurons. We noticed that cross-correlation histograms between the two regions in late learning exhibited pronounced oscillatory spike coupling (). To quantify this interaction, we calculated the coherence between spiking activity in the two regions in both early and late learning (
Supplementary Methods). The resulting coherograms exhibited a clear increase in coherence at low frequency bands in late learning relative to early learning (), and these frequencies corresponded to the oscillatory frequency seen in the cross-correlograms (
Supplementary Methods). Furthermore, mean coherence in the theta band (4–8 Hz) was significantly greater in late learning than early learning (;
P < 0.001). This increase in coherence appeared to be related to learning to perform the task rather than higher reward expectation or proportion of correct trials in late learning, because coherence values remained high surrounding target achievement during the contingency degradation manipulation, where reward delivery was not contingent upon target achievement (
Supplementary Fig. 9, not different from non-degraded trials,
P > 0.05). In addition, coherence levels remained high during task performance in incorrect trials (data not shown), further suggesting that the increase in coherence observed is due to learning to perform the skill rather than outcome anticipation. Thus, neuroprosthetic skill learning is accompanied by dynamic changes in functional interactions between M1 and the DS neurons, suggesting an important role for corticostriatal plasticity in this novel task.
We therefore investigated if corticostriatal plasticity would be necessary for neuroprosthetic skill learning. N-Methyl-D-aspartic acid (NMDA) receptors in striatal medium spiny neurons are critical for corticostriatal long-term potentiation
29. We used a knockin line that expresses Cre recombinase in both striatonigral and striatopallidal medium spiny neurons (
RGS9L-cre), but not in all striatal neurons (e.g. absent from parvalbumin interneurons,
Supplementary Figure 10), and crossed it with mice carrying a floxed allele of the NMDAR1 gene
30. The resulting mice lack of NMDA currents in most projection neurons
30 (but not all striatal cells hence we refer to them as
RGS9L-Cre/Nr1f/f, and not as striatal NR1 knockouts, see
supplementary methods), and have impaired corticostriatal long-term potentiation
30. As previously described, these animals do not display any major motor deficits (
Supplementary Video 1 and Supplementary Video 2) and can learn to perform rapid sequential movements (
Supplementary Fig. 11), albeit being unable to learn precise motor sequences
27. We investigated neuroprosthetic skill learning in
RGS9L-Cre/Nr1f/f mice and littermate controls. While control mice showed performance improvement across learning irrespective of physical movement as observed for rats (
P < 0.001, ),
RGS9L-Cre/Nr1f/f mice exhibited marked learning deficits on the task, with no significant increase in the percentage of correct trials from early to late learning (,
P = 0.98). Furthermore, acute pharmacological blockade of NMDA receptors in trained control animals did not affect performance of the neuroprosthetic skill (even at relatively high doses that affect striatal burst firing,
Supplementary Figs. 12–13), suggesting that the deficits observed in
RGS9L-Cre/Nr1f/f mice are not due to inability to perform the skill but rather to the inability to learn the task.
Consistent with the findings above (), DS neurons in littermate controls exhibited a significant increase in firing rate across learning, while in mutants they did not (; main effect of genotype F1, 10 = 32.45, P < 0.001; early vs. late P< 0.05 for CT and P = 0.23 for KO). Also, in control mice, the proportion of DS neurons with significant target-related firing rate modulation increased with learning (; P < 0.05), but this was not observed in RGS9LCre/Nr1f/f mice (; P = 0.28). Finally, the development of functional corticostriatal interactions during learning was also abolished in RGS9L-Cre/Nr1f/f mutants, with no significant increase in coherence between M1 and DS spikes with learning (, F80, 10 = 0.65, P = 0.44); although littermate controls showed a clear increase as seen in rats (, F80, 10 = 4.86, P < 0.05). Taken together, these results demonstrate that the striking corticostriatal plasticity observed in rats during learning also occurs in control mice, but this plasticity is absent in mice with a decrease in functional NMDA receptors in striatum. These mutant mice do not show improvement with training, therefore indicating that corticostriatal plasticity may be necessary for learning to intentionally modulate M1 states to obtain specific outcomes.
In summary, we used a novel task in rodents to demonstrate that corticostriatal networks exhibit profound plasticity during the learning of intentional neuroprosthetic skills, and further, that disrupting this plasticity impairs learning. This adds great support to the claims that cortico-basal ganglia circuits play a role in abstract cognitive processes
18–20. We observed that DS neurons strongly modulated their activity in relation to M1 activity even when the latter was dissociated from physical movements, suggesting that the striatum is important for learning and selecting abstract actions that are controlled by cortical output. Hence, these data suggest that cortico-basal ganglia circuits may be involved in learning mental actions and skills that do not require physical movement, indicating that they may have a broader function involved in intention and decision-making than previously acknowledged.
Our results also have important implications for the field of BMIs
6. The abstract actions investigated here form the basis for skillful neuroprosthetic control
16 and, as we have shown here, these actions recruit elements of the natural motor system outside of M1. Thus, our results suggest that neuroprosthetic movements capitalize on the neural circuitry for motor learning and therefore have great potential to feel naturalistic, generalize well to novel movements and environments, and benefit from our nervous system's highly-developed storage and retrieval mechanisms for skilled behavior.