We previously showed that in monkeys trained on a visual motion discrimination task, improvements in perceptual sensitivity corresponded to changes in motion-driven responses of neurons in area LIP, which represents the readout of motion information to form a direction decision, but not area MT, a likely source of that motion information 4, 12, 13, 18, 35, 44
. Here we showed that a computational model that uses a reinforcement-learning rule to adjust pooling weights between MT-like sensory neurons and LIP-like decision neurons can account for both behavioural and neural changes observed during training.
Our model suggests that the changes we measured in LIP during training reflect an increasingly selective readout of the most informative MT neurons. In reality, the sensory evidence is likely provided by not just MT but also other motion-sensitive neurons in the brain, like those found in the middle superior temporal area (MST) 45
. Likewise, LIP is just one of an interconnected network of brain regions, including the superior colliculus and frontal eye field, that represent and likely contribute to the formation of the direction decision 17
. Therefore, our model is not informative about where in the brain the actual changes in connectivity occur. Rather, the model establishes principles governing how functional (i.e., direct or indirect) connectivity between areas like MT that provide the sensory evidence for the task and areas like LIP that form the decision is modified by experience.
Our simulations also provide a deeper understanding of the relationship between the noisy activity of MT-like neurons and behavioural choices. This relationship, called choice probability, appears to arise from both an appropriate readout scheme and a particular form of interneuronal correlations. Choice probability in our simulations matched real MT data and increased selectively for the most sensitive neurons only when pairwise correlations depended on the similarity of both the direction tuning and sensitivity of each pair of neurons 4
. Several modelling studies have made similar assumptions about stronger correlations between neurons with similar response properties, possibly arising from common inputs to similarly tuned neurons 36, 46, 47
. Consistent with this idea, response correlations in V1 have been shown to depend on the similarity of tuning between pairs of neurons 37
. Pairwise correlations in MT have been shown to be stronger between neurons with similar direction tuning curves 24
, but their relationship to neuronal sensitivity has not yet been examined systematically.
Our reinforcement-learning model used a simple delta rule to adjust the pooling weights based on a reward prediction error from the current decision. This kind of reward prediction error has been used extensively to account for learning behaviours that involves the establishment of sensory-response associations and is reflected in the phasic activity of midbrain dopamine neurons 6
. These signals are likely driven at least in part by reward-related activity encoded in numerous cortical and subcortical regions including the orbitofrontal and anterior cingulate cortex, striatum, and ventral tegmental area 7
. However, in our model reward was essentially a surrogate for whether the response was correct or not, suggesting that other kinds of feedback signals related more closely to errors than rewards might also play a role in driving learning 48
. Further work is needed to determine which of these feedback signals are present during perceptual learning.
This simple model with a single learning rule can account for the time course, magnitude, and specificity of both associative and perceptual improvements. Early in training, the feedback reinforcement signal first establishes the functional connectivity in stimulus-response association, from neurons that represent the sensory stimulus to neurons that control the motor responses. This sensory-motor connectivity is further refined by the same learning mechanism to provide a more selective read-out of the most sensitivity sensory signals associated with that response, a form of channel reweighting thought to underlie several forms of perceptual learning 4, 49
Other feedback signals, such as attention, have also been implicated in gating and/or guiding neural changes during perceptual learning 2, 43
. We did not model effects of attention on learning explicitly in our model, mainly because the attention state of the animal was not manipulated or measured in our experiments. It has been suggested that the co-occurrence of attention and reward feedback is an important factor deciding which stimulus features are learned during perceptual learning 11
. Our model provides a framework for addressing this important issue.