|Home | About | Journals | Submit | Contact Us | Français|
Most of our motor skills are not innately programmed, but are learned by a combination of motor exploration and performance evaluation, suggesting that they proceed through a reinforcement learning (RL) mechanism. Songbirds have emerged as a model system to study how a complex behavioral sequence can be learned through an RL-like strategy. Interestingly, like motor sequence learning in mammals, song learning in birds requires a basal ganglia (BG)-thalamocortical loop, suggesting common neural mechanisms. Here we outline a specific working hypothesis for how BG-forebrain circuits could utilize an internally computed reinforcement signal to direct song learning. Our model includes a number of general concepts borrowed from the mammalian BG literature, including a dopaminergic reward prediction error and dopamine mediated plasticity at corticostriatal synapses. We also invoke a number of conceptual advances arising from recent observations in the songbird. Specifically, there is evidence for a specialized cortical circuit that adds trial-to-trial variability to stereotyped cortical motor programs, and a role for the BG in ‘biasing’ this variability to improve behavioral performance. This BG-dependent ‘premotor bias’ may in turn guide plasticity in downstream cortical synapses to consolidate recently-learned song changes. Given the similarity between mammalian and songbird BG-thalamocortical circuits, our model for the role of the BG in this process may have broader relevance to mammalian BG function.
Many complex behaviors, such as speech or playing a musical instrument, are not innately determined but are acquired through practice. The reinforcement learning (RL) framework proposes that during practice, an animal experiments with its motor output and uses sensory feedback to reinforce action sequences to its improve its performance (Sutton and Barto, 1981). Currently, much of our understanding of how neural circuits may implement RL comes from studies of animals engaged in tasks motivated by tangible reinforcers such as food or juice (Schultz et al., 1997, Hikosaka, 2007). However, less is known about the neural mechanisms of natural motor learning that may be shaped by an internal template or evaluation system. Recently, songbirds have emerged as a tractable model system to study how reinforcement learning could drive the development of a complex motor sequence.
Juvenile zebra finches, for example, appear to employ a trial-and-error strategy during song development as they learn to imitate their tutor's song (the song ‘template’, Figure 1A) (Marler and Waser, 1977, Doya and Sejnowski, 1995). Like a babbling baby (Kuhl, 2004), juvenile songbirds produce highly variable vocalizations, called subsong (Marler, 1970, Doupe and Kuhl, 1999). During several weeks of practice, performance gradually improves as song acquires more temporal structure and starts to resemble the template (Figure 1B). Song learning requires auditory feedback: deafening in juvenile birds impairs song development and results in songs with abnormal acoustic structure and variability, suggesting that the learning bird compares his own song to a stored auditory memory or ‘template’ to generate corrections to his motor program (Konishi, 1965b, a, Marler and Sherman, 1983). These observations are consistent with the concepts of reinforcement learning, which posits that variability in juvenile song represents motor exploration, and that the comparison of the bird's own song to the template results in a reinforcement (or error) signal that directs plasticity in the song motor pathway (Doya and Sejnowski, 1995, Kao et al., 2005, Olveczky et al., 2005).
Is the RL model of song learning biologically plausible? What neural structures could carry out the central components of RL: motor exploration, song evaluation, and the implementation of motor plasticity? Here we address these components in terms of their possible implementation by circuitry in the songbird brain. We will start by describing some recent progress on the mechanisms of motor production, the generation of motor exploration, and on the generation of biased variability, an instructive mechanism that may guide plasticity in the motor program (Andalman and Fee, 2009). We will then turn to a relatively speculative model of how specific instructive bias signals could be computed in the basal ganglia from a simple reinforcement signal.
Adult zebra finch song consists of a reliable sequence of 2-7 distinct syllables called a ‘motif’. The individual syllables last roughly 100ms and are reproduced in highly stereotyped fashion across song renditions (Figure 1A). The neural circuitry underlying adult song production is well identified and exists in all songbird species that have been studied (Wild, 1997). The forebrain nucleus RA (robust nucleus of the arcopallium), which has structural and functional homologies to layer 5 of the primary motor cortex (Karten, 1991, Jarvis, 2004), projects in a topographic manner to primary motor neurons in the brainstem (Wild, 1993). During adult singing, RA neurons exhibit a sequence of bursts, the pattern of which is precisely reproduced each time the bird sings its song motif (Yu and Margoliash, 1996, Leonardo and Fee, 2005). This sequence of bursts then converges to drive a sequential pattern of activity in downstream motor neurons and muscles (Vicario and Nottebohm, 1988, Fee et al., 2004).
A major input to RA comes from the premotor cortical nucleus HVC (used as a proper name) (Nottebohm et al., 1976, Vu et al., 1994), which in turn receives input from the thalamic nucleus Uvaeformis (Figure 1C) (Nottebohm et al., 1982). Each neuron in HVC that projects to RA generates a single highly stereotyped burst, of roughly 6ms duration, at one time in the adult song motif (Hahnloser et al., 2002). It has been proposed that bursts in HVC drive, at each moment, the population of RA neurons active at that time (Fee et al., 2004). In this model, HVC resembles a ‘clock’ which marches through time in the song motor sequence (Long and Fee, 2008); a stereotyped sequence of neural activity in HVC drives a stereotyped sequence of RA neurons, which in turn drive a similarly stereotyped sequence of vocal motor outputs. It has been suggested that the sparse sequential activity in HVC is generated by synaptically-connected chains of neurons through which activity propagates (Li and Greenside, 2006, Jin et al., 2007, Long et al., 2010).
This model of adult song production suggests that the correct song pattern is produced by wiring up HVC to activate an appropriate subset of RA neurons at each moment in time, i.e. that song learning occurs via plasticity in the HVC-RA synapses (Johnson et al., 1997, Kittelberger and Mooney, 2005). In the context of song learning, a central question is how do HVC neurons select which neurons in RA to wire up with? In other words, how is plasticity in this cortico-cortical motor pathway guided such that the resulting ‘motor program’ (or pattern of activity in RA) results in a faithful reproduction of the tutor song? The reinforcement learning paradigm posits that this occurs by trial-and-error search.
Adult song in the zebra finch, the most commonly studied songbird, is highly stereotyped and is driven by precisely-timed neural activity in the HVC→RA pathway, but juvenile song is highly variable (Figure 1B). Does vocal variability represent motor exploration required for RL? What are the origins and functions of this vocal variability?
Recently, it has become clear that variability in both RA firing patterns and in juvenile song is not simply a consequence of immature connectivity in the HVC→RA motor pathway. Rather, it is actively injected into the motor pathway by a second input to RA from the frontal cortical nucleus LMAN (lateral magnocellular nucleus of the anterior nidoapallium) (Figure 1C) (Kao et al., 2005, Olveczky et al., 2005, Olveczky et al., 2011). LMAN forms the output of the anterior forebrain pathway (AFP), a circuit homologous to BG-thalamocortical loops in mammals that is necessary for vocal learning but not for song production in adults. While lesions of LMAN in adult birds have relatively little effect on song structure, in juvenile birds they produce profound deficits in song development (Bottjer et al., 1984, Scharff and Nottebohm, 1991).
Numerous observations now support the idea that LMAN drives vocal exploration. Most importantly, lesions (or transient inactivation) of LMAN cause a loss of song variability at all developmental stages. In the earliest ‘babbling’ stage of singing, LMAN lesions result in abnormal, highly stereotyped song (Bottjer et al., 1984), and pharmacological LMAN inactivations completely abolish subsong vocalizations (Aronov et al., 2008). Later in the plastic song stage, when song is still highly variable but the vocalizations have some repeatable components, LMAN lesions or inactivations largely abolish variability, resulting in a highly stereotyped yet simplified song (Figure 2A-B) (Scharff and Nottebohm, 1991, Olveczky et al., 2005) that is driven by HVC (Aronov et al., 2008). Finally, in adult song, which has only a small amount of variability, LMAN lesions (or inactivations) have a correspondingly small effect, but still produce a loss of variability (Bottjer et al., 1984, Kao and Brainard, 2006, Stepanek and Doupe, 2010).
Electrophysiological and gene expression results are also consistent with the view that LMAN drives variability in vocal output. First, RA-projecting neurons in LMAN exhibit highly variable spiking patterns in young birds (Olveczky et al., 2005) and show premotor bursts of activity prior to syllable onsets and offsets during vocal babbling (Aronov et al., 2008). Second, electrical stimulation of LMAN during singing causes immediate perturbation of ongoing song (Kao et al., 2005). Third, inactivation of LMAN largely eliminates variability in the song-related firing patterns of RA neurons in juvenile birds (Olveczky et al., 2011). Finally, in adult birds, immediate early gene expression and neural activity are elevated when the bird sings in an isolated social context (‘undirected song’) (Jarvis et al., 1998, Kao et al., 2008), in which the bird produces a more variable form of his song. In contrast, IEG expression and neural activity are lower when the bird sings to a female bird, producing the more stereotyped ‘directed’ form of his song.
In contrast to the effects of LMAN lesions, elimination of HVC by lesions or inactivation results in a complete loss of stereotyped structure from song. In young zebra finches, HVC lesions result in loss of the earliest appearance of stereotyped vocal structure, called protosyllables, and in the loss of respiratory-vocal coordination (Aronov et al., 2011, Veit et al., 2011). At later developmental stages, even in adults, HVC lesions result in vocalizations nearly indistinguishable from subsong (Aronov et al., 2008). Consistent with this, removal of the HVC input to RA in adult birds results in an immediate reversion of RA firing patterns from the normal highly-stereotyped bursting to the noisy bursting characteristic of subsong (Olveczky et al., 2011)
These results point to a view in which the earliest stage of singing—vocal babbling—is generated primarily by highly variable inputs to RA from LMAN (Figure 2C). Then in plastic song, HVC begins to inject stereotyped sequential structure into these noisy RA firing patterns, producing recognizable repeated syllables. As song learning progresses, HVC inputs gradually come to dominate over LMAN inputs in driving RA firing patterns, resulting in song of increasing stereotypy. Finally, in adult song, RA firing patterns consist of high-frequency bursts driven primarily by HVC, while LMAN inputs become much less effective at driving variability; the result is a highly stereotyped song (Figure 2C).
From the perspective of reinforcement learning, it is interesting that song variability gradually decreases over time during song development (Immelmann, 1969, Tchernichovski et al., 2001). This reduced variability is associated with a gradual increase in the stereotypy, sparseness and burst rate of RA neurons (Olveczky et al., 2011). An important question is how this gradual decrease in the motor pathway variability is achieved. One clue is that any reduction in drive from HVC to RA, from either complete or partial lesions of HVC result in an immediate increase in song variability (Thompson and Johnson, 2007, Thompson et al., 2007, Aronov et al., 2008), and RA firing patterns (Olveczky et al., 2011). Furthermore, adult birds with transection of the HVC to RA pathway exhibit RA firing patterns very similar to those recorded in subsong birds (Olveczky et al., 2011). Furthermore, LMAN firing patterns recorded in adult birds during undirected song (Kao et al., 2008) appear to be very similar to those recorded in juvenile birds (Olveczky et al., 2005), implying that the reduction in song variability during development may not result from a decrease in LMAN activity, but rather from a decreased effectiveness of LMAN inputs at driving variability in RA neurons. Furthermore, these results suggest that HVC inputs may play a direct and immediate role in producing the decreasing sensitivity of RA to LMAN inputs.
One possibility recently suggested (Olveczky et al., 2011) is that HVC drive to RA, in addition to driving large bursts in RA neurons, activates a strong tonic inhibition of RA neurons that produces the observed suppression of spiking in RA neurons between song-related bursts and at the end of song bouts (Spiro et al., 1999, Chi and Margoliash, 2001, Leonardo and Fee, 2005). This inhibition could be mediated by local interneurons in RA (Spiro et al., 1999) that hyperpolarize RA neurons and make them unresponsive to LMAN inputs, which are dominated by NMDA-type receptors (Mooney, 1992). In other words, at any point in the song, an RA neuron may be either so strongly activated by HVC that its firing rate is saturated and therefore unresponsive to LMAN input, or the neuron may be so strongly hyperpolarized by HVC-driven feedforward inhibition that the NMDA-receptor-mediated LMAN inputs undergo magnesium blockade. Thus, the gradual increase in the efficacy of HVC inputs observed during song learning (Kittelberger and Mooney, 1999) could automatically produce a decreasing responsiveness of RA neurons to LMAN input, and would produce a corresponding decrease in song variability that is matched to the progress in learning. A learning process that includes large exploratory variations early in learning and smaller variations late in learning may implement ‘simulated annealing’ (Kirkpatrick et al., 1983), a process that can ensure rapid and accurate convergence of a gradient descent learning algorithms to the global minimum, even in the presence of local minima.
While LMAN is clearly implicated in the generation of variability in the song motor pathway, this cortical nucleus is part of a complex circuit that likely plays a broader role in learning. Specifically, LMAN receives an excitatory projection from the portion of the thalamic nucleus DLM (medial portion of the dorsolateral thalamus) that in turn receives an inhibitory pallidal-like input from a basal ganglia nucleus Area X (Figure 1C) (Bottjer et al., 1989, Vates and Nottebohm, 1995, Boettiger and Doupe, 1998, Luo and Perkel, 1999b). Importantly, Area X has both striatal and pallidal cell types (Farries and Perkel, 2002, Carrillo and Doupe, 2004, Reiner et al., 2004). In fact, the LMAN→Area X→DLM→LMAN circuit forms cortico-striato-pallidal-thalamocortical loop that shares striking similarities to mammalian circuitry in its neurochemistry, synaptic connectivity and even in the firing patterns generated by specific cell classes in brain slice and during behavior (Luo et al., 2001, Farries and Perkel, 2002, Carrillo and Doupe, 2004, Doupe et al., 2005, Goldberg et al., 2010, Goldberg and Fee, 2010).
Similar to what has been observed in mammalian BG circuits (Alexander et al., 1986, Hoover and Strick, 1993), the LMAN→Area X→DLM→LMAN circuit forms a closed, topographically organized loop. Tracing studies demonstrate that a sub-region within LMAN projects to a sub-region of Area X, which in turn projects back to that same LMAN region through DLM (Johnson et al., 1995, Luo et al., 2001, Bottjer, 2004). This topographic organization is maintained in the projection from LMAN to RA (Iyengar et al., 1999). Given that RA and its projections to downstream brainstem motor neurons is myotopically organized, the songbird anatomy enables parallel circuits within the AFP to independently influence distinct channels of vocal motor output. This concept will be of central importance to our model of BG-dependent vocal learning.
It is interesting to note that there is another parallel pathway, with some similarities to the AFP, that innervates HVC rather than RA (Jarvis et al., 1998). This pathway involves a projection from a thalamic nucleus (DMP) near DLM (Foster et al., 1997) to the medial magnocellular nucleus of the anterior nidopallium (MMAN), a cortical region just medial to LMAN that projects to HVC (Nottebohm et al., 1982). Lesions of MMAN result in deficits of song learning (Foster and Bottjer, 2001), and in abnormal immediate early gene expression in HVC (Kubikova et al., 2007). Interestingly, other brain regions surrounding LMAN and Area X, which may have patterns of projections similar to the AFP, have been shown to be active during other motor behaviors such as hopping, suggesting that the song system evolved as a specialization of more general motor learning pathways in the avian brain (Feenders et al., 2008)
What roles do the BG and thalamic portions of the AFP play in the generation of LMAN-dependent motor exploration? There are conflicting findings on the role of Area X. In mammals, it has been proposed that behavioral variability could emerge within the striatum and influence behavior through downstream thalamocortical motor circuitry (Sridharan et al., 2006, Sheth et al., 2011). In support of this possibility in birds, infusion of a dopamine antagonist near Area X (though possibly also reaching LMAN) increased song variability when birds are singing to a female (Leblois et al., 2010); In such ‘directed’ singing, extracellular concentrations of dopamine measured in Area X are higher than otherwise (Sasaki et al., 2006). In addition, pallidal neurons within Area X, including those that project to DLM, exhibit highly variable firing patterns during singing, consistent with a possible role in driving variability in the downstream DLM→LMAN circuit (Hessler and Doupe, 1999a, Goldberg et al., 2010).
However, there is also evidence supporting the view that Area X is not necessary for the generation of song variability. In contrast to lesions of LMAN, elimination of Area X in juvenile birds leads to protracted song variability in adulthood (Sohrabji et al., 1990, Scharff and Nottebohm, 1991). Indeed, we have recently quantitatively examined the role of Area X in the generation of vocal variability in juvenile birds and found that songs exhibit normal vocal variability even after complete bilateral lesions of Area X. In contrast, bilateral DLM lesions abolish vocal babbling and largely eliminate song variability (Goldberg and Fee, 2011), similar to LMAN lesions (Bottjer et al., 1984, Scharff and Nottebohm, 1991, Olveczky et al., 2005). Thus, while the variability-generating function of LMAN requires inputs from DLM, it does not appear to require the BG component of the AFP.
More recently, it has been found that localized cooling of LMAN in very young birds slows down the timescales of subsong babbling, suggesting that neuronal or circuit dynamics within LMAN may play a direct role in the generation of vocal variability (Aronov et al., 2011). Of course, these findings do not rule out involvement of other components of the AFP or even RA. However, in the hypothesis that follows, we will emphasize the role of LMAN in generating song variability.
While Area X is not necessary for the expression of exploratory variability in juvenile birds, the BG play a crucial role in vocal learning. Birds that receive Area X lesions as juveniles develop songs with abnormal acoustic structure, protracted variability and little, if any, resemblance to the tutor song, as if motor exploration continues without proper reinforcement (Sohrabji et al., 1990, Scharff and Nottebohm, 1991, Goldberg and Fee, 2011). It has also been shown in adult birds that lesions of Area X block the singing-related up-regulation of immediate early gene expression in LMAN and RA (Hara et al., 2007, Kubikova et al., 2007), the downstream molecular targets of which may be genes related to memory consolidation and reconsolidation (Lee et al., 2004, Kubikova et al., 2007). What is the specific role that the BG play in song learning? We will return to this question after describing a powerful new technique for studying the mechanisms of vocal learning in songbirds.
A detailed investigation of the mechanisms of song learning faces several challenges. Natural song learning proceeds slowly and unpredictably (Tchernichovski et al., 2001), making it difficult to know how the singing bird classifies its vocalizations as sounding like the tutor or not like the tutor (Deregnaucourt et al., 2004). Recently, a song-operant conditioning task has been developed that brings the ‘value’ of specific vocalizations under experimental control (Figure 3)(Tumer and Brainard, 2007, Andalman and Fee, 2009). In this paradigm, a fast computer monitors natural variations in the pitch of specific song syllables in real time. When the pitch passes an experimentally-programmed threshold, a brief noise burst (~25 ms) is immediately triggered through a speaker to distort the auditory feedback perceived by the bird. Given the natural trial-to-trial variations of pitch across repeated renditions of the same syllable, some syllable renditions cross this threshold and are ‘hit’, while others ‘escape’ (Figure 3A). Remarkably, birds rapidly learn to change the pitch of targeted vocalizations; within hours of receiving this conditional auditory feedback (CAF) they largely produce only those syllable variants that avoid feedback (Figure 3B)(Tumer and Brainard, 2007, Andalman and Fee, 2009, Charlesworth et al., 2011). Importantly, LMAN-inactivated or lesioned birds exhibit very little trial-to-trial variability in pitch and exhibit no learning in this task (Andalman and Fee, 2009, Charlesworth et al., 2011). These experiments demonstrate that birds can use sensory feedback to reinforce successful syllable variations over others. They also suggest that LMAN-dependent fluctuations in song are evaluated to drive learning, consistent with a reinforcement learning model of song acquisition (Doya and Sejnowski, 1995).
Is the production of vocal variability the only function served by LMAN, or does LMAN also play a more specific role in shaping learned changes in vocal output? It has been proposed that LMAN may be involved in evaluating vocal errors and transmits an instructive signal that guides plasticity in the HVC→RA connection (Bottjer et al., 1984, Troyer and Bottjer, 2001). This hypothesis is supported by the fact that LMAN lesions essentially ‘freeze’ the motor program encoded in the HVC→RA pathway, preventing changes in juvenile and adult song (Bottjer et al., 1984, Scharff and Nottebohm, 1991, Williams and Mehta, 1999, Brainard and Doupe, 2000b, Horita et al., 2008). Indeed, given the premotor influence of LMAN on song, an interesting hypothesis is that plasticity in the HVC→RA pathway is instructed by a premotor drive that biases the song away from vocal errors (Kao et al., 2005, Olveczky et al., 2005).
The vocal operant conditioning task described above provides an opportunity to test hypotheses, such as this, about learning mechanisms. To examine the role of the AFP in song learning, it has been possible to inactivate LMAN after a period of conditional auditory feedback-driven learning, the result of which was an immediate loss of recently acquired adaptive changes to song (Andalman and Fee, 2009, Charlesworth et al., 2011). Specifically, in juvenile zebra finches, birds that had learned throughout the course of a day to avoid feedback reverted back to approximately the morning's performance following LMAN inactivation (Figure 3C)(Andalman and Fee, 2009). Because of this ‘unlearning’ of the acquired pitch change, LMAN inactivation resulted in an immediate increase in the amount of feedback that was incurred. A similar effect was obtained in Bengalese finches by infusion of AP5 into RA (Charlesworth et al., 2011), suggesting that LMAN contributes to learning through NMDA receptor-mediated glutamatergic transmission to RA (Mooney, 1992, Stark and Perkel, 1999, Olveczky et al., 2005). This finding suggests that LMAN is not simply injecting variability into ongoing song, but is also actively biasing this variability to improve behavioral performance. LMAN begins to make fluctuations more often in a direction (in motor parameter space) that result in a better outcome, and make fluctuations less often that lead to errors.
Because the average song structure is not affected by LMAN lesions in adult birds, it is clear that the plastic changes that underlie song learning must eventually be incorporated, or consolidated, into the motor pathway, and thus become independent of the AFP. To address the question of how quickly learned changes in song are incorporated into the motor pathway, repeated inactivations of LMAN have been carried out during sequential days of CAF-based learning. It was found that large changes in syllable pitch that accumulated from day-to-day were largely independent of LMAN (Andalman and Fee, 2009, Charlesworth et al., 2011) (Figure 3D), while LMAN-dependent bias was limited to the pitch change acquired on the same day. In young adult zebra finches, the consolidation process appears to take about one day (Andalman and Fee, 2009). In other words, changes in the pitch generated by the motor pathway alone (measured between subsequent LMAN inactivations) were strongly correlated with LMAN-dependent bias one day earlier (Figure 3E). In contrast, in adult Bengalese finches, motor-pathway plasticity appears to be much slower, taking roughly four days for the learned pitch changes to become LMAN independent (Charlesworth et al., 2011).
These findings hint at a possible direct role for LMAN-dependent bias in actively driving plastic changes in the HVC→RA pathway. In this view, one might think of AFP bias as representing the gradient of error in the space of RA activity. Thus, AFP bias could serve to provide an online correction to the motor performance, and could also guide plastic changes in the song motor pathway and that have the effect of reducing vocal errors. Over the course of many days of learning, these plastic changes accumulate, gradually pushing the motor pathway to a configuration that minimizes vocal errors (Andalman and Fee, 2009, Charlesworth et al., 2011).
How could LMAN bias drive accumulating changes in the motor pathway? One feature of bias is that it contains the information required to drive the correct synaptic changes at HVC→RA synapses by a simple local learning rule. If LMAN biases song output by reliably activating some subset of RA neurons at an appropriate moment in the song, then a simple spike-timing-dependent Hebbian mechanism could strengthen the HVC synapses whose activity reliably precedes LMAN-driven activity in the RA neuron. This would gradually allow HVC inputs to independently drive the same patterns of activity in RA. Note that this model does not explicitly require a reinforcement signal in RA, as has been previously proposed (Fiete et al., 2007), although it is possible such a signal to RA may be involved. The time course of the resulting consolidation may be linked to NMDA-receptor modulation of activity-dependent genes and downstream protein synthesis-dependent mechanisms in RA and perhaps LMAN (Lee et al., 2004, Kubikova et al., 2007, Redondo and Morris, 2011).
In summary, we hypothesize that AFP bias has two distinct roles: first, as an online correction that results in an improved performance, and second, as a learning signal that guides slower plastic changes in cortical motor programs (i.e. consolidation). We view this as a sequential process, with the learning of bias occurring rapidly, followed by a slower integration or accumulation of learned changes in the motor pathway. In fact, there may be parallels between consolidation observed in the songbird motor pathway and habit formation described in mammalian model system. In both cases, there is a transition from reward-driven behaviors (bias) influenced by BG output, to habitual behavior that may be executed by cortical circuits in a BG-independent manner (Atallah et al., 2007, Graybiel, 2008).
If bias and subsequent changes in the motor pathway represent the final stages of motor learning, what are the mechanisms that underlie the earlier stages of acquiring and expressing bias? An important clue is the temporal specificity with which bias is expressed: CAF-induced pitch changes can be localized to within ~10 milliseconds of the time in the song on which the CAF is made conditional (Charlesworth et al., 2011). While it has not been directly demonstrated, these findings provide some evidence that LMAN-driven bias may be a time-dependent signal that produces a different premotor drive at each moment in the song. If bias is generated by spiking activity in LMAN neurons, then time-dependent bias could result from a learned tendency of LMAN neurons to discharge more often at particular times in the song and less at others. Such song-locked firing has been observed in LMAN neurons (Figure 4A)(Hessler and Doupe, 1999a, Leonardo, 2004, Olveczky et al., 2005, Aronov et al., 2008, Kao et al., 2008), and to the extent these firing patterns drive spiking in RA neurons, they could generate bias. However, it is unknown if these song-locked patterns in LMAN represent learning-related signals acquired during previous vocal experience. This important question awaits recordings of LMAN neurons during a CAF learning paradigm.
In addition to the temporal specificity of AFP bias, the spatial specificity of the AFP is also likely important. As described earlier, the AFP is organized into multiple closed loops associated with different myotopic subdivisions of nucleus RA in the motor pathway. Thus, just as the topographic organization of the LMAN-X-DLM-LMAN loop allows each of these subdivisions, or ‘channels’ of the motor pathway to have an independent ‘noise’ input from LMAN, it also allows each channel to have its own independent bias signal.
A key question, then, is how the AFP computes bias. Information about song timing could only arise in LMAN through its thalamic input DLM, which in turn could receive song timing information from either of its afferent structures, Area X or RA (Wild, 1993, Vates et al., 1997, Luo and Perkel, 1999b), both of which receive inputs from HVC, the originator of song temporal structure (Long and Fee, 2008, Long et al., 2010). Either the cortico-thalamic (RA→DLM) or the BG-thalamic (Area X →DLM) pathway could be involved in the generation of temporally structured bias by LMAN. However, given the central role of Area X in vocal learning, we hypothesize that lesions of this BG circuit or its thalamic target DLM would disrupt at least the acquisition of CAF-driven changes in song pitch. While the RA→DLM pathway could certainly be involved in the expression of bias, in the rest of this review we will focus on the possible role of the BG in the classical (Area X→DLM→LMAN) anterior forebrain pathway.
One idea for the role of Area X in song learning is based on the ‘AFP comparison’ hypothesis – namely that the song template is stored in the AFP and that auditory information about the ongoing song, possibly transmitted via HVC, is evaluated within Area X (Mooney, 2004, Prather et al., 2008, Sakata and Brainard, 2008). The results of this comparison could then be transmitted through the AFP to direct plasticity in the motor pathway. (Doya and Sejnowski, 1995, Doupe, 1997, Brainard and Doupe, 2000a, Troyer and Doupe, 2000).
The AFP-comparison hypothesis and its variants are largely motivated by the observation of auditory responses in the AFP under anesthesia and in awake non-singing birds (Doupe, 1997, Person and Perkel, 2007, Prather et al., 2008), and even in sleeping birds (Dave and Margoliash, 2000). It should also be noted that auditory responses in song nuclei of non-singing birds is not a special property of the song learning circuit (AFP), but are also observed throughout the song motor pathway, even at the level of the syringeal motor neurons (Williams and Nottebohm, 1985). The function of such ubiquitous song-selective responses throughout these nuclei in non-singing birds is not known.
However, if comparison of ongoing song to the tutor song occurs within the AFP, one might expect AFP neurons to be sensitive to auditory/vocal errors during singing. Contrary to this prediction, responsiveness to distorted auditory feedback has not been observed during singing, either at the level of cortical inputs to Area X from HVC and LMAN (Leonardo, 2004, Kozhevnikov and Fee, 2007, Prather et al., 2008), or within the BG itself (JHG and MSF, unpublished findings). Also somewhat inconsistent with the AFP-comparison hypothesis are the observations that singing-related activity in the AFP is not altered by deafening (Hessler and Doupe, 1999b), and that presentation of song stimuli in awake animals produces primarily activation of auditory areas outside the AFP and motor nuclei (Mello et al., 1992, Mello and Clayton, 1994, Gentner and Margoliash, 2003).
The absence of auditory responses in AFP neurons during singing, together with the fact that deafening does not alter neural activity or immediate early gene expression in the AFP during singing (Jarvis and Nottebohm, 1997, Hessler and Doupe, 1999a), as well as increasing evidence that the AFP has a key premotor function in juvenile song (Kao et al., 2005, Olveczky et al., 2005, Aronov et al., 2008, Andalman and Fee, Charlesworth et al., 2011), have led us to consider an alternative model for BG-dependent song learning. We describe here a hypothesis in which Area X computes and generates, based on evaluations of song performance, a time-dependent and channel-dependent premotor signal that is transmitted to DLM to produce premotor bias in LMAN.
Notably, in our model, Area X is not involved in storage of the template, the processing of auditory feedback during singing, nor in evaluating the match to the song template. We hypothesize that these functions occur exclusively in auditory areas outside the traditional song system (London and Clayton, 2008, Keller and Hahnloser, 2009). Second, while our model requires that Area X receives an evaluation signal conveying the quality of ongoing song, we hypothesize that this signal is not transmitted via HVC (Troyer and Doupe, 2000, Mooney, 2004, Gale and Perkel, 2010), but rather through neuromodulatory inputs to Area X. This model is based fundamentally on the idea that Area X receives a global, fast, time-dependent evaluation signal that indicate good or bad song performance at fast timescales (e.g. <100ms).
While several different neuromodulators could play a role (Lewis et al., 1981, Ryan and Arnold, 1981, Castelino and Ball, 2005), the dopaminergic system is especially well suited to convey error-related signals in Area X. In mammals, dopamine (DA) neurons encode mismatch between anticipated and actual outcomes, and send reinforcement signals to the basal ganglia that are thought to regulate synaptic plasticity and direct behavioral learning in many different tasks (Schultz, 1997, Matsumoto et al., 1999, Tsai et al., 2009). In these studies, reinforcement was driven by external rewards such as food or juice. Could the DA system also play a role in internally computed rewards during motor/vocal learning?
In songbirds, DA neurons in the VTA send a massive projection to Area X (Gale et al., 2008), where they may also regulate synaptic plasticity (Ding and Perkel, 2004). Moreover, VTA receives input from descending cortical pathways (Gale et al., 2008) – including neurons in the arcopallium that may be analogous to layer 5 auditory cortical neurons (Mello et al., 1998). Auditory cortex could plausibly play a role in comparing the bird's ongoing song to the memorized song template. The results of such an evaluation could then be transmitted to VTA and then to Area X in the form of a reward prediction error (Hollerman and Schultz, 1998). It is known that VTA neurons exhibit singing-related neural activity (Yanagihara and Hessler, 2006, Hara et al., 2007), and DA levels in Area X are strongly modulated by singing (Sasaki et al., 2006), and that this dopaminergic input to Area X is necessary for normal patterns of immediate early gene expression during singing (Hara et al., 2007). We and others hypothesize that DA could have similar functions in songbirds as in mammals (Ding et al., 2003, Harding, 2004, Gale and Perkel, 2005, Kubikova and Kostal, 2010, Kubikova et al., 2010). For example, VTA neurons could provide Area X with a reward signal indicating how well the bird's own song matches the tutor memory (Gale and Perkel, 2010). Of particular interest is the possibility, which we propose here, that such a signal could take the form of a fast time-dependent reinforcement signal conveying the value of recent (e.g. <100 ms) song vocalizations.
How precisely could a reinforcement signal to the BG contribute to the acquisition of premotor bias that directs learning? Note that the concept of ‘premotor bias’ is closely related to the question: “What is the best action to select at a given moment in time (or in response to a specific cue)?” This question has been extensively studied in mammals, and it is widely hypothesized that striatal medium spiny neurons (MSNs) play a central role in promoting the selection of actions that maximize reward (Houk and Wise, 1995, Gurney et al., 2001, Bar-Gad et al., 2003). First, MSNs can respond selectively to sensory cues that require a specific action to be taken in order to obtain reward (Hikosaka et al., 1989). Second, MSNs can affect action: striatal microstimulation or optogenetic stimulation of MSNs triggers motor initiation (Alexander and DeLong, 1985, Kravitz et al., 2010). Third, MSNs may select specific actions over others. During behavior, MSNs may selectively discharge in advance of leftward, but not rightward eye movement (Kawagoe et al., 1998), or prior to specific steps in a stereotyped sequence (Barnes et al., 2005, Jin and Costa, 2010). Fourth, the activity of MSNs is strongly modulated by the history of reward associated with a given action (Kawagoe et al., 1998), and thus could promote the selection of the best one. For example, MSNs may discharge in advance of a leftward movement when it is expected to be rewarded, but not for the identical movement when it is not (Samejima et al., 2005). Finally, the behavior-locked firing of MSNs is adaptable, and can thus contribute to learning. When cue-reward contingencies are unexpectedly violated in these studies, MSNs can change their response rapidly—within a few trials—and this change in firing may precede the changed behavioral response that follows from the reversed association with the cue (Pasupathy and Miller, 2005, Watanabe and Hikosaka, 2005).
Thus MSNs appear able to detect a specific context, and to select among many actions the best one to take given the context. Importantly, their ability to alter their firing patterns during learning to bias motor output to obtain reward may depend critically on DA signaling. DA strongly regulates the synaptic plasticity at corticostriatal synapses (Calabresi et al., 2007, Kreitzer and Malenka, 2008), and it has been proposed that DA-dependent changes in synaptic weights at corticostriatal synapses give rise to changing MSN firing patterns during behavior (Bar-Gad et al., 2003, Yin et al., 2009), which, in turn, would alter firing in downstream thalamocortical circuits and lead to improved performance.
The framework, in which DA-modulated plasticity of corticostriatal synapses biases action selection (Surmeier et al., 2009), already prominent in the mammalian field, has not been extensively applied to song learning. Here we apply several concepts from this framework to the songbird model system, and suggest neural mechanisms by which MSNs in Area X could mediate the acquisition of premotor bias in the service of vocal learning. The crux of our hypothesis is that MSNs in Area X ‘monitor’ LMAN neurons to determine which of these produce variations that improve song performance. This could be done by correlating LMAN activity with a reinforcement signal. In monkeys, activity of dopaminergic midbrain neurons is correlated with rewarding stimuli (Hollerman and Schultz, 1998, Morris et al., 2004, Nakahara et al., 2004), and evidence in both birds (Ding and Perkel, 2004) and mammals (Reynolds et al., 2001, Calabresi et al., 2007, Kreitzer and Malenka, 2008) suggests that dopamine can modulate plasticity of corticostriatal synapses. Thus, a positive correlation between LMAN activity and reward could lead to a gradual strengthening of the appropriate cortical inputs onto MSNs. These strengthened inputs would then lead MSNs to spike at an appropriate time such that BG thalamocortical feedback to LMAN activates precisely those LMAN neurons whose activity led to a better vocal performance.
Consistent with LMAN's role in driving variability, LMAN neurons that project to RA discharge in a highly variable bursting pattern during singing. Importantly, the projection from LMAN to RA produces axon collaterals that project to Area X (Nixdorf-Bergweiler et al., 1995, Vates et al., 1997).1 Thus the activity of each of the ~10,000 LMAN neurons (Bottjer and Sengelaub, 1989) that drive variability in the vocal output can be directly observed by Area X. Because LMAN inputs to RA drive variability, we refer to efference copy of these inputs to Area X as ‘variability copy.’
The closed topographic loops in the LMAN→Area X→DLM→LMAN pathway are ideally suited to the computation and implementation of bias in the AFP. Because the LMAN neurons in one channel of the AFP project to a small subset of MSNs in Area X (Figure 3B), that subset of MSNs is responsible for evaluating the variations generated by their afferent LMAN neurons. If the variations produced by those LMAN neurons leads to improved song performance, the closed loop allows that subset of MSNs to feedback directly to bias the appropriate set of LMAN neurons on future song performances.
In addition to maintaining the specificity across different channels of the motor pathway, the AFP must also maintain the temporal specificity of bias. Inputs to Area X from HVC exhibit sparse and distributed firing patterns well suited to this purpose (Figure 3C). In the zebra finch, individual X-projecting HVC (HVC(X)) neurons generate one to three high-frequency bursts of spikes during each rendition of the song motif (Kozhevnikov and Fee, 2007, Prather et al., 2008). Each burst is brief (~10ms duration), highly reliable, and precisely time-locked to one point in the song with submillisecond timing precision. The bursts of different neurons occur at different times in the song, and appear to be distributed throughout the song (Figure 3C)(Kozhevnikov and Fee, 2007). These findings demonstrate that Area X receives a precise and sparse representation of the current time in the song that could be used to compute a temporally specific bias signal and to drive LMAN at precise times in the song.
Of course, crucial to our hypothesis is that each channel within Area X must be able to compute an appropriate time-dependent bias, and thus must receive from HVC a complete representation of time in the song. Consistent with this requirement, small injections of tracer into HVC result in widespread label in Area X (Nottebohm et al., 1982) and small injections of retrograde tracer into Area X lead to neurons labeled throughout HVC (Luo et al., 2001). These observations suggest a lack of topography in the HVC→X projection. Given the additional apparent lack of topography in the temporal organization in HVC (Figure 3B) (Hahnloser et al., 2002), and the fact that these neurons can generate more than one burst at widely distributed times in the song (Kozhevnikov and Fee, 2007), it seems likely that individual subregions of Area X receive inputs from HVC that are broadly distributed in time. The fact that HVC inputs to Area X are topographically broad while the LMAN inputs are restricted highlights a striking asymmetry in the role of these two different cortical inputs to the BG (Figure 3B). We will return to discuss this later.
How are the two cortical inputs—from HVC and LMAN—integrated by MSNs? Single-unit recordings in Area X of juvenile zebra finches reveal that putative MSNs exhibit sparse, precise spiking that is precisely time-locked to one point in the song (Figure 3D)(Goldberg and Fee, 2010). The fact that MSNs generate sparse sequential firing patterns during singing, so unlike their LMAN inputs, suggests that they may be driven largely by their HVC inputs (Figure 3D-F). The highly sparse firing patterns of MSNs are suggestive of their involvement in a temporally localized computation (Fiete et al., 2004).
Armed with the idea that Area X receives these three signals — (1) a topographically organized ‘variability copy’ from LMAN (2) a global representation of time from HVC, and (3) a reward prediction error from DA neurons — we can imagine a simple circuit capable of computing and driving a time-dependent bias signal within one ‘channel’ of the AFP circuit (Figure 4).
In this simple model, an LMAN neuron acts as a source of ‘noise’ by intrinsically generating a variable, bursty firing pattern (as in Figure 3A).2 This neuron projects topographically to a localized subregion of RA, where it produces deviations in some vocal parameter, for example increases in pitch. An axon collateral of this same neuron projects locally to MSNs in a subregion in Area X. We hypothesize that MSNs in this region compute whether activity in their LMAN inputs, at a particular time in the song, is correlated with a good or bad vocal outcome, as signaled by the reward input. If activity in LMAN produces a good outcome at a particular time, some MSNs begin to fire sparsely at that time, signaling the high ‘value’ of that LMAN neuron at that time. We will hypothesize that these MSNs begin to fire sparsely at a particular time because a correlation between LMAN activity and reward causes a strengthening of HVC inputs to MSNs.
The resulting activity of MSNs can then drive a temporal pattern of bias in LMAN that improves song performance. This is possible because the MSNs within this hypothesized ‘pitch channel’ converge to a small population of pallidal-like neurons that then project to a subregion of DLM, which projects back to the correct set of LMAN neurons in the pitch channel by virtue of the closed topography of the LMAN-Area XDLM-LMAN loop (Luo et al., 2001). The striato-cortical loop drawn in Figure 5A is analogous to the classical ‘direct’ pathway of the mammalian BG, such that activation of the MSNs will result in activation of the LMAN neuron (through disinhibition of DLM). Of course, in the real RA and LMAN circuits, there are likely at least hundreds of neurons within each AFP channel, but here we consider a single model neuron to represent computation within one channel.
Let us imagine a simple song with 5 time points, and that at times 2 and 4 in the song sequence the pitch tends to be too low. To generate the correct bias, we could activate MSNs at time points 2 and 4, which will then, through DLM, bias the hypothetical ‘pitch-up’ LMAN neuron to be more active at those time points, with the effect of increasing pitch at those time points (Figure 5B). An obvious way to achieve this would be to functionally connect HVC neurons projecting to Area X (HVC(X) neurons) active at times 2 and 4 to MSNs in the ‘pitch-up channel’ (Figure 4B). In order to maintain the sparse firing of MSNs, we could strengthen the synapse from HVC(X) neuron 2 onto one MSN, and the synapse from HVC(X) neuron 4 onto another MSN.
Selective strengthening of HVC inputs to MSNs could occur using a local synaptic learning rule that operates as a function of the three hypothesized inputs to MSNs: input from one LMAN neuron generating a highly variable pattern of activity, input from one HVC(X) neuron active at a single time point T, and a global time-dependent reinforcement signal indicating the performance of the vocal output at every time-point (Figure 5C). Specifically, the correct bias could be learned by strengthening HVC-to-MSN synapses in response to a synchronous activation of the LMAN and HVC inputs that is followed by an increase in the global reinforcement signal. Examples of similar learning rules include the ‘empiric synapse’ (Fiete and Seung, 2006, Fiete et al., 2007), and reward-modulated spike-timing-dependent plasticity (STDP) (Farries and Fairhall, 2007, Izhikevich, 2007).
Of course there could be a substantial delay between the time an LMAN neuron produces a fluctuation in the song and the time an evaluation of that fluctuation will return to Area X as a reward signal. The combined latency of LMAN premotor activity plus auditory processing could easily be in the range of 50-100ms (Troyer and Doupe, 2000). However, because of the very sparse activation of HVC synapses, each synapse could carry a memory, such as by synaptic tagging (Redondo and Morris, 2011), or by an ‘eligibility trace’ (Tesauro, 1992, Houk et al., 1994, Suri and Schultz, 1999) of earlier coincident activation of the HVC and LMAN inputs. Such mechanisms could solve the ‘temporal credit assignment problem’ (Sutton and Barto, 1998) emphasized in earlier models of song learning (Figure 5C) (Dave and Margoliash, 2000, Troyer and Doupe, 2000).
In this simple model we have represented the basic learning rule underlying the generation of AFP bias as an increase in the strength of the HVC-to-MSN synapse rather than the LMAN-to-MSN synapse. What is the reason for this asymmetry? Selective strengthening of the HVC-to-MSN synapse has two main advantages. First, the HVC input is more reliable than the LMAN input (Olveczky et al., 2005, Kozhevnikov and Fee, 2007, Kao et al., 2008, Fujimoto et al., 2011), and strengthening the HVC synapse would cause a more consistent activation of the MSN at a specific point in time. Second, a learning rule that strengthens the HVC input would allow the possibility of incorporating a predictive element. For example, an asymmetric STDP learning rule (Farries and Fairhall, 2007) could strengthen HVC inputs that precede the activation of the LMAN neuron, thus causing the activation of the MSN before the LMAN neuron! In this way, the activation of the MSN would be early enough to propagate through DLM to LMAN to activate the LMAN neuron at the correct time.
In the hypothetical scenario above, activity of a ‘pitch up’ LMAN neuron produced a better song outcome, and thus was biased by the AFP to consistently increase song pitch. But what if activity of this LMAN neuron at a different time produces a worse outcome for the song? It might be useful for vocal learning to have a mechanism that allows Area X to suppress the firing of an LMAN neuron whose activity causes vocal errors. However, because MSNs fire so sparsely, the firing rate of MSNs in the direct pathway cannot be reduced to implement a reduction of the output of LMAN neurons. One solution to this problem could be to invoke a contribution from an indirect pathway. Indeed, there is anatomical evidence that Area X contains an indirect pathway similar to the MSN→GPe→GPi projection in primates (Farries et al., 2005), and neural recordings reveal two pallidal cell types in Area X: GPi-like neurons that project to the thalamus, and GPe like neurons that do not (Goldberg et al., 2010). Furthermore, there is evidence for differential expression of D1 and D2 type dopamine receptors in MSNs of Area X, suggesting the possible existence of MSNs belonging to these two different pathways (Ding and Perkel, 2002, Kubikova et al., 2010) similar to the mammalian striatum (Deng et al., 2006, Kravitz et al., 2010).
In mammals, activation of MSNs in the indirect pathway is thought to cause inhibition of the GPe neurons leading to disinhibition of their target GPi neurons (Alexander and Crutcher, 1990). Thus, HVC-driven activation of indirect pathway MSNs (at a specific time) could increase GPi output at this time, resulting in suppression of activity in DLM and in the error-producing LMAN neuron. Of course, this signal could have the same channel- and temporal specificity described above for the direct pathway. Plasticity between HVC inputs and indirect pathway MSNs could be implemented with the exact same learning rule used previously, but with a negative sign, indicating that poor outcomes (‘negative reward’), rather than positive outcomes, should result in an increased synaptic weight. The specific expression of D1 and D2 receptors on direct and indirect pathway MSNs, respectively, could act to implement the pathway-specific learning rules we have proposed (Shen et al., 2008). For example, D1 receptors could promote LTP on direct pathway MSNs during phasic DA increases, and D2 receptors could promote LTP on indirect pathway MSNs during DA decreases (Calabresi et al., 2007, Hong and Hikosaka, 2011). However, it remains unknown if MSNs selectively expressing D1 or D2 receptors in Area X (Kubikova et al., 2010) are differentially connected to the hypothesized direct and indirect pathway pallidal neurons within Area X.
The model described so far represents just one simplified ‘channel’ of the AFP, hypothetically controlling song pitch. Of course, normal song learning requires that many aspects of motor output be learned in concert. A natural extension of the model is that a similar circuit operates in parallel in every distinct AFP channel — each controlling different muscles that affect different features of song, e.g. pitch, song amplitude or spectral entropy (Sober et al., 2008). In each hemisphere, there are approximately 3000 pallidal neurons contacting DLM neurons in a roughly 1-to-1 fashion (Luo and Perkel, 1999a, Farries et al., 2005), and these project topographically (though probably not 1-to-1) onto roughly 10,000 neurons in LMAN (Bottjer and Sengelaub, 1989, Burek et al., 1991). Interestingly, the 3000 possible independent output channels in Area X is similar to the number of primary motor neurons (Roberts et al., 2007) that topographically innervate only roughly 8 muscles on each side of the vocal organ (Vicario and Nottebohm, 1988).
One consequence of the temporal and channel specificity of MSNs we hypothesize is that there must be at least one MSN for every combination of LMAN neuron group (channel) and time in the song. Such a sparse representation in space and time would account for the large number of MSNs – roughly 400,000 in Area X of the zebra finch (Burek et al., 1991). This number of MSNs would provide sufficient coverage for roughly 100 independent temporal bins and 4,000 independent motor channels. Of course, all of these 400,000 MSNs in Area X need to converge back to control the bias of roughly 10,000 neurons in the zebra finch LMAN (Bottjer and Sengelaub, 1989). Because the number of pallidothalamic neurons in Area X is roughly similar to the number of DLM neurons and LMAN neurons, most of the convergence from MSNs back to LMAN occurs at the level of the MSN-to-pallidal projection.
It is also interesting to speculate on the possible extension of this model to the medial anterior forebrain pathway previously hypothesized (Vates et al., 1997, Jarvis et al., 1998, Kubikova et al., 2007). A small region of basal ganglia medial to Area X (mArea X) receives inputs from MMAN and HVC (Foster et al., 1997) in a manner parallel to the cortical projections into Area X (Jarvis et al., 1998), and it seems possible that medial Area X integrates these inputs in a way similar to what we have proposed for the HVC and LMAN inputs in Area X (Kubikova et al., 2007). Indeed, it is possible that the output (MMAN) of this parallel circuit functions to bias HVC during learning to produce the proper pattern of sparse bursts (Hahnloser et al., 2002, Fiete et al., 2010) or syllable sequence (Hosino and Okanoya, 2000, Jin et al., 2007, Lipkind et al., 2010, Andalman et al., 2011).
Our model shares several essential features with earlier models of mammalian BG function – for example the idea that the BG computes a correlation of cortical activity with reward and provides feedback to cortex to shape future behavior. However, a unique feature of our model is the distinct functionality we have assigned to the two different cortical inputs to Area X. Because of the functional segregation of LMAN as a variability generator and HVC as a timing generator, we suggest that the input from LMAN is a ‘variability copy’ signal allowing the BG to compute the correlation of song variations with a reward signal, and that the input from HVC is a sparse timing signal that allows the BG to compute this correlation locally at each moment in the song, and to drive a temporally specific corrective bias signal back to LMAN through the thalamus.
In our model, the HVC inputs to Area X serve a role similar to that envisioned for corticostriatal inputs by Houk and Wise (1995). In their language, these inputs represent the current ‘context’ in a motor behavior. Of course, context is extremely important in evaluating a behavior because a given behavior can yield very different outcomes in different contexts. For example, the transient activation of a particular muscle by LMAN may improve the song at one time point, but make the song worse at another.3 It is important to note that, in our model, the HVC signal transmitted to Area X may be better viewed as a ‘context’ signal, rather than an efference copy of a motor signal as has been suggested (Troyer and Doupe, 2000). It has been shown that the firing patterns of HVC(X) neurons carry little information about the particular sound produced during singing, but rather they code for a particular time point (context) in the song (Kozhevnikov and Fee, 2007, Prather et al., 2008, Fujimoto et al., 2011).
It is important to note here that Area X does not receive an efference copy of motor commands from RA, which would be analogous to the collaterals of pyramidal tract axons that project to the BG in mammals (Reiner et al., 2010).4 It only receives a copy of the signals from LMAN that drive variations in the song, and the signals from HVC that encode the timing, or context, in which those variations occurred. Thus, Area X appears not to evaluate aspects of the song motor program in RA that are driven from HVC, but rather only the variations driven from LMAN.5 The possible selective involvement of Area X in analyzing motor variations, rather than overall motor output, is a striking aspect of the organization of the song learning system.
The distinct projection patterns of HVC and LMAN—a tight topographic projection from LMAN and topographically divergent projection from HVC—are consistent with the distinct roles of these inputs in our model. ‘Variability copy’ signals must be localized because the bias computed and generated by MSNs must be transmitted locally back to the very same variability generating circuits through a closed loop. In contrast, ‘context’ signals must be projected more divergently within the BG circuit because a great many behavioral variations might potentially be useful in any given context. Anatomical observations in mammals have revealed both divergent corticostriatal projections (Parthasarathy et al., 1992, Flaherty and Graybiel, 1995), as well as the tight closed loops formed by the projections from BG-recipient thalamus back to cortex (Alexander et al., 1986, Hoover and Strick, 1993). In addition, careful reconstruction of individual corticostriatal axons shows a large heterogeneity in projection patterns: some neurons project to very localized zones with a small number of synapses <50, while others project to a 100-fold larger zone with >2500 synapses (Zheng and Wilson, 2002). These different projection patterns may be related to functional differences in these inputs in terms of whether they carry ‘context’ signals or ‘variability copy’ signals.
One of the remarkable features of basal ganglia organization is the massive convergence at every level from cortex to MSNs to pallidal neurons, and to thalamic neurons that project back to cortex (Bar-Gad et al., 2003). In rats, roughly three million MSNs converge onto only 30,000 pallidal output neurons and subsequently onto a similar number of thalamic neurons in the VA/VL complex (Oorschot, 1996). In humans, a similarly massive convergence from ~100 million MSNs to ~50,000 pallidal neurons is reported (Oorschot, 2010). In the context of our model, the reason for this convergence becomes apparent. If the role of Area X is to bias the variable activity of LMAN neurons, then the feedback from DLM to LMAN requires only as many channels as LMAN contains. In contrast, MSNs in Area X evaluate the performance of each LMAN channel separately at each moment in the song, which requires many more neurons. Broadly speaking, in our model, MSNs can evaluate the performance of every individual variability-generating neuron in cortex (LMAN), and do so independently in all different contexts. However, the result of this evaluation (in the form of bias) only needs to be sent back to the variability-generating circuits. Numerically, if there are N variability-generating circuits (channels) and M contexts, there will be M x N medium spiny neurons and only N thalamic channels that feed back to bias the N variability-generating circuits. In this view, convergence through the BG functions to link the very large number of possible context-motor configurations to the more limited number of motor or behavior effectors.
We have presented a highly speculative model of song learning that captures many of the observed features of the anatomy and physiology of the song system. We note that our model does not yet incorporate many observations on which much emphasis has been placed in other models of song learning (Troyer and Doupe, 2000, Doupe et al., 2004, Nottebohm and Liu, 2010). These include auditory responses in the motor and anterior forebrain pathways (Doupe, 1997, Prather et al., 2008, Sakata and Brainard, 2008), neurogenesis in HVC and Area X (Alvarez-Buylla et al., 1988, Scott and Lois, 2007), the effect of sleep on song learning and physiology (Dave and Margoliash, 2000, Deregnaucourt et al., 2005, Hahnloser et al., 2006), and the role of social context in the function of song circuitry (Jarvis et al., 1998, Hessler and Doupe, 1999b, Sasaki et al., 2006, Kao et al., 2008). Much work remains to integrate the diverse observations related to song learning. As a result, our model may be wrong in many details, or even in some of its central concepts, but it represents for us a working hypothesis that provides a basic framework for formulating future experiments. Of crucial importance is to determine if dopaminergic inputs to Area X carry online, fast reinforcement signals during singing that direct plasticity in medium spiny neurons.
Many aspects of our model, such as the role of reinforcement signals, are directly inspired by the mammalian BG literature. It is likely that concepts emerging from the study of songbirds will likewise influence studies and models of mammalian BG function. Indeed, our hypothesis for vocal learning in the songbird raises questions about parallels with mammalian motor learning. Are there specialized circuits in mammalian cortex that generate variability? Is there convergence at the level of MSNs between efference copies of variability-generating signals and inputs from sensory areas that represent contextual signals? The massive feedback of pallidothalamic circuits to frontal cortical areas that may be involved in the generation of behavioral variability and flexibility, would certainly be consistent with this analogy. It is clear, of course, that ‘context’ must also be considered more broadly than sensory inputs (Houk and Wise, 1995). It must include, as is the case for HVC, sequential context within complex motor behaviors, or perhaps even higher level context such as task rules and other executive components of behavior (Miller and Cohen, 2001).
Funding to MSF was provided by the NIH (grant # R01DC009183 and #R01MH067105) and to JHG by the NIH (grant # K99NS067062), and the Charles King Trust and Damon Runyon Research Foundation post-doctoral fellowships.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1This pattern of projection has been likened to that of layer 3 neurons in prefrontal cortex that send an axon to motor cortex and a collateral to the striatum (Reiner et al., 2003, Jarvis, 2004).
2In the simple one-neuron model of LMAN described here we have ascribed the variability as being intrinsic to the LMAN neuron. Of course, in the bird, this variability might also be generated by circuitry intrinsic to LMAN, or receive significant contributions from other circuits, including parts of the motor pathway or AFP.
3Here we use the term ‘context’ to refer to temporal position in a complex motor sequence, rather than the ‘social context’, which refers to whether the bird sings his song directed to a female (Immelman, 1969).
4Interestingly, parts of the arcopallium adjacent to RA do project to striatal areas (Bottjer et al., 2000).
5However, we note that the pathway from RA->DLM->LMAN (Vates et al., 1997) is a possible route by which HVC-driven RA activity could reach and be evaluated by Area X.