|Home | About | Journals | Submit | Contact Us | Français|
To understand the neural basis of human speech control, extensive research has been done using a variety of methodologies in a range of experimental models. Nevertheless, several critical questions about learned vocal motor control still remain open. One of them is the mechanism(s) by which neurotransmitters, such as dopamine, modulate speech and song production. In this review, we bring together the two fields of investigations of dopamine action on voice control in humans and songbirds, who share similar behavioral and neural mechanisms for speech and song production. While human studies investigating the role of dopamine in speech control are limited to reports in neurological patients, research on dopaminergic modulation of bird song control has recently expanded our views on how this system might be organized. We discuss the parallels between bird song and human speech from the perspective of dopaminergic control as well as outline important differences between these species.
Production of learned vocal expressions is a rare ability that is found only in humans, bats, dolphins, elephants, and sea lions among mammals and in oscine songbirds (e.g., zebra finches, canaries, crows), parrots and hummingbirds among birds (Janik and Slater, 1997; Jurgens, 2002; Jarvis, 2004; Poole et al., 2005). Among these species, only humans have the most advanced abilities for speech and language production. Their closest relatives, non-human primates, have a very limited ability to learn new acoustic structures or sequences of vocalizations in order to produce new types of vocalizations (Jurgens, 2002; Egnor and Hauser, 2004) and, therefore, do not represent a good model for probing the brain mechanisms of learned vocal motor control (Simonyan and Horwitz, 2011). Similarly, the closest relatives of oscine songbirds, suboscine songbirds, lack the ability to learn new songs (Kroodsma and Konishi, 1991). Thus, the fundamental question of how exactly the brain controls learned voice production for speech and song remains to be addressed through the studies in vocal learning species only. While such studies in dolphins, elephants and sea lions are not always possible and feasible, research in the past years in humans, songbirds and bats has provided some insights into the similarities and differences of their vocal motor pathways, which may lay a foundation for the understanding of the specialized control of their learned voice production.
Here, we review one of the aspects of learned vocal motor control in humans and songbirds, its dopaminergic regulation. First, we briefly outline the definitions of learned and innate voice production. We introduce the structural, functional and neurochemical organization of the basal ganglia with emphases on their involvement in human speech and bird song control. We then provide an overview of up-to-date findings on dopaminergic function of speech and song control in humans and songbirds, respectively, and suggest hypotheses on the role of dopamine in speaking and singing. We conclude with a discussion of the common and distinct features of learned vocal motor circuitry and its dopaminergic regulations between humans and songbirds.
The terms ‘voice’, ‘vocalization’, and ‘phonation’ are often used interchangeably, but there are fine differences between them. In a narrower usage, ‘voice’ refers only to sound production during vocal fold vibration. In this regard, ‘phonation’ is a technical term to describe the physical and physiological processes involved with vocal fold vibration that leads to sound production. In a broader usage, ‘voice production’ is synonymous to human speaking and singing, which are learned (voluntary) behaviors. The term ‘speech’ usually refers to learned voice production with semantic content by humans, while human song can be produced both with and without semantic content. Among birds, song can be learned or innate depending on species, such as in oscine and suboscine songbirds, respectively. Song consists of syllables produced in a sequence separate from isolated calls. On the other hand, the term ‘voice production’ is frequently used not only in reference to speech and its elements (e.g., production of vowels, syllables) but also interchangeably with the term ‘vocalizations’ for the description of non-speech and pre-speech sounds, such as those produced by animals and human infants.
The behavioral mechanisms of vocal learning to produce human speech and birds song are thought to be similar because both have critical (or sensitive) periods for learning in early life before adulthood and require auditory experience and feedback to develop and maintain their learned vocal sequences (Doupe and Kuhl, 1999). There are, however, some fundamental differences between vocal learning birds and humans. While humans are able to learn new vocal expressions throughout the lifespan, closed-ended (age-limited) vocal learners, such as zebra finches, are not able to imitate new songs well, if at all, in adulthood, whereas the open-ended vocal learners, such as African Grey parrots, have only limited ability to imitate new sounds as adults (Catchpole and Slater, 2008). As a next distinction, the repertoire of human speech is much larger than that of the most vocal learning birds. Importantly, birds use most of their songs for affective communication as opposed to semantic communication typical for human speech. Despite these differences, learned song among vocal learning birds and speech in humans are considered to be semi-functionally analogous. However, a closer functional analogy, however, would be either song in vocal learning birds and song with affective meaning in humans (i.e., singing to gain attention) or learned sequences of calls with some higher levels of semantic complexity in some vocal learning birds (e.g., African Grey parrots) and speech in humans.
As at the behavioral level, there are similarities, but also important differences in the brain organization between humans and songbirds in general and with respect to vocal motor pathways in particular. A major common feature of the basal ganglia circuitry between humans and birds is the formation of the closed cortico-basal ganglia-thalamo-cortical loops, including parallel direct and indirect basal ganglia pathways. A major difference between humans and birds is that the latter's cortex-like structure, called pallium, does not have a six-layered cortical organization as typically present in mammals, but rather consists of clusters of nuclei. The pallial neurons have histological similarities with those found in the basal ganglia; however, pallial neuroanatomical connectivity and functional properties are similar to those of the mammalian cortex (Reiner et al., 2004; Doupe et al., 2005; Jarvis et al., 2005). Furthermore, in contrast to the human speech controlling system, the songbird pallidum, substantia nigra pars reticulata and subthalamic nucleus are not considered to be part of the bird song system.
As we stated earlier, speech production is unique to humans, while other mammals, including non-human primates, have a limited ability to produce learned vocalizations. However, much of our understanding about the anatomical organization of the basal ganglia circuitry in humans comes from extensive research in non-human primates and rodents with an assumption of fundamental anatomical similarities between these mammalian species. On the other hand, recently developed neuroimaging techniques, such as diffusion tensor tractography and transcranial magnetic stimulation, have started probing human brain anatomical connectivity, investigations of which were limited in the past due to invasiveness of most neuroanatomical methodologies. Below, we present the anatomy of the basal ganglia circuitry and its implications for speech control based largely on neuroanatomical tracing studies in non-human primates and rodents as well as on limited, to date, neuroimaging studies in humans.
The human basal ganglia are a collection of closely interconnected subcortical nuclei that include the striatum (divided into the putamen and caudate nucleus), globus pallidus (divided into the internal (GPi) and external (GPe) segments), substantia nigra (divided into the pars compacta (SNc) and pars reticulata (SNr)), and the subthalamic nucleus (STN). The limbic portion of the basal ganglia includes the nucleus accumbens, ventral pallidum (or substantia innominata), and ventral tegmental area (VTA) (Fig. 1A). From the developmental point of view, the basal ganglia are considered to consist of only telencephalic structures, the striatum and pallidum, which are modulated by the brainstem structures, the STN, SN and VTA (Swanson, 2000; Reiner et al., 2004; Jarvis et al., 2005). Functionally, the human striato-pallidal system may be divided into the dorsal and ventral portions, which are involved in regulation of motor actions in response to cognitive activities and in processing of emotional and motivational stimuli, respectively (Heimer et al., 1982; Turner and Desmurget, 2010).
Within the descending vocal motor pathway, the laryngeal motor cortex (LMC) sends the strongest of all basal ganglia projections to the putamen, as determined using neuroanatomical tract tracing in non-human primates and diffusion tensor tractography in humans (Jurgens, 1976; Simonyan and Jurgens, 2003; Simonyan et al., 2009) (Figs. 2A and 3B,E). Cytoarchtectonically, the LMC is located in the ventral primary motor cortex in humans (area 4 of Broadmann (Brodmann, 1909) or area 4p of Geyer et al. (Geyer et al., 1996)) and in the ventral premotor cortex in non-human primates (area 6 of Brodmann (Brodmann, 1909) or area 6bα of Vogt and Vogt (Vogt and Vogt, 1919), area FCBm of von Bonin and Bailey (von Bonin and Bailey, 1947), area F5 of Matelli et al. (Matelli et al., 1985), area 6VR(F5)/ProM of Paxinos et al. (Paxinos et al., 2000), area F5(6Va/Vb) of Saleem and Logothetis (Saleem and Logothetis, 2007)) (Fig. 3A,D). Mapping studies have not been able to identify the laryngeal muscle representation in the primary motor cortex (area 4) in non-human primates (Hast et al., 1974; Jurgens, 1976; Simonyan and Jurgens, 2002; Coude et al., 2011), whereas the location of the LMC in the premotor cortex (area 6) in humans is assumed but remains to be confirmed. The LMC region in the primary motor cortex in humans and in the premotor cortex in non-human primates are considered to be homologous, with similar anatomical connectivity but different function (for review, see (Simonyan and Horwitz, 2011)). In both humans and non-human primates, the respective LMC regions have been shown to establish strong connectivity with other cortical regions involved in vocal and orofacial motor planning and initiation, such as inferior frontal gyrus, premotor cortex, and supplementary motor area, (Fig. 2A) as well as to receive a direct auditory input from the superior temporal cortex, a proprioceptive input from the primary somatosensory and inferior parietal cortices, and a vocal gaiting input from the cingulate cortex, among others (Jurgens, 2002; Simonyan and Jurgens, 2002, 2005a; Simonyan et al., 2009). In this review, we will focus on LMC connections contributing to the vocal basal ganglia circuitry.
The descending LMC-striatal projections, as other motor cortical-striatal projections, are thought to be excitatory (glutamatergic) and terminate in the ventromedial portion of the dorsal striatum over its entire rostro-caudal extent, as found in non-human primates (Jurgens, 1976; Simonyan and Jurgens, 2003) (Fig. 3B). Here, the striatal LMC projections form a ‘larynx’ terminal field, which is segregated in the anterior dorsal putamen (i.e., putamen rostral to the anterior commissure) and overlaps with projections from the orofacial motor cortex in the posterior putamen (i.e., putamen caudal to the anterior commissure) (Kunzle, 1975; Jurgens, 1976; Simonyan and Jurgens, 2003).
The ascending striatal-LMC projections originate directly from the substantia innominata with its embedded cholinergic basal nucleus of Meynert and amygdaloid extension, as well as indirectly from the striatum via the basal ganglia-thalamic loops (Simonyan and Jurgens, 2005b) (Fig. 3C). Hence, the LMC in non-human primates and humans establishes direct connections with different functional divisions of the basal ganglia, which are engaged in higher-order cognitive processes (i.e., anterior dorsal putamen), sensorimotor control of movement coordination and execution (i.e., posterior dorsal striatum), and attention and memory processing (i.e., basal nucleus of Meynert) (Mesulam et al., 1983; Alexander et al., 1986; Richardson and DeLong, 1988; Nakano et al., 2000).
Other basal ganglia structures, such as STN, GP and SN, also participate in the basal ganglia vocal motor circuitry (Fig. 2). As has been found in non-human primates and is considered to be the case in humans, the descending LMC projections passing through the striatum project out of the telencephalon and reach the nucleus ambiguus of the brainstem (i.e., the site of laryngeal motoneurons) through the dorsolateral SNr; the latter also projects to the parvicellular reticular formation, connected directly with the nucleus ambiguus (Fig. 2A) (Jurgens, 2002). Additionally, in contrast to non-human primates, human LMC sends direct projections to the nucleus ambiguus (Kuypers, 1958; Iwatsubo et al., 1990). This direct LMC-nucleus ambiguus connection appears to be a unique feature, possibly allowing for voluntary control of complex voice production in humans, such as during speaking and singing (Jurgens, 2002; Simonyan and Horwitz, 2011).
Functionally, the exact role (or roles) of the basal ganglia system in the control of human speech is not fully understood. The putamen and the head of the caudate nucleus are hypothesized to be involved in the control of learned vocal patterns (e.g., speech and song) but not innate emotional vocalizations (e.g., monkey calls, human laughing and crying) based on the fact that striatal lesions cause dysphonia, dysarthria and other verbal aphasias in humans but have no profound effects on monkey vocalizations (Damasio et al., 1982; Jurgens et al., 1982; Cummings, 1993; Lee et al., 1996; Nadeau and Crosson, 1997; Watkins et al., 2002; Simonyan and Jurgens, 2003). A series of neuroimaging studies in humans have further demonstrated a greater involvement of the striatum in speech initiation and production than in speech comprehension (for review, see (Price, 2010)). In addition, the putamen, together with cerebellum, has been implicated in the control of speech tempo (Hinton et al., 2004; Riecker et al., 2005; Riecker et al., 2006), while the head of the caudate nucleus was recently proposed to be involved in suppression of unintended responses (Price, 2010).
Besides the involvement of the striatum in the control of pure speech motor output, the functional importance of this region in speech control has been recently broadened by studies demonstrating its association with syntactic and semantic processing (Kotz et al., 2003; Davis and Gaskell, 2009; Price, 2010), explicit identification of linguistic aspects of emotional speech prosody (Bach et al., 2008), verbal semantic and episodic memory (Koylu et al., 2006; Ystad et al., 2010), and vocal imitation of novel speech sequences, including learning of a second language (Klein et al., 1994; Klein et al., 1995; Liu et al., 2010; Liegeois et al., 2011). Despite this recent progress, however, the question of how the putamen and caudate nucleus integrate and coordinate between all these motor and cognitive functions to contribute to the human speech controlling system remains to be elucidated.
Other basal ganglia components, such as STN and GP/SNr, have also been implicated as playing a role in the speech motor control. Recent studies have shown that the STN neuronal activity is modulated at the initiation, during the onset and following speech production (Watson and Montgomery, 2006; Hebb et al., 2011) and have suggested the STN importance in suppression of inappropriate responses or switching within semantic and phonemic sub-categories during verbal fluency tasks (Anzak et al., 2011) as well as in processing of emotional prosody (Bruck et al., 2011). The functional role of the GP/SNr remains less clear, as only a few studies have been conducted, with some suggesting a role for GP/SNr in verbal fluency and semantic processing (Wallesch et al., 1990; Murdoch, 2001; Troster et al., 2002). Given the anatomical importance of these structures in the formation of parallel descending LMC-basal ganglia-brainstem pathways, future studies exploring their functional role(s) will be critical for our complete understanding of the brain mechanisms underlying human speech control.
Similar to humans, the songbird basal ganglia consist of the striatum, pallidum, SN and STN (Fig. 1B). However, in contrast to humans, the songbird striatum consists of the lateral and medial segments, and the pallidum, SNr and STN are not considered integral components of bird song system. Instead, songbirds have a specialized song region in their dorsal striatum, called Area X, in which the majority of its medium spiny neurons (MSN) are intermingled with rare, sparsely distributed GP-like neurons. An Area X analog is present only in song learning bird lineages (i.e., songbirds, parrots and hummingbirds), but not in any other vocal non-learning birds studied to date (Kroodsma and Konishi, 1991; Feenders et al., 2008).
As it is the case with the role of the human striatum in speech control, the functional importance of Area X for bird singing is not fully understood. Due to similarities of neuroanatomical and functional properties as well as an impact of lesions on learned vocal behavior, Area X of song learning birds is considered to be analogous to human striatum. However, because songbird Area X has also an intermingling of pallidal neurons, lesions to Area X in songbirds might be more akin to a combination of striatal and pallidal lesions in humans. The dominating view assigns Area X a critical role in song learning based on studies showing that lesions to Area X impair learning of normal acoustic and syllabic sequence organization of the bird's song during early song learning phases (Scharff and Nottebohm, 1991) but have little effect on singing in adult songbirds, which have already learned their song (Kobayashi et al., 2001). However, it is important to note that lesions to Area X may lead to transient but substantial deficits in song sequencing (or production), similar to stuttering in humans (Kobayashi et al., 2001; Kubikova et al., 2007b). Furthermore, it has been shown that Area X in adult songbirds is highly active during song production with increased neuronal firing rates and its plasticity-related gene regulation (e.g., egr1, c-fos) depending on the social context and vocal explorations (Kimpo and Doupe, 1997; Jarvis et al., 1998; Hessler and Doupe, 1999). Specifically, when a male bird produces undirected song for vocal explorations or practice, firing rates are variable from one song bout to another, plasticity-induced gene regulation is high, and the song pitch varies considerably. When the same male bird sings directly to a female bird, neuronal firing rates in Area X are significantly lower and more stereotyped, subsequent induction of plasticity related genes is much less than during undirected song, and the male bird produces significantly less variable song (Jarvis et al., 1998; Hessler and Doupe, 1999; Kao et al., 2005).
Pallial control of learned song production is performed by the song-specialized HVC nucleus (letter-based name) of the nidopallium and the robust nucleus of the arcopallium (RA) (Fig. 1B). The HVC nucleus has been hypothesized to be analogous to layer III and the RA nucleus analogous to layer V of the human LMC (Jarvis, 2004). Similar to the human LMC, the songbird RA nucleus projects directly to vocal motor neurons (nXIIts) of the brainstem (Nottebohm et al., 1976), but, unlike mammalian cortical layer V neurons, has no connections with the striatum (Jarvis, 2004). Instead, the HVC nucleus has two projection cell types, one projecting to the RA nucleus and the other projecting to striatal Area X. The latter projection is the input into the anterior forebrain pathway (AFP), i.e., pallial-basal ganglia-thalamo-pallial pathway. This loop consists of three interconnected major song nuclei: Area X projecting via pallidal-like neurons to the anterior dorsolateral medial (aDLM) and dorsal intermediate posterior (DIP) nuclei of the thalamus, which further projects to the pallial magnocellular nucleus of the anterior nidopallium (MAN), which closes the loop by projecting back to Area X. The lateral part of MAN (LMAN) sends output to the RA nucleus, and the medial part of MAN (MMAN) sends output to the HVC nucleus (Kubikova et al., 2007a). A balance between the variability generated by LMAN and possible consolidation generated by Area X has been proposed to enable vocal learning in songbirds (Scharff and Nottebohm, 1991; Jarvis, 2004). However, while behavioral studies suggest that LMAN induces variable patterns of song production in both juvenile and adult songbirds (Kao et al., 2005; Olveczky et al., 2005), recent studies in adult songbirds have shown that, unlike in juveniles, Area X is not needed to maintain a high level of stereotypy in a closed-ended learner (i.e., zebra finch) (Goldberg and Fee, 2011).
The role of the basal ganglia in general and dopaminergic transmission through the nigro-striatal pathway in particular is critical for the movement control (Graybiel et al., 1994; Swanson, 2000; Redgrave et al., 2010). Similar to other motor behaviors, dopaminergic modulation of human speech production is thought to be exerted through the striatal dopamine release from the ventral portion of the SNc and the ventral tegmental area (VTA). Dopamine regulates the functional balance between the direct and indirect basal ganglia circuits. Moreover, due to direct projections from the SNc and VTA to the LMC as shown in non-human primates (Jurgens, 1982; Simonyan and Jurgens, 2005b) and to the GPi and STN as shown in non-human primates and rodents, respectively (Smith et al., 1989; Cragg et al., 2004) (Fig. 3C), it is likely that dopamine may additionally directly modulate the activity of the LMC and other components of the vocal basal ganglia circuitry during speech production (Fig. 2A). This hypothesis, however, remains to be tested.
Within the direct basal ganglia circuitry, the mammalian striatum conveys a direct input to the GPi and SNr, while giving off branching collaterals to the GPe (Albin et al., 1989; Redgrave et al., 2010) (Fig. 2A). The majority of these striato-pallidal and striatonigral projections are GABAergic (inhibitory) with some fibers also containing enkephalin, substance P and dynorphin (Parent and Hazrati, 1995). Within the indirect basal ganglia circuitry, the striatal projections first reach the GPe, which further projects directly to the GPi and indirectly, via the STN, to the SNr (Carpenter et al., 1981a; Carpenter et al., 1981b; Smith et al., 1990; Shink et al., 1996; Smith et al., 1998). The striato-GPi fibers contain GABA and substance P; the striato-GPe fibers contain GABA and enkephalin, whereas the STN-GPi and STN-SNr fibers express glutamate (Carpenter et al., 1981a; Carpenter et al., 1981b; Smith et al., 1990; Shink et al., 1996; Smith et al., 1998). Striato-pallidal and striato-nigral pathways within both direct and indirect circuits send inhibitory (GABAergic) projections to the ventral lateral and centromedian (striatopallidal pathway) as well as ventral anterior and mediodorsal (striato-nigral pathway) thalamic nuclei (Carpenter et al., 1976; Ilinsky et al., 1985; Fenelon et al., 1990; Percheron, 2004), which further send an excitatory (glutamatergic) input to the cerebral cortex, including the LMC. The direct and indirect basal ganglia circuits interact with each other through the collaterals within the striatum (Yung et al., 1996) and the common connections with the GPi and SNr (Parent et al., 2000). The two circuits, however, have opposite effects on thalamic activity: while the direct pathway provides a positive feedback by increasing the activation of thalamo-cortical neurons, the indirect pathway exerts a negative feedback by reducing the thalamic activation of cortical neurons (Wilson, 2004).
In songbirds, similar direct and indirect pathways exist (Reiner et al., 2004; Doupe et al., 2005; Farries et al., 2005). Striatal neurons of Area X first project to GPi-type neurons sparsely located within Area X and then further to the vocal part of the thalamus (i.e., aDLM/DIP), thus forming a direct pathway. The GPe-type neurons within Area X also receive striatal input and further project to the same GPi-type neurons, which, in turn, project to the aDLM/DIP, thus forming an indirect pathway (Gale and Perkel, 2010; Goldberg et al., 2010). The aDLM sends an excitatory input to LMAN and RA, whereas DIP projects to MMAN.
In mammals and presumably in songbirds, the balance between the direct and indirect basal ganglia circuits is regulated through differential modulation of D1- and D2-family dopamine receptors by dopaminergic input from the SNc and VTA (Kubota et al., 1986; Gerfen, 2000; Ding and Perkel, 2002; Kubikova et al., 2010; Redgrave et al., 2010). In mammals, the D1-family receptors (including D1A and D1B (or D5)) are coupled to the G protein Gαs and are preferentially expressed on the MSNs, mainly within the direct pathway, whereas the D2-family receptors (including D2, D3 and D4) are coupled to the G protein Gαi and are expressed mainly within the indirect pathway (Gerfen, 1992; Surmeier et al., 1998). In songbirds, preferential expression of D1- (including D1A, D1B (or D5) and D1D) and D2-family receptors (including D2, D3 and D4) on the MSNs within the direct and indirect pathways, respectively, has also been proposed (Kubikova et al., 2010); however, more experiments are required to characterize their organization in birds. Because activation of the D1 receptors enhances and activation of the D2 suppresses the MSNs, dopamine release leads to excitation of the direct circuit and to inhibition of the indirect circuit. Collectively, dopamine released within both the direct and indirect circuits reduces the inhibition of thalamo-cortical neurons, allowing facilitation of movements initiated in the motor cortex (Alexander and Crutcher, 1990; Wichmann and DeLong, 1996; Smith et al., 1998). In turn, direct striatal projections to the SNc contain GABA and, hence inhibit the activity of the SNc in non-human primates and possibly in humans (Hedreen and DeLong, 1991; Lynd-Balta and Haber, 1994). Songbirds, however, do not appear to have a projection from the striatum to the SNc (Gale and Perkel, 2010), indicating that such a projection may not be critical for learned sequencing of their songs.
In spite of several differences between anatomical and functional organization of human speech and bird song systems, our present knowledge about dopaminergic function during normal speech production is largely based on a series of studies in songbirds and a recent report in bats. To date, investigations on the functional role(s) of dopamine (and other neurotransmitters) in human speech control have not been reported in healthy subjects. On the other hand, a few case reports of patients treated with dopamine receptor antagonists have described acute drug-induced alterations of dopaminergic transmission, which temporarily disrupted normal vocal motor control, leading to the development of uncontrolled laryngeal spasms (Newton-John, 1988; Warren and Thompson, 1998). Studies in neurological and psychiatric patients with voice and speech problems, such as Parkinson's disease (PD), stuttering, Tourette's syndrome, and schizophrenia, have further hinted at the importance of proper dopaminergic modulation of speech control.
The neuropathological hallmark of PD is a significant loss of dopaminergic neurons in the SNc, resulting in a reduction of striatal dopamine release (Hornykiewicz, 2001). Along with the main PD symptoms, such as bradykinesia, muscular rigidity, tremor and disturbances of gait and posture, nearly 90% of PD patients experience severe changes of their voice quality, characterized by monotone and hypotonic speech with limited pitch range, reduced loudness, decreased accuracy of articulation, and a raspy voice (Aronson, 1980). With the advancement of disease, more than half of non-demented patients develop additional cognitive deficits affecting speech and language, such as difficulties with phonological processing, syntactic complexity, and language comprehension (Walsh and Smith, 2011). In addition, patients with advanced Parkinson's disease occasionally develop compulsive involuntary singing and humming while receiving high-dose dopamine replacement therapy (Friedman, 1993; Bonvin et al., 2007; Kataoka and Ueno, 2010). Mechanistic aspects of dopaminergic dysfunction affecting voice and speech production in PD patients remain, however, largely unclear, while pharmacological and surgical (e.g., deep brain stimulation) treatments have mild, if any, effect on speech improvement.
Stuttering is another neurological disorder that has been linked to the alterations in dopaminergic function. It has been suggested that lateralization effects of the nigrostriatal dopamine release might represent a contributing factor in the pathophysiology of this disorder, as persons who stutter exhibit three times higher uptake of a dopamine precursor in the left but not the right caudate nucleus (Wu et al., 1997). Even though these findings stem from a single report, they provide a starting point for further investigations of dopamine lateralization effects on symptom expression in neurological patients with speech disorders.
Altered dopaminergic function has further been associated with such psychiatric voice problems as vocal tics in Tourette's syndrome and auditory verbal hallucinations in schizophrenia. Dopamine levels have been reported to be abnormally increased in Tourette's syndrome (Heinz et al., 1998; Wong et al., 2008), whereas schizophrenia symptoms have been associated with both elevated subcortical and deficient cortical dopaminergic function (Thompson et al., 2009).
Hence, the evidence from neurological and psychiatric patients explicitly indicates a central modulatory role played by dopamine in proper execution of motor commands and higher-order cognitive processing associated with speech production. Clinically, characterization of mechanistic aspects of normal and dysfunctional dopaminergic transmission would potentially guide the identification of novel therapeutic opportunities in these patients, whose speech problems largely do not respond to currently available treatments.
In songbirds, singing has been demonstrated to induce dopamine release into Area X (Sasaki et al., 2006) (Fig. 4). Importantly, the increases in striatal dopamine levels are dependent on the social context, such that higher levels of dopamine are maintained in Area X when a male bird sings directly to a female bird compared to when a male bird produces undirected song. Such high maintenance of dopamine release during directed singing appears to occur through differential activity of the dopamine re-uptake transporter in the VTA axons within Area X. Pharmacological blockade of the transporter has been shown to lead to increased levels of dopamine release during undirected singing similar to the high dopamine levels during directed singing (Sasaki et al., 2006) (Fig. 4), while unilateral damage of the VTA input to Area X has been reported to reduce a bird's production of socially directed but not undirected songs (Hara et al., 2007).
While dopamine release or pharmacological manipulations of dopamine receptors have been shown to modulate both D1 and D2 receptors in Area X (Ding and Perkel, 2002), activation of D1 receptors has been found to have a stronger correlation with changes in song behavior (Kubikova and Kostal, 2010), triggering song variability difference in undirected versus directed singing (Leblois et al., 2010). The activation of D1 receptors appears to suppress glutamatergic presynaptic input from HVC and LMAN onto Area X, similar to that found in mammalian nucleus accumbens (ventral striatum), but opposite of that found in mammalian dorsal striatum (Ding et al., 2003). Further, socially motivated singing behavior has been shown to correlate with D1 but not D2 receptor activation in the brainstem nuclei controlling motivation to sing (Heimovics et al., 2009). In contrast, a recent study in bats has demonstrated involvement of striatal D2 but not D1 receptors in modulation of their echolocation pulse acoustics (Tressler et al., 2011). Such differential involvement of D1 and D2 receptors in regulation of learned vocal behavior in songbirds and bats may represent vertebrate lineage differences between birds and mammals. However, it is likely that there will be some synergistic role between both receptor types, as synergetic activation of D1 and D2 receptors in mammals has been found to have greater modulatory enhancement of behavior than activation of either receptor type alone (Gerfen, 2000).
Based on a compilation of different proposals (Gerfen, 1992; Ding and Perkel, 2002, 2004; Kubikova et al., 2010; Leblois et al., 2010; Goldberg and Fee, 2011), we hypothesize that dopaminergic input into Area X modulates the behavioral output and subsequent gene induction in the pallial-basal-ganglia-thalamic song loop for long-term changes necessary in song behavior. A vocal learning bird develops an auditory memory of the song heard from the tutor, a so-called auditory template (Konishi, 2010). During undirected singing, the bird monitors his own song for errors to closely match it with the auditory template, while, during directed singing, bird sings an already perfected song to attract a female (Jarvis et al., 1998; Woolley and Doupe, 2008). When, during undirected singing, a match occurs to the song heard, the low levels of dopamine release would preferentially activate the Area X's D1 receptors or possibly synergistically activate D1 and D2 receptors. These would then enhance or suppress the glutamatergic input from LMAN and HVC. On the other hand, during directed singing, higher levels of dopamine release would preferentially activate D2 receptors and weaken the input from LMAN and HVC (Kubikova et al., 2010). These opposite changes in synaptic and behavioral plasticity may be due to the opposing effects of dopamine on D1 and D2 receptors, as shown in mammals (Gerfen, 2000). Validating this hypothesis requires testing whether dopamine levels during singing are different when the produced song matches the tutor song more closely than when there is a poor match. It will also require manipulating dopamine and their receptor levels during song learning or modification and testing whether controlling dopamine can experimentally influence how well the song is copied.
Human speech and bird song appear to share some similar features, although several differences also exist. Most importantly, both species are capable of learning new utterances for speech and song production, respectively, in addition to production of genetically pre-programmed, innate vocalizations. Although the songbird pallium consists of groups of nuclei instead of mammalian cortical layers, some similarities in the organization of vocal neural pathways in general and basal ganglia-thalamo-cortical loops in particular exist between songbirds and humans. Within the parallel direct and indirect basal ganglia circuits, dopaminergic function appears to be critical for modulation of the neural activity of striato-thalamo-cortical circuits for complex goal-directed or context-dependent changes in speech and song output.
On the other hand, differences in the underlying mechanisms of dopaminergic influences on human speech and bird song should be acknowledged. While dopaminergic modulation of bird singing has been shown to have larger effects through the striatal D1 receptors than through the D2 receptors in songbirds, a more complex human speech appears to be regulated through the modulation of activity of both receptor types. More detailed investigations are required to determine the potential role of D2 receptors in bird song production as well as to identify the exact role(s) of both D1 and D2 receptors in normal and disordered human speech and language control.
Thus, due to similarities with humans in the ability to produce learned vocalizations, songbirds present a potentially useful model to study some shared mechanisms of dopaminergic transmission underlying speech and song production. However, as humans possess several unique features of speech control, human dopaminergic regulation may need to be studied in humans only. Some of these studies, however, may be challenging in humans and may depend on the development of innovative methods due to the invasive nature of the experimental approaches used to reveal the dopaminergic function in songbirds.
This work was supported by the National Institute on Deafness and Other Communication Disorders, National Institutes of Health (DC009620 to K.S., Intramural Program to B. H.), the Howard Hughes Medical Institute and the National Institute of Mental Health (MH62083 to E.D.J.).