|Home | About | Journals | Submit | Contact Us | Français|
Vocal learning, the substrate for human language, is a rare trait found to date in only three distantly related groups of mammals (humans, bats, and cetaceans) and three distantly related groups of birds (parrots, hummingbirds, and songbirds). Brain pathways for vocal learning have been studied in the three bird groups and in humans. Here I present a hypothesis on the relationships and evolution of brain pathways for vocal learning among birds and humans. The three vocal learning bird groups each appear to have seven similar but not identical cerebral vocal nuclei distributed into two vocal pathways, one posterior and one anterior. Humans also appear to have a posterior vocal pathway, which includes projections from the face motor cortex to brainstem vocal lower motor neurons, and an anterior vocal pathway, which includes a strip of premotor cortex, the anterior basal ganglia, and the anterior thalamus. These vocal pathways are not found in vocal non-learning birds or mammals, but are similar to brain pathways used for other types of learning. Thus, I argue that if vocal learning evolved independently among birds and humans, then it did so under strong genetic constraints of a pre-existing basic neural network of the vertebrate brain.
Vocal learning is the ability to acquire vocalizations through imitation rather than instinct. It is distinct from auditory learning, which is the ability to make associations with sounds heard. An example of auditory learning is demonstrated when a dog learns to associate the words “sit” (English), “sientese” (Spanish), or “osuwali” (Japanese) with the act of sitting. The dog understands the word, but cannot imitate the sound of the word. However, auditory learning is not solely defined in the context of one species learning to grasp the meaning of sounds of another species. For example, a young vervet monkey will make innate alarm calls to all large animals. Auditory learning is exhibited when the young animals apparently learns from adult conspecifics to refine the production of their alarm calls only to predators.1 Vocal learners, such as humans, parrots, and some songbirds, are able to imitate non-innate sounds such as sit, sientese, or osuwali. Most vocal learners, however, imitate sounds of their own species that have been passed on through cultural transmission.2 Vocal learning depends upon auditory learning, but auditory learning does not depend on vocal learning. Vocal learners must hear the sounds they will later imitate. They must also use auditory feedback to correct their vocal output,3 apparently forming auditory memories of whether or not they matched the sounds that they are trying to imitate.
Given the above definitions, most, if not all, vertebrates are capable of auditory learning, but few are capable of vocal learning. The latter has been experimentally found to date only in three distantly related groups of mammals (humans, bats, and cetaceans) and three distantly related groups of birds (parrots, hummingbirds, and songbirds) (Fig. 1).4,5 Not all species in these groups have vocal learning abilities to the same degree. For example, humans, the most prolific vocal learners, can learn to produce a seemingly infinite number of combinations of learned vocalizations. While not as prolific, some parrots, corvid songbirds, starlings, and mockingbirds can produce hundreds if not thousands of calls and/or learned warble/song combinations. Finally, less prolific vocal learners, such as some very stereotyped songbirds and hummingbirds produce only one distinct song type with little variation.6–8
Each of the vocal learning avian and mammalian groups has close vocal non-learning relatives (Fig. 1). Thus, it has been argued that vocal learning has evolved independently of a common ancestor in the three vocal learning bird groups4 and presumably in the three vocal learning mammalian groups. The question thus arises is whether there is something special about the brains of these animals that can imitate sounds.
A combination of methods, which include tract tracing, Nissl staining of brain sections, lesioning of specific brain regions, electrophysiological recordings, and vocalizing-driven gene expression, has revealed that songbirds, parrots, and hummingbirds each have seven cerebral or telencephalic vocal brain nuclei that are active when they are producing learned vocalizations (Fig. 2A–C; abbreviations in Table 1).5,9–12 These brain nuclei have been given different names in each bird group because of the possibility that each group evolved them independently of a common ancestor with such nuclei.5,13 Three of the nuclei are in nearly identical brain locations in each bird group, forming a column across three brain subdivisions (Fig. 2A–C, dark grey): (1) a nucleus in the anterior striatum (parrot MMSt, hummingbird VAS, and songbird Area X); (2) one in the anterior nidopallium (parrot NAOc, hummingbird VAN, and songbird MAN); and (3) one in the anterior mesopallium (parrot MOc, hummingbird VAM, and songbird MOc-like). The other four nuclei are located more posteriorly, in different locations relative to each other for each bird group, but within comparable areas of the same cerebral subdivisions (Fig. 2A–C, white): (4) a prominent nucleus that bulges from the nidopallium into the overlying ventricle (parrot NLC, hummingbird VLN, and songbird HVC); (5) a robust appearing nucleus within the arcopallium (parrot AAC, hummingbird VA, and songbird RA); (6) a small nucleus in the nidopallium (parrot lAN, hummingbird VMN, and songbird NIf); and (7) a small nucleus near the latter in the mesopallium (parrot lAM, hummingbird VMM, and songbird Av). To date, none of these seven vocal nuclei have been found in the cerebrums of vocal non-learning birds.14–16
The neural connectivity between most of the seven cerebral vocal nuclei has been determined in both songbirds and parrots,13,17,18 and some connections have been determined in hummingbirds.16 In all three bird groups, the posterior nuclei (numbers 4 and 5 above) appear to be part of a posterior vocal pathway that is connected with vocal motor neurons of the brainstem. This pathway includes a projection from a nidopallial vocal nucleus (HVC, NLC, VLN) to the arcopallial vocal nucleus (RA, AAC dorsal part, VA), and from there to midbrain (DM) and medulla (nXIIts) vocal motor neurons (Fig. 2A–C, black arrows). DM and nXIIts are also present in vocal non-learning birds and are known to control production of innate vocalizations. However, vocal non-learning birds apparently do not contain projections from the arcopallium to DM and nXIIts.20 The connectivity of the other two cerebral posterior vocal nuclei (numbers 6 and 7 above) in hummingbirds is not known, and in parrots has not been well studied.12 In songbirds, one of these nuclei (NIf) projects to HVC and the other (Av) receives a projection from HVC (see Figure 2 of Reiner et al., this volume,34 pp. 90,91).21
In songbirds and parrots, the nuclei located anteriorly are part of what is called an anterior forebrain pathway loop, where the pallial vocal nucleus (MAN, NAO) projects to the striatal vocal nucleus (Area X, MMSt), the striatal vocal nucleus in turn projects to a nucleus in DLM of the dorsal thalamus (DLM, DMM), and the dorsal thalamus nucleus in turn projects back to the pallial vocal nucleus (MAN, NAO) (Fig. 2A,C, white arrows).17,18 In the songbird DLM, the anterior part has been proposed to be the vocal part of this nucleus.22 The parrot pallial MO vocal nucleus also projects to the striatal vocal nucleus (MMSt).17 Connectivity of the songbird MO analogue has not yet been determined.
The major differences among avian vocal learning groups, I argue, are in the connections between the posterior and anterior vocal pathways.12 Each group appears to have differences in the inputs and outputs between the two pathways. In songbirds, the anterior vocal pathway receives input into Area X of the striatum from HVC of the nidopallium; the posterior pathway’s HVC receives input from the medial portion of the anterior pathway via mMAN, and the posterior’s RA receives input from the lateral portion of the anterior pathway via lMAN (Fig. 2A, ,3A3A).23 In contrast, in parrots, the anterior pathway receives input into its two pallial nuclei (NAO and MO) from the ventral part of arcopallial nucleus, called AACv; the posterior pathway’s NLC and AACv and AACd (the dorsal part) appear to receive input from the same region of the anterior pathway’s NAO and from DMM of the dorsal thalamus (Fig. 2B, ,3B3B).17 One important distinction is that the parrot posterior pathway vocal nuclei do not appear to send projections to the striatal nucleus of the anterior pathway.
Vocal learning (Fig. 2A–C) and vocal non-learning birds, I argue, share similar auditory pathways. These similarities include projections from ear hair cells to co-chlear ganglia neurons, to auditory pontine nuclei (CN, LL), to midbrain (MLd) and thalamic (Ov) nuclei, and to primary (L2) and secondary (L2, L3, NCM, and CM) pallial areas (see Figure 2 of Reiner et al., this volume,34 pp. 90,91). Also found in both vocal learning and non-learning birds are auditory regions in the arcopallium (vocal learners: songbird RA cup, parrot ACM, hummingbird Ai; vocal non-learner: pigeon AIVM) and in the striatum (CSt) (Fig. 2A–C).5,12,26,29 The auditory arcopallial nuclei receive input from the auditory nidopallium areas. Due to lack of detailed study on the connectivity of CSt (formally known as PC or caudal PA) in different species, it is not yet possible to make reliable comparisons for the striatal auditory region.
The source of auditory input into the vocal pathways of vocal learning birds is also unclear. Various routes have been proposed. These include the HVC shelf into HVC, the RA cup into RA, Ov or CM into NIf, and from NIf dendrites in L2, in songbirds.27,29–31 In parrots, these include the NLC shell into NLC, and nucleus basorostralis, L1 and L3 into the comparable NIf-like nucleus lAN.13,17,32 More study is necessary to determine whether any of these regions/nuclei truly brings auditory input into the vocal pathways. The location of the vocal nuclei relative to the auditory regions suggests that there will be some differences among vocal learning groups. In songbirds, the posterior vocal nuclei are embedded in the auditory regions; in hummingbirds, they are situated more lateral, but still adjacent to the auditory regions; in parrots, they are situated far laterally and physically separate from the auditory regions (Fig. 2A–C). At a minimum, the axons connecting the vocal and auditory systems would have to take different routes in the different vocal learners.
Much like generating a consensus DNA sequence from comparing comparable genes of different species, this comparative analysis suggests that a consensus bird brain system for learned vocalizing consists of seven cerebral vocal nuclei, with at least one nucleus located in each major brain subdivision except for the hyperpallium and pallidum, with the seven nuclei distributed into two pathways: (1) a posterior pathway that projects onto lower motor neurons and (2) an anterior pathway that forms a pallial-basal ganglia-thalamic-pallial loop.
A recent forum on avian neuroanatomy led comparative neurobiologists to propose nomenclature changes in describing the avian brain and its homologies with the mammalian brain (see Reiner et al., this volume34)33,35 This new understanding of avian brain organization makes possible better-informed comparisons of vocal learning brain pathways in birds with the pathways in mammals. However, a caveat limiting such comparisons is that ethical and practical issues prevent tract-tracing experiments in humans and cetaceans. In bats, the vocal learning brain areas are not known. Thus, the connectivity of vocal learning pathways is not known for any mammal. Studies have been performed on the cerebrums of non-vocal learning mammals, such as cats, rats, and macaque monkeys. Therefore, I make connectivity comparisons between vocal learning pathways in vocal learning birds with non-vocal pathways in vocal non-learning mammals.
I argue that the songbird and parrot posterior vocal pathways are similar in connectivity to mammalian motor corticospinal pathways (Fig. 3). As has been proposed generally for the arcopallium by Karten and Shimizu,36 I more specifically suggest that projecting neurons of songbird RA and parrot AACd are similar to what has been called pyramidal tract (PT) neurons of lower layer 5 of mammalian motor cortex.37–40 The latter send long axonal projections out of the cerebrum through pyramidal tracts to synapse onto brainstem and spinal cord α-motor neurons that control muscle contraction and relaxation. As has been proposed generally for the nidopallium by Karten and Shimizu,36 I more specifically suggest that the projection neurons of parrot NLC and the RA-projecting neurons of songbird HVC are similar to layer 2 and 3 neurons of mammalian motor cortex, which send intrapallial projections to layer 5 (Fig. 3).41,42 I suggest that songbird Av of the mesopallium is also similar to layers 2 and 3 in that Av has an intrapallial connection from another motor pallial nucleus (HVC; see Figure 2 of Reiner et al., this volume,34 pp. 90,91). Mammalian parallels to songbird NIf are less clear. In mammals, primary sensory input from the thalamus projects to layer 4 of cortex; motor feedback from the thalamus projects to layer 5 (in primary motor cortex) or layer 3 (in frontal cortex).43,44 NIf is directly adjacent to L2 and like L2 it is similar to mammalian layer 4 neurons receiving thalamic input from UVa, but like the mammalian thalamic input to layer 3, UVa has motor-associated activity, firing during vocalizing.45
In humans, the only connectivity determined in a cerebral vocal brain area that I am aware of has been conducted by Kuypers.46 Using silver staining of degenerated axons in patients that had vascular strokes to brain areas that included but were not limited to face motor cortex, he found that this area of cortex projects to nucleus ambiguous (also spelled ambiguus) and the hypoglossal nucleus. He reproduced similar lesions in macaque monkeys and chimpanzees (vocal non-learners) and found that their face motor cortex projects minimally, if at all, to nucleus ambiguous, but it does project massively to the hypoglossal nucleus and to all other brainstem cranial motor nuclei as found in humans.47 Nucleus ambiguous in mammals controls muscles of the vocal organ (the larynx)48,49 much like nXIIts does in birds (the syrinx). The hypoglossal nucleus in mammals and the non-tracheosyringeal part of nXII in birds controls muscles of the tongue.19 In this manner, the pallial nuclei combined of the songbird and parrot posterior vocal pathways are more similar to the human face motor cortex than to any other part of the human pallium.
Others and I have argued that the songbird and parrot anterior vocal pathways are similar in connectivity to mammalian cortical-basal ganglia-thalamic-cortical loops (Fig. 3).17,50–52 Going further, I argue that projection neurons of songbird MAN and parrot NAO17,53,54 are similar to what has been called intratelencephalic (IT) neurons of layer 3 and upper layer 5 of mammalian premotor cortex, which send two collateral projections, one to medium spiny neurons of the striatum ventral to it and the other to other cortical regions, including motor cortex (Fig. 3).40,55 Unlike mammals, the spiny neurons in both songbird Area X, and presumably parrot MMSt, project to pallidal-like cells within the vocal nuclei Area X and MMSt instead of to a separate structure consisting only of pallidal cells.17,52,56 This striatal-pallidal cell intermingling may be a general trait of the anterior avian striatum. The projection of the pallidal-like cells of songbird Area X and parrot MMSt are similar to the pallidal projection neurons of the internal globus pallidus (GPi) of mammals, which project to the ventral lateral (VL) and ventral anterior (VA) nuclei of the dorsal thalamus (Fig. 3).57 Like songbird DLM and parrot DMM projection’s to lMAN and NAO, mammalian VL/VA projects back to layer 3 neurons of the premotor areas, closing parallel loops.43,57,58
Because connections between the posterior and anterior vocal pathways differ between songbirds and parrots, comparisons between them and mammals will also differ. I argue that input into mammalian anterior pathways is from collaterals of PT-layer 5 neurons of motor cortex, one projecting to the brainstem and spinal cord and the other projecting into the striatum (Fig. 3C).40,59 This is different from the songbird where a specific cell type of HVC, called X-projecting neuron, projects to the striatum separately from those (neurons of RA) that project to the medulla. Parrot vocal connectivity is even more different from the mammalian, where AAC of the arcopallium has two anatomically separate neuron populations, AACd projecting to the medulla and AACv projecting to anterior pallial vocal nuclei NAO and MO.60 I argue that output of mammalian anterior pathways are the collaterals of the above-mentioned IT-layer 3 and IT-upper layer 5 neurons that project to other cortical regions (Fig. 3C).40
A comparative analysis of the literature27,28,61 reveals that birds, reptiles, and mammals have relatively similar auditory pathways (Fig. 4). All three groups have ear hair cells that synapse onto sensory neurons, which project to cochlea and lemniscal nuclei of the brainstem, which in turn project to midbrain (avian MLd, reptile torus, mammalian inferior colliculus) and thalamic (avian Ov, reptile reunions, mammalian medial geniculate) auditory nuclei. The thalamic nuclei in turn project to primary auditory cell populations in the pallium (avian L2, reptile caudal pallium, mammalian layer 4 of primary auditory cortex). For connectivity in the cerebrum, the only detailed information we have is for mammals and birds. It has been proposed that avian L1 and L3 neurons are similar to mammalian layers 2 and 3 of primary auditory cortex, the latter of which receive input, like L2, from layer 4.26,62 I suggest that avian NCM and CM are also similar to layers 2 and 3 in that they form reciprocal intrapallial connections with each other and receive some input from L2. Neurons of the songbird RA cup, parrot ACM, and pigeon AIVM of the arcopallium are similar to mammalian layers 5 and 6, which send auditory feedback projections to the shell regions of thalamic and midbrain auditory nuclei.28,29,63–65 I further suggest that avian CSt is similar to an auditory region in the mammalian striatum. Similar to birds (Fig. 2A–C), this mammalian brain region is located caudally in the striatum, is smaller in dimension to the anterior motor striatal regions, and may receive connections from the pallium/cortex.5,27,66,67 The connectivity of this region has not been verified in birds or mammals.
Taken together, the above analysis suggests that there are gross similarities between the connectivity of the consensus bird brain system for learned vocalizing and the non-vocal motor pathways (a posterior-like pathway) and cortical-basal-ganglia-thalamic-cortical loops (an anterior-like pathway) of mammals (Fig. 3A–C). It also suggests that the auditory pathways of birds, reptiles, and mammals may be homologous (Fig. 4). Differences between birds and mammals in connectivity of each of these pathways appear to be in the details, particularly at the level of the pallium. With this knowledge as a base, we can use vocal learning birds to gain some insight into the neurobiology of vocal learning pathways in humans. I first compare brain regions where lesions result in somewhat comparable behavioral deficits.
There are some gross similarities in behavioral deficits following lesions in specific brain areas of vocal learning birds (experimentally placed) and humans (due to stroke or trauma). For example, lesions to songbird HVC and RA,9,68 particularly those placed on the left side in canaries, cause deficits similar to those found after damage to human face motor cortex, this being muteness for learned vocalizations, i.e., for speech (Fig. 5A,B).69–71 Innate sounds, such as crying and screaming, can still be produced, however. When the lesions are unilateral, both birds and patients often recover some vocal behavior, because the opposite hemisphere appears to take over some function; likewise, recovery is better when the canary is a juvenile or the patient a child.72–74 Partial lesions to parrot NLC causes partial deficits in producing the correct acoustic structure of learned vocalizations, including for learned speech.75 The symptoms are similar to that of dysarthia in humans after recovery from damage to the face motor cortex. Lesions to the face motor cortex in chimpanzees and other non-human primates do not affect their ability to produce vocalizations.47,70,76 I am not aware of the effects of such forebrain lesions on vocal behavior being tested in vocal non-learning birds. Lesions to avian nXIIts and mammalian nucleus ambiguous, and large lesions to the midbrain that include bird DM and mammalian (including human) central grey, result in the inability to vocalize (muteness) in both vocal learners and non-learners.9,48,77–80 One difference between the avian posterior vocal pathways and the face motor cortex in humans is that lesions to songbird NIf or parrot lAN of the posterior pathway do not prevent production of learned vocalizations or cause dysarthic-like vocalizations, but lead to production of more varied syntax81 or impaired vocal imitation.82
Lesions to songbird MAN23,83,84 cause deficits that are most similar to those found after damage to anterior part of the human premotor cortex, this being prevention of vocal learning and/or inducing sequencing problems (Fig. 5B). In birds and humans, such lesions do not prevent the ability to produce learned song or speech. In humans, these deficits are called verbal aphasias and verbal amusias.85 Damage to the left side often leads to verbal aphasias, whereas damage to the right often leads to verbal amusias.86 My analysis of the literature (only example references listed)69,85,87–95 indicates that these brain areas include a lateral-to-medial strip of premotor cortex from the anterior insula (aINS), Broca’s area, the anterior dorsal lateral prefrontal cortex (aDLPFC), the pre-supplementary motor area (preSMA), and the anterior cingulate (aCC) (Fig. 5B). The deficits in humans, however, are more complex. Specifically, lesions to songbird lMAN83,96 and lesions to the human insula and Broca’s85,91,94 leads to poor imitation, i.e., disrupted vocal learning, with spared maintenance of stereotyped song or speech. However, in addition, lesions to Broca’s and/or DLPFC85 lead to poor syntax production in construction of phonemes into words and words into sentences. Lesions to DLPFC also result in uncontrolled echolalia imitation, whereas lesions to pre-SMA and anterior cingulate69,88–90,92 result in spontaneous speech arrest, lack of spontaneous speech, and/or loss of emotional tone in speech, but with imitation preserved. Lesions to songbird mMAN lead to a decreased ability in vocal learning (i.e., vocal imitation) and some disruption of syntax,23 as do lesions to Broca’s. It is not clear whether lesions to mMAN lead to singing arrest or reduced emotional tone in singing similar to the effects produced by lesions of the medial anterior premotor cortex in humans.
Lesions to songbird Area X and to the human anterior striatum (the head of the caudate and putamen, Fig. 5C) do not prevent the ability to produce already learned speech, but do result in disruption of vocal learning and disruption of some syntaxin birds83,97,98 or verbal aphasias and amusias in humans.91,93,99–104 More specifically, songbirds have trouble crystallizing onto correct acoustic structure of syllables and syntax heard, and as adults they stutter on syllables that were repeated in tandem before the lesions.83,97,98 In humans, the aphasias from subcortical lesions can be a combination of types found with premotor cortical lesions,91 perhaps because, as is the case in non-human mammals, large cortical areas send connections that converge onto relatively smaller striatal areas.105 Not many cases have been reported of lesions to the human globus pallidus leading to aphasias,106 but the fact that this can occur, suggests some link with a striatal vocal area in humans.
Similar to a preliminary report on songbird DLM,107 damage to anterior portions of the human thalamus (VA, VL) leads to verbal aphasias (Fig. 5C).108 In humans, there is temporary muteness followed by aphasia deficits that are sometimes greater than those accompanying lesions to the anterior striatum or premotor cortical areas. This greater deficit may occur because there is further convergence of inputs from the striatum into the thalamus. The interpretation of thalamic lesions in humans, however, is not without significant controversy,85 perhaps because of small but important differences in lesion locations between patients among studies. The thalamus concentrates many functions into small adjacent nuclei, and thus, a relatively small variance in the location of a lesion may lead to a large difference in the brain function affected.
Not many studies have investigated the effects of lesions to auditory cerebral areas of birds,109–111 and therefore, only relatively rough comparisons can be made with humans and vocal non-learning mammals with such lesions. Lesions to Bengalese finch female NCM and zebra finch female CM appear to result in a significant decline in the ability to form long-term auditory discrimination memories of songsheard 111,112 This deficit has some similarity to that which results from lesions to secondary auditory cortex in mammals, where animals have more difficulty forming discriminative memories of complex sounds. In humans, bilateral combined damage to primary auditory cortex and Wernicke’s area (Fig. 5B) leads to full auditory agnosia, the inability to consciously recognize any sounds (speech, musical instruments, natural noises, etc.); it does not prevent unconscious perception of sounds.85 If the lesion is more restricted to primary auditory cortex, a patient may suffer an inability to process rapid sequencing of sounds. If restricted to Wernicke’s, the patient may suffer word-deafness, the inability to identify the sounds of speech (left side) with a preserved ability to identify nonverbal sounds, or tone-deafness (right side), called auditory aphasias or amusias, respectively. It is difficult to ascertain how non-human animals, including birds, perceive sensory stimuli, and therefore it is difficult to make comparisons with humans in regard to perceptual auditory deficits.
The lesions mentioned above that affect vocal and auditory behaviors in vocal learning birds and in humans can affect more than one modality. For example, lesions to lMAN in songbirds113,114 and to Broca’s and anterior striatum in humans85,115 lead to decreased abilities in song/speech perception and discrimination. Lesions to auditory cortices in humans lead to fluent aphasia: patients unconsciously produce long stretches of nonsensical words and sentences (and were once characterized as crazy people).85 It is not known if lesions to pallial auditory areas of vocal learning songbirds will result in fluent aphasic singing: long sequences of nonsensical song syllables and motifs. However, it is known that in parrots, lesions to pallial auditory areas do not affect the acoustic structure of already learned vocalizations.109 This is similar to the consequences of auditory cortex lesions in humans, where acoustic structure of phonemes and words are not affected.
Taken together, the evidence from behavioral speech/language deficits in humans resulting from damage to known brain areas is consistent with the presence of a posterior-like motor pathway and an anterior-like vocal learning pathway that are similar to the vocal pathways in vocal learning birds. There also appears to be a separate pathway for basic auditory processing and auditory learning in humans, as seen in vocal learning birds. The relative locations of the brain regions in humans appear to be comparable to the relative location of the three pathways in birds. The clearest difference between birds and humans appears to be the greater complexity of the deficits found after lesions in humans. I now ask whether comparisons of patterns of brain activation support the similarities between birds and humans derived from lesion results.
We can define brain activation as a relatively rapid neural change (milliseconds to minutes) that occurs during production of a behavior and/or processing of a sensory stimulus. Thus, activation includes changes in electrophysiological activity (recorded in both birds and humans, during surgery of patients for the latter), electrical stimulation (birds and humans), motor- and sensory-driven gene expression (birds and non-human mammals), and imaging of activated brain regions (in humans). Not all types of activation occur in the same brain areas and not all approaches are simple to perform. For example, expression of the most commonly studied activity-dependent gene in birds, ZENK, is associated with increased electrophysiological activity in neurons of all cerebral brain subdivisions except for those in the pallidum and in primary sensory receiving areas of the pallium (Fig. 6; e.g., L2).116,117 Accumulated ZENK expression levels occur in certain brain areas after a bird sings its learned songs for 10–30 minutes (Fig. 6). However, measurement of gene expression changes in humans after speaking is not possible, and thus, other methods to quantify activity changes in specific brain areas must be used. Two alternatives are functional magnetic resonance imaging (fMRI) and positron emission tomography (PET) scanning. FMRI measures increased blood flow to activated brain areas, whereas PET measures increased glucose usage. It is difficult to image fMRI signals when speaking, as movement of the face causes image distortions. It is easier to do so with PET, as the signal is examined after the person speaks. A general problem with activation and vocal communication studies is that many do not separate neural changes due to vocalizing versus hearing. When an individual vocalizes, he/she hears himself and thus activation in a given brain region, when compared to the silent condition, could be due to either vocalizing and/or hearing. Separation requires comparison of the vocalizing condition with additional control conditions where an individual hears the same sounds while not vocalizing, or vocalizes while deaf.118
In vocal learning birds, activation studies have revealed that all seven comparable cerebral nuclei display ZENK expression that is vocalizing-driven and not hearing-driven;5,12,51,118 this is in part how some of the nuclei were initially identified (Fig. 6). In deafened songbirds, these nuclei still display vocalizing-driven ZENK expression when the birds sing.118 Likewise, premotor neural firing has been found in HVC, RA, NIf, lAreaX, and lMAN when songbirds sing.10,11,119,120 In deafened birds, similar singing-associated activity still occurs when a bird sings, at least for lMAN.11 The firing in HVC and RA correlates with specific features of produced songs (sequencing of syllables and syllable structure, respectively), whereas firing in lAreaX and lMAN is much more varied and no specific correlating feature has yet been found. However, firing and gene expression in lAreaX, lMAN, as well as RA differ depending upon the social context in which singing occurs (see Brainard, this volume).51,121,122 Singing directed to another bird results in lower activation in these nuclei relative to singing in an undirected manner. On the other hand, HVC appears to be active at a similar firing rate and pattern whenever the bird sings.51,123 No difference has been observed between right (the dominant) and left HVC activity during singing in zebra finches, but in song sparrows activity in the left and right HVC is associated with production of specific sequences of song syllables.124 Stimulation with electrical pulses to HVC during singing temporarily disrupts song output, i.e., song arrest.125
If we compare the songbird HVC data, the brain area in humans that appears to be always activated with any speech task, as measured by PET or fMRI signals, is the face motor cortex.126,127 Similar to other songbird vocal nuclei, other human brain areas appear to be activated or not depending upon the context in which speech is produced. Production of verbs and complex sentences can be accompanied by activation in all or a subregion of the strip of cortex anterior to the face motor cortex: the anterior insula, Broca’s, DLPFC, pre-SMA, and anterior cingulate.126–133 Activation in Broca’s, DLPFC, and pre-SMA appears to be higher when speech tasks are more complex, including learning to vocalize new words or sentences, sequencing words into complex syntax, producing non-stereotyped sentences, and thinking about speaking.129,134–136 The left brain vocal areas show more activity than their right counterparts during speaking.127–129,132 Only one electrophysiological recording study that I am aware of has been conducted on these regions, where premotor speech-related neural activity was found to occur in Broca’s area.137 Stimulation studies have shown that low threshold electrical stimulation to the face motor cortex, Broca’s, or the anterior supplementary areas will cause speech arrest or generation of phonemes or words.92,138–140 For other cortical areas anterior or posterior to these, higher stimulation is often required to elicit similar results.
In non-cortical areas, activation has been found in the anterior striatum141–143 and the thalamus141 during speaking. Low threshold electrical stimulation to specific ventral lateral and anterior thalamic nuclei, particularly in the left hemisphere, leads to a variety of speech responses, including word repetition, speech arrest, speech acceleration, spontaneous speech, anomia, and verbal aphasia (but also auditory aphasia).144 I have only found one study that has shown activation in the globus pallidus during speaking.130 In non-human mammals and birds, specific comparable nuclei of the midbrain (avian DM and part of mammalian periaqueductal grey (PAG)) and medulla (avian nXII, mammalian ambiguous, and other nuclei) display premotor vocalizing neural firing49,145–147 and/or vocalizing-driven gene expression.5,12,51
In vocal learning and vocal non-learning birds, hearing songs or other ethologically relevant sounds results in increased neural firing and/or hearing-driven gene expression in midbrain (MLd), thalamic (Ov), and caudal cerebral (L2, L3, L1, NCM, CM, CSt, and Ai) auditory areas that have helped identify some of these regions as part of an auditory pathway (Fig. 6).5,12,51,123,148–152 Activation in the secondary auditory pallial areas NCM and CM in songbirds is higher when they hear species-specific sounds and during auditory learning.151,153–157 However, when a bird hears himself vocalize, the gene activation in these secondary pallial areas is less pronounced than when he hears other birds vocalize.51 Similarly, in both humans and non-human mammals, hearing vocalizations activates midbrain (inferior colliculus), thalamic (medial geniculate), and cerebral (primary auditory and Wernicke’s area in humans) regions that are part of the known mammalian auditory pathway 128,132,135,158–160 Similar to avian NCM and CM, activation in Wernicke’s area of humans and a Wernicke’s-like area of non-human primates is higher during hearing of species-specific vocalizations, and is less prominent when an animal or patient hears himself speak.161–163
There is some overlap with cerebral areas that show neural firing during vocalizing or hearing, and this depends upon the species. In awake male zebra finches, firing is minimal in vocal nuclei when a bird hears playbacks of song, but greater when he is anesthetized or asleep and presented with playbacks of his own song.122,124,164 In song sparrows, the reverse occurs: robust firing is observed in HVC when an awake bird hears playbacks of his own song, and this response is diminished when he is anesthetized.124 In both species, the level or number of neurons firing in vocal nuclei during hearing is lower than that during singing. In humans, the face motor cortex, Broca’s and/or the DLPFC often show increased activation when a person hears speech or is asked to perform a task that requires thinking in silent speech.127–134 The magnitude of activation is usually lower during hearing than that seen during actual speaking. The anterior insula, Broca’s, and DLPFC can also be activated by other factors, such as by engaging working memory,165,166 which is a short-term memory that is formed before committing it to long-term storage. A potential flaw of some of these studies is that they incorporate into their design a person hearing instructions in speech, or reading, which is often accompanied by subvocalizing or thinking in silent speech, or by actually speaking. Thus, the non-language task is accompanied by a language task, confounding variables. Further, other studies describe activation in language/speech areas during a non-language/speech task, but actually measure regions adjacent (more anterior or posterior) to those activated by speaking. Thus, presently it is not clear whether areas specifically activated by hearing and speaking can be activated in tasks that do not require language processing, language production, or thinking in silent speech.
Taken together, although there are differences between the results, tasks, and methods used to study brain activation during vocalizing in birds and humans, and although there are discrepancies among studies, the comparative data are consistent with the idea that songbird HVC and RA are more similar in their activation properties to face motor cortex than to any other human brain area and that songbird MAN, Area X, and the anterior part of DLM are more similar in their properties to parts of the human premotor cortex, anterior striatum, and ventral lateral/anterior thalamus, respectively.
On the basis of the comparative analyses presented in the previous pages, I will make some predictions about the organization of brain pathways for human language. I argue that, similar to the vocal learning birds, humans have at least three basic cerebral pathways used for learned vocal communication: (1) a posterior vocal pathway, (2) an anterior vocal pathway, and (3) an auditory pathway (Fig. 2D). I suggest that the human posterior vocal pathway consists of the six-layered face motor cortex, with layer 5 neurons sending projections down to the vocal part of the brain-stem central grey and to nucleus ambiguous, controlling the production of speaking and singing. I suggest that the human anterior vocal pathway consists of a cortical-basal-ganglia-thalamic-cortical loop, connecting a strip of premotor cortex, which I tentatively call the language strip (to include part of the anterior insula, Broca’s, DLPFC, pre-SMA, and anterior cingulate), to the anterior striatum, to the ventral lateral and/or anterior nuclei of the dorsal thalamus, back to the same strip of cortex (Fig. 2D). If the human vocal pathways are similar to mammalian non-vocal pathways (Fig. 3C), then the human anterior vocal pathway would receive input from the posterior vocal pathway via axonal collaterals from PT-layer 5 neurons of the face motor cortex projecting into the anterior striatum; the human posterior vocal pathway would receive input to the face motor cortex from axonal collaterals of IT-layer 3 or upper layer 5 neurons of the language strip. The major difference between vocal learning and vocal non-learning mammals would be the absence in non-learners of these two types of pathways controlling nucleus ambiguous. The major similarity would be the presence of posterior and anterior motor pathways that control other learned sensorimotor behaviors.
I propose that the human auditory pathway follows the basic mammalian system (Fig. 4). Having a cerebral auditory area would explain why non-human mammals, including a dog, exhibit auditory learning, including learning to understand the meaning of human speech, although presumably with less facility than a human. I have no proposal yet as to how auditory information enters posterior and anterior vocal pathways of humans, as there is no consensus connectivity that can yet be deciphered from the vocal learning birds. The arcuate fibers that traverse a caudal-rostral direction in the human brain have been proposed to carry information from Wernicke’s to Broca’s,167 but this has not been demonstrated experimentally.
I suggest that major differences between vocal learning birds and humans are ones of general brain organization and degree. In terms of organization, the avian pallium is nuclear and the mammalian pallium is layered. In terms of degree, according to my hypothesis, the human brain has relatively much more pallial tissue dedicated to vocal behavior than do vocal learning birds. In addition, the human brain is many orders of magnitude bigger than any known vocal learning bird. Humans apparently have more complex syntax, more learned rules in their speech production and perception, and much bigger bodies to control than do any vocal learning bird. Vocal learning bird species, however, are not uniform in their abilities. Some have more complex syntax and continued adult learning whereas others do not. These differences may be controlled by variance in the relative sizes of vocal nuclei and/or the amount of expression of specific genes.168,169 Nevertheless, the vocal brain similarities among three vocal learning bird groups are striking and the similarities with humans are intriguing, leading one to wonder how such similarities in distantly related groups could have evolved.
Given that auditory pathways in avian, mammalian, and reptilian species are similar, whether or not a given species is a vocal learner, I suggest that the auditory pathway in vocal learning birds and in humans was inherited from their common stem-amniote ancestor, thought to have lived ~320 million years ago.170 For the evolution of vocal learning brain pathways among birds and in humans, I propose three alternative possibilities5: (1) the vocal system in the three vocal learning bird groups and the proposed comparable system in humans all evolved independently; (2) there was a vocal system in the common ancestor of vocal learning birds with seven cerebral nuclei, and a similar system in the common ancestor of vocal learning mammals, that were then lost multiple independent times in closely related bird and mammalian groups; or (3) most, if not all birds, mammals, and perhaps reptiles have vocal learning to various degrees, and songbirds, parrots, hummingbirds, and humans (and perhaps bats and cetaceans) independently amplified the associated brain structures for their more highly developed vocal learning behaviors.
If hypothesis 1 were true, then the evolution of brain pathways for vocal learning may be under strong epigenetic (outside the genome) constraints. The evolution of wings provides an analogy. Wings evolved independently at least four times, in birds, bats, pterosaurs (ancient flying dinosaurs), and insects. In each case, they evolved at the sides of the body, usually one on each side, and not one on the head, the other on the tail, or elsewhere. One hypothesis is that wings evolved in similar ways because of a strong epigenetic constraint, the environmental force of the center of gravity/mass on the body, dictating the most energetically efficient manner for flight.171 If hypothesis 2 were true, then maintenance of vocal learning may be under an environmental/social epigenetic constraint that selects against vocal learning. If hypothesis 3 were true, it would mean that many birds and mammals, and maybe reptiles, have at least primordial brain structures for vocal learning. I believe that a combination of all three hypotheses is true, where both genetic and epigenetic constraints have influenced the evolution of vocal learning.
Because the connections of the anterior and posterior vocal pathways in vocal learning birds bear some resemblance to those of non-vocal pathways in both birds and mammals, pre-existing connectivity was presumably a genetic constraint for the evolution of vocal learning.17,22,160,172 In this manner, I argue that a mutational event that caused descending projections of avian arcopallium neurons to synapse onto nXIIts or mammalian layer 5 neurons of the face motor cortex to synapse onto nucleus ambiguous may be the only major change that is needed to initiate a vocal learning pathway. Thereafter, other vocal brain regions could develop out of adjacent motor brain regions with pre-existing connectivity. Such a mutational event would be expected to occur in genes that regulate synaptic connectivity of upper pallial motor neurons to lower α-motor neurons. This hypothesis requires that avian non-vocal motor learning systems have up to seven nuclei distributed into two pathways in at least six brain subdivisions (the mesopallium, nidopallium, arcopallium, striatum, pallidal-like cells in the striatum, and dorsal thalamus). It would also require that mammalian non-vocal motor learning systems have brain regions distributed in two pathways involving at least four brain subdivisions (the six layers of the cortex, the striatum, the pallidum, and the dorsal thalamus). Not apparent in this view, is the question of whether there is a genetic constraint for auditory information entering vocal learning pathways. Sensory processing neurons projecting into cerebral motor pathways do exist for non-vocal functions of the cerebrum, but I have not been able to extract a consensus-connectivity at this time.
Is there an environmental or social factor that selects for or against vocal learning? If selection for vocal learning requires a relatively simple mutational event, then why is it not more common? My graduate student, Adriana Ferreira, and I have brainstormed on possible answers to these questions. We have come up with six possible epigenetic factors for selection for vocal learning, some of which have been previously proposed:173,174 (1) individual identification; (2) semantic communication; (3) territory defense; (4) mate attraction; (5) complex syntax; and (6) rapid adaptation to sound propagation in different environments. We believe that the first three factors cannot explain selection for vocal learning. As no two individuals look alike, so too no two individuals within a population appear to sound alike, allowing vocal non-learners to identify individual conspecifics by voice. Many vocal non-learners use calls to communicate semantic information, such as “an eagle above,” “a snake on the ground,” or “a food source,”175 whereas most vocal learners use their learned vocalizations in more affective, emotional context. Yet, many vocal non-learners use their innate calls and crows to attract mates and defend territories. We believe that factors 5 and 6 in combination with factor 4 can explain selection for vocal learning. Vocal learners, but not vocal non-learners, have the ability to produce more varied syntax, either during vocal development and/or after reaching adulthood in some species. Females of some songbird species appear to prefer more varied syntax.6,174,176 Therefore, birds with the ability to produce more vocal variety are likely to be selected for this trait. For sound transmission, vocal non-learners produce their vocalizations best in specific habitats,177–180 which makes their vocal behaviors less adaptable to changes in environments. For example, a pigeon’s low frequency vocalizations travel best near the ground, while an eastern phoebe’s higher pitched vocalizations travel better higher in the air. In contrast, vocal learners have the ability to change voice characteristics, either during the lifetime of an individual or through several generations, presumably allowing better group communication in different environments.
We believe that predation is a strong selection factor against vocal learning and this may be why it is so rare. If more varied syntax is attractive to mates, it may also be more attractive to predators. As innate vocalizations tend to be more constant, they are habituated to more easily, potentially becoming part of the background noise. Therefore, in order for a predator not to habituate to the sounds of his prey, he would have to evolve a neural mechanism to overcome the natural habituation at times when he is hungry. Some findings support the prey side of this view: Okanoya has shown that Bengalese finches that have been bred in captivity without predators for the last 250 years and without human selection for singing behavior, show more varied syntax than their white-backed munia conspecifics still living in the wild from which they derive.174 Zebra finches bred in captivity show more variation on the songs learned among adults of a colony than do their wild-type conspecifics.181 For both species, females prefer the more complex songs, including the wild munia finches preferring the more varied songs of the domesticated Bengalese finches.174,176 Given these findings, we would expect to find more syntax complexity selected for in the wild than currently exists. We argue, as does Okonoya,174 that it is predatory pressure selecting against it.
In this chapter, I have presented the sketches of a hypothesis about the evolution of vocal learning and human language. The analysis demonstrates the value of using a comparative approach to generate insight into the neurobiology of human language. Except for the proposed connectivity of the human vocal brain pathways, various parts of the hypothesis are testable. I suspect that with additional experimentation some of the details will need revision, but that the general principles will hold.
This work was funded by National Science Foundation Grant IBN0084357 and the Alan T. Waterman Award. I thank Tony Zimmerman at Duke University for assistance in editing and critical reading of the manuscript.