4.1. Phonation and articulation
Our previous fMRI study (Brown et al., 2008
) established a representation of the larynx in the motor cortex, one which overlaps an area involved in voluntary control of expiration (Loucks et al., 2007
; Simonyan et al., 2007
). Using this motor cortex focus as a reference, we were able to demonstrate for the first time that connected speech gives its principal motor cortex activation in the larynx area, thereby supporting the notion that much of the speech signal is voiced, including all vowels and a majority of consonants. Previous neuroimaging studies on speech production have not made this point about phonation, and have instead talked about activity in the “mouth” or “face” area of the motor cortex (e.g., Fox et al., 2001
), with the implication being that speech is mainly articulatory. Knowing the location of the larynx area, we were able to interpret residual activations in the motor strip as being related to articulation, mainly in the Rolandic operculum for tongue movement and the region lateral to the larynx area for lip movement. This is a first step toward a somatotopic dissection of phonation and articulation in the cortical motor system. The study of Terumitsu et al. (2006)
seemed poised to make the same point, in that the authors compared phonated vs. mouthed versions of the same polysyllable string. However, their analyses did not involve a direct contrast between the voiced and unvoiced tasks, and what they called “phonation” in their ICA analysis included articulation as well as phonation, as evidenced by ICA clusters in the Rolandic operculum.
The results with the speech task match very closely the findings of two voxel-based meta-analyses of overt reading. Turkeltaub et al. (2002)
published an activation likelihood estimation (ALE) meta-analysis of 11 studies of oral reading, and found the region of greatest concordance across these studies in the motor cortex to be at −48, −12, 36, and 44, −10, 34, very close to our ventromedial speech peaks at −40, −12, 30, and 44, −10, 30. Likewise, Brown et al. (2005)
performed two parallel ALE meta-analyses of eight studies of oral reading in stutterers and fluent controls, respectively. The peak M1 activations for the control subjects were at −49, −9, 32, and 54, −10, 34, and those for the stutterers were at −45, −16, 31, and 48, −12, 32, again quite close to the ventromedial M1 peaks for the speech task in this study. Both of these meta-analyses identified the larynx area as a major location of activation during oral reading. They also found bilateral activations at the Rolandic operculum, very close to our fMRI tongue coordinates. Hence, the general pattern seen to emerge from imaging studies of speech production is two major sites of activation in the motor cortex: the larynx area deep in the central sulcus, and the tongue area in the Rolandic operculum. While other processes are clearly critical for speech production – not least muscular activity in the lips, velum and pharynx – larynx and tongue activities might be the most readily identifiable ones because of their distance in the motor cortex. For example, in our previous imaging study (Brown et al., 2008
), lip and tongue movement showed a region of overlapping activity, although this was dorsal to tongue-related region of the Rolandic operculum.
Given our separability of activity in the larynx area and Rolandic operculum during simple phonation and tongue movement, respectively, and their combination during speech (in addition to presumptive activity in lip-related areas), there does seem to be a basic additivity of phonation and articulation that comes into play during speech production. Looking to the subtraction analysis, we obtained mixed results. While tongue and lip movement nicely subtracted out articulation-related activity in the motor cortex during speech, the monotone-phonation task was not very effective at subtracting out the larynx peak of speech. Interestingly, a similar result was found in the study of Murphy et al. (1997)
. Their contrast was better matched than ours in that they compared the vocalization of a phrase with mouth-closed vocalizing of the same phrase using the/a/vowel. Hence, much about the melody and rhythm of the original phrase should have been contained in the unarticulated version. Their subtraction revealed bilateral peaks in the sensorimotor cortex quite close to the ventromedial larynx area. Why might the larynx activation during speech be difficult to subtract out with phonatory control tasks, especially given the efficiency of the subtraction of articulatory areas using articulatory controls tasks? One speculation is that co-articulation during speech production may activate the larynx area in a much stronger manner than tasks that involve a single articulatory posture, such as during the monovowel tasks used in this study and that of Murphy et al. (1997)
. Likewise, speech tasks show an oscillatory cycling between voiced and unvoiced sounds that is not seen in the controls tasks. Given the overlap in the larynx coordinates between the reading and syllable-singing meta-analyses, the effect that we and Murphy et al. are seeing is most likely quantitative rather than qualitative. Further work is needed to enlighten this point, not least an analysis of potential neural sub-domains within the larynx motor cortex for vocal-fold tension vs. relaxation, and abduction vs. adduction.
4.2. A neural model of vocalization
We would like to consolidate the results of the fMRI experiment and meta-analyses into a model of vocalization (), one that focuses on the generation of sounds at the vocal source, and hence phonation. A very similar model of vocal production is presented by Bohland and Guenther (2006)
, as discussed below. The fMRI monotone-phonation task as well as the 5-note phonation task used in Brown et al. (2008)
were designed to be as pure a model of phonation as possible, minimizing the contribution of articulation to the brain activations. We would like to consider the activation pattern of these tasks as a minimal model of “primary” areas for phonation, and then contrast that with data from the fMRI speech task and the two meta-analyses in order to characterize “secondary” areas that may tap more into articulation or general orofacial functioning than phonation.
Fig. 3 Key brain areas of the vocal circuit. Shaded boxes represent “primary” areas that are principal regions for the control of phonation in speaking and singing. White boxes represent “secondary” areas that are less reliably (more ...)
The primary vocal circuit consists principally of three motor areas: (1) the larynx motor cortex and associated premotor cortex; (2) lobule VI of the cerebellum, and (3) the SMA. The primary auditory areas are Heschl’s gyrus and the auditory association cortex of the posterior STG, including area Spt. Secondary vocal areas include: (1) the Rolandic operculum (the ventral part of the motor cortex, hence included with the M1/premotor box in the figure), (2) the putamen and ventral thalamus, (3) cingulate motor area, and (4) frontal operculum/anterior insula. In , primary areas are shown with shaded boxes, and secondary areas with white boxes. The connectivity model in the diagram is largely based on the connections of the larynx motor cortex in the Rhesus monkey (Simonyan & Jürgens, 2002
), in which most of the areas listed are reciprocally connected with the motor cortex, the exceptions being the cerebellum and putamen, which feed back to the cortex indirectly via the ventral thalamus. Regarding auditory areas, it is not known if they project directly to the motor cortex or if they have to pass through a relay like Broca’s area, as posited in the standard Geschwind model of speech (Catania, Jones, & fitches, 2005
). In the monkey, there is a minor projection from the posterior STG to the larynx motor cortex (Simonyan & Jürgens, 2002
); hence, such a pathway could exist in humans as well. Preliminary diffusion tensor imaging work from our lab suggests that there is indeed direct connectivity between temporoparietel auditory areas and the orofacial precentral gyrus via the arcuate fasciculus (unpublished observations).
Lobule VI of the posterior cerebellum showed the highest ALE score of any brain region in the meta-analysis. Somatotopic analysis has demonstrated that this is indeed an orofacial part of the cerebellum (Grodd, Hülsmann, Lotze, Wildgruber, & Erb, 2001
). We showed that this region is activated by lip movement and tongue movement as well as vocalization. Hence, while this region seems to be an obligatory component of the vocal circuit, there is probably little about it that is voice-specific, although there may be somatotopic sub-domains for each effector within this general area. This stands in contrast to lobule VIIIA of the ventral cerebellum, which was activated by both speech and monotone-phonation but which did not show activity for lip and tongue movement (although see Watanabe et al. (2004
) for activity in this region during tongue movement) or show ALE foci in either meta-analysis. Given that half of the studies in the syllable-singing meta-analysis were PET studies, and given the fact that many older PET machines had an axial span of only 10 cm, it is likely that the ventral part of the cerebellum was cut off in many of the studies used in the meta-analysis (e.g., Brown, Martinez, Hodges, Fox, & Parsons, 2004
; Brown, Martinez, & Parsons, 2006
; see Petacchi, Laird, Fox, & Bower, 2005
, for a discussion of this topic). Hence, lobule VIIIA may be a brain area that has been under-represented in studies of overt vocalization thus far and may therefore be expected to appear with greater frequency in future publications of speech and song (e.g., Bohland & Guenther, 2006
; Riecker et al., 2005
The SMA is one of a handful of brain areas which when lesioned can give rise to mutism, and stimulation of this area can elicit vocalization in humans but not monkeys (Jürgens, 2002
). The SMA is organized somatotopically (Fontaine, Capelle, & Duffau, 2002
) but, as with lobule VI of the cerebellum, there is no information as to whether there are effector-specific zones within the somatotopic orofacial area of the SMA. While the SMA is routinely activated in studies of both speech and song production, its exact role is unclear. The SMA is classically associated with activities like bimanual coordination (Carson, 2005
), and stimulation of the SMA can lead to simultaneous activation of linked effectors, such as the whole arm. It is thus reasonable to presume that the SMA plays some role in the sequential coordination of effectors during vocal production, although this area is clearly activated when single effectors such as the lips or tongue are used. In Indefrey & Level’s (2004)
qualitative meta-analysis of 82 studies of single-word processing, they argued that the SMA was involved in articulatory planning. In addition, they found that the SMA was active in both covert and overt word-reading tasks. In support of this, the SMA has also been found to be active in many studies of covert singing (Callan et al., 2006
; Halpern & Zatorre, 1999
; Riecker, Ackermann, Wildgruber, Dogil, & Grodd, 2000a
). Hence, the SMA plays some role in motor planning, motor sequencing, and/or sensorimotor integration, but the exact role in vocalization is not well understood.
The putamen gave one of the most complex profiles of any area in these analyses. No activity was seen for the fMRI monotone-phonation task, whereas there was a strong left-hemisphere focus for speech. That said, the putamen showed very strong ALE foci bilaterally in the syllable-singing meta-analysis and reasonably good concordance across its contributing studies, hence creating an inconsistency between the fMRI study and the meta-analysis. One potential resolution to this inconsistency is to posit that the putamen is more important for articulation than phonation. In the fMRI study, we found more putamen activity for lip movement and tongue movement than for simple phonation. Likewise, many studies have shown activity in the putamen during lip movement, tongue movement, and voluntary swallowing (Corfield et al., 1999
; Gerardin et al., 2003
; Martin et al., 2004
; Rotte, Kanowski, & Heinze, 2002
; Watanabe et al., 2004
). One problem with this interpretation is that damage to the basal ganglia circuit gives rise to severe dysphonia in addition to articulatory problems (Merati et al., 2005
). This would seem to suggest that the basal ganglia play a direct role in phonation. It is interesting in this regard that the only major voice therapy that seems successful at ameliorating Parkinsonian dysphonia, namely Lee Silverman Voice Therapy (Ramig, Countryman, Thompson, & Horii, 1995
; Ramig et al., 2001
), is a phonation-based therapy that indirectly improves articulation as a by-product (Dromey, Ramig, & Johnson, 1995
; Sapir, Spielman, Ramig, Story, & Fox, 2007
). The role of the putamen in phonation and articulation is in need of further exploration. For the time being, we put it in the category of “secondary” areas. We do the same for the ventral thalamus. Its co-occurrence with the putamen (i.e., both were absent in the fMRI monotone task, both were present in the syllable-singing meta-analysis, and the thalamus was only present in articles that reported putamen activation in the syllable-singing meta-analysis) probably reflects the connectivity of the basal ganglia, which sends its output from the internal segment of the globus pallidus to anterior parts of the ventral thalamus. The cerebellum’s projection to the cerebral cortex also passes through a part of the ventral thalamus (posterior to the basal ganglia projection), and so it is unclear why there should be an absence of ventral thalamus activation in the presence of strong cerebellar activity. The thalamus showed relatively low concordance across studies in the meta-analysis.
One interesting point of reference with regard to the basal ganglia comes from the studies of Riecker et al. (2005)
, Riecker, Kassubek, Groschel, Grodd, and Ackermann (2006)
, which were included in the meta-analysis. These studies examined the tempo of vocalization, looking at monosyllable/pa/repetitions over the range of 2–6 Hz. What was found was that activity in lobules VI and VIIIA of the cerebellum showed positive correlations with syllable rate whereas activity in the putamen and caudate nucleus showed negative correlations. Putamen activity decreased monotonically for speaking rates ranging from 2 to 6 Hz (Riecker, Kassubek, Groschel, Grodd, & Ackermann, 2006
); the two cerebellar regions showed the reverse pattern. Our profile of high cerebellum and low putamen does not follow from the assumption that these patterns would extend to 1 Hz, the suggested production rate for our fMRI monotone task. Again, the absence of articulatory changes in our singing task may be a more important factor than tempo per se in explaining the absence of putamen activity.
The cingulate motor area gave low concordance in the meta-analysis, and was not found to be active in the speech or monotone-phonation fMRI tasks. Unlike the larynx motor cortex, the CMA is the only cortical part of the monkey brain which, when lesioned, disrupts vocalization (Sutton, Larson, & Lindeman, 1974
, but see Kirzinger & Jürgens, 1982
). The projection from the cingulate cortex to the periaqueductal gray is thought to represent an ancestral vocalization pathway in primates that is perhaps more important for involuntary vocalizations than voluntary ones like speech. This area may indeed be more involved in emotive vocalizations than learned vocalizations such as speech and song in humans. It is interesting to note that almost all of the studies in the meta-analysis that showed CMA activation employed monotone tasks rather than melodic singing tasks. Hence, the CMA may have some preference for simple vocal tasks, as shown by its activation in monotone (Brown et al., 2004
; Perry et al., 1999
), monovowel (Sörös et al., 2006
), and monosyllable (Bohland & Guenther, 2006
; Riecker et al., 2006
) tasks. This hypothesis is consistent with the reading study of Barrett et al. (2004)
, in which subjects had to read semantically-neutral passages under conditions of either happy or sad mood induction. Regressions with affect-induced pitch range showed that the more monotonous the speech became during sad speech, the greater the activity in the CMA. The major Talairach coordinate for this regression was at −8, 18, 34, which corresponds quite well with one of the CMA coordinates from the syllable-singing meta-analysis at −2, 16, 32. CMA activity may thus be sensitive to melodic complexity, showing a preference for low-complexity vocal tasks having minimal pitch variation, which may reflect its evolution from a system involved in simple, stereotyped vocalizations. Might the CMA be the brain’s “chant” center? Further work is needed to clarify the role of the cingulate cortex in vocalization.
The frontal operculum and medial-adjacent anterior insula represent yet another difficult case for our model. As with the putamen, activity in this region was much stronger in the meta-analysis than the fMRI monotone task. We again make the speculation that this area encodes generalized orofacial functions and thus might be equally involved in articulation and phonation. The fMRI study showed comparable activity in the frontal operculum for lip movement and tongue movement as for vocalization. This casts doubts on a phonation-specific role of this region. In addition, the most typical type of symptom associated with damage to the anterior insula is apraxia of speech and not dysphonia alone (Jordan & Hillis, 2006
; Ogar, Slama, Dronkers, Amici, & Gorno-Tempini, 2005
). Hence, damage to this region is much more likely to result in articulatory deficits than phonatory ones, although both seem to co-occur. As Ogar et al. (2005)
point out: “Prosodic deficits, however, are thought to be a secondary effect of poor articulation” (p. 428). It is for these reasons that we put the frontal operculum and adjacent anterior insula into the category of “secondary” areas for vocalization. Several models of vocal production have ascribed an important role for the anterior insula in phonological processing (Ackermann & Riecker, 2004
; Bohland & Guenther, 2006
; Indefrey & Level, 2004
; Riecker et al., 2005
). In Indefrey and Level’s (2004)
meta-analysis, they associated the anterior insula most strongly with “phonological code retrieval”, which is a process of searching for phonological words that match a lexically selected item. They found less evidence for a role of the anterior insula in actual speech production, a result counter to the perspective of Ackermann and Riecker (2004)
. Riecker et al. (2006)
found that activity in the insula increased monotonically with syllable rate, hence showing a similar profile to the cerebellum (as well as larynx motor cortex and SMA). So the frontal operculum/anterior insula is almost certainly a vocal-motor area, but its exact role is in need of further analysis.
The model in shows striking similarities with the “basic speech production network” proposed by Bohland and Guenther (2006)
, which includes all of the areas mentioned here. In fact, there is no region of disagreement between our model and theirs. Perhaps the only motivational difference relates to our goal of defining a network of vocal production based on phonation, leading to our distinction between primary and secondary areas for vocalization. Their model was based on a series of syllable tasks, ranging from simple to complex trisyllables. Hence, articulation was an important component of all of their tasks. It is possible that a task based on vowels alone would yield different results. For example, the vowel production task of Perry et al. (1999)
failed to show activity in some of the areas that we have speculated to be associated with articulation (e.g., putamen) but did show activity in others (Rolandic operculum, frontal operculum), whereas the vowel production task of Sörös et al. (2006)
failed to show activity in the Rolandic operculum but did show activity in the putamen and frontal operculum. Further work is clearly needed to verify the phonation network postulated in our primary areas.