|Home | About | Journals | Submit | Contact Us | Français|
Two primary areas of damage have been implicated in apraxia of speech (AOS) based on the time post-stroke: (1) the left inferior frontal gyrus (IFG) in acute patients, and (2) the left anterior insula (aIns) in chronic patients. While AOS is widely characterized as a disorder in motor speech planning, little is known about the specific contributions of each of these regions in speech. The purpose of this study was to investigate cortical activation during speech production with a specific focus on the aIns and the IFG in normal adults. While undergoing sparse fMRI, 30 normal adults completed a 30-minute speech-repetition task consisting of three-syllable nonwords that contained either (a) English (native) syllables or (b) Non-English (novel) syllables. When the novel syllable productions were compared to the native syllable productions, greater neural activation was observed in the aIns and IFG, particularly during the first 10 minutes of the task when novelty was the greatest. Although activation in the aIns remained high throughout the task for novel productions, greater activation was clearly demonstrated when the initial 10 minutes were compared to the final 10 minutes of the task. These results suggest increased activity within an extensive neural network, including the aIns and IFG, when the motor speech system is taxed, such as during the production of novel speech. We speculate that the amount of left aIns recruitment during speech production may be related to the internal construction of the motor speech unit such that the degree of novelty/automaticity would result in more or less demands respectively. The role of the IFG as a storehouse and integrative processor for previously acquired routines is also discussed.
Although speech production is supported by an extensive neural network, the specific contributions of the cortical areas within this network are not fully understood. For example, lesion studies can provide important information about how the brain supports speech, but certain limitations are inherent in patient data. Acquired apraxia of speech (AOS) is a motor speech disorder that most commonly results from stroke (Ogar et al., 2005). It has been conceptualized as a deficit in speech motor programming indicative of impairment in transforming the linguistic code into a coordinated motor code with specific goals for articulation of the phonological representation (Ballard, Granier, & Robin, 2000; Ballard & Robin, 2007; Deger & Ziegler, 2002; McNeil et al., 1997; Spencer & Rogers, 2005). The primary clinical characteristic of AOS include: 1) slow rate of speech, 2) sound distortions, 3) distorted, perceived sound substitutions, 4) consistent errors in terms of type and location, and 5) prosodic abnormalities (Wambaugh et al., 2006). Collectively, lesion studies including patients with AOS have revealed conflicting evidence regarding the brain regions crucial for motor speech planning. A number of regions have been associated with AOS, but the two primary areas of damage that have been implicated are: (1) the left inferior frontal gyrus (IFG) (Broca, 1865; Hillis et al., 2004; Mohr, 1978), and (2) the left anterior insula (aIns) (Dronkers, 1996; Nagao et al., 1999; Ogar et al., 2006). We propose a simple method to reconcile these apparently contradictory findings. Specifically, we note that AOS has been associated with damage to the aIns and IFG in chronic patients, but primarily to the IFG in acute patients. Based on this hypothesis, we suggest that the IFG is primarily involved with well rehearsed speech, so injury leads to immediate deficits. In contrast, we hypothesize that the aIns becomes more involved with generating novel speech, and therefore offers a redundancy for rehabilitation. If so, patients with damage restricted to the IFG may have a better long-term prognosis for recovery of function through utilization of the aIns, while patients with damage to both areas may be less likely to recover.
Damage in the left aIns has been linked to AOS in chronic stroke (Dronkers, 1996; Nagao et al., 1999; Ogar et al., 2006), primary progressive AOS (Nester et al., 2003), and reduced fluency in aphasia (Bates et al., 2003; Borovsky et al., 2007). However, several functional magnetic resonance imaging (fMRI) studies have revealed increased insular activity associated with overt speech (compared to covert speech) that was not modulated by length or complexity (Ackermann et al., 2004; Shuster & Lemieux, 2004). Such findings might suggest that the insula is not central to motor planning, but may be involved in motor coordination or sensory integration/feedback (Dogil et al., 2001). However, Bohland and Guenther (2006) found that the complexity of speech production was related to increased activity in the aIns, in addition to the IFG.
In contrast to lesion studies in chronic patients, Hillis et al. (2004) found that AOS was related to a compromised (either frank lesion or related hypoperfusion) left IFG in acute stroke regardless of Ins involvement. No relationship was found between involvement of the insular cortex and AOS. They postulated that the central location of the insular cortex within the middle cerebral artery (MCA) distribution makes it susceptible to damage in large MCA infarcts (Finley et al., 2003) when there is also structural damage and/or functional compromise within the inferior frontal lobe. Small lesions tend to result in transient AOS (Mohr, 1978; Marien et al., 2001); however, individuals with larger lesions are more likely to have persistent, chronic AOS. Therefore, large MCA lesions, and thus possible Ins damage, may be overrepresented in chronic AOS. Unfortunately, findings from neurodegenerative cases of AOS are comparably inconclusive, as they provide support for either the left aIns (Nester et al., 2003), the IFG (Broussolle et al., 1996), or both (Gorno-Tempini et al., 2004). However, the crucial importance of the IFG in speech production has been supported by a number of studies using repetitive transcranial magnetic stimulation (rTMS) in normal participants, which allows transient inhibition of a selected region. Several studies have reported speech disruptions and speech arrest when rTMS is applied to, not only the facial motor cortex, but also the IFG (Pascual-Leone, Gates, & Dhuna, 1991; Epstein, 1998; Epstein et al., 1999; Stewart, Walsh, Frith, & Rothwell, 2001; Aziz-Zadeh, Cattaneo, Rochat, & Rizzolati, 2005).
On the other hand, the role of the aIns in normal speech production remains somewhat muddled. While a number of neuroimaging studies have shown aIns activation during speech (Bohland & Guenther, 2006; Kuriki et al., 1999; Tourville et al., 2008; Wise et al., 1999;), others have revealed a lack of aIns activation (Bonilha et al., 2006; Frenck-Mestre et al., 2005; Murphy et al., 1997; Riecker et al., 2000; Soros et al. 2006). For example, Bonilha et al. (2006) did not find neural activity in the aIns associated with the production of bisyllabic nonwords. Instead, significant Ins activation was found during the production of non-speech oral movements, while significant IFG activation was found during sublexical speech. The fact that the IFG was uniquely activated during the speech condition suggests that the IFG may be responsible for motor sequencing in speech, as has been suggested in previous studies (Borovsky et al., 2007; Nishitani et al., 2005). However, the potential role of the Ins in the non-speech productions is unclear.
The IFG and aIns possibly play complementary roles in speech production, which may explain the seemingly contradictory evidence in the patient literature based on time post-onset. One possible explanation for the Bonilha et al. (2006) findings may be related to the intrinsic nature of the two tasks. While the production of native speech syllables is highly over-learned, the execution of contrasting non-speech oral motor movements is comparatively unpracticed. Therefore, Ins recruitment during the non-speech condition may have been related to the novelty (or lack of automaticity) for the task. Consistent with this hypothesis, Soros et al. (2006) suggested that involvement of the aIns may be reflective of the repetitiveness of the speech task. That is, the aIns may show greater involvement when speech is varying as opposed to repetitive. This is interesting in light of the conflicting evidence regarding lesion location in chronic and acute cases of AOS. According to the Directions Into Velocities of Articulators (DIVA) model, the inferior frontal cortex stores phonetic codes for specific syllable productions (Guenther, 2006; Guenther, Ghosh, Tourville, 2006). Such phonetic codes could be viewed as simple motor plans that can be linked together to produce more complex plans for the articulation of words and sentences. If so, then damage to this area would hinder access to these motor plans and integration of these plans into larger plans, thus compromising the translation of the phonological representation/auditory maps into an articulatory plan. Although the specific role of the aIns remains elusive, we speculate that both the pars opercularis of the IFG (OpIFG) and aIns contribute to translation of auditory maps into motor commands for speech execution, albeit in different ways. In accord with the dual-route model (Levelt & Wheeldon, 1994; Varely, Whiteside, & Luff, 1999; Varley & Whiteside, 2001), it is plausible that the IFG may be implicated in routine speech planning; however, when this area is damaged, other cortical regions, such as the insular cortex, then assume an important role in compensating for this damage. That is, the aIns may be crucial in learning/relearning motor productions and sequences when the IFG is compromised. If so, the integrity of the insular cortex may be an important diagnostic indicator in the recovery of AOS.
A study utilizing the production of native and non-native speech syllables in normal participants provides a unique and straightforward means for investigating the role of novelty/automaticity related to left aIns recruitment in speech. Accordingly, the purpose of this study was to investigate cortical activation during speech production with a specific focus on the aIns and the IFG in normal adults. Based on our theoretical framework, neural activity in the left IFG was expected during speech production regardless of the novelty of the utterance. However, when normal individuals attempt to produce novel speech sounds that are not a part of their native repertoire, additional brain regions may be recruited to support the task beyond those areas that would be active when producing native speech. Thus, it was hypothesized that the production of novel speech syllables would initially recruit areas beyond what was observed for the native syllables, including increased activity in the aIns. If the aIns is important for acquisition of new/unfamiliar motor productions and sequences, greater insular activation should be revealed during novel syllable productions compared to native productions. In addition, after repeated exposure during the task, the neural activity associated with the novel syllables should theoretically converge with the activation seen during the production of native syllables. Thus, we predicted a decrease in aIns activation for novel productions over time.
Participants included 30 (13 male/17 female) right-handed, native-English speakers between 18–35 years of age (mean=22.8 years, S.D.=4.7). Individuals were excluded from participation if a screening revealed a history of neurological disease/injury, psychiatric disorder, or contraindications for MRI scanning (e.g., metal implants, pacemakers, pregnancy, and claustrophobia). Additional exclusion criteria included a history of speech-language difficulties (e.g., disorders and/or delays in articulation, fluency, resonance, or language), gross structural abnormalities of the articulators (e.g., cleft palate or severe malocclusions), or hearing impairment. All participants demonstrated visual acuity sufficient for viewing the visual stimuli.
During a 30-minute speech-repetition task, participants observed audiovisual movies of 3-syllable non-words generated by a single speaker (showing only the lower portion of the face below the nose) and were asked to repeat each production immediately following its audio-visual presentation. The non-words were equally divided between two conditions based on syllable type: (1) English (native) or (2) Non-English (novel). Non-words (sublexical syllable strings) were created for this task instead of using real words in an attempt to isolate motor speech from lexical processing and to make the conditions equal, as the inclusion of novel sounds would inherently render a non-word. A total of twelve CV syllables (6 English phonemes/6 non-English phonemes combined with the vowel //) were used to create the stimuli (see Table 1). In order to balance across the two conditions, the targeted native and novel syllables were selected from the International Phonetic Alphabet based on shared phonetic features so that each of the 6 pairs only differed by one distinctive feature (i.e., manner, place, or voice). For the novel condition, only syllables that were distinct from any common dialectal variant of English were used. The 3-syllable non-words were constructed for each condition by either repeating one of the target syllables three times or positioning it between two other syllables to provide two articulatory environments for each target syllable. For the purpose of fMRI data analysis, a baseline condition was also included, which consisted of a still frame presentation of a neutral mouth paired with pink noise. Participants were instructed to do nothing during these trials.
Within the entire fMRI run, participants received a total of 180 trials (72 English, 72 novel, and 36 baseline trials). The order of presentation was pseudo-randomized and counterbalanced into 3 segments to provide an event-related paradigm in which each stimulus was presented the same number of times within the first, middle, and last third of the 30-minute task. Therefore, participants heard 24 English, 24 novel, and 12 baseline trials during each third of the task. Each stimulus presentation lasted 2.5 s, followed by a jittered inter-stimulus interval (ISI) with a mean of 7 s (± 2 s). Immediately before each stimulus presentation, a fixation point appeared for 0.5 s to alert the participant of the upcoming trial. The paradigm was created using E-Prime 2.0 BETA software (2006) and was presented via DLP projection to a mirror mounted on the MRI scanner head coil, along with SereneSound headphones (www.mrivideo.com). Immediately preceding the initiation of the experimental task, each participant completed a 1.6-minute practice run with untargeted exemplars from each condition to allow familiarization to the task without exposure to the experimental trials.
To ensure task compliance and allow for the systematic analysis of behavioral data, responses were recorded via a non-ferrous microphone and audio editing software with a 16 kHz sampling rate and 16-bit resolution. The audio-recordings from all 30 participants were combined to create a rating paradigm for each of the six targeted novel syllables using an experiment-control freeware (Alvin, v. 1.19, http://homepages.wmich.edu/~hillenbr/). Repetitions of novel syllables from all participants were randomly presented to four independent raters who judged the accuracy of each production compared to the model on a 5-point scale. The audio stimuli from the fMRI task were used as the model productions for each target. Prior to beginning the rating process, each rater listened to all production of a given target. The raters were instructed to make judgments based on how closely each production matched the model exemplar. During the rating process, they had continued access to the models and were allowed to repeat each exemplar as many times as needed before proceeding to the next item. To determine whether accuracy improved with repeated exposure over the 30-minute task, the scores from each participant were compared between the first and last time segment using a paired t-test.
All MRI scanning was conducted on a Siemens 3.0 Tesla Trio system using a 12-element head coil, and event-related fMRI data were collected using a sparse echo planar imaging (EPI) sequence with a repetition time (TR) of 10 s and an acquisition time (TA) of 2 s. Sparse-sampled fMRI acquisition is ideal for imaging speech tasks (Birn et al., 1999; Bonilha et al., 2006; Fridriksson et al., 2006; Ghosh et al., 2008; Gracco et al., 2005; Soros et al., 2006) because the TR is 8s longer than the TA. This allows stimulus presentation and overt responses to occur within this window of ‘silence’ in which no volumes were being collected. Because normal speech production is associated with some degree of head movement, sparse imaging can greatly reduce the motion-related artifacts typically associated with continuous EPI data collection during overt speech (Barch et al., 1999; Birn et al., 1999; Yetkin et al., 1996), as well as allow for clearer stimulus presentation and monitoring of responses. Furthermore, because the slow rise of the hemodynamic response typically peaks 3–6 s after the associated event, the blood-oxygen-level dependent (BOLD) signal changes can be measured during the brief intervals following each stimulus-response pair. Other fMRI parameters included: 30 ms echo time, 90° flip angle, 64×64 matrix, a 3.3 × 3.3 × 3.2 mm voxel size, number of slices=33, no slice gap or parallel imaging acceleration, number of volumes=180. A high-resolution T1-weighted sequence (matrix size 256×256×160, voxel size 1×1×1 mm) was also acquired for each participant to aid normalization. Total MRI scanning lasted approximately 45 minutes.
Individual and group analyses of the fMRI data were carried out using FSL (FMRIB’s Software Library, www.fmrib.ox.ac.uk/fsl; Smith et al., 2004). Pre-statistics processing included: motion correction through linear image registration (MCFLIRT: Multivariate Exploratory Linear Decomposition into Independent Components; Jenkinson, Bannister, Brady, & Smith, 2002); non-brain removal (BET: Brain Extraction Tool; Smith, 2002); spatial smoothing using a Gaussian kernel of 6 mm full-width half maximum; and high pass temporal filtering (Gaussian-weighted LSF straight line fitting, with sigma=100 s). Time-series statistical analysis was performed using a general linear model with local autocorrelation correction (FILM: FMRIB’s Improved Linear Model; Woolrich, Ripley, Brady, & Smith, 2001). The fMRI data for each participant were analyzed in native space and co-registered to standard MNI (Montreal Neurological Institute) space, an average template created from 152 brains (Evans et al., 1993), using linear registration (FLIRT: FMRIB’s Linear Image Registration Tool; Jenkinson & Smith, 2001, Jenkinson, Bannister, Brady, & Smith, 2002).
Higher-level (group) analyses were carried out using a Bayesian mixed effects analysis (FLAME 1 and 2: FMRIB’s Local Analysis of Mixed Effects; Beckmann, Jenkinson, & Smith, 2003; Woolrich, Behrens, Beckmann, Jenkinson, & Smith, 2004). This included a pair of group analyses to determine regions of activation that were associated with the production of: (1) English-syllable trials and (2) novel-syllable trials compared to baseline. In addition, data from the beginning and final third were analyzed separately to reveal activation associated with syllable trials during the first and last 10 minutes of the task. Neural activation was indirectly measured as localized BOLD signal intensity during each speech condition compared to the baseline neutral-mouth condition. Z (Gaussianised T) statistic images were produced by applying a cluster threshold of Z>2.3 and a (corrected) cluster significance threshold of P=0.01 (Worsley, Evans, Marrett, & Neelin, 1992). In order to answer our proposed research questions, additional comparisons were performed to reveal which areas showed significantly more activation in each condition above what was seen in the other conditions. These contrasts included: (1) English vs. novel productions for (a) the entire run, (b) the beginning third, and (c) the final third, as well as (2) beginning vs. final third for (a) English productions and (b) novel productions. The Talairach Daemon Client was accessed to provide standard coordinates of local maxima for cortical areas where greater cortical activity was revealed in each contrast (http://ric.uthscsa.edu/projects/talairachdaemon.html). For illustrative purposes, the mean statistical maps were overlaid onto a standard brain template using MRIcron (Rorden et al., 2007) with a threshold of Z=2.3 for between condition comparisons. However, because baseline comparisons yielded high levels of activity and extensive activation maps, a higher threshold of Z=3.6 was selected for baseline comparisons for clarity of display.
In addition to whole brain comparisons, a region of interest (ROI) comparison was employed for the left aIns and the left OpIFG. These ROIs were selected a priori and were defined according to the AAL (Anatomical Automatic Labeling) freeware database (http://www.cyceron.fr/freeware/; Tzourio-Mazoyer et al., 2002). While the OpIFG was predefined in the AAL database, the anterior Ins was not. Therefore, a redefined ROI was created by limiting the insular volume of interest from the AAL to include only the anterior portion by preserving those voxels that were also included in the “anterior inferior frontal gyrus insula” volume of interest from the Jerne database (http://hendrix.ei.dtu.dk/services/jerne/ninf/voi.html). The size of each ROI was as follows: aIns, 7.31 cc; OpIFG, 8.47 cc. That is, we limited the AAL insular volume by applying an intensity filter to this region so as to encompass only the anterior portion of the insula while excluding the IFG and posterior regions. The mean intensity of task-related signal change was extracted from the left aIns and the left OpIFG during the first and last third of the entire session for both task conditions (native vs. novel speech). Differences in the mean percent signal change within each ROI were analyzed using a 2×2×2 repeated measures ANOVA (within-subject factors for novelty, location, and time) with an overall significance level set at p<0.05. Pair-wise comparisons were also calculated for the eight relevant comparisons with a Bonferroni corrected significance at the p<0.05 level. The mean and standard error for each variable were used to generate a graphical display of signal intensity in both ROIs for each condition relative to time.
To determine if the accuracy of novel productions improved over the length of the entire task, performance ratings from the beginning and final third of the task for each participant were compared. The group mean for the initial segment was 2.49 out of 5 (S.D.=0.36) and the final segment was 2.56 (S.D.=0.56). The results of the paired t-test did not reveal a significant difference between the ratings for the two time segments (t=−1.43, p=0.16). Instead, there was notable variability in performance accuracy across participants, and ratings remained rather constant across time for individual participants.
The results from the fMRI analyses revealed widespread bilateral cortical activation during speech production compared to baseline, regardless of the specific syllable composition that was produced within each condition. As expected, both speech conditions were associated with bilateral neural recruitment of the visual, auditory, motor, and somatosensory cortices, as well as subcortical activity in the basal ganglia and thalamus (anatomical abbreviations are defined in Table 2). More specifically, common areas of activity during both speech conditions included bilateral recruitment of the following areas: (1) the occipital lobe, including the IOG, MOG, SOG, Cu, FuG, and LiG; (2) the temporal lobe, including the ITG, MTG, STG, TP, HG, Wernicke’s area, and Hp; (3) the frontal lobe and cingulate cortex, including the M1, PMC, SMA, OpIFG, TriIFG, and ACgG; (4) the parietal lobe, including the S1; (5) the insula; (6) the cerebellum; (7) the BG, including the Pu and GB; and (8) the thalamus. Figure 1 shows the statistical maps of neural activity associated with the production of native (English) syllables and novel (non-English) syllables compared to baseline during the first (top panel) and final (bottom panel) third of the task. Brain regions associated with the production of English syllables are represented in green, while brain regions associated with the production of novel syllables are represented in red. Brain areas that showed significant activation during both tasks are revealed in the overlap between the two maps, which is depicted in yellow.
When activation associated with each condition was statistically compared against each other, no areas were significantly more active during the production of English syllables than during the production of novel syllables. In contrast, greater activity during the production of novel syllables compared to English syllables was revealed in several areas, including bilateral activity in the Amg, BG, TP (BA 38), OrIFG (BA 47), STG (BA 22), Th, M1 (BA 4), PMC/SMA (BA 6), S1 (BA 3/2), and PTG/FuG (BA 37), as well as greater activation in the MTG (BA 21), SMG (BA 40), IPL (BA 7), IFG (BA 44/45), and Ins bilaterally with greater involvement in the left hemisphere based on visual inspection of the statistical activation maps. Although the areas of activity associated with each speech condition were similar across the length of the task, some differences emerged when the beginning and ending thirds were independently examined. Table 3 provides the location and level of peak activation differences between the novel and native conditions. Brain regions that were significantly more active during the production of novel syllables compared to English syllables for both time periods are displayed in Figure 2. During the first 10 minutes, there was greater activity bilaterally in the Amg, BG, Th, TP, MTG, STG, MFG, OrIFG, OpIFG, TriIFG, Ins, ACgG, M1, PMC, SMA, S1, as well as the left PTG/FuG, SMG, and IPL. During the last 10 minutes, greater activity was found in the left Ins, BG, TriIFG, M1, PMC, S1, SMG, posterior STG, and IPL for novel compared to native productions. In both the beginning and ending time segment, the greatest difference in signal intensity for novel compared to native productions was observed in the insula, IFG, and IPL/S1. While the difference in insular activity remained extensive during the final third of the task, visual inspection suggests that the IFG activity showed less difference between the two conditions than was observed during the initial third (Figure 2).
An ROI analysis was employed for the left aIns and the left OpIFG using a 2×2×2 repeated measures ANOVA to compare the mean percent signal change associated with the speech task for the three within-subject factors of cortical location, familiarity, and time. The results revealed a significant main effect for familiarity (p=0.0001) and time (p=0.04); however, there was also a significant interaction between these two factors (p=0.007). Thus, the effect associated with familiarity was modulated by time. No other main effects or interactions were significant. The results of the pair-wise comparisons are displayed in Table 4.
Figure 3 displays the group mean and standard error for the amount of change in signal intensity according to location, familiarity, and time. The signal intensity was relatively stable for English productions (Figure 3, dashed lines) across time in both ROIs. The results of the pairwise t-tests revealed no difference in the percent signal change for English-syllable production between the first and final time period in either ROI (p=0.13 and 0.98). For novel productions (Figure 3, solid lines), significantly greater activity was revealed in the aIns during the first time period compared to the final time period (p=0.0003); whereas, no difference was found over time in the OpIFG (p=0.04, although it was approaching significance, it did not reach Bonferroni corrected alpha level for multiple comparisons). However, when novel productions were compared with English productions during the first time segment, significantly greater activity was revealed in both the aIns (p=0.0001) and the OpIFG (p=0.0001). Similarly, during the final time segment, greater activation was also found in both the aIns and OpIFG for novel productions compared to English productions (p=0.0001 and 0.002, respectively). Figure 3 shows greater signal change associated with novel productions regardless of time or location, as well as the interaction between familiarity and time.
In addition, Spearman’s correlation coefficients were calculated between the amount of signal change in each ROI during each phase of the task and the improvement in the participants’ accuracy scores for novel productions from the initial to the final phase of the task. Improvement in the accuracy scores for novel productions showed a significant positive correlation with the amount of signal change in the aIns during the final phase of the task (r=.39, p=.04).
The purpose of this study was to investigate the neural substrates of motor speech production with a specific focus on the left aIns and IFG. Both of these regions have been frequently reported in neuroimaging studies of speech production and in neuroanatomical studies of AOS; however, the specific contribution of each area to speech production remains unclear. Based on the sometimes contradictory findings regarding the possible role of the aIns in motor speech production, it was hypothesized that the aIns may play an important and complementary role to the IFG. That is, we postulated that both regions are involved in motor speech planning, but that the recruitment within each of these regions is modulated by the content to be produced. For example, the two-stage model of motor programming proposes two types programming as follows: 1) the organization/integration of movement elements into chunks (i.e. the internal spatio-temporal structure of units) and 2) the sequencing of successive units into the correct serial order (Klapp, 2003). Although speech is normally produced with considerable ease, disruptions and/or increased demands to either of these processes could perturb the seemingly automatic nature of speech production. We suggest that the production of novel (non-English) syllables provides a unique opportunity to investigate speech production by exceeding the typical demands of motor speech planning in normal adult speakers. Therefore, this study was designed to test our a priori hypothesis regarding the neural demands for producing novel (non-English) syllable strings compared to native (English) syllable strings.
While an extensive neural network was activated during the production of both native and novel syllables, certain differences emerged. As expected, the production of novel syllables resulted in greater extent and intensity in the BOLD signal compared to English syllables. While no brain areas showed greater activity during English productions compared to novel production, a number of brain regions with greater activity emerged bilaterally when novel productions were contrasted with native productions, including the left aIns and IFG (Figure 2). All of these regions have been previously implicated in speech production in a number of studies (Dogil et al., 2001; Ackermann et al., 2004; Shuster & Lemieux, 2004; Bohland & Guenther, 2006). Thus, the overall findings suggest that the areas responsible for routine speech production are also recruited when attempting to produce new speech sounds that are not a part of the native repertoire. In particular, these data suggest that the production of unfamiliar speech sounds results in increased bilateral engagement of the entire motor speech system, including areas that are typically associated with language processing and motor execution.
It is possible that the increased neural activity associated with the novel-syllable productions merely reflected increased difficulty, as increased task demands have been associated with increased activation and right hemisphere recruitment in normal and impaired individuals (Just, Carenter, Keller, Eddy, & Thulborn, 1996; Szameitat, Schubert, Müller, & Von Cramon, 2002; Fridriksson & Morrow, 2005). Alternatively, the overall increase in activity in the motor speech network may directly reflect the lack of familiarity with the motor commands necessary to produce the target. Along with evidence supporting localization of function, many neurocognitive functions are supported by the interaction of many components within a distributed network. This concept may be particularly relevant when new constructs have to be established. According to the DIVA model, multiple brain regions are highly interactive during the learning phase of syllable productions (Guenther 2006; Guenther, Ghosh, Tourville, 2006). In the DIVA model, the auditory, motor, and somatosensory areas interact to provide both feed forward and feedback controls in acquiring new motor maps for speech.
If the increased activation associated with novel productions was related to such motor learning, then more extensive activation would be expected for novel productions during the initial segment compared to the final segment. Indeed, the contrasts for the beginning and ending third of the task revealed more extensive activation for novel productions during the initial segment, followed by a decrease with prolonged practice. During the final segment, the statistical maps of activation became much more similar between the native- and novel-syllable productions (Figure 2). That is, the amount of additional recruitment associated with novelty was modulated by repeated exposure/practice. Although a decrease in the task-related BOLD signal over time could reflect a decrease in overall activity over the length of the 30-minute scanning session (e.g. participants become more comfortable with the task and/or begin to habituate to the task), this seems highly unlikely. If this were the case, a similar decrease in activation would be expected across conditions. On the contrary, the decrease in activation was unique to the novel condition. This suggests that the increased activity seen initially may represent areas that are important in supporting the acquisition of new motor plans for speech, and that the need for recruitment in these areas may subside as these motor representations become integrated into the speech motor system.
As the task progressed, the increased activity associated with novel compared to native productions became more isolated and left-hemisphere lateralized (Figure 2). Of particular interest, significant activation was revealed in the left aIns during the production of novel syllables over what was seen during the production of native syllables regardless of time, both in the whole-brain analysis (Figure 2) and the ROI analysis (Figure 3). While there was no difference in aIns recruitment between the beginning and ending segment for native productions, greater Ins activation was found during the initial segment of the task for novel productions compared to the final segment. That is, novel productions during the first segment compared to the last segment of the task clearly demonstrated greater activity initially in the aIns bilaterally, followed by a decrease in activity during the final segment. However, even in the final segment, recruitment of the left aIns remained greater for novel compared to native productions when most of the other areas of activation in the speech motor network no longer showed a difference in activity between the two conditions.
These results may suggest that the aIns plays an important role in establishing new motor constructs. However, the implications for these finding are somewhat unclear. Although the possible role of the aIns in learning new motor speech plans provided theoretical motivation for the implementation of this study, its increased response to novel productions does not appear to be unique when other cortical areas are considered. Instead, an extensive network was recruited during novel productions, including the right aIns and the IFG bilaterally, particularly during the initial phase, when the greatest novelty would be present. The two regions of interest in this study were the left aIns and the left IFG. While the left aIns showed increased activation associated with novel productions as predicted, this was also true of the left IFG. In fact, apart from the left IPL, these two regions demonstrated the highest level of peak activity (local maxima) during the novel versus native comparisons (Table 3). Based on these findings, as well as the results from previous fMRI and neuroanatomical studies in AOS, it is likely that these two regions are highly interactive, particularly when the motor speech system is being taxed. Continued research is required in order to tease apart these two heavily interconnected regions, as activation in one area may demand an obligatory response from the other. Brain disruption techniques might provide a complementary approach for understanding whether these two brain areas play different functional roles; however, disruption of the aIns is logistically problematic due to its medial location from the lateral surface of the brain.
In general, the results of this study support the hypotheses regarding the role of the aIns in sublexical speech production containing novel syllables. While native syllables should have internal motor representations, novel syllables would not, and therefore, would require the assembly of new motor commands. If motor plans for familiar productions are stored in the IFG (Guenther, 2006; Guenther, Ghosh, Tourville, 2006), then they can be retrieved as units to facilitate speech. Online self-monitoring for content and accuracy of speech output can be provided by auditory and somatosensory feedback. During the production of native speech sounds, monitoring and integration would always be present to help guide and modify productions as needed. However, the importance of coordination and sensory feedback/integration increases greatly when the target to be produced is unfamiliar, and therefore, does not have a motor plan stored internally. Previously, the role of the aIns has been attributed to several speech-related functions, including sensory integration (Dogil et al., 2001) and motor coordination (Ackermann et al., 2004). However, the current findings do not provide sufficient evidence to distinguish between these two accounts. Nevertheless, increased neural activation in the aIns associated with novel productions, as well as the subsequent decrease in activation associated with practice, supports the claim that the aIns may be important when attempting to produce new/unestablished speech plans.
On the other hand, sustained levels of significant activation in the aIns across the length of the task could be used to argue against this hypothesis. However, this position seems inconsistent with the overall nature of the data. It is more likely that the non-native syllables remained somewhat novel throughout the 30-minute paradigm. This is supported by the fact that accuracy ratings for novel productions did not improve over the length of the task for the group as a whole. While subjective evaluation of the behavioral data clearly confirmed that all participants were attempting the novel productions, the amount of exposure that was provided did not yield increased speech accuracy (at least not as perceived by our English-speaking raters). Instead, those with lower ratings at the beginning also tended to have lower ratings at the end, and the same was true for those with higher ratings. However, the individual data show variability between participants regarding their change in performance. Although the ROI analysis revealed a significant decrease in the amount of neural activation in the aIns during novel productions over time, the correlation analysis for behavioral improvements suggests that this difference may not be completely straightforward. While the mean percent signal change associated with novel production decreased during the final 10 minutes compared to the first 10 minutes, increased behavioral performance across the length of the task was associated with greater activity in the aIns during the final phase. That is, those individuals with the greatest activation in the aIns during the final third of the task demonstrated greater improvement in their novel productions. No relationship was found with behavioral improvement and activation in the OpIFG. In addition, the amount of activation in either ROI during the initial phase was not related to the amount of improvement across the task. Taken together, these results suggest that the aIns showed the greatest amount of recruited initially for the group, but as the task progressed, the degree of individual learning became associated with the amount of activity that remained in the aIns. Speculation leads us to suggest that individuals with greater improvements remained engaged in a learning mode longer. If the motor plans for the novel productions were not yet solidified and these individuals were continuing to modify their productions, this should be reflected in their neural recruitment. That is, as the task progressed, better learners continued to recruit the aIns, while poorer learners recruited it less.
It is unclear whether the range in behavioral performance was related to differences in perception abilities or whether it independently reflected production abilities. However, in previous studies investigating novel speech-sound learning, participants who were better at producing a novel sound were not necessarily those who were better at auditory identification (Golestani, Paus, & Zatorre, 2002; Golestani, Molko, Dehaene, Le Bihan, & Pallier, 2007; Golestani & Pallier, 2007). In fact, Golestani and colleagues found structural differences within two distinct anatomical regions that were associated with performance in each of these tasks. Greater white matter density in the left HG and IPL was associated with auditory aptitude, while greater white matter density in the insula/prefrontal cortex (between BA 45, BA 47, and BA 13) was associated with production aptitude (Golestani & Pallier, 2007). The importance of connections between HG and the IPL in the auditory analysis of novel sounds may explain why the IPL showed increased activity during the novel productions in the current study. Furthermore, the importance of connections between the IFG and aIns in novel speech abilities is particularly interesting considering the anatomical differences that have been found in lesion studies of AOS, as well as the similarities in activation between the aIns and IFG in the current study during novel productions. Perhaps communication between these two regions is key. This may explain why both regions have been implicated in AOS. Although the underlying cause of struggle is different for a person with AOS who is trying to retrieve or execute a degraded/previously learned motor plan compared to a normal person trying to produce speech sounds that are not within their native repertoire, there are some parallels. If the aIns is important for facilitating new speech motor representations in normal people (either directly or via connections with the IFG), it may also play a crucial role in reestablishing representations that are impaired or inaccessible in AOS.
Thus, we suggest increased recruitment of the aIns during productions that are novel/lack automaticity; however, we do not assume that this region would not be recruited for familiar production. Instead, we propose that this area supports online processing during speech production with the extent of its involvement being modulated by particular aspects of the targeted production. For example, Maas et al. (2008) suggest differences in processing demands for speech production related to the internal complexity of units vs. the number of units within a sequence, although these two concepts are not mutually exclusive. Internal complexity not only involves the complexity of the spatial-temporal structure of the individual units (e.g. repeating vs. alternating syllables, such as dada vs. daba), but can also pertain to the number of syllables in a sequence that has been stored as a chunk. That is, the motor plans for frequently used strings of syllables (e.g. high frequency words) may be stored as a single unit where the number of syllables now determines the internal complexity of the unit. If this idea is applied to the current study, we posit that novel productions yield greater internal complexity demands then native productions due to familiarity. Because these syllables are novel, the spatio-temporal structures are not prepackaged, and therefore, the motor elements that comprise these novel syllables must be newly assembled. If the aIns and OpIFG are sensitive to internal complexity and sequence length respectively, then we would expect both of these areas to contribute to the production of multisyllabic nonwords since nonwords should not be stored as whole units in either case. Soros et al. (2006) suggested that aIns activity may be particularly involved during varying compared to repetitive speech production. If this hypothesis is correct, then repeated practice of both native and novel syllable combinations should decrease demands in the aIns compared to initial exposure. Regarding OpIFG involvement, the demands of sequence length should remain relatively unchanged for native combination, but may be altered for novel combinations. For example, during the initial phase, we would suggest greater segmentation of motor plans, which could pose greater sequence length demands. The idea of the IFG as a serial processor is not new as this region has been implicated in syntactical processing in auditory comprehension and speech production (Davis et al, 2008)
In conclusion, the current findings suggest an extensive neural network supporting speech production and an overall increase in activity within the system during the production of unfamiliar speech, including the aIns and IFG. While the observed increase in neural activation in the aIns during novel productions, particularly in the initial phase, provides support for its possible role in motor speech, other areas within the speech/language network also showed increased activation. Furthermore, the involvement of these regions in speech production does not exclude their involvement in other cognitive processes. Nevertheless, these findings are consistent with the hypothesized importance of the aIns in facilitating the production of new motor plans for speech, perhaps related to assembly of smaller motor units, and the IFG in orchestrating the serial organization of motor chunks/established plans.
Figure S-1. The mean accuracy scores of each participant for novel productions during the initial and final segment of the task using a 5-point ascending scale for subjective ratings based on similarity to target exemplar.
Figure S-2. Neural activation associated with the production of multisyllabic non-words during the first 10 minutes (red) and last 10 minutes (green) of the 30-minute task compared to baseline (the overlap of common areas of activity during the two time segments is represented in yellow). The top panel displays the statistical activation map associated with the native/English syllable condition, and the bottom panel displays the statistical activation map associated with the novel/non-English syllable condition.
Figure S-3. Scatterplot of the relationship between the change in behavioral accuracy scores for novel productions and the percent signal change in the aIns during the final phase of the task (r=.39, p=.04).
This work was supported by the National Institute on Deafness and other Communication Disorders, DC008355.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.