|Home | About | Journals | Submit | Contact Us | Français|
Developmental stuttering is a speech disorder that disrupts the ability to produce speech fluently. While stuttering is typically diagnosed based on one's behavior during speech production, some models suggest that it involves more central representations of language, and thus may affect language perception as well. Here we tested the hypothesis that developmental stuttering implicates neural systems involved in language perception, in a task that manipulates comprehensibility without an overt speech production component. We used functional magnetic resonance imaging to measure blood oxygenation level dependent (BOLD) signals in adults who do and do not stutter, while they were engaged in an incidental speech perception task. We found that speech perception evokes stronger activation in adults who stutter (AWS) compared to controls, specifically in the right inferior frontal gyrus (RIFG) and in left Heschl's gyrus (LHG). Significant differences were additionally found in the lateralization of response in the inferior frontal cortex: AWS showed bilateral inferior frontal activity, while controls showed a left lateralized pattern of activation. These findings suggest that developmental stuttering is associated with an imbalanced neural network for speech processing, which is not limited to speech production, but also affects cortical responses during speech perception.
As humans, we spend much of our day communicating with our peers through spoken language. The ability to produce speech fluently is an important component of effective oral communication, but we are often unaware of our own speech fluency, which is typically achieved automatically and effortlessly. Developmental stuttering is a speech impairment that severely affects individuals' ability to produce speech fluently. Stuttering affects many children between the ages of 3 y to 6 y, with a prevalence of about 2–5% during these ages, and an incidence of up to 8.5%–11.2% (Dworzynski et al., 2007, Reilly et al., 2009, Reilly et al., 2013, Yairi and Ambrose, 2013). About 80% of these children regain fluency at the end of this period. The rest, nearly 1% of the adult population, will be affected by persistent developmental stuttering, manifested as involuntary speech blocks, sound/syllable repetitions, sound prolongations and fragmented words. Stuttering moments are also associated with secondary physical features such as facial grimaces, eye blinking, jaw and neck jerking (Bloodstein and Bernstein-Ratner, 2008, Riva-Posse et al., 2008). Adults who stutter (AWS) are typically painfully aware of their disfluencies, and often consider stuttering as one of their main defining features.
Although stuttering is mainly perceived as a fluency disorder, there is ample evidence that auditory deficits may be involved in this disorder as well. Behaviorally, auditory sensitivity has been shown to develop differently in young people who stutter, compared to their fluent peers (Howell and Williams, 2004). Fluency may be temporarily gained in AWS by manipulating the auditory feedback during speech production, as in delayed auditory feedback or listening to masking noise while speaking (Cherry and Sayers, 1956, Ingham et al., 2009, Kalinowski et al., 1993, Lincoln et al., 2010). The beneficial effect of such auditory manipulations suggests the existence of a sensory component in developmental stuttering. Neurally, differences were found in AWS both in motor regions and in primary and secondary auditory cortices (Beal et al., 2010, Beal et al., 2011, Brown et al., 2005, Chang et al., 2009, De Nil et al., 2008, Foundas et al., 2001, Kell et al., 2009, Kikuchi et al., 2011, Belyk et al., 2015, Budde et al., 2014) and in the functional interactions between motor and auditory cortices (Chang et al., 2011, Lu et al., 2010). This line of evidence supports the notion that abnormal interactions between the motor and auditory cortices are part of the etiology of developmental stuttering (Brown et al., 2005, Chang et al., 2009, Giraud et al., 2008, Ludlow and Loucks, 2004, Max et al., 2004, Neilson and Neilson, 1987). The current study examines the perceptual aspects of persistent developmental stuttering, by using an auditory language perception task to map brain activation and regional hemispheric lateralization in AWS compared to non-stuttering controls.
Several neuroimaging studies have reported anatomical and functional differences between people who stutter and non-stuttering controls. Anatomically, studies conducted with children and adults who stutter have reported abnormal gray matter volume and density in bilateral inferior frontal gyrus (IFG), precentral gyrus, supplementary motor area (SMA) and bilateral superior temporal gyrus (STG) (Beal et al., 2007, Chang et al., 2008, Kell et al., 2009, Kikuchi et al., 2011). AWS were also found to have reduced leftward asymmetry in the planum temporale (Foundas et al., 2001) and lack the standard “torque asymmetry”, typically manifested as a larger volume of the right prefrontal and left occipital lobes compared to their respective homologs (Foundas et al., 2003). Differences in diffusion properties of the white matter underlying the left Rolandic Operculum were reported in both children and adults who stutter (Chang et al., 2008, Sommer et al., 2002, Watkins et al., 2008). We recently reported tract specific differences in the microstructural properties of the frontal aslant tract in AWS compared to non-stuttering controls, and a correlation between diffusivity and fluency within this tract (Kronfeld-Duenias et al., 2016). AWS were further reported to have symmetrical white matter volume underlying the auditory cortex (in contrast to a leftward asymmetry found in controls Jäncke et al., 2004) and a larger rostral half of the corpus callosum compared to controls (Choo et al., 2011). This distributed pattern of structural brain differences found in developmental stuttering is typical to many developmental impairments, in which deficient and compensatory processes mutually evolve over time. In summary, volumetric differences in gray and white matter as well as microstructural differences in white matter have been associated with developmental stuttering throughout the neural network of speech processing, with deficits affecting regions and pathways associated primarily with speech, motor and auditory functions.
Functionally, developmental stuttering has been associated with enhanced responses in the motor system (e.g. primary motor cortex, SMA, and cerebellum) as well as decreased activity in the auditory cortex when AWS are asked to produce speech during functional Magnetic Resonance Imaging (fMRI) (Brown et al., 2005, De Nil et al., 2008, Belyk et al., 2015, Budde et al., 2014). Abnormal activation has also been documented in the basal ganglia of AWS (Braun et al., 1997, Chang et al., 2009, Ingham et al., 2004, Loucks et al., 2011, Toyomura et al., 2011). Again, most of these differences were recorded during speech production conditions (but see Chang et al., 2009, Loucks et al., 2011; De Nil et al., 2008). In addition to the recurring findings mentioned above, many inconsistencies still exist in the literature, even among meta-analyses of partly overlapping pools of studies. For example, Belyk et al. (2015) found that stuttering trait was associated with deactivation of the Larynx motor cortex, but this finding was not observed in a similar meta-analysis by Budde et al. (2014). The picture is further complicated by the fact that some studies report differences within the same networks but in the opposite direction: for example, several studies report increased activation in the auditory cortices and reduced activation in speech- and motor-related brain regions of AWS (e.g. left IFG, left SMA and premotor cortex bilaterally) (Chang et al., 2009, Kell et al., 2009, Loucks et al., 2011, Toyomura et al., 2011, Watkins et al., 2008). Moreover, a recent analysis of cortical responses in individual participants during stuttering moments has shown little overlap between individuals (Wymbs et al., 2013), calling into question the relevance of group analyses and meta-analyses in individuals who stutter.
Differences have also been found in the functional lateralization patterns observed in AWS and controls during language production. In fluent speakers, speech production evokes largely left lateralized responses in inferior frontal, primary motor (M1) and premotor cortex (PMC). In contrast, AWS exhibit an activation pattern which is more bilateral and symmetrical across the two hemispheres or with right dominance, particularly in the frontal cortex (Brown et al., 2005, Kell et al., 2009, Neumann et al., 2005). These results, recorded using language production tasks in fMRI, converge nicely with the anatomical findings pointing to altered cortical asymmetry in developmental stuttering (Foundas et al., 2001, Foundas et al., 2003).
In contrast with the abundant research available on brain activity in AWS during speech production, few studies have addressed brain responses in AWS during the perception of speech and speech-like stimuli. Most of the evidence on brain responses during speech perception in AWS comes from human electrophysiology, namely electro- or magneto-encephalography (EEG/MEG) (Biermann-Ruben et al., 2005, Corbera et al., 2005, Hampton and Weber-Fox, 2008, Kikuchi et al., 2011, Morgan et al., 1997, Weber-Fox and Hampton, 2008). Most of these studies have found that AWS were not different from fluent speakers in early, “perceptual” components (e.g. N100) evoked by simple, non-linguistic stimuli (Biermann-Ruben et al., 2005, Corbera et al., 2005, Hampton and Weber-Fox, 2008, but see Beal et al., 2010). The difference between AWS and non-stuttering controls emerged in later, “cognitive” components, such as the P300 (Hampton and Weber-Fox, 2008, Morgan et al., 1997), the N400 in a rhyming task (Weber-Fox et al., 2004), and when presented with more complex auditory stimuli such as speech sounds and sentences (Biermann-Ruben et al., 2005).
fMRI studies using speech perception tasks in AWS are few, and provide a mixed pattern of results. In general, such studies have identified differences in similar cortical regions as those that were found to be atypically activated during speech production. Specifically, differences were detected in right IFG (BA 44), auditory cortex, and motor regions (M1, PMC and SMA) (Chang et al., 2009, De Nil et al., 2008, Loucks et al., 2011). However, the direction of these effects is inconsistent. For example, in one study AWS showed reduced activation in auditory and motor regions (Chang et al., 2009), while in another study AWS showed over-activation in left MTG/STG including primary auditory cortex and in right insula and primary motor cortex extending into the SMA (De Nil et al., 2008). These inconsistencies may stem from different stimuli and experimental designs used in these studies. For example, De Nil et al. (2008) examined brain responses associated with auditory perception of words, while Chang et al. (2009) examined brain responses associated with perception of syllables prior to planning and execution of speech and non-speech responses that occurred in every trial. The mixed results observed in the few published fMRI studies using perception tasks in AWS preclude reaching conclusive statements about the functionality of the speech perception network in developmental stuttering.
A further gap in the literature on brain responses during language perception in AWS concerns functional lateralization. Based on the anatomical literature, one would expect differences in lateralization of responses in both frontal and temporal language regions (Foundas et al., 2001, Foundas et al., 2003). Related findings from diffusion MRI provide independent but indirect support for the hypothesis that frontal lateralization may be different in AWS, by showing differences in the volume and diffusivity of anterior callosal connections (Cai et al., 2014, Choo et al., 2011, Civier et al., 2015, Cykowski et al., 2010, Kell et al., 2009). However, previous fMRI studies that compared cortical responses during language perception tasks in AWS against non-stuttering controls did not quantify directly the lateralization of response by measuring the relative signals in the left and right cortical homologs of specific cortical regions. This is important because functional lateralization in fMRI is not dichotomous: even when both homologs show supra-threshold activation, strong lateralization may be revealed when a quantitative lateralization index is calculated (Wilke and Lidzba, 2007). The only study to have done such an analysis in the context of an auditory processing task in AWS used near-infrared spectroscopy (NIRS) and was limited in coverage to the temporal cortex (Sato et al., 2011). Our study aimed to fill this gap by analyzing lateralization patterns quantitatively in frontal and temporal regions in response to a language perception task.
In the present fMRI study we examined the neural responses during speech perception in adults who stutter and matched non-stuttering controls. Cortical responses were collected while participants listened to speech segments and auditory baseline epochs equated in their frequency spectrum (signal correlated noise, see below). We used a sound detection task to monitor participants' attention throughout the scan. Participants were not required to prepare for or to produce any oral response. This procedure allowed us to draw conclusions with greater confidence about the neural activity and the functional lateralization pattern of AWS during speech perception.
36 participants took part in the experiment: 11 self-reported AWS of developmental origin (8 males, 3 females; age range: 19–52 y, mean age = 31.64 y, standard deviation (SD) = 10.78) and 25 adults who do not stutter (22 males, 3 females; age range: 18–53 y, mean age = 30.24 y, SD = 8.45). Demographic and clinical data of the participants are reported in Table 1. In addition, to ensure that group size and gender differences were not driving our results, we repeated the analysis on a sub-sample of pair-matched AWS and control subjects (N = 9 in each group, 6 males and 3 females). These smaller samples were pair-matched by age and gender (see Supplemental Table S3). Two additional matched pairs were excluded from this analysis because they did not show activation in at least one of the analyzed regions of interest (ROIs).
AWS were recruited through an ad on the online forum of the Israeli Stuttering Association (‘AMBI’). Two independent speech-language pathologists (SLP) (O.A. and R.E-V.) confirmed the diagnoses of stuttering based on audiovisual recordings of each individual's behavioral assessment (see the Procedure section) and evaluated stuttering severity of each individual of the AWS group using the Stuttering Severity Instrument, Third Edition (SSI-III; Riley, 1994).
The percent of stuttered syllables produced by AWS was evaluated based on the transcription of output on two speaking tasks: an unstructured interview and a reading task (Riley, 1994). These assessments were performed in a separate session outside the scanner (see the Procedure section).
Control participants who do not stutter were recruited through word-of-mouth, billboards and internet forums. Both AWS and controls were native Hebrew speakers with varying degrees of right hand dominance as measured by the Edinburgh Inventory for Assessment of Handedness (Oldfield, 1971), Hebrew version. Participants were physically healthy and reported no history of neurological disease or psychiatric disorder. Before participating, each subject signed a written informed consent, according to protocols approved by the Ethics committee of the Tel-Aviv Sourasky Medical Center and by the Ethics committee of the Humanities Faculty in Bar Ilan University. At the end of the experiment, subjects received 25$ for their participation.
During functional MRI scans, participants were presented with two types of stimuli: speech segments (Speech) and signal correlated noise (SCN) (see Stoppelman et al., 2013 for an empirical motivation for choosing this baseline condition). Stimuli were presented in a block design, interleaved with rest epochs. The active conditions are described as follows:
Four passages (15 s each) were extracted from short poems in Hebrew (Atlas, 1977, Gefen, 1974). Stimuli were recorded by a female native Hebrew speaker in a silent chamber. Each short passage was digitized at a sampling rate of 44 kHz and scaled to an average intensity of 75 dB.
Four baseline epochs (15 s each) were presented. The purpose of this baseline was to separate auditory responses from language responses, particularly in superior temporal cortex. To this end, we sought an unintelligible version of the speech segments that would retain its physical properties as much as possible. We adopted a procedure from (Davis and Johnsrude, 2003) in which the amplitude envelope of each speech segment was extracted and applied to a pink noise segment, band-pass filtered to maintain the original frequency spectrum of speech. This procedure resulted in an unintelligible stimulus that preserved the amplitude variations and the spectral profile of the original speech. In a previous study, we have shown that this baseline adequately removes sensory responses while maintaining language responses in the inferior frontal and posterior temporal cortex (Stoppelman et al., 2013). SCN stimuli were generated using Praat software (Boersma and Weenink, 2013).
Each participant was seated in a quiet room together with one of the experimenters (V.K-D.). In the unstructured interview, participants were asked to talk for 10 min about a neutral topic, such as a recent travel experience, a movie or a book. The experimenter was instructed to refrain from interrupting the speaker, and to ask questions only when the participant was having difficulties finding a topic to talk about. In the reading task, each participant was asked to read aloud one of three paragraphs from the standardized and phonetically balanced ‘Thousand Islands’ reading passage (Amir and Levine-Yundof, 2013). The three paragraphs were of similar size (~ 200 syllables each) and the different paragraphs were assigned to the participants in a random order. Both tasks were recorded using a digital video camera (Sony DCR-DVD 106E, Sony Corporation of America, New York, NY) with a noise canceling microphone (Sennheiser PC21, Sennheiser Electronic Corporation, Berlin, Germany). Audio signals from the microphone were digitally recorded using audio processing software (Goldwave, Inc., St. John's, Canada), on a mono channel, with a sampling rate of 48 kHz. The percent of stuttered syllables was calculated based on the syllables that included any type of stuttering-like-disfluencies (SLD; Yairi and Ambrose, 2005). SLDs include part-word repetitions, monosyllabic-word repetitions and disrhythmic phonations. All other disfluency types, which are typically not regarded as stuttering per se (i.e., interjections, revisions or phrase repetitions), were not included. Stuttering duration scores and physical concomitants were evaluated by the two SLPs based on the video recording of the speaking tasks. Taken together, the percent of stuttered syllables, stuttering duration scores and physical concomitants were used to obtain a total score of stuttering severity (SSI-III score).
Subjects lay supine inside the MRI scanner with foam padding placed on both sides of their head to minimize head movements. Subjects were instructed to listen to each of the two experimental conditions (Speech and SCN) while maintaining their gaze on a central fixation cross presented on a screen through a mirror mounted on the head coil. During the scanning session, auditory stimuli were delivered binaurally via MRI-compatible headphones (Optoacoustics Ltd., Mazor, Israel). E-prime 2.0 (Psychology Software Tools Inc., Pittsburgh, USA) was used for stimulus presentation and response collection.
The experimental conditions were presented during continuous scanning, in a block design, interleaved with rest periods. Each block lasted 15 s, while rest periods lasted 12.5 s (see Fig. 1). The scanning session consisted of two consecutive runs, separated by a short break. The duration time of each run was 2:15 min. Each block contained either the Speech or the SCN condition and appeared twice in each run, yielding a total of 8 blocks across two runs (4 of each condition). This design was selected based on our prior published results (Stoppelman et al., 2013) which demonstrated the efficacy of SCN as a baseline for localizing cortical responses during speech perception in individual participants, using the exact same stimulation parameters (block length and number of repetitions).
To assure subjects' attention throughout the scanning session, they were asked to engage in an incidental auditory detection task. Subjects were asked to press a response button with their left pointer finger each time they detected a target sound (a non-speech sound that sounded similar to the sound of a water drop). Target sounds occurred 2 or 3 times during each block. The target sound was scaled to the same intensity as the auditory stimuli and its timing was unpredictable. Before the initiation of each run, subjects were reminded regarding the auditory detection task. These instructions were delivered visually and were not time limited.
Prior to scanning, participants took part in a practice session outside the MRI scanner in order to get familiarized with the auditory detection task and the different stimulus types. The training session was similar to the actual experiment but shorter in length and contained different stimuli.
We assessed the task performance of all participants during scanning. Successful task performance of each subject was estimated by the accurate detection of the auditory target tones in each of the blocks.
fMRI data were collected using a standard echo-planar imaging (EPI) sequence (TR = 2500 ms, TE = 35 ms, flip angle = 90°, matrix size: 128 × 128, FOV = 220 cm, voxel size: 1.7188 × 1.7188 × 3). Scanning was performed on a 3 Tesla General Electric scanner (Signa HDxt, GE Medical Systems), located at the Wohl institute for advanced imaging at Tel Aviv Sourasky Medical Center. Thirty-two functional (T2* weighted) and anatomical (T1 weighted) oblique slices (3-mm thick, no inter-slice gap) were acquired along the AC-PC plane. Inplane anatomical slices were acquired using the same prescription and slice thickness as the functional slices, but with a finer inplane resolution (voxel size 0.8594 × 0.8594 × 3) and using a T1 contrast, for the purpose of inplane-volume registration (see MRI data analysis section). Slices extended from the top of the head towards the ventral surface, achieving full coverage of the IFG, STG, and covering most of the temporal and occipital lobes. In addition to fMRI and inplane anatomical images, we acquired high resolution axial T1 anatomical images (voxel size: 1 × 1 × 1 mm) for each subject, covering the entire brain using a 3D fast spoiled gradient echo (FSPGR) sequence. This high resolution volume anatomy was essential for defining individual regions of interest in volume (3 plane) space, as well as for visualizing the responses on a surface mesh representation of the cortex (see Fig. 2). Some participants took part in additional fMRI measurements and diffusion MRI measurements reported elsewhere (Kronfeld-Duenias et al., 2016, Civier et al., 2015). The length of the entire scan session was 45–60 min.
All analyses were applied at the single-subject native space. This choice was guided by the known variability that has been previously demonstrated in the location and size of language responsive regions (Amunts et al., 1999, Ben-Shachar et al., 2003, Ben-Shachar et al., 2004, Glezer and Riesenhuber, 2013), as well as by recent findings from Wymbs et al. (2013) showing that individual responses during stuttering moments do not converge with expected patterns of activation from meta-analyses of group data. We did not apply any averaging or spatial smoothing to the data (beyond that introduced by interpolation due to rigid body registration), in order to maintain the high spatial resolution provided by MRI. Data preprocessing and analysis of fMRI data were performed using mrVista (http://web.stanford.edu/group/vista/cgi-bin/wiki/index.php/MrVista), running on Matlab 7.11 (The Mathworks, Nattick, MA). BrainVoyager QX (Brain Innovation, Maastricht, The Netherlands) was used for visualization of the statistical parametric maps over inflated brains.
The high resolution anatomical images were first aligned to the AC–PC plane using a rigid body transform, resliced and resampled into 1 × 1 × 1 mm isotropic resolution. Subsequently, a second rigid body transform was calculated between the inplane anatomical images (which were acquired using the same prescription as the functional slices) and the resampled high resolution volume anatomy. This inplane-to-volume rigid-body transformation was later applied to the functional slices (see fMRI preprocessing section), allowing analysis of the functional data in 3D space. We further calculated an alignment between each individual's volume anatomy and the Montreal Neurological Institute (MNI) T1 template. Rather than applying this transformation, we computed the inverse transformation and applied it to assign each voxel in the participant's native space its matching coordinate in the MNI coordinate system. This allowed reporting the location of individual activation clusters using MNI coordinates.
The first five volumes of each run were excluded in order to allow signal stabilization. This resulted in a total of 49 volumes per run. Images were visually inspected for motion by examining the raw images from a single slice sequentially along the scan. For each subject, we calculated the maximum translation in each scan, as estimated by a rigid image registration algorithm that registered each functional volume to the first of the same run, minimizing the root mean square error between the volumes. Across all participants, this parameter never exceeded 1 voxel. We therefore preferred not to apply motion correction, to avoid the added smoothing that is introduced by interpolating the images. Baseline drifts were removed from the time series by high pass temporal filtering (cutoff frequency: 0.02 Hz).
For localization and visualization purposes, functional images were coregistered to the volume anatomy, using the same rigid body transform calculated between the inplane and volume anatomy. However, all calculations of statistical parametric maps were conducted in the inplane slices in native space. Similarly, voxel time courses were extracted from the inplane data, not from the volume registered (therefore interpolated) images.
We next calculated voxelwise general linear models (GLMs) in order to estimate the relative contribution of each condition to every voxel's time course. GLM predictors for each condition (Speech, SCN) were constructed as a boxcar function, indicating the times in which this condition was presented, convolved with a canonical hemodynamic response function. Statistical parametric maps (SPMs) were computed for each contrast of interest, based on voxel-wise t-tests between the weights of the relevant predictors.
The definition of ROIs relied on both functional and anatomical criteria: for each participant, activated clusters of voxels were selected within predefined anatomical borders (for a similar approach see, e.g., Ben-Shachar et al., 2003, Ben-Shachar et al., 2004, Ben-Shachar et al., 2007, Kanwisher et al., 1997, Stoppelman et al., 2013). The current study focused on the following ROIs: inferior frontal gyrus (IFG), posterior superior temporal sulcus (pSTS) and Heschl's gyrus (HG). ROIs were defined bilaterally in each subject. The anatomical borders of each ROI were as follows: (a) IFG: pars opercularis and pars triangularis (BA 44 and 45, respectively) (Amunts et al., 1999), (b) pSTS: the posterior third of the superior temporal sulcus, including BA 39 bordering BA 37 and BA 22 (Ben-Shachar et al., 2004), (c) HG: in the depth of the Sylvian fissure, on the superior temporal plane, posterior to the planum polare and anterior to the planum temporale. It was previously reported that the medial two-thirds of HG is more responsive to basic auditory stimuli, in contrast to the lateral third which was reported to be more cytoarchitectonically and functionally similar to the neighboring non-primary auditory cortex. Therefore, we defined HG as the medial two-thirds of the transverse gyrus (Morosan et al., 2001, Rademacher et al., 2001).
Functional regions of interest were selected within these anatomical borders by marking continuous clusters of voxels that passed the significance threshold (p < 10− 3, uncorrected; see Supplemental Fig. S1 for examples of ROIs in individual participants). IFG and pSTS were defined by the < Speech vs. SCN > contrast, while HG (primary auditory cortex) was defined by the < SCN vs. Rest > contrast. The mean time course of the activated voxels in each ROI was collected in the inplane slices, to avoid smoothing and interpolation. Time courses were transformed into discrete signal amplitudes by calculating the difference SignalPeak − SignalBaseline where Peak is a 5 s (3 TRs) window starting 5 s after block onset and ending 10s after block onset, and Baseline is a 5 s (3 TRs) window starting 5 s before block onset and ending at block onset. The mean amplitude of each condition was calculated as the average across the amplitudes of all the blocks of the same condition. Difference scores between the mean amplitudes [Speech − SCN] were entered into the statistical analysis. This was done for each participant for each ROI. Finally, to control for potential differences in cluster size, we conducted a follow-up analysis in which ROIs were redefined as fixed sized 3 mm radius spheres within the same anatomical borders.
We used BrainVoyager QX (Brain Innovation, Maastricht, The Netherlands) to generate inflated brain models from the high resolution anatomy of one control participant and one AWS. We generated contrast maps for the contrasts < Speech vs. SCN > and < SCN vs. rest > in the same participants and projected them onto the inflated brains, in order to allow better localization of the activations overlayed on the 3D reconstructed inflated brain model.
Comparisons of BOLD signals in AWS and controls during the speech perception experiment were performed using STATISTICA (Statsoft Inc., Tulsa, OK, USA). Subjects with brain activation greater than 3 standard deviations from the group-mean were excluded from analysis (this procedure excluded data from one subject in RpSTS and one subject in RHG). We conducted a three-way mixed ANOVA with Group (AWS, control) as a between-subjects variable and ROI (IFG, pSTS, HG) and Hemisphere (left, right) as within-subject variables. The dependent variable was the difference between the mean amplitudes [Speech − SCN]. We followed up on significant interactions with t-tests to understand the source of the interactions. Having found a three way Group × ROI × Hemisphere interaction, we conducted a 4-way mixed effect ANOVA with Group, ROI, Hemisphere and Condition (Speech, SCN) on the mean amplitudes of the conditions (not on their differences). This was done in order to find out the source of the group difference, whether it stemmed from a difference in the response to speech or to the auditory baseline (SCN). We controlled the False Discovery Rate (FDR; Benjamini and Hochberg, 1995) at q < 0.05 to account for multiple comparisons. Two additional follow-up analyses were performed: First, group comparisons of ROI sizes were calculated as 6 t-tests (one for each ROI), in order to assess the potential contribution of ROI size differences to any observed group differences in amplitudes. Second, we followed up on significant group differences in amplitude with t-tests using the amplitudes extracted from the fixed sized sphere ROIs (see ROI definition section above), in order to verify once more that the effect did not result from differences in ROI sizes.
In order to compare functional hemispheric lateralization between AWS and controls, we calculated a lateralization index (LI) for each participant in each ROI. LI was calculated according to the standard formula:
where Left and Right refer to the mean amplitudes (for Speech - SCN) of the BOLD signal measured in the relevant ROI within the left and right hemispheres, respectively. LIs range from − 1 to 1, where positive index values indicate left-hemisphere dominance and negative index values indicate right-hemisphere dominance. LIs between − 0.2 and 0.2 are considered bilateral (Springer et al., 1999). It is well known that LIs may vary with the statistical threshold (p-value) applied to the statistical parametric map from which the signal was extracted (Wilke and Lidzba, 2007). To avoid drawing conclusions based on an arbitrary fixed threshold, we computed LIs across a broad range of thresholds (from 10− 2 to 10− 8, step size of 0.1). This way, we could compare the general pattern of lateralization for each group across multiple thresholds. A two-way mixed ANOVA was conducted on these LI measures, with Group (AWS, control) as a between-subjects independent variable and Threshold (10− 2 to 10− 8) as a within subject independent variable.
Finally, lateralization indices are often calculated by comparing the number of activated voxels in each ROI, rather than the mean amplitudes (Wilke and Lidzba, 2007). We therefore repeated the same analysis with the dependent variable being the number of activated voxels in each ROI, rather than the mean amplitude.
Participants in the AWS group all achieved a score of 10 or higher on the SSI-III, providing clinical validation for their group classification, beyond their self-report. AWS showed a wide range of stuttering severity scores, as evident in the large ranges of % stuttered syllables and SSI scores (Table 1). AWS and controls did not differ significantly on age, handedness and education (Table 1). On-line follow-up of participants' responses during the first 3–4 blocks assured that they performed the task at high accuracy. Behavioral log files from 29 participants verified that they performed the auditory detection task easily and with high accuracy (> 90%) providing confirmation of attention maintenance throughout the experiment. Behavioral log files of 7 of the 36 participants were not available due to technical problems (4 AWS and 3 controls).
We examined individual contrast maps for two functional contrasts, speech perception compared to SCN, and SCN compared to rest. The typical distribution of brain responses in each of these contrasts is depicted in Fig. 2 for one representative AWS participant (Fig. 2A) and one representative control participant (Fig. 2B). In general, activation for speech perception (Speech vs. SCN; hot colors) was seen along the bilateral temporal cortices, encompassing STG, STS and middle temporal gyrus. Activation was also present in the IFG, superior frontal sulcus and precentral sulcus. Auditory responses (SCN vs. rest; cold colors) were consistently found in Heschl's complex, bilaterally. This figure also demonstrates a difference in the lateralization pattern of frontal responses: Frontal activations tend to be left lateralized in control participants, while a more bilateral pattern of frontal responses was observed in AWS. This is demonstrated at the level of individual participants in Fig. 2. Table 2 reports the mean and standard deviation of the MNI coordinates, the mean and standard deviation of the cluster size, and the number of participants who showed above-threshold activation, for each ROI and for each group. No significant group differences were found in cluster size (p > 0.2, uncorrected). See Supplemental Table S1 for individual amplitudes and Supplemental Table S2 for individual cluster sizes in each ROI across all participants.
To test the statistical significance of potential group differences during speech perception, we conducted a mixed ANOVA with Group (AWS, control), ROI (IFG, pSTS, HG) and Hemisphere (left, right) as factors explaining the differential activation for [Speech − SCN]. We found a significant Group × ROI × Hemisphere interaction (F(2, 44) = 3.329, p < 0.05). To explain this interaction, we calculated the simple effect of Group within each ROI. We found that AWS show a significantly greater activation compared to controls in two unilateral ROIs: RIFG and LHG (RIFG: t(25) = 2.957; LHG: t(34) = 3.548; p < 0.05, FDR-corrected for 6 comparisons). These group differences are shown in Figs. 3 and and4.4. Both group differences remained significant in a follow-up analysis in which ROIs were re-defined as fixed sized spheres.
Similar results were found in the matched sample (see Supplemental Fig. S2). In addition, LpSTS showed marginally stronger activation in AWS compared to controls, but this difference did not survive correction for 6 comparisons (See Supplemental Fig. S3).
The only other significant effect found in this ANOVA was a main effect of ROI (F(2, 44) = 54.889, p < 0.001), showing that across groups and hemispheres, there is a difference in speech related activation between the three ROIs examined (see Table 3). Post-hoc comparisons revealed significantly stronger activation in IFG and in pSTS in comparison to HG (p < 0.05 FDR-corrected for 3 comparisons between the ROIs). This effect is to be expected, considering that the dependent measure in this analysis is the difference in BOLD signals for [Speech − SCN].
Regions involved in processing language content (such as IFG and pSTS) are therefore expected to retain much of the speech related signals following subtraction of SCN responses. In contrast, in regions that process mostly sensory aspects of speech, such as HG, activation would be minimal for the same subtraction.
We next consider the source of the group differences found in RIFG and LHG. A significant group difference in cortical responses to [Speech − SCN] could result from enhanced activation in the speech condition, reduced activation in the SCN condition, or both. Supplemental Figs. S4 and S5 show the group-averaged responses in each of the ROIs for the Speech condition and the SCN condition, separately. As is evident in these figures, group differences appear to stem from differences in the responses during the Speech condition, not during the SCN condition. To assess the statistical significance of this effect, we conducted a mixed 4-way ANOVA, with Group (AWS, control) as a between-subject factor and ROI (IFG, pSTS, HG), Hemisphere (left, right) and Condition (Speech, SCN) as within-subject factors. The dependent variable was the signal amplitude for each condition (calculated relative to the Rest baseline). Indeed, this analysis found a significant Group × ROI × Hemisphere × Condition interaction (F(2, 44) = 3.328, p < 0.05). This interaction was explained by significant group differences found only in the speech condition in two of the ROIs (RIFG: t(25) = 2.583; LHG: t(33) = 3.963, p < 0.05 FDR-corrected). This means that, in both RIFG and LHG, AWS show over-activation during the Speech condition compared to controls. This result, too, was replicated in the matched sample.
Our next goal was to examine potential group differences in lateralization patterns between AWS and controls. We conducted lateralization analyses separately in each of the ROIs (IFG, pSTS, HG) with LIs calculated over the mean amplitude values (for Speech - SCN) in each homolog (see the Methods section). As shown in Fig. 5, AWS and controls showed significantly different functional lateralization in IFG over a large range of threshold values (see Supplemental Table S1 for individual LI values at a single threshold of –log(p) = 3). This observation was supported by a 2-way mixed ANOVA with Group and Threshold as independent variables, showing a significant main effect of Group across threshold values (F(1, 14) = 5.631, p < 0.05). When we move from less restrictive thresholds (i.e., 10− 2) to more restrictive ones (i.e., 10− 8), the control group exhibits a more left lateralized pattern of activation, while the AWS show a bilateral pattern, with a tendency towards the right hemisphere in several thresholds. These findings support the notion that AWS exhibit a more bilateral activation pattern in the IFG compared to controls, who demonstrate the expected leftward lateralization. We did not find any significant group differences in LIs within pSTS and HG (Supplemental Fig. S6). See Supplemental Fig. S7 and accompanying Supplemental text for similar results of a lateralization analysis over the number of activated voxels.
Our findings indicate that, during speech perception, adults who stutter exhibit significant over-activation in RIFG and LHG, compared to controls. The difference between the groups results from their differential responses while listening to speech (not from under-activation during the baseline condition). AWS showed a bilateral pattern of activation in the inferior frontal cortex, while controls displayed the expected left lateralized pattern of activation. These results demonstrate that neural differences between AWS and adults who do not stutter are not restricted to speech production but are also evident during speech perception.
Increased activation in right frontal regions of AWS has been repeatedly documented using speech production tasks (Blomgren et al., 2004, Brown et al., 2005, Chang et al., 2009, Giraud et al., 2008, Kell et al., 2009, Preibisch et al., 2003, Belyk et al., 2015, Budde et al., 2014). This aberrant activation pattern is often explained as a compensatory attempt to restore fluent speech. However, our results showed abnormal activation in RIFG even though there was no requirement to produce speech. While covert speech production is hard to rule out, it would make it hard for the participants to follow the sound detection task. Thus, covert articulation is not encouraged by our task. Therefore, the over activation we document in RIFG of AWS during comprehension is hard to integrate within an “articulatory compensation” interpretation.
We offer two potential interpretations for the differences found in RIFG responses during speech perception. Admittedly, both interpretations are rather speculative at this point. The first potential interpretation we consider suggests that the over-activation seen in the RIFG of AWS during speech perception may result from structural changes in the connections between the RIFG and its left homolog. There is some evidence that the pattern of structural connectivity between the frontal lobes may be different in AWS (Choo et al., 2011, Civier et al., 2015). If true, such differences may lead to reduced inter-hemispheric inhibition, similar to that observed in aphasic patients (Karbe et al., 1998, Netz et al., 1995, Rosen et al., 2000, Thiel et al., 2006). In aphasic patients, left perisylvian damage leads to disinhibition of the contralesional homotopic areas (Winhuisen et al., 2005). This release from inhibition allows the potential language capacities of the right hemisphere to be expressed. However, these alternative right-hemispheric pathways have not been trained to perform complex language tasks (Heiss et al., 2003, Karbe et al., 1998), so their involvement may turn out to be detrimental to the aphasic patient's recovery (Hamilton et al., 2011). Based on these observations in a completely different patient population, we propose that over-activation in RIFG during speech production could in principle hinder fluency in AWS, because these right frontal cortical regions are not adept for the efficient information transfer necessary for fluent speech production. This proposal converges with the finding that over-activation in the RIFG is associated with stuttering trait (see Belyk et al., 2015, Fig. 2, z = 10 and Budde et al., 2014, Fig. 2, z = 7), thus suggesting that this effect reflects a constant factor separating AWS from controls. However, these ideas are yet to be examined in a direct joint analysis of functional lateralization and structural connectivity in a large enough cohort of AWS and controls.
Another potential, but currently speculative, interpretation of our findings can be offered by alluding to the motor theory of speech perception (for review see Galantucci et al., 2006, Liberman et al., 1967, Liberman and Mattingly, 1985). This controversial theory (see, e.g., Lane, 1965 for a classical critique and Scott et al., 2009 for a more recent alternative interpretation) asserts that speech perception and production are closely intertwined, because listeners use covert production in order to perceive speech. Thus, speech perception is actually perception of the listener's phonetic gestures (which reproduce the speaker's utterances), rather than perception of acoustic patterns. There is behavioral and neural evidence for a link between perception and production in both humans and animals (Galantucci et al., 2006). This idea is further supported by imaging data suggesting an overlap between brain regions that participate in speech perception and production (Pulvermüller et al., 2006, Wilson, 2004). In line with this general view, one might argue that changes in the production pathways that lead to speech production impairments would also affect speech perception. It is therefore possible that the RIFG, which is consistently over-activated in AWS when they perform production tasks, shows abnormal activation during speech perception, due to its potential involvement in simulating speech production in this population. Obviously, this interpretation too is not directly tested in this study and therefore may not be proved or disproved by our data.
Our findings further document over-activation in the left HG of AWS during speech perception. Previous neuroimaging studies have shown that AWS exhibit aberrant auditory cortex activation during various speech processing tasks (e.g., Braun et al., 1997, Brown et al., 2005, Chang et al., 2009, Corbera et al., 2005, Kikuchi et al., 2011, Watkins et al., 2008). Findings from these studies suggest that deficits in auditory perception of AWS underlie the deficits seen in their speech production. Support for this notion comes from neuroimaging studies that manipulate auditory feedback during speech production in AWS (Cai et al., 2014). Such a manipulation has been found to enhance fluency on the one hand, while reducing the abnormal activation in the auditory system on the other hand (Braun et al., 1997, Fox et al., 1996, Fox et al., 2000, Stager et al., 2003, Toyomura et al., 2011, Watkins et al., 2008). This supports the causal role of the aberrant auditory processing in inducing stuttering.
Functional MRI studies concerning speech perception in AWS have reported mixed results regarding their auditory cortex activity. These studies either showed a decrease in activity of auditory cortex (Chang et al., 2009), an increase in a different, more ventral location (De Nil et al., 2008) or no group differences at all in the auditory cortex during auditory processing (Loucks et al., 2011). Our data, showing over-activation in the left HG of AWS during language perception, add to this conflicting pattern of results. Still, this finding fits well with prior proposals (e.g., Giraud et al., 2008), which suggest that, when AWS engage in speech production, the excessively activated motor cortex greatly inhibits the auditory cortex. In speech perception, we remove the motor component from this equation, thus there is no factor that actively attenuates the auditory cortex. We suggest that this situation may lead to the over-responsivity we observe in HG during speech perception.
Another possible explanation of our finding is neural timing differences in auditory processing during speech perception in AWS in comparison to non-stuttering controls. Studies in healthy and brain damaged patients have shown that the left and right homologs of the human auditory cortex are functionally segregated (Belin et al., 1998, Liégeois-Chauvel et al., 1999, Robin et al., 1990, Tallal et al., 1993, Zatorre and Belin, 2001, Zatorre et al., 2002). The left auditory cortex was shown to be more sensitive to rapid (temporal) acoustic changes, while the right auditory cortex is more sensitive to acoustic spectral features. Speech is highly dependent on rapidly changing broadband sounds. The greater temporal sensitivity of the left auditory cortex is optimal for speech discrimination, while the greater spectral sensitivity on the right is ideal for frequency processing (Zatorre et al., 2002).
In their MEG studies from 2010 and 2011, Beal and colleagues raised the possibility that timing deficits exist in the auditory cortex of both children and adults who stutter during speech perception. They found timing differences, but not amplitude differences, in the auditory M50 and M100 components of AWS engaged in a passive listening task (Beal et al., 2010, Beal et al., 2011). The authors attributed these timing differences to a deficiency in accessing the internal neural representation of speech sounds in AWS (Beal et al., 2010). This interpretation is consistent with the idea that the left auditory cortex of AWS has reduced temporal sensitivity, which leads to less-than-optimal speech discrimination ability, due to its difficulty in accessing the neural representation of speech sounds (Jansson-Verkasalo et al., 2014). Viewed from this perspective, the over-activation we observe in left HG of AWS could reflect its attempt to overcome this difficulty while handling rapid acoustic changes that occur during auditory perception of speech.
This study is limited in sample size, particularly of the AWS group (N = 11). With this sample size we are not powered to examine the interacting effects of many interesting factors, such as age, gender and stuttering severity. It is encouraging that we could replicate known differences between AWS and controls (e.g., over activation of RIFG) even with such a small sample, using an individual participant data analysis approach1. Even so, larger studies will be essential to establish the extent to which these effects generalize across individuals and samples and the degree to which they are driven by other factors such as the ones mentioned above. A separate limitation concerns the short stimulation protocol used in the current experiment. While we have established in a prior study that 4 blocks of each condition are sufficient for evoking robust responses in sensory and language-responsive cortex, such a short stimulation protocol is insufficient for split half assessment of the reliability of the signals obtained within participants (see, e.g., Wymbs et al., 2013, Gorgolewski et al., 2013). In future studies, longer stimulation protocols will be necessary to assess within subject reliability.
Naturally, by sampling adults who have stuttered most of their lives, we may not attribute a causal role to the differences we detect in triggering or maintaining stuttering. Some of these differences in brain responses may well have been caused by a lifelong struggle with stuttering, as we have suggested above. Future studies incorporating longitudinal designs will be necessary to establish which differences exist before stuttering arises and which differences emerge later in life. Even with such data at hand, it will be impossible to determine the direct causal role of early differences on stuttering behavior, because both may be driven by a third, unmeasured factor.
Finally, our measurements used standard, continuous scanning, which means that our speech stimuli were in fact embedded in structured noise (evoked by the scanner). Such sensory noise may interact with the perception of the relevant stimulus, and this interaction may be different in AWS and controls. Impairments in extracting signal from noise have been demonstrated in other developmental deficits such as dyslexia (Jaffe-Dax et al., 2015, Sperling et al., 2005) and autism (Zaidel et al., 2015). It would be advisable to examine the same paradigm under sparse temporal sampling conditions (Gaab et al., 2003). Notably, such scanning paradigms typically enforce the use of an event related fMRI design, which would limit power and may not allow the analysis of fMRI signals at the level of individual brains, as implemented here. We therefore maintain that future measurements using sparse sampling will be complementary to the ones acquired here.
The current study investigated differences in brain responses between adults who stutter and non-stuttering controls during speech perception. Our findings indicate functional differences in frontal and temporal brain regions related to speech perception and point to an imbalanced neural network for speech processing in AWS. These findings lend support to the view of developmental stuttering as a multifaceted disorder associated not only with changes in speech production but also with changes in speech perception. Better characterization of the causal role of these differences can be achieved via training studies that will selectively address the perception component and examine its effect on fluency and accompanying brain responses.
This study was supported by the Israel Science Foundation (#513/11 awarded to M.B.-S. and O.A.), the European Commission (International reintegration grant #231029 to M.B.-S.), the National Institute for Psychobiology in Israel, as well as by the I-CORE program of the planning and budgeting Committee and The Israel Science Foundation (grant #51/11). We thank the Israeli Stuttering Association (AMBI) for their help in reaching potential participants and we thank all the participants who took part in this study. This study has been performed as part of Tali Halag-Milo's MA thesis submitted to Ben-Gurion University of the Negev.
1Recent data from Wymbs et al. (2013) show little agreement in the locations of cortical responses of 4 individuals who stutter during stuttering moments. While this lack of agreement between participants is alarming, it is possible that better agreement between individual participants is attainable when the focus is placed on stuttering trait, not state, as is the case here.
Appendix ASupplementary data to this article can be found online at http://dx.doi.org/10.1016/j.nicl.2016.02.017.
Supplementary tables and figures.