summarizes the average vowel duration and levels produced by the PFS controls and PWS under the noPert, Down and Up conditions. A two-way mixed ANOVA with vowel duration as the dependent measure yielded no significant main effect of participant group (F1,37
0.028; p>0.86), nor any significant main effect of perturbation condition (F2,74
1.848; p>0.16). Similarly, there was no significant main effect of group (F1,37
0.220, p>0.65) or perturbation condition (F2,74
0.328; p>0.72) on vowel level.
Summary of the vowel durations and levels produced under the three perturbation conditions (noPert, Down and Up) by the PFS (control) and PWS participants.
The Down and Up F1 perturbations to the AF used in the experiments were based on fixed ratios of 20%. Under the Down perturbation, the average absolute perturbation magnitudes were 115.6±3.4 and 119.6±2.9 (mean±1 SEM) Hz in the PWS and PFS, which did not differ significantly (p>0.43, two-tailed t-test). Similarly, there was no significant difference in the absolute magnitude of the perturbations in the PWS (113.6±3.5 Hz) and the PFS (116.7±3.1 Hz) (p>0.5) under the Up perturbation.
Both groups of participants showed statistically significant compensatory responses to the perturbations of the AF of F1 during the production of the monophthong [ε] embedded in the CVC words “head” and “pet”. In , each red curve shows the difference between the average F1 trajectories produced under the Down and noPert conditions by a PFS control subject; similarly, each blue curve shows the difference between the average F1 trajectories produced under the Up and baseline conditions. As can be seen in this panel, there was considerable between-subject variability in their responses to the AF perturbations. However, the group-average responses () showed a systematic pattern of change of F1 in the productions in directions opposite to the perturbations, i.e., a gradual decrease under the Up perturbation and a gradual increase under the Down perturbation. Frame-by-frame t-tests were used to delineate the intervals in which these deviations from baseline were statistically significant at the group level. The light red parts of the horizontal bar in indicate time intervals in which the difference between the F1 trajectories produced under the noPert and Up conditions reached statistical significance. Similarly, the light blue parts of the horizontal bar in the same panel indicate intervals in which the produced F1 trajectories under the noPert and Down conditions reached statistical significance (p<0.05 uncorrected, two-tailed t-test). In both horizontal bars, the darker-colored parts indicate the time intervals in which statistical significance was reached on a corrected level (FDR
0.05). As can be seen from these bars, significant deviations from baseline commenced approximately 150–200 ms following the onset of the vowel (the onset of the perturbation). The magnitude of the compensation increased with time, and was approximately 3% (i.e., ~15% of the perturbation) in the PFS group and 1.5% (i.e., ~7.5% of the perturbation) in the PWS group at 300 ms following perturbation onset.
Compensatory adjustments of produced F1 trajectories under the Down and Up perturbations in PWS and PFS participants.
A seemingly puzzling aspect of the result from the PFS group is the significant deviations from the noPert baseline in the participants’ F1 productions in the same directions as the perturbations. These deviations can be seen in the first 100 ms following the onset of the perturbation (see the left part of ). These deviations reflected cross-trial adaptation similar to that shown in previous AF perturbation experiments that used sustained auditory perturbations (e.g., 
), which demonstrated offline updating (i.e., adaptation) of the motor programs for the production of vowels. Due to the block-by-block randomized organization of the baseline, a perturbation trial always followed another perturbation trial of the opposite type, if it followed any perturbation trial in the same block (see and the first sub-section of the Materials and Methods section). As a result, if a perturbation trial is preceded closely by another perturbation trial in the same block, the early part of the subject’s production in this trial may contain an adaptation response to the perturbation in the previous perturbation trial, which may be misrecognized as an apparent “early following” response to the perturbation in the same trial. Since such adaptive updating after-effects tend to decay during unperturbed productions (e.g., 
), a perturbation trial separated from the preceding perturbation trial by a larger number of baseline trials should show a weaker apparent early-following response of this type.
Consistent with this reasoning, when we included only the perturbation trials that were either preceded by no perturbation trial in the same block (e.g., the Down trials in Blocks 1 and 3 and the Up trial in Block 2 of the example in ) or separated from the preceding perturbation trial in the same block by at least three trials (e.g., the Down trial in Block 2 of the example in ), the apparent early following response disappeared (). We will refer to this subset of data as the limited data set. It needs to be pointed out that the cross-trial adaptation effects were present not only in the Down and Up trials, but also in the noPert trials preceded closely by perturbation trials. However, since the noPert trials were preceded by Down and Up trials with equal probability, owing to the randomization of trial order, and because of the symmetry of the adaptation between the Down and Up directions, the cross-trial effects tended strongly to cancel out when all noPert trials were included to form the baseline condition.
Interestingly, this cross-trial adaptation effect was not as pronounced in the PWS group as in the PFS group. This can be seen clearly by comparing the left part of with the right part, in which the F1 changes in the first 100 ms were small and not significantly different from zero. To investigate the statistical significance of this between-group difference in cross-trial adaptation, we computed the average F1 changes from the no-perturbation baseline in the first 50 ms following the onset of the perturbation in the perturbation trials that were separated from the same-block preceding perturbation trials by two or fewer trials. The cross-trial adaptive response in the PFS group can be clearly seen in the black curve of these changes were in the same directions as the perturbations, and as mentioned above, may be mistaken as “early following responses”. However, as can be seen from the purple curve of the same figure, these changes were smaller in absolute value and not significantly different from zero in the PWS group. We performed a two-way mixed ANOVA with the between-subject factor GROUP, which took the values of [PWS, PFS], and the within-subject factor SHIFT, which took the two levels [Down and Up]. The result of the ANOVA indicated a significant GROUP×SHIFT interaction (F1,37
0.022), as well as a significant main effect by SHIFT (F1,37
0.020). These results provide statistical confirmation of the observation that the cross-trial adaptation was weaker in the PWS than in PFS.
Different cross-trial adaptation responses in the PWS and PFS groups.
As Panels B and C of show, the PWS showed compensatory responses that were qualitatively similar to those of the PFS: on average, the F1 trajectories in the subject’s productions deviated from the baseline values in directions opposite to the Up and Down perturbations. These compensatory F1 changes became significant at approximately 150 ms following perturbation onset. The same conclusion can be reached independent of whether the full () or limited () data set is examined. However, owing to the small size of the compensatory F1 corrections under the Up perturbation, significant F1 changes at the corrected level were reached only under the Down perturbation for the PWS. Comparing the data from the PWS and PFS in , it can be seen that the magnitude of the compensatory responses were appreciably smaller in the PWS group than in the PFS group. The same conclusion can be drawn if the limited data set is considered (comparing ).
To examine the statistical significance of the difference in magnitude of the compensatory responses between PWS and PFS, we computed the composite response curve for each participant by subtracting the Up response (e.g., red curves in ) from the Down responses (e.g., blue curves in ). This approach to reducing the dimensionality of the data was justified by the fact that the compensatory F1 corrections were largely symmetrical with respect to the perturbation directions in both the subject groups. shows the average composite response curves in the PWS and PFS with the purple and black curves, respectively, computed on the full data set. showed the same average composite curves computed on the limited data set. It can be seen that regardless of whether the full or the limited data set was used, the magnitude of the composite response curves was smaller by approximately 47% in the PWS than in PFS at 300 ms following vowel onset.
Average composite response curves from the PWS (purple) and PFS (black) groups.
To systematically analyze the statistical significance of the compensatory F1 changes and the between-group difference in the compensation magnitude, we performed a mixed analysis of variance (ANOVA). The dependent variable of the ANOVA was the Down-Up contrasts in the produced F1s of the participants, i.e., values in the composite response curves. These F1 contrasts were computed on 11 equally spaced time points between 0 and 300 ms following vowel onset. Note that the separation between adjacent time points was 30 ms, greater than the size of the smoothing Hamming window (28 ms), hence they did not cause correlations in the error terms. The 300-ms time limit was chosen because it was the lower bound of the vowel-duration range the participants were instructed to achieve. The two independent variables that entered the ANOVA were 1) GROUP, a between-subject factor, with two levels (PWS, PFS), and 2) time point (TPT), a within-subject factor, with the 11 levels that correspond to the above-mentioned eleven time points. show the interval-averaged F1 compensation curves, under the full and limited data sets, respectively.
In this ANOVA, we were primarily interested in the main effect of TPT and the interaction between GROUP and TPT. The TPT main effect evaluates the significance of the compensatory F1 production changes when the data are collapsed across the PWS and PFS, whereas the GROUP×TPT interaction constitutes a test of the between-group difference in the trends of F1 change with time, i.e., magnitude of the compensatory responses.
The TPT main effect was highly significant regardless of the data set used (full data set: F10,370
; limited data set: F10,370
), clearly indicating the significance of the online compensatory adjustments of F1 in response to the AF perturbations when the data were collapsed across the two groups of subjects. In addition, the GROUP×TPT interaction reached significance for both data sets (full data set: F10,370
0.006; limited data set: F10,370
0.049, both with Huyhn-Feldt correction). In the post hoc
comparison following the ANOVA with Tukey’s least significant difference (LSD) approach, the between-group difference in the latest two average intervals (270 and 300 ms following vowel onset) reached statistical significance under the limited data set, which confirms our informal observation earlier of the weaker-than-normal F1 compensation in PWS compared to the PFS responses (). The post hoc
comparisons for the full data set reached significance in the latest time point (300 ms), as well as in several earlier ones (before 150 ms from vowel onset), the latter of which confirmed again the significance of the weaker-than-normal between-trial adaptation in PWS than in PFS.
Consistent with previous findings (e.g., 
), compensatory responses to the auditory feedback could not be observed in all perturbation trials, despite the statistically significant compensation in the group-average data (). To characterize the between-trial variability in the responses and how it differed between PWS and PFS, we categorized the perturbation (Down and Up) trials into three categories: a. compensating, b. unresponsive and c. following. The average F1 in the last 50 ms of the [0, 300]-ms time interval following vowel onset was computed in each trial, and referred to as the F1end
. The mean and standard deviations (SD) of the F1end
s under the noPert condition was computed for each subject. For each perturbation trial, if its F1end
deviated by more than one SD from the mean in the direction opposite to the perturbation, it was categorized as compensating; if its F1end
deviated by more than one SD from the mean in the same direction as the perturbation, it was categorized as following; otherwise the unresponsive category applied. Only the limited data set was used in this analysis.
As summarizes, under the above criterion, the proportions of compensating responses were small (<30%), in both the PWS and PFS groups. These proportions were smaller compared to previous findings based on pitch perturbation (e.g., 
), which may be due to differences in pitch and articulatory control and/or due to the relatively short analysis window (300 ms) used in the current study. But these proportions were significantly greater than what would be expected if there were no differences in the distribution of F1end
between the noPert and perturbation conditions (15.9%; PFS: p
0.00016, PWS; p
0.0037; one-sample two-tailed t-test). The average proportion of compensating responses was slightly lower in the PWS than in the PFS, but this difference was not significant (p
0.152, two-tailed t-test). On average, the PWS group showed a greater proportion of trials in the unresponsive category compared to the PFS, but this difference only approached significance (p
Proportions of compensating, unresponsive and following responses under the Down and Up perturbations in the two groups of subjects.
To examine whether there was any systematic relationship between compensation magnitude and stuttering severity in the PWS group, we performed parametric and non-parametric correlational analyses between the Down-Up F1 fraction difference at 300 ms following vowel onset and the SSI-4 composite score across the PWS. No significant correlation was found, either under a linear Pearson product moment correlation (full data set: R2
0.70; limited data set: R2
0.90) or under a Spearman’s rho correlation (full data set: ρ
0.73; limited data set: ρ
0.79). When the sub-scores of SSI-4, including the frequency, duration, and concomitants scores, were correlated with the compensation magnitude, no significant correlations were found, either (full data set: R2
0.018, 0.015, 0.039 and p
0.56, 0.59, 0.39; limited data set: R2
0.0040, 0.016, 0.057 and p
0.78, 0.58, 0.30 for frequency, duration and concomitants scores, respectively).
To address the question of whether the compensatory responses to the AF perturbation are more variable on a trial-to-trial basis in PWS than in PFS, we computed the across-trial standard deviation (SD) of the F1 value produced at 300 ms following vowel onset by each subject in each perturbation condition. shows the mean SDs (±1 SEM) in each group as a function of perturbation condition. As can be seen in this figure, the PWS and PFS showed similar F1 SDs, which were not significantly different. This observation was confirmed by a group-level repeated-measures ANOVA with a between-subject factor GROUP (PWS, PFS) and a within-subject factor SHIFT (noPert, Down, Up). The main effect of GROUP did not reach significance (limited data set: F1,37
0.20, p>0.65; full data set: F1,37
, p>0.99); nor did the main effect of SHIFT (limited data set: F2,74
1.83; p>0.16; full data set: F2,74
1.43, p>0.24). The GROUP×SHIFT interaction was also non-significant (limited data set: F2,47
1.12; p>0.33; full data set: F2,74
0.052; p>0.95). Therefore there was no evidence that the compensatory response to AF perturbation was more variable in PWS than in PFS.
Variability of F1 production in PWS and PFS under the three perturbation conditions.
In rationalizing the weaker-than-normal response in PWS observed above, two possibilities need to be discerned: 1) the response latencies to the online perturbations of AF were longer in PWS than in PFS, and the belated onset of response could have caused the smaller magnitudes of compensation in PWS when comparisons are made on a temporal basis; 2) PWS and PFS had similar response latencies, and the smaller-than-normal compensation magnitudes were due to slower increase of the compensatory changes with time after the response onset. To distinguish these two possibilities, it was necessary to compute the latencies of the participants’ compensatory responses.
There is currently no widely accepted method for computing response latencies to auditory perturbation. In the current study, the latencies of the individual participants’ compensatory responses were computed based on a least-squares two-segment piecewise linear spline fit. The Cohen’s d scores for the Down-Up contrasts were computed as a function of time, which yielded the Down-Up Cohen’s d curve. Briefly, Cohen’s d is a measure of the statistical separation between two sets of random variables. It is defined as the ratio between the difference in the mean values and the composite standard deviation of the two sets of measurements. This approach is based on the assumption that the latency of response is approximately equal under the Down and Up perturbations. We are aware of no theoretical argument or empirical evidence that argues against this assumption.
Obviously, it was meaningful to define response latencies only for subjects who showed significant compensatory responses to the AF perturbation. Here we applied the following criterion for significant compensatory response: the Down-Up Cohen’s d at 300 ms following perturbation onset is greater than 0.3. Under this criterion, 11 of the 21 PWS and 14 of the 18 PFS were judged as compensating significantly when the full data set was analyzed. The ratio of compensating subjects was lower in the PWS (52.4%) than in the PFS (77.8%). However, this between-group difference in percentage was non-significant (p
0.18, two-tailed Fisher’s exact test). Similar results were found for the limited data set: 15 of the 21 PWS (71%) and 16 of the 18 PFS (89%) were determined as significantly compensating, and the between-group difference in the percentage of non-compensating subjects also did not reach statistical significance (p
Note that this approach of evaluating the existence of compensatory responses in individual participants based on Cohen’s d scores is superior to an alternative, simpler approach based on the absolute magnitudes of the difference between the F1 data from the Down and Up conditions, in that it focuses on the statistical separation between the productions under these two conditions and hence was more robust against spurious fluctuations in the F1 trajectories.
A two-piece linear spline with three adjustable parameters was fitted to the individual participants’ Cohen’s d curves as follows:
The three adjustable free parameters included 1)
the baseline value before the onset of the perturbation; 2)
the latency of the response; and 3)
the slope of the linear increase of F1 change with time. The fmincon
function of the MATLAB Optimization Toolbox was used for the least-square-error fitting. A conservative lower limit of 50 ms was imposed on
during the optimization, based on the shortest latencies that have been reported in prior studies of pitch and formant perturbation 
. An example Cohen’s d curve is shown in , along with the fitted linear spline. The response latency was determined as the value of
in the resulting fit. Only the limited data set was used in computing the response latencies, because the presence of the cross-adaptation effect in the full data set may lead to under-estimations of the latencies.
Calculation of the latencies of the compensatory response to the Down and Up perturbations in individual PWS and PFS participants.
We used this more-involved method of fitting a two-segment spline, rather than the simpler approach based on an absolute threshold of Cohen’s d score, because it served to prevent the response magnitude from biasing the calculated latency. If a fixed threshold were used and the time at which the Cohen’s d curve first overcomes this threshold were calculated as the response latency, then the calculated response latencies of the participants with smaller response magnitudes would be longer than those with greater response magnitudes, even if the true onset times of the responses are equal. This is an especially important issue in the current study, because we have observed significant and substantial between-group differences in the magnitudes of the compensatory responses.
Panels Bof shows a comparison of the mean response latencies of the compensating subsets of both groups. The average response latencies were approximately 150–160 ms, and showed no significant between-group difference (p
0.27, two-tailed two-sample t-test). Therefore the weaker-than-normal compensatory response to the auditory perturbation observed before () was not attributable to slower onset of the online compensation, but instead was more likely due to a weaker gradual increase in the F1 deviation from the baseline values in the PWS compared to the PFS.
There is evidence that the acuity of the sensory systems can affect the degree to which the motor systems utilize the corresponding sensory feedback for motor control and learning (e.g., 
). In speech motor control, speakers who have better auditory acuity to vowel formant differences show greater adaptation to the perturbation of AF during the production of the monophthong [ε] 
. Therefore, the under-compensation we observed in the stuttering participants may be attributable to worse-than-normal auditory acuity for vowel formant (F1) differences. This explanation seemed possible in the light of previous reports of abnormal auditory processing of speech sounds in PWS (e.g., 
As described above, we tested this possibility by measuring the participants’ JNDs of F1 of the vowel [ε]. An adaptive staircase procedure (see Methods for details) was used. As shows, the F1 JND was on average 10.3% higher in the PWS group than in the PFS group, indicating that on average, the PWS participants were slightly worse at detecting F1 differences of the vowel [ε] as compared to PFS. However, this difference was not statistically significant (p
0.56, two-tailed t-test). Moreover, there was no evidence for systematic cross-participant correlations between their auditory acuity and the magnitude of their compensatory F1 production changes. This held true for the pooled group of PWS and PFS, and for each of the two groups separately (). These results indicate that PWS’s weaker-than-normal compensation for online perturbations of AF was not the result of an auditory perceptual deficit (i.e., inability to detect the shifts in AF), but instead reflect functional defects in the AF-based online control of speech movements.
Auditory acuity to differences in F1 of the vowel [ε] and its relation to the magnitude of the compensation to perturbation.