Experiment 1. Responses to spatial perturbation
In Experiment 1, twenty-five subjects produced the multisyllabic utterance “I owe you a yo-yo” under three different auditory perturbation conditions: 1) the baseline (noPert) condition, which involved no perturbation of auditory feedback, 2) the Down perturbation, which exaggerated the downward sweep of F2 during the word “owe”, and 3) the Up perturbation, which diminished the same downward sweep of F2 (see Methods for details). The detailed timing and magnitudes of the local extrema of F2 under the noPert baseline can be found in . In , the dashed curves show the average F2 trajectories in the perturbed auditory feedback conditions; the solid curves show the average F2 trajectories produced by the subjects under the three feedback conditions. It can be seen that these three solid curves mostly overlap from the onset to the middle of the focus interval, which is not surprising given the latency (~120–200ms) involved in online feedback-based articulatory adjustments (e.g., Purcell and Munhall, 2006
; Tourville et al., 2008
, Donath et al., 2002
; Xu et al., 2004
). However, shortly after the F2 turning point in the syllable “owe”, at about 160 ms after the onset of the perturbation, the three curves begin to show a systematic pattern of divergence. Compared to the noPert (baseline) production, the average F2 trajectory produced under the Down perturbation showed increased values of F2 and the average F2 trajectory under the Up perturbation showed decreased F2 within the same time frame. The compensatory changes in the produced values of F2 can be seen more clearly by the solid curves in , which show the average change of F2 from the noPert baseline under the two types of perturbations. These compensatory changes in the magnitude of F2 were in the directions opposite to the directions of the auditory perturbation and lasted into the syllable [ju] (“you”), after the end of the perturbation.
Articulatory compensations under the spatial (Down and Up) perturbations
The compensatory responses reached statistical significance when the data were averaged along the un-normalized time axis and responses under the Down perturbation were compared to the noPert baseline (, indicated by the blue horizontal bar), or when the Down and Up responses were compared with each other (magenta horizontal bar). False Discovery Rate (FDR) (Benjamini and Hochberg, 1995
) was used to corrected for multiple comparisons. Due to slightly smaller average compensatory magnitude and greater inter-subject dispersion, the compensatory response under the Up perturbation did not reach statistical significance under the corrected statistical threshold.
visualizes the F2 compensations in un-normalized (real) time. The un-normalized time axis is suitable for a first-pass examination of the data and for estimating the latency of compensation, but it suffers from two shortcomings: 1) it doesn’t correct for the misalignment in time of the F2 extrema across trials and subjects, which may lead to unwanted smoothing of the pattern of compensation; and 2) it intermingles the F2 changes due to timing and magnitude (spatial) adjustments. In order to isolate spatial adjustments from timing adjustments, the time axis was normalized in a piecewise linear fashion (). The F2 trajectories from individual trials were anchored at the set of F2-extremum landmarks in ([i], [u]1, [j]1, [u]2, [j]2, [u]3 and [j]3); the F2 trajectories between adjacent landmarks were computed through linear interpolation of time. 250 uniformly-spaced interpolation points were used between each pair of adjacent landmarks. This piecewise normalization isolates compensatory corrections in the magnitude of F2 from the adjustment of the timing of the F2-extremum landmarks.
As illustrated in , the difference between the Down and Up conditions was statistically significant within a time interval between [u]1
(FDR = 0.05, magenta horizontal bar). In addition, the comparisons of the individual perturbation conditions (Down and Up) with the noPert baseline both reach corrected levels of significance (see blue and red horizontal bars in , respectively). Including the gradual buildup to the significant differences and the subsequent decay, the magnitude compensation spanned a longer time interval, from [u]1
. The largest F2 magnitude adjustments are seen near the temporal midpoints between [u]1
and between [j]1
. Interestingly, the compensation magnitude shows a “dip” near the [j]1
, an F2 maximum (see arrow F in ). The reason for this decreased F2 compensation magnitude around the semivowel is unclear, but may be related to a nonlinear saturation relation between articulatory position and formant frequency for this phoneme (Stevens, 1989
When the F2 changes were analyzed at individual landmark points, significant compensatory changes were again observed. These landmarks included the F2 minimum at [u]1
, the temporal mid-point between [u]1
, the F2 maximum at [j]1
, and the temporal mid-point [j]1
(). At each of these landmarks, RM-ANOVA indicated a significant main effect by perturbation condition (noPert, Down and Up, F2,58
=4.09, 11.4, 12.7, and 16.3; p<0.025, 1×10−4
for the four above-mentioned landmarks, respectively). Pair-wise Tukey’s HSD comparisons between the Down and Up conditions reached significance for all three landmarks as well (p<0.05 corrected for all landmarks). The ratio between the peak magnitudes of the compensatory response (thick solid curves in ) and the peak magnitudes of the auditory perturbation (dashed curves in ) was 18.9% for the Down perturbation and 9.7% for the Up perturbation. The magnitudes of the compensatory F2 adjustments are slightly larger under the Down perturbation than under the Up perturbation. This asymmetric pattern of compensation may be due to a greater need to avoid a predicted undershooting of the F2 target at the semivowel [j]1
than to prevent a predicted overshooting, since the semivowel [j]1
is associated with a local maximum of F2 that is reached from below. Despite the significance of these compensatory responses on the group level, there was considerable variability across trials and subjects. For example, for the landmark [j]1
, 20 of the 30 subjects showed trends consistent with the group average under the Down perturbations and 22 of the 30, under Up perturbation. This relatively high level of variability is consistent with previous findings based on real-time manipulation of formant feedback (Purcell and Munhall, 2006
; Tourville et al., 2008
In addition to these changes in the magnitude of F2, which reflected feedback-based control of the spatial parameters of articulation, we also observed significant changes in the timing measures of the F2 trajectory under the auditory perturbations. The [i]-[u]1 interval, namely the interval between the F2 maximum at [i] and the F2 minimum at [u]1, was affected significantly by the perturbation condition (F2,58=6.6, p<0.005) and was significantly different between the Down and Up conditions (p<0.05 corrected, post hoc Tukey’s HSD). On average, this interval shortened and lengthened under the Down perturbations and lengthened under the Up perturbation (). If the F2 minimum at [u]1 is defined as the end time of the syllable “owe”, this observation indicates that the Down and Up perturbations led to an earlier- and later-than-baseline termination of this syllable, respectively. In other words, these perturbations altered the articulatory timing within this syllable. In comparison, the [i]-[j]1 interval, namely the interval between [i] and [j]1, exhibited a similar, but non-significant trend of change (F2,58=1.33, p>0.25, ). Therefore, if the F2 maximum at [j]1 is regarded as the onset of the syllable [ju], it can be seen that the Down and Up perturbations didn’t significantly alter the onset timing of the following syllable (i.e., between-syllable timing).
After the experiment, the subjects were questioned about whether they were aware of any distortions of the auditory feedback during the experiment. Apart from the higher-than-normal loudness and the differences between hearing one’s own voices through natural auditory feedback and through playback or recordings, none of the subjects reported being aware of any deviations of the auditory feedback from the natural pattern.
Experiment 2. Articulatory timing adjustments under the temporal perturbations
Experiment 1 provided evidence for the involvement of auditory feedback in the online feedback-based guidance of the spatial aspect of multisyllabic articulation. As for the role of auditory feedback in controlling syllable timing, such a role was observed only in the control of within-syllable timing (), and not in the control of between-onset timing (). There are two alternative explanations for this pattern: 1) the syllable onset times may be completely pre-programmed, so that changes in auditory or other sensory feedback cannot affect the syllable-onset times; and 2) auditory feedback is utilized by the speech motor system in the online control of syllable timing, but the Down and Up perturbations used in Experiment 1 are not suitable types of perturbation to demonstrate such a role of auditory feedback.
In order to distinguish between these two possibilities, we used two novel types of perturbations of F2 trajectories, namely Accelerating (Accel) and Decelerating (Decel) temporal perturbations. Unlike the spatial perturbations used in Experiment 1, these temporal perturbations alter the timing of the F2 minimum associated with [u]1. We hypothesized that with these new perturbations, significant changes in the subjects’ articulatory timing would be observed, which would support a role of auditory feedback in the online control of both within-syllable and between-syllable timing.
The baseline values of the time intervals can be found in the rightmost column in . Unlike in Experiment 1, no significant change in the magnitude of F2 was observed in response to the Accel and Decel perturbations (not shown). However, in the temporal domain, the subjects’ articulation showed an asymmetric pattern of temporal changes under the Accel and Decel perturbations. Significant articulatory timing changes were observed only under the Decel perturbation, which resulted in increases in both measured intervals. This can be seen from the slightly delayed F2 minimum at [u]1 and F2 maximum at [j]1 in the average Decel curve compared to those in the average noPert curve in . As Panels B and C of show, the changes in the [i]-[u]1 interval and the [i]-[j]1 interval were quite small under the Accel perturbation, but were much greater and statistically significant under the Decel perturbation. The main effect of perturbation condition was significant for both intervals ([i]-[u]1 interval: F2,42=9.08, p<0.001; [i]-[j]1 interval: F2,42=13.7, p<0.0001); the changes of both intervals under the Decel perturbation from the noPert baseline were statistically significant (). On the individual-subject level, 16 of the 22 subjects showed timing-correction trends consistent with the group average.
Articulatory adjustments under the temporal (Accel and Decel) perturbations
These temporal adjustments were qualitatively different from the spatial compensation observed in Experiment 1. The timing adjustments in this experiment were in the same direction as the temporal perturbations in the auditory feedback; whereas the spatial corrections in Experiment 1 opposed the feedback perturbation. Across the 22 subjects in Experiment 2, the ratio between the change in the [i]-[u]1
interval in the subjects’ production under the Decel perturbation and the perturbation of that interval in the auditory feedback was 12.6±4.8% (Mean ± 1 standard error of the mean). The change in the [i]-[j]1
produced interval amounted to 26.1±6.6% of the perturbation of the [i]-[u]1
interval in the auditory feedback. These ratios of temporal adjustments were somewhat greater than the ratios of compensation under spatial perturbation observed in Experiment 1 and in previous studies that concentrated on quasi-static articulatory gestures (Purcell and Munhall, 2006
; Tourville et al., 2008
). ( here)
Changes in articulatory timing beyond the vicinity of the focus interval
In addition to the effects on the [i]-[u]1 and [i]-[j]1 intervals, which were relatively close in time to the perturbation, the Decel perturbation also caused timing alterations in later parts of the utterance. As shows, the timing of the six major F2 landmarks (including the minima of [u]1, [u]2 and [u]3, and the maxima of [j]1, [j]2 and [j]3) all showed significant lengthening under the Decel perturbation. These results indicate that although the manipulation of auditory feedback was applied locally on an early part of the sentence, the Decel had global effects on syllable timing within this utterance. This timing change beyond the perturbed section of the utterance was a consequence of the delaying in the earlier syllables and a lack of subsequent efforts of the speech motor system to “catch up”, the implication of which will be discussed below. By contrast, the Accel perturbation caused no significant change in any of the three time intervals. After the completion of the session, subjects were asked whether they were aware of any distortion of the auditory feedback. Six of the 22 subjects (27%, higher compared to the 0% ratio in Experiment 1) reported becoming aware of the temporal distortions during the experiment. The words they used to describe their subjective perceptions of the perturbations included “echo”, “out of sync” and “garbled”. However, there was no evidence that these six subjects’ showed timing adjustment responses that were different from the other subjects.