|Home | About | Journals | Submit | Contact Us | Français|
It is known that the speech of people who stutter improves when the speaker’s own vocalization is changed while the participant is speaking. One explanation of these effects is the disruptive rhythm hypothesis (DRH). DRH maintains that the manipulated sound only needs to disturb timing to affect speech control. The experiment investigated whether speech that was gated on and off (interrupted) affected the speech control of speakers who stutter. Eight children who stutter read a passage when they heard their voice normally and when the speech was gated. Fluency was enhanced (fewer errors were made and time to read a set passage was reduced) when speech was interrupted in this way. The results support the DRH.
There are many ways of temporarily inducing speakers who stutter to speak fluently. Methods include regulating breathing , speaking in time with a metronome [2, 3, 4], presentation of visual [5, 6, 7] or tactile  stimuli, and alterations to the physical characteristics of a speaker’s own vocal output before the sound enters the ear (commonly known as altered auditory feedback, AAF ). The effects these alterations have in general are described next. There have been many accounts of why AAF improves speech control in speakers who stutter and fluent speakers and some of these are then described. A prediction drawn from one of the accounts is identified, tested and confirmed. The discussion considers how this account could apply to other ways of inducing speakers who stutter to speak fluently.
Auditory feedback has been altered in fluent speakers in three principal ways: a) timing, b) spectral properties, and c) loudness level. Lee , for instance, examined delayed auditory feedback (DAF) where a brief period of time is introduced between when a speaker says something and when it is heard. He reported that DAF caused fluent speakers to take a long time to produce a given message and make several types of speech error . Spectral content can be changed in different ways. Currently, the most popular method is to shift the speech spectrum up or down before it is replayed to the speaker . This technique, known as frequency shifted feedback (FSF), is one form of alteration to spectral content. Fluent speakers appear to make some compensation for spectral alterations. For instance, Burnett, Senner and Larson  reported that fluent speakers compensated partially by changing their voice pitch when auditory feedback was shifted in pitch. Level can be changed by amplifying or attenuating the sound of the speaker’s voice. Fluent speakers reduce voice level when the voice is amplified , which is sometimes called the Fletcher effect . Fluent speakers increase voice level when the voice is attenuated, which is referred to as the Lombard sign . As with FSF, there appears to be partial compensation for the changes the experimenter made .
In contrast to the effects seen with fluent speakers, each of the ways of manipulating speech output considered above has been reported to improve the fluency of speakers who stutter. DAF was the first alteration reported to improve the speech control of speakers who stutter and has been used as a way of inducing fluency in these speakers in treatment programs . Early work investigated delays of 50 ms and upwards because the equipment available could not produce shorter delays . However, delays above 50 ms have the undesirable side effect that they make speech output sound drawled. Recent studies that have used electronic devices have reported significant improvements using short DAF-delays without any readily apparent slowing of speech [15, 16].
The first study that reported that FSF improved the speech control of speakers who stutter was Howell et al. . Subsequent work led to a commercially-available device for treating stuttering that delivers this form of alteration (as well as short delay DAF) .
Several types of masking noise have been employed with speakers who stutter, ranging from continuous aperiodic noise  through to the Edinburgh masker, which produces a buzz that occurs just when the speaker vocalizes . All forms of masking have been reported to improve the fluency of speakers who stutter. The Edinburgh masker can be considered as a form of FSF (the buzz has a different spectral content to the voice). One undesirable side effect of masking is that it produces a Lombard sign .
The majority of theories of speech control assume that speech information is retrieved from the AAF signals and that changes to speech control in fluent speakers and speakers who stutter, result when the speakers use this information to control speech. Early explanations supposed that the brain employs feedback control to control the voice. The essential feature of feedback theories for speech is that the current speech output is sent back to a sensing device that controls future output . The information that arises at this sensing device is used to correct an activity when it exceeds predetermined limits. In the case of DAF procedures, the sound of a speaker’s voice is transformed by delaying before it reaches the sensing device, so the segment of speech that is heard at a particular time is different from the segment that the speaker intended to produce at that time. A feedback monitoring explanation maintains that this discrepancy is detected via the ears and the corrections a fluent speaker then makes, introduce, rather than remove, errors. The improved speech control in speakers who stutter might arise because aberrant feedback is corrected.
If feedback interpretations are correct for fluent speech, then the delays at which errors are observed, indicate what segments are involved in speech control. The notion behind this is that a delay equal to the length of the unit used for output, results in the speaker getting feedback about the preceding segment when he or she is producing the next segment. Using this idea, Black  argued that since a delay of 200 ms is most disruptive on speech control and that as this corresponds roughly with the length of a syllable, then the unit used by speakers to monitor feedback is the syllable. Feedback theories were shown to be unsatisfactory, basically because voice output takes time to process, that would slow speech control (the speaker has to receive information before it can be established whether it was produced correctly, and speakers do not have such processing pauses after each item has been produced .
In the light of such problems, other accounts have been proposed. Some authors have argued for an auditory feedback processing mechanism that operates at the prosodic level [24, 25, 26]. Prosodic processes extend over long time periods. Thus, the problem of obtaining auditory feedback early enough would not create such difficulties if prosodic units are used for feedback control as it is for the view that syllables are the unit that is used. 2) Borden  argued that auditory feedback is not used all the time, but in circumscribed situations. These include when language is being acquired (either developmentally or as a second language in adulthood), and when the speaker’s voice is altered. In all these cases speech rate is slow, which could be because feedback is being monitored. 3) Some authors adopted feedforward, instead of feedback, models . These models maintain that movement errors are continuously computed and used (when they arise) as correction signals. They get round the problem of feedback being slow by doing the work in advance of the movement. Such a model has been applied to one of the situations Borden  regarded as reliant on auditory feedback (developmental speech acquisition) by Guenther .
All these views share the property that all or part of speech is used at some point during the control process (the mirror neuron view considered in the discussion also subscribes to this position). The disruptive rhythm hypothesis (DRH) , does not make the assumption that speakers need to fully analyze the contents of their speech output to control the constituent speech sounds in on-going productions. That is, speakers do not have to determine they have produced the phones /b/, then /ae/ and then /t/ correctly, to ascertain that they have produced the word ‘bat’ properly before they can go on to the next sound they wish to produce. There are several reasons for maintaining that speech content is not required, including: 1) Fluent speakers do not require auditory feedback to control their voice as speakers with complete hearing loss can produce fluent speech ; 2) The information about vocalization in auditory output is degraded by the bone-conducted sound that occurs concurrently and which contains little information about speech segments, as shown by Howell and Powell . (See also [31, 32] for further discussion of these and other problems of feedback control.) DRH maintains that altering voice output affects speech control because it creates a secondary rhythmic signal. This signal disrupts timing control. This contrasts with the theories considered above which maintain that speakers continue to recover speech content for use in control from transformed signals. Some of the support for this account as it applies to fluent speakers is now given.
From a rhythmic perspective, DAF involves speaking one utterance while hearing another that is out of synchrony with it (in contrast with speaking in normal listening conditions where the sound that is heard has a rhythm in synchrony with speech). Howell et al.  considered two situations involving voice control to suggest that synchronous activities are easy to perform and asynchronous ones are difficult. The first was canon singing, which is easy (as shown by the fact that it is one of the first forms of song that children are taught). The second is a form of medieval song called hoquetus. This involves each singer producing a note at the offset of another singer’s note, and is difficult to master. The observation about canon singing points to the fact that it is easy to produce synchronous activities whether or not those activities derive from the speaker’s own speech. The case of hoquetus shows that other on-going rhythms out of synchrony with one’s own singing, makes control difficult. In terms of rhythmic relationships, DAF is analogous to hoquetus, so disruption to rhythm could also apply to the effects of DAF on speech control. The hoquetus example could also account for why the disturbance under DAF varies with delay. The highly disruptive hoquetus form of singing occurs when one singer’s note finishes as the next singer’s note commences. This would correspond to the DAF situation in which speech is delayed by the length of a syllable (to which the note is equivalent). A delay equal to the length of a syllable is maximally disruptive in DAF . Howell et al.  suggested that this delay is most disruptive because of the rhythmic relationship between what is heard and what is spoken, rather than because feedback about the wrong syllable is sent when this delay is used (as in standard feedback accounts ).
There have been several experimental studies that show that the rhythmic properties of the second sound determine the amount of disruption under alteration conditions. For instance, Howell and Archer  substituted a non-speech noise at the point where the delayed sound occurred under DAF. This non-speech sound cannot be used to determine that the correct phones have been produced, as required in a feedback account. Nevertheless, they reported that this non-speech sound produced equivalent disruption to DAF of a speech sound and argued that this is because the two conditions cause equivalent rhythmic disruption.
In a second example that showed that DAF may have its effects because of the disruption it causes to rhythm, Howell et al.  argued that interrupting speech (by gating it on and off) produced feedback of sound that is similar in some respects to what they considered to occur under DAF (disruption to rhythm, without any part of speech being delayed). They argued that the consequences of delaying a sound are to displace speech so that there are occasions when a person will not receive feedback when speaking; at other times a person will receive feedback when not speaking. Switching speech on and off would produce the first of these consequences without having the speech itself delayed. Interruption of speech has been extensively studied in speech perception [35, 36]. There is some indication in this literature that interrupting the speech of a speaker causes speech control to suffer in fluent speakers in a similar way to delaying it: “An incidental ... observation concerning the effect of interrupting sidetone on a talker’s normal rate of speaking. If slowly interrupted speech is fed back at a high intensity, there is a strong tendency to slow down. At 1 interruption per second, the talker tries to drawl out his words until each sound is heard at least once. At somewhat higher rates of interruption, he tends to synchronize his vowels with multiples of the frequency of interruption.” [35, p.169]. Miller and Licklider’s description would predict the most slowing when speech is interrupted once per second (i.e. 500 ms on and 500 ms off). Howell et al.  showed that interrupting the speech of fluent speakers produced similar timing disruption to DAF.
As argued at the outset, manipulations of auditory feedback which adversely affect fluent speakers improve the fluency of speakers who stutter. The gating manipulation remains to be investigated in people who stutter. In addition, the work of Howell et al.  suggests that, as it has some similarities with DAF, it would affect the fluency of these speakers in the same way as DAF (i.e. improve their fluency). This prediction is tested in the experiment. If the prediction is upheld, it would document another situation where fluency is enhanced in speakers who stutter when feedback is altered, and, insofar as the signal involves timing disruption alone, would provide evidence in favor of the DRH.
Eight children who stutter took part, 5 male and 3 female, with an age range of 9 years 5 months to 15 years 6 months, and a mean age of 13 years 3 month. All the children attended main stream schools, had no special educational needs and eyesight was good enough so none wore glasses. Standard audiograms showed that hearing was within normal limits. All 3 females and 4 of the 5 males had attended a one-week intensive therapy course that delivered Lidcombe therapy (the other male had attended a two-week intensive course that delivered the same treatment). These episodes of treatment were given a minimum of two years before participation in the experiment. Stuttering rate, obtained as part of routine screening of these speakers when speaking in normal listening conditions and assessed using SSI-3 , was between 3% and 19% of the syllables.
Speech output was transduced by an AOI condenser microphone type ECM 1005. The output was led to one input of a precision linear multiplier (Burr-Brown, 4213 PM). The other input was a square-wave gating signal which switched speech on and off at 2 Hz (Howell et al.  reported that this interruption frequency produced most effect in fluent speakers). The altered sound was replayed over Sennheiser HD250 linear 2 headphones. Level at the headphones was adjusted so that the level of the uninterrupted speech was the same as that at the microphone (zero gain). To do this, the speaker phonated the vowel /ae/ continuously with and without headphones on and adjusted the level in the headphones so that it was subjectively as loud when headphones were worn compared to normal listening.
The 129-word, 167-syllable, phonetically-balanced ‘North Wind and the Sun’  passage was used. Speakers first read this from a printed sheet they held at approximately 20 cm distance, under normal listening conditions, then in the interruption condition and finally again under normal listening conditions. The two readings under normal listening were included to check for any adaptation over repetitions of the text. Time to read the complete passage and speech dysfluencies were obtained. The types of dysfluency that occurred were phrase, word and part-word repetition, prolongations and word breaks. These are described in Yairi and Ambrose . No phrase revisions or abandonments occurred, probably because the material was read. Four samples were selected at random and reanalyzed by an independent judge and agreement across the judges was assessed. The percentage agreement was calculated as number of cases of agreement of dysfluencies between the first and second judges divided by total instances of dysfluencies indicated by the first judge which were then converted to percentages. Agreement on overall disfluency rate was 92% which represents an excellent level of agreement . Durations were always within 0.5s.
The top part of Table 1 gives the results for the eight individual participants, in the initial reading of the North Wind passage under normal listening (NW1), then reading the North Wind Passage under conditions where speech was interrupted (NW int), and finally reading the North Wind passage under normal listening (NW2). For each of the readings, the overall time to read the text is given (in s) with articulation rates in syllables per second in parentheses (using number of syllables from the target text) and number of dysfluencies produced (Dysfl.). Results for the three females are given first (rows labeled F1-F3) and then for the five males (M1-M5).
It has been reported that speakers who stutter experience fewer problems as they read a passage repeatedly (the adaptation effect). To check for such an effect, overall reading time and dysfluency rate on the North Wind and the Sun passage  between the first and last readings (both spoken under normal listening conditions, NW1 and NW2 respectively in Table 1) were obtained and examined by related t tests. There were no significant differences for time or errors between NW1 and NW2 (no adaptation effects).
The interruption condition was compared by related t tests with each of the recordings made under each of the normal listening conditions for both time and dysfluency rate. For the comparison of the reading under the first normal listening condition with the interruption condition (NW1 and NW int), there were significant differences in both time and dysfluency rate (time - t(7) = 4.005, sig p=.005; Dysfl. - t(7) = 3.789, sig p=.007). Similarly, for the comparison of the reading under the second normal listening condition with the interruption condition (time - t(7) = 2.781, sig p=.027; Dysfl. - t(7) = 3.705, sig p=.008).
The experiment showed that interrupted speech improved fluency in speakers who stutter in terms of both time to read and dysfluency rate. This effect was predicted on the basis of similarities between interruption and DAF and because DAF is known to enhance the fluency of speakers who stutter. Thus, interrupting or delaying speech both have the effect that there are occasions when a person will not receive feedback whilst speaking, and this appears sufficient to enhance fluency in speakers who stutter.
Several of the manipulations outlined in the introduction which temporarily induce speakers who stutter to speak fluently, either involve a temporal signal with no speech content (metronome click, [2, 3, 4]; and visual and tactile rhythmic stimuli [6, 7]) or affect speech timing directly without using auditory feedback (such as regulated breathing ). Timing influences from different sensory inputs and pure motor effects could both operate via a mechanism that deals with sensory-motor integration across modalities. Howell  argued that one possible structure responsible for this is the cerebellum. One of the roles of the cerebellum is to regulate timing and, as seen, controlling timing affects fluency in speakers who stutter. Thus, it is possible that a multimodal sensory-motor integration mechanism in the cerebellum is responsible for voluntary timing changes and for the influences of a variety of rhythmic events on fluency of speakers who stutter.
Other authors (e.g. Kalinowski & Saltuklaroglu ) argue that mirror neurons located in Broca’s area which work with visual and auditory speech inputs might mediate the fluency-enhancing effects for this range of manipulations. This account explains the effects of AAF as a ‘second signal’ that activates the mirror neurons. A limitation is that voluntary timing changes would not result directly from the operation of the neuronal systems. Kalinowski and Dayalu  deal with this by arguing that voluntary timing changes are achieved by different mechanisms from those that involve manipulations that produce fluent speech (like FSF).
To discriminate between the DRH and mirror neuron views, future work will need to determine what timing mechanism are involved under various fluency-enhancing manipulations. Also, alternative perspectives, such as the possible role of feedforward mechanisms, need to be evaluated.
This research was supported by grant 072639 from the Wellcome Trust.