|Home | About | Journals | Submit | Contact Us | Français|
This study investigated the acoustic basis of within-speaker, across-utterance variation in sentence intelligibility for 12 speakers with dysarthria secondary to Parkinson’s disease (PD). Acoustic measures were also obtained for 12 healthy controls for comparison to speakers with PD. Speakers read sentences using their typical speech style. Acoustic measures of speech rate, articulatory rate, fundamental frequency, sound pressure level and F2 interquartile range (F2 IQR) were obtained. A group of listeners judged sentence intelligibility using a computerized visual-analog scale. Relationships between judgments of intelligibility and acoustic measures were determined for individual speakers with PD. Relationships among acoustic measures were also quantified. Although considerable variability was noted, articulatory rate, fundamental frequency and F2 IQR were most frequently associated with within-speaker variation in sentence intelligibility. Results suggest that diversity among speakers with PD should be considered when interpreting results from group analyses.
Intelligibility has been defined as the degree to which an individual acoustic signal is understood by a listener (Weismer, 2008). Various metrics are used to assess sentence-level intelligibility in dysarthria including the Assessment of Intelligibility of Dysarthric Speech (AIDS: Yorkston & Beukelman, 1981) or the computerized version of this published test, the Speech Intelligibility Test (SIT; Yorkston, Beukelman, & Tice, 1996). These tests as well as other intelligibility metrics yield a numerical score. However, while numerical scores provide an estimate of the overall speech mechanism involvement in dysarthria (Tjaden & Wilding, 2011a; Weismer, Jeng, Laures, Kent, & Kent, 2001; Weismer, Yunusova, & Bunton, 2012), numerical scores alone do not provide insight concerning production variables contributing to intelligibility. As discussed in the following section, many studies therefore have sought to determine the relationship of acoustic measures to intelligibility.
Numerous acoustic measures have been proposed to contribute to variations in intelligibility in dysarthria including supra-segmental and segmental acoustic measures. The following review focuses on dysarthria studies investigating acoustic measures of interest to this study. Relevant studies of normal speech are also considered. A more comprehensive review of possible acoustic correlates of intelligibility, including consonant spectral measures and rhythm metrics that were not included in this study, may be found in several sources (Kim, Hasegawa-Johnson, & Perlman, 2011a; Liss, Utianski, & Lansford, 2013; Weismer et al., 2012; Weismer, 2008; Yunusova, Weismer, Kent, & Rusche, 2005).
Speech rate has been defined as the number of output units (i.e. syllables) per unit time including pauses, while articulation rate is the number of syllables per unit time excluding pauses (Tsao, Weismer, & Iqbal, 2006). The relationship between measures of global speech timing and intelligibility varies across studies. Some studies report improved speech intelligibility when speech rate is slowed (Hammen, Yorkston, & Minifie, 1994; Yorkston, Beukelman, Strand, & Bell, 1999; Yorkston, Hammen, Beukelman, & Traynor, 1990), while other studies report decreased or no improvement in intelligibility when speech or articulation rate is slowed (McRae, Tjaden, & Schoonings, 2002; Van Nuffelen, De Bodt, Vanderwegen, Van de Heyning, & Wuyts, 2010; Van Nuffelen, De Bodt, Wuyts, & Van de Heyning, 2009). Thus, the relationship between global speech timing and intelligibility appears to be complex.
Prosodic variables, such as sentence-level measures of F0 have also been related to intelligibility. Bunton, Kent, Kent and Duffy (2001) examined sentence-level F0 for speakers with dysarthria. Results suggested that when F0 range was reduced using resynthesis techniques decrements in intelligibility were observed. Studies investigating healthy talkers further suggest a relationship between increased sentence-level F0 range and intelligibility. For example, healthy speakers judged to be more intelligible have been shown to have a greater sentence-level F0 range (Bradlow, Torretta, & Pisoni, 1996; but see Miller, Schlauch, & Watson, 2010). Laures and Weismer (1999) as well as Watson and Schlauch (2008) reported that when natural F0 variations (i.e. F0 contour) were flattened using speech resynthesis, sentence intelligibility was reduced. Studies investigating clear speech and speech produced with increased vocal intensity also suggest that an increased F0 range is associated with higher intelligibility (Picheny, Durlach, & Braida, 1986; Sapir, Ramig, & Fox, 2011; Tjaden, Kain, & Lam, 2014a).
In addition to overall sentence-level measures of F0, F0 measures of adjacent syllables reflecting reduced F0 variation have also been linked to reductions in intelligibility. Liss, Spitzer, Caviness, Adler, and Edwards (1998 Spitzer, Caviness, Adler, and Edwards (2000) investigated the degree to which acoustic manifestations (i.e. phrase duration, F0, amplitude variation and formant frequencies) of reduced syllabic contrast correlated with the lexical boundary errors in dysarthria. Results were interpreted to suggest that reduced syllabic contrasts contribute to reduced intelligibility in PD by hindering a listener’s ability to locate lexical boundaries within an utterance.
Studies have also evaluated the link between SPL and speech intelligibility (Neel, 2009; Ramig, Countryman, Thompson, & Horii, 1995; Tjaden & Wilding, 2004). Ramig, Bonitati, Lemke, and Horrii (1994) and Cannito et al. (2012) reported the effects of Lee Silverman Voice Treatment (i.e. LSVT™) on intelligibility. Results from Ramig et al. (1994) and Cannito et al. (2012) suggested that following treatment, mean SPL increased and intelligibility also improved. The basis for the improved intelligibility is unclear, however. Acoustic and kinematic studies suggest that intelligibility improvements associated with increased vocal intensity could also be related to changes in source characteristics such as an increase in F0 or F0 range (Dromey & Ramig, 1998; Picheny et al., 1986; Tjaden et al., 2014a) or improved segmental articulation (Dromey & Ramig, 1998; Dromey, Ramig, & Johnson, 1995) in the form of an expanded vowel space area (Tjaden & Wilding, 2004) and increased spectral distinctiveness of stop consonants (Tjaden & Turner, 1997). Furthermore, the positive influence of a greater-than-normal vocal intensity on intelligibility may be related to an improved Signal to Noise ratio or audibility, although improved audibility alone does not fully account for improvements in intelligibility associated with an increased intensity (Neel, 2009; Kim & Kuo, 2012). Thus, while an increased SPL may be associated with improved intelligibility, the precise explanation for this relationship is a topic of ongoing study.
SPL modulation is another variable that has been shown to be associated with intelligibility. Bunton, Kent, Kent, and Rosenbek (2000) investigated intensity range of individuals with Amyotrophic Lateral Sclerosis (ALS), Cerebellar Disease (CD) and healthy controls. Significant differences in intensity range were not reported between the control and clinical groups. However, a restricted intensity range tended to be associated with reduced speech intelligibility for the ALS group with moderate intelligibility. Studies of clear speech for normal talkers as well as dysarthria also suggest the importance of energy modulation to intelligibility (Krause & Braida, 2009; Tjaden et al., 2014a). In contrast, other dysarthria studies have found no relationship between intensity range and intelligibility (Kim, Kent, & Weismer, 2011b).
Various measures of vowel segmental production have been shown to be related to speech intelligibility including vowel space area and F2 slope. Vowel space area is generated by plotting the frequencies of the first two formants (F1, F2) (Turner, Tjaden, & Weismer, 1995). In this manner, vowel space area provides an overall estimate of a speaker’s vowel articulatory-acoustic working space, with greater vowel space areas, suggesting a greater difference in articulatory position for vowels (Weismer et al., 2012). Similarly, reduced spectral contrasts among vowels as partially indexed by vowel space area have been reported to be associated with reduced intelligibility (Kim et al., 2011a; McRae et al., 2002; Weismer et al., 2001; but also refer Weismer, Laures, Jeng, Kent, & Kent, 2000).
Studies have also shown a relationship between F2 slope and intelligibility. F2 slope corresponds to the rate of vocal tract shape change for vowels, with shallower slopes associated with slower tongue movement speeds (Yunusova et al., 2012). Recently, Kim et al. (2011b) investigated the relationship between various acoustic measures, including F2 slope and speech intelligibility in speakers varying in dysarthria severity. Results indicated that shallower F2 slopes were associated with poorer speech intelligibility (see also Kim, Weismer, Kent, & Duffy, 2009; Tjaden, Richards, Kuo, Wilding, & Sussman, 2013; Yunusova et al., 2012).
Many of the dysarthria studies reviewed in the preceding section employed a group design, wherein data were pooled across speakers prior to quantifying the strength of the association between a given acoustic measure and intelligibility. Because speakers in these studies typically spanned a range of dysarthria severity, the association between acoustic measures and intelligibility could reflect the influence of a third, mediating variable (see discussion in Kim et al., 2011b; Weismer et al., 2001; Yunusova et al., 2005). Yunusova et al. (2005), however, investigated the acoustic basis of within-speaker variation in intelligibility for speakers with dysarthria secondary to PD and ALS. A novel acoustic measure, F2 interquartile range (F2 IQR) was generated by extracting second formant time histories for all voiced segments within a breath group and calculating the difference between the first and third quartiles. Results suggest that F2 IQR partially explained within-speaker variation in intelligibility for some speakers with PD. The results illustrate the feasibility of the within-speaker approach to identify acoustic variables that explain variation in intelligibility in dysarthria.
Intelligibility is an important construct in dysarthria. Although studies using a group design have made progress toward identifying acoustic variables potentially contributing to intelligibility in dysarthria, our understanding of the acoustic basis of intelligibility in dysarthria is incomplete. Yunusova et al.’s (2005) study demonstrates how a within-speaker, across-utterance design avoids the third-variable effect and thus may help to determine acoustic variables integral to intelligibility. However, additional studies are needed to determine the utility of the within-speaker approach. Thus, this study used a single subject design to investigate potential acoustic variables associated with within-speaker variations in intelligibility in dysarthria secondary to PD. Healthy controls were included for providing a baseline against which to compare acoustic measures for speakers with PD and to aid in interpreting the results. As in Yunusova et al.’s (2005) study, the relationship between intelligibility and acoustic variables was not of interest for healthy speakers. Previous research with healthy speakers suggests a limited range of within-speaker, across-utterance intelligibility variation (Monsen, 1983). Further, the relationship between acoustic measures and intelligibility was also not of interest for healthy speakers due to the prospect of ceiling effects.
Although many potential correlates of intelligibility could be investigated, articulatory rate and measures of F0, SPL and F2 IQR were selected for study because previous research suggests that these measures at least moderately correlate with intelligibility. Care was taken to include a diverse array of segmental and suprasegmental acoustic measures. Further, F2 IQR was used to index vowel segmental production because measures such as vowel space area and F2 slope may vary as a function of overall dysarthria severity rather than variation in intelligibility (Weismer et al., 2001). In addition, F2 IQR also does not require utterances to contain specific phonetic content. For each speaker with PD, the main research questions were whether greater modulations in F0 and SPL as well as relatively greater F2 IQR were associated with greater scaled sentence intelligibility.
Twelve speakers (7 men, 5 women) with idiopathic PD were studied as well as 12 age and sex-matched healthy speakers (7 men, 5 women). Healthy speakers ranged in age from 50 to 77 years (Mean = 63 years; SD = 10 years). Selected perceptual and acoustic characteristics have been previously reported for these speakers, who are part of a larger research project (Sussman & Tjaden, 2012; Tjaden, Lam, & Wilding, 2013; Tjaden, Sussman, & Wilding, 2014b). Characteristics for the PD and control groups are summarized in Table 1. Although five speakers with PD had a history of dysarthria therapy, control speakers reported no history of speech-language therapy. All speakers were native speakers of standard American English and had achieved at least a high school diploma. In addition, participants did not use a hearing aid, had thresholds of at least 40 dB HL in one ear at 1000 and 2000 Hz (Ventry & Weinstein, 1983), and scored 26 or greater on the Standardized Mini-Mental State Examination (Molloy, 1999). Participants with PD reported no history of other neurological disease or neurosurgical treatment.
Procedures for obtaining SIT scores (Yorkston et al., 1996) as well as a scaled speech severity for the Grandfather Passage, an operationally defined perceptual constructs that aims to tap into speech naturalness and prosody, were reported in Sussman and Tjaden (2012) as part of a larger project. Mean SIT scores and scaled speech severity for the Grandfather passage are in included in Table 1 for describing speakers in this study. Mean scaled speech severity for the Grandfather Passage could range from 0 (“not impaired”) to 1.0 (“severely impaired”). Mann–Whitney U tests confirmed a significant difference in SIT scores (p<0.001) and scaled speech severity (p<0.05) for the Control and PD groups. The mildly reduced intelligibility for speakers with PD (i.e. 89%) compared to controls (i.e. 93%) coupled with moderate scale values for speech severity is consistent with mild dysarthria (Yorkston, Beukelman, Stand, & Hakel, 2010). Mean SIT scores and scaled speech severity for the reading passage were also somewhat reduced for the control group. Because the average age of the control group was 63 years, one possible explanation is that these scores represent listeners’ perception of speech characteristics consistent with normal aging adults. In addition, although listeners were blinded to the neurological status of the speakers they were told that some speakers had a neurological diagnosis, which may have biased listeners to judge speech samples as being more impaired. As noted in Sussman and Tjaden (2012), PD speakers were also anecdotally noted to have reduced segmental precision and a breathy, monotonous voice.
Experimental stimuli from which the acoustic and perceptual measures were obtained were 10 sentences randomly selected per speaker from a pool of 25 syntactically and semantically normal IEEE Harvard Psychoacoustic Sentences (IEEE, 1969). Sentences included both declarative and imperatives and ranged in length from 7 to 9 words, with five key or content words.
Speakers were audio-recorded in a sound-treated room reading Harvard sentences (IEEE, 1969) in habitual, clear, fast, loud and slow conditions. For the habitual condition, which was of interest to the current study and was always recorded first, speakers were simply instructed to read sentences in their habitual or typical speaking style. The acoustic signal was transduced using an AKG C410 head mounted microphone positioned at 45–50° angle and 10 cm from the center of the lips. The signal was pre-amplified, low pass-filtered at 9.8 kHz and digitized at 22 kHz using TF32 (Milenkovic, 2002). A 1000 Hz calibration tone was also recorded for use in calculating SPL (Lam, Tjaden, & Wilding, 2012). All speakers were paid $10 per hour. Speakers with PD were recorded approximately 1 h prior to taking anti-Parkinsonian medications.
Methodological details concerning the perceptual task are reported in other studies (Tjaden et al., 2013, 2014b). Readers are referred to these previous studies for a more detailed treatment of this material.
Fifty inexperienced listeners recruited from the University at Buffalo participated. Listeners were native speakers of standard American English, had at least a high school diploma or equivalent, were between 18 and 30 years of age, and reported no history of speech, language or hearing problems. All listeners passed a hearing screening administered at 20 dB HL bilaterally for octave frequencies ranging from 250 to 8000 Hz (ANSI, 1989). Listeners were paid $10 per hour.
As described in Tjaden et al. (2014a, b), listeners judged intelligibility for Harvard sentences using a continuous vertical 150 mm computerized Visual Analog Scale (VAS) (Cannito et al., 1997; Sussman & Tjaden, 2012), with scale endpoints labeled as “Understand everything” (0 = intelligible) and “Cannot understand anything” (1.0 unintelligible).
As speakers with PD had mild dysarthria (Table 1), the experimental stimuli (i.e. 10 Harvard sentences per speaker) were mixed with 20-talker babble (Frank & Craig, 1984) to increase task difficulty and prevent ceiling effects. Various studies of dysarthria (Bunton, 2006; Cannito et al., 2012; McAuliffe, Schafer, O’Beirne, & LaPointe, 2009) and normal speech (Smiljanić & Bradlow, 2009; Uchanski, 2005) have also employed background noise when studying intelligibility. Sentences first were equated for peak amplitude using Goldwave, version 5.1 (2007) and then were mixed with babble at a Signal to Noise ratio of −3 dB. This level was determined via pilot testing to minimize both ceiling and floor effects and has been used in previous studies of normal speech as well as dysarthria (e.g. Ferguson & Kewley-Port, 2002; Maniwa, Jongman, & Wade, 2008; McAuliffe et al., 2009). Stimuli were presented to individual listeners via Sony MDRV300 headphones in a sound treated booth at 75 dB HL for peak vowel amplitude of key words in sentences. Listeners judged each sentence without knowledge of speaker identity or whether a speaker had a neurological diagnosis.
Sentences were pooled and divided into 10 sets. Five listeners were assigned to judge each set. Sentence sets contain one sentence produced by each talker in each condition. Thus, a given listener judged a subset of sentences for each talker. Following a brief practice task, listeners judged the intelligibility. Using a customized computer program, listeners were instructed to place the indicator of the computer mouse on the continuous 150 mm visual analog scale shown on a monitor. Scale endpoints were labeled “understand everything” and “cannot understand anything”. Following completion of the experiment, the computer program transformed the position of the mouse indicator to scores ranging from 0 and 1.0, with lower values indicating better intelligibility. Judgments were averaged across listeners to provide an average intelligibility judgment for each sentence.
Each listener judged a random selection of 10% of sentences twice for determining intrajudge reliability. Pearson product correlation coefficients for the first and the second presentation of sentences ranged from 0.60 to 0.88 across the 50 listeners, with a mean of 0.71 (SD = 0.07). All correlations were also significant (p<0.001). As for the larger project (Tjaden et al., 2014a,b), a conservative approach was taken wherein listeners with intrajudge coefficients less than r = 0.70 was excluded from further consideration. Remaining analyses therefore reflect intelligibility judgments for 29 listeners (Mean intrajudge Pearson r = 0.76; SD = 0.05; Range 0.70–0.88). Interjudge reliability was assessed using the Intraclass correlation coefficient (ICC) following Neel (2009). Average ICCs ranged from 0.63 to 0.91 (Mean = 0.83; SD = 0.09) and were statistically significant (p<0.001). Reliability metrics compare favorably with those reported in previous dysarthria studies employing scaling tasks to measure intelligibility (e.g. Bunton et al., 2001; Kim et al., 2011b; Kim & Kuo, 2012; Neel, 2009; van Nuffelen et al., 2009; Weismer et al., 2001; Yunusova et al., 2005).
Acoustic measures were performed by the first author or a trained research assistant using TF32 (Milenkovic, 2002). Each measure is considered in detail in the following sections.
Sentences were segmented into speech runs and pauses. A speech run was defined as a stretch of speech bound by a silent pause of more than 200 ms (Tjaden & Wilding, 2004; Turner & Weismer, 1993). Standard acoustic criteria were used to identify run onsets and offsets (Tjaden & Wilding, 2004). For each speaker and sentence, articulatory rates were calculated. Speech run durations for each sentence were summed to yield a total articulatory time. The number of syllables for each speech run was also obtained. Syllable counts were summed for each sentence to calculate the articulatory rate in syllables per second. Over 90% of sentences did not contain interword pauses. Thus, measures of speech rate were not obtained.
F0 time histories were generated for each sentence. Computer-generated F0 traces were visually checked for errors and errors were corrected manually on a pitch period by pitch period basis using the combined waveform and wideband spectrographic (300–400 Hz) displays. The first full glottal pulse was used to determine the onset and offset of voiced segments. The F0 trace continued until the last discernible glottal pulse indicative of voicing seen in both waveform and wideband spectrographic displays. For sentences that contained areas of possible diplophonia or glottal fry, the lowest frequency was marked for inclusion in the pitch traces and was included in all F0 calculations. These segments were observed in about half (i.e. about 60 sentences) of the total sentences produced by all speakers with PD. Voiced speech segments, which could not be identified by periodic voicing, were omitted from all F0 calculations. Time histories were extracted into Excel for conversion from Hertz to Semitone (de Pijper, 2007). Mean F0, F0 SD (i.e. the standard deviation of data points from mean F0) and F0 Range (i.e. 90th percentile minus 10th percentile; Tjaden & Wilding, 2010) for each sentence were calculated. Two measures, F0 Range and F0 SD, were included to index F0 variation following previous studies (Bunton et al., 2000; Laures & Weismer, 1999). In addition, a slightly curtailed F0 Range (i.e. 90th percentile minus 10th percentile) may be more representative of a person’s ability to consistently modulate F0 because an absolute F0 Range metric is highly dependent on the two extreme values in a data set.
Mean SPL and SPL SD were used to index vocal intensity and intensity modulations. A Root-Mean-Squared (RMS) voltage trace was generated for all speech runs in a given sentence. RMS traces voltages were exported to an Excel file and converted to dB SPL using the calibration tone for each speaker to calculate Mean SPL. SPL SD was exported to Excel directly from the RMS trace voltage for each sentence.
Segmental integrity for vocalic segments was indexed using F2 IQR (Yunusova et al., 2005). This measure is suggested for use in within-speaker, across utterance analyses because, unlike most commonly used measure of the vowel space area, F2 IQR does not rely heavily on the phonetic context of an utterance (Yunusova et al., 2005). Linear predictive coding (LPC) formant trajectories for F2 were generated for all vowels, liquids, and glides and manually corrected for errors. Voiced segments were identified using voicing energy from the wideband spectrographic display (Tjaden & Wilding, 2010). Nasalized vowels were included, but nasal consonants (e.g. /m/) were excluded if segmentation could be achieved from the surrounding acoustic signal based on observed changes in spectrographic energy such as from the nasal resonance. F2 time histories were imported into Excel and F2 IQR was calculated by subtracting the 1st quartile from the 3rd quartile over each sentence.
Intra-judge and inter-judge reliability of acoustic measures were calculated for approximately 10% of the stimuli (10 sentences × 24 speakers = 240 total sentences). Twenty-four sentences were randomly selected (i.e. one sentence from each speaker) to be re-measured by the first author or a trained research assistant. Average absolute measurement error and standard deviation were used to measure reliability. Pearson product moment correlation coefficients were also obtained (Tjaden, 2003; Tjaden & Wilding, 2004).
For intra-judge reliability, the absolute average measurement error and standard deviations were 0.03 syll/s (0.05 syll/s) for articulatory rate. The absolute average measurement error and standard deviations in semitones were 0.25 (0.26), 0.26 (0.24) and 0.62 (1.15) for mean F0, F0 SD and F0 Range, respectively. The absolute average measurement error and standard deviation for F2 IQR were 0.02 kHz (0.03 kHz). Finally, the absolute average measurement error and standard deviation for mean SPL and SPL SD were 0.11 dB (0.23 dB) and 0.10 dB (0.20 dB), respectively. For all acoustic measures, the Pearson correlations were greater than 0.96.
For inter-judge reliability, the absolute average measurement error and standard deviations were 0.05 syll/s (0.10 syll/s) for articulatory rate. The absolute average measurement error and standard deviations in semitones were 0.72 (1.24), 0.47 (0.42) and 0.87 (1.51) for mean F0, F0 SD and F0 Range, respectively. The absolute average measurement error and standard deviation for F2 IQR were 0.03 kHz (0.04 kHz). Finally, the absolute average measurement error and standard deviation for mean SPL and SPL SD were 0.11 dB (0.18 dB) and 0.16 dB (0.36 dB), respectively. For all acoustic measures, the Pearson correlations were all greater than 0.91.
Levene’s test for homogeneity of variance was used to assess the equality of variance among samples for each acoustic measure. For many of the acoustic variables, this assumption was not met. Therefore, non-parametric Mann–Whitney U tests were used to compare acoustic measures for the two groups. Spearman rank order correlations were used to evaluate the strength of association between the various acoustic variables and scaled sentence intelligibility within each PD speaker. A non-parametric model was used due to the number of observations (n = 10). As suggested by Cohen (1988), relationships between variables exhibiting a correlation of ±0.30 or stronger have a moderately strong association. Thus, moderate correlations among acoustic measures and intelligibility are considered meaningful in this study, regardless of whether correlations met standard criteria for statistical significance (p<0.05). Similar magnitudes of correlations between acoustic variables and intelligibility have been interpreted to be meaningful in past dysarthria studies (Kim et al., 2011b; Tjaden & Wilding, 2011b; Weismer et al., 2001; Yunusova et al., 2005, 2012). In addition, only one other published dysarthria study has investigated the acoustic basis of within-speaker variation in intelligibility. Thus, a conservative approach to data interpretation was deemed appropriate so as not to miss potential trends of interest. Correlation analysis was also used to investigate the relationship among all acoustic measures for the PD group.
Descriptive characteristics for scaled judgments of intelligibility of Harvard sentences for individuals with PD are summarized in Table 2. Average scaled sentence intelligibility for the PD group was 0.47 (SD = 0.14). For comparison, the average scaled sentence intelligibility for the 12 control speakers was 0.28 (SD = 0.10). A Mann–Whitney U test indicated that this difference was significant (p<0.001). In addition, Table 2 indicates that all speakers with PD demonstrated a range of sentence intelligibility for Harvard sentences such that some Harvard sentences were judged more intelligible than others within each speaker with PD.
Figure 1 shows group means and standard deviations for acoustic measures. On average, the PD group had slightly faster articulatory rates (i.e. upper left panel) and reduced F2 IQR (i.e. upper right panel) as well as greater across-speaker F2 IQR variability (SD = 105.02 Hz) compared to the control group (SD = 45.62 Hz). As indicated by the height of the standard deviation bars (i.e. lower left panel), the PD group also tended to have a greater F0 Range (8.32 semitones), on average, compared to the control group (6.81 semitones). On average, F0 SD was slightly greater for the PD group (3.61 semitones) relative to the control group (3.36 semitones). Finally, the PD group demonstrated a modest reduction in Mean SPL (M = 71.87 dB, SD = 2.36 dB) compared to the control group (M = 72.97 dB, SD = 3.67 dB). SPL SD was also reduced for the PD group (8.29 dB) relative to controls (8.72 dB). Despite these descriptive differences, Mann–Whitney U tests indicated that acoustic measures for the PD and control groups were not statistically different.
Table 3 reports Spearman rank order correlations reflecting the strength of association between acoustic variables and scaled sentence intelligibility for each speaker with PD. Speakers are listed from least (PDM07 = 0.68) to most (PDF08 = 0.21) intelligible, as indexed by average scaled intelligibility for Harvard sentences. The bold text indicates correlations of ±0.30 or stronger and for completeness; asterisks indicate statistically significant correlations (ρ<0.05). Except for PDF08, at least one acoustic variable was at least moderately correlated with scaled sentence intelligibility for all speakers.
As reported in Table 3, articulatory rate was at least moderately correlated with scaled sentence intelligibility for five of the 12 speakers. The absolute magnitude of correlations ranged from 0.333 to 0.794. The direction of the effect varied across speakers.
Mean F0 was moderately correlated with scaled sentence intelligibility for six of the 12 speakers. The absolute magnitude of correlations ranged from 0.333 to 0.806. The direction of the relationship varied across speakers. As indicated in the upper left panel of Figure 2, greater Mean F0 was correlated with better scaled intelligibility for PDM01 (ρ= −0.503) while the upper right panel indicates lower Mean F0 was correlated with better scaled sentence intelligibility for PDM03 (ρ = 0.636).
F0 SD was at least moderately correlated with scaled sentence intelligibility for five of 12 speakers. The absolute magnitude of correlations ranged from 0.321 to 0.515. With the exception of one speaker (PDM01) the direction of the relationship was such that lower values of F0 SD were associated with better intelligibility. Similarly, correlation analysis indicated at least moderate associations between F0 Range and scaled sentence intelligibility for five of the 12 speakers. The absolute magnitude of correlations ranged from 0.321 to 0.527. The direction of the relationship was the same for all five speakers such that reduced or restricted modulations in F0 were associated with better intelligibility.
Mean SPL was at least moderately correlated with scaled sentence intelligibility for one speaker. A lower mean sound pressure level was associated with better scaled sentence intelligibility. SPL SD was also at least moderately correlated with scaled sentence intelligibility for three of 12 speakers, including PDF01 (ρ = 0.370), PDM04 (ρ = −0.571), and PDM02 (ρ = 0.366). SPL SD and intelligibility for these speakers were also modestly correlated, but the direction of the relationship differed across the three speakers. For example, as indicated in the lower left panel of Figure 2, greater sentence-level variation in SPL was associated with better intelligibility for PDM04 while the lower right panel indicates greater sentence-level variation in SPL was associated with reduced intelligibility for speaker PDM02.
F2 IQR was at least moderately correlated with sentence intelligibility for five of the 12 speakers. The absolute magnitude of correlations ranged from 0.321 to 0.709. The direction of the effect was such that reduced F2 IQR was associated with better intelligibility for four speakers. For one speaker, a greater F2 IQR was associated with higher intelligibility.
The strength of association as well as the direction of the relationship between most acoustic measures and intelligibility varied. Measures of articulatory rate, fundamental frequency and F2 IQR emerged as possible acoustic variables contributing to within-speaker variations in sentence intelligibility. With the exception of F0 Range, the direction of the relationship varied. For about half of the speakers, articulatory rate, measures of fundamental frequency and F2 IQR were at least moderately correlated with sentence intelligibility. Although the direction of the relationship varied, SPL SD was at least moderately correlated with sentence intelligibility for three speakers with PD. In sum, half of the speakers had three moderately strong correlations while three additional speakers had one or two moderately strong correlations.
Table 4 reports correlations among acoustic variables in the PD group. This table suggests significant associations among most of the acoustic variables when data were pooled across speakers. Overall, Mean SPL tended to be more strongly correlated with other acoustic variables while articulation rate, SPL and F0 modulation, as indexed by both standard deviation and range, tended to be less strongly correlated with other acoustic variables.
Due to the modest strength of association between acoustic measures and within-speaker variation in intelligibility, a qualitative post-hoc analysis was undertaken to explore whether word frequency might be contributing to within-speaker variation in intelligibility. Recall that 10 sentences were randomly selected for each speaker from this larger pool of 25 for use in performing acoustic measures and scaled judgments of intelligibility, as previously described. Thus, each of the 25 Harvard sentences was represented in the post-hoc analysis, albeit not for each speaker. Figure 3 shows mean judgments of intelligibility for the 29 listeners for each of the 25 Harvard Sentences. In Figure 3, Harvard sentences are indicated by sentence number in descending order on the x-axis. With the exception of sentences 15, 20 and 21, all sentences were included in the subset of 10 sentences for at least three different speakers with PD. Figure 3 shows that sentence 2, 11 and 23 were judged to be relatively more intelligible while sentences 14 and 18 were judged to be relatively less intelligible compared to the other sentences. Word frequency data (Brysbaert & New, 2009) further suggest that the keywords in sentences 14 and 18 occur less frequently (e.g. “dune” occurs 51 times) than the words that comprise sentence 2, 11 and 23 (e.g. “box” occurs 4577 times) out of 51 million words based on American subtitles.
To further explore whether word frequency contributed to the results of the correlation analyses in Table 3, the within-speaker correlation analyses were repeated excluding sentences 2, 11, 23, 14 and 18. The two analyses yielded similar results. The absolute strength of association among acoustic measures and intelligibility for the initial and repeated analyses was 0.484 and 0.509, respectively. Mean SPL and SPL SD emerged as additional acoustic variables contributing to within-speaker variations in sentence intelligibility for about half of the speakers with PD compared to the initial analysis summarized in Table 3. In contrast, less than three speakers in the original analysis exhibited at least a moderate correlation between sentence intelligibility and measures of SPL. The direction of the relationship also varied. However, in sum, measures of SPL were the most affected.
The purpose of this study was to investigate potential acoustic variables associated with acrossutterance, within-speaker variation in intelligibility for speakers with mild dysarthria associated with PD. The strength of association as well as the direction of the relationship between most acoustic measures and intelligibility varied. Before further discussing these results and their implications, acoustic characteristics of speakers with PD relative to controls are considered.
While differences in acoustic measures for the PD and control groups were not statistically significant, the PD group demonstrated trends consistent with previously published acoustic studies (Dromey, 2003; Weismer et al., 2001; Yunusova et al., 2005). That is, on average, the PD group had a faster articulatory rate, reduced mean fundamental frequency, lower mean intensity level and reduced F2 IQR compared to the control group (Figure 1). SPL modulation (SPL SD) also tended to be reduced for the PD group, while F0 variation (F0 SD, F0 Range) tended to be greater for the PD group. The lack of statistically significant differences may be explained by the modest sample size and the fact that the PD group had mild dysarthria (Table 1). As discussed below, a trend toward greater F0 Range and F0 modulation for speakers with PD as a whole may be explained by voicing instabilities.
The direction of the relationship between sentence-level F0 modulation and scaled sentence intelligibility varied across the speakers with PD (Table 3). However, for most speakers, reduced F0 SD (i.e. 4 of 5 speakers) and F0 Range (i.e. 5 of 5 speakers) were associated with higher intelligibility. This finding is at odds with at least some previous studies (Bunton et al., 2001; but see Miller et al., 2010). One explanation for the trend toward reduced F0 modulation to be associated with better sentence intelligibility is the occurrence of low frequency vocal instability. For example, phonatory instability in the form of glottal fry and diplophonia suggesting an extreme lower F0 Range was observed in four of 12 female speakers with PD. For example, sentence one produced by PDF05 had the greatest F0 Range (14.94 semitones) and the lowest F0 10th percentile value (75.49 semitones). Importantly, for this speaker, average F0 10th percentile values were lower compared to controls while F0 90th percentile values were similar across sentences and consistent with age and sex-matched controls. This trend held for both female speakers with PD for whom results yielded at least a moderately strong relationship between metrics used to index F0 modulation and intelligibility.
In contrast, sentences with a reduced F0 Range for men that were judged to have better intelligibility appeared to be explained by restricting higher vocal frequencies (i.e. F0 90th percentile). This trend held for all male speakers with PD (i.e. 3 speakers) for whom results yielded at least a moderately strong relationship between F0 Range and intelligibility. For example, for PDM01, sentence 23 was judged to be more intelligible, had the most restricted F0 Range (4.17 semitones) and one of the lowest F0 90th percentile values (85.70 semitones). For this speaker, F0 10th percentile values were consistent with age- and sex-matched controls. Similar to previous studies, these findings suggest that F0 modulation aberrancies may manifest differently for men and women in PD (Hertrich & Ackermann, 1995; Holmes, Oates, Phyland, & Hughes, 2000; MacPherson, Huber, & Snow, 2011; Skodda, Rinsche, & Schlegel, 2009; Skodda, Visser, & Schlegel, 2011).
Most speakers with PD were judged to have the best intelligibility for sentences with reduced F0 modulation. Although the underlying reason for the varied results for F0 modulation is still unknown, synthetically enhanced F0 contours have been reported to increase vowel intelligibility for speakers with PD (Bunton, 2006). Results from the current study together with Bunton’s (2006) study suggest that additional perceptual studies investigating the effect of increased F0 Range attributable to low frequency instabilities are needed given that voice disruptions or instabilities are a common feature of dysarthria (Bunton & Weismer, 2002; Darley, Aronson, & Brown, 1969; Kent, Kent, Duffy, & Weismer, 1998).
Finally, Mean F0 was moderately correlated with scaled sentence intelligibility for six speakers (Table 3). For two of the six speakers, a lower Mean F0 was associated with better scaled sentence intelligibility. However, in Dromey and Ramig’s (1998) study, higher Mean F0 was associated with better scaled sentence intelligibility. It is also important to note that higher Mean F0 was associated with better intelligibility for two female speakers, while for male speakers the relationship varied. Aside from the large number of acoustic variables correlated with Mean F0 (Table 4), no obvious explanation emerged for why higher Mean F0 was associated with better intelligibility for some speakers, while for other speakers higher Mean F0 was associated with lower intelligibility. Thus, sentence-level mean F0 is likely linked to other acoustic variables and independently may not be a reliable indicator of within-speaker variations in scaled intelligibility for speakers with mild dysarthria in PD.
SPL modulation and intelligibility were moderately associated for three of 12 speakers, but the direction of the relationship varied (Figure 2). On average, two of the three speakers had greater mean scaled intelligibility (Table 1) and similar SPL SD (8.56 and 8.11 dB). Results suggest that the different relationships between SPL SD and sentence intelligibility cannot be attributed to overall differences in SPL modulation. Thus, the relationship between SPL modulation and within-speaker variation in scaled sentence intelligibility appears to be complex. Since speakers in the current study had mild dysarthria, studies investigating more severely impaired speakers are also needed as the results may differ for more severe dysarthria (Kim et al., 2011b; Weismer et al., 2012).
For one speaker (Table 3), sentences characterized by a reduced Mean SPL were associated with better scaled sentence intelligibility. Using a slightly different approach, Kim and Kuo (2012) reported that increased intensity levels for sentences produced by healthy controls were associated with a significant decrease in scaled intelligibility. Thus, the current findings may reflect the fairly preserved sentence-level intelligibility for speakers with mild dysarthria in PD. Although further research is needed to investigate the relationship between Mean SPL and sentence intelligibility, it appears that at least for some speakers with dysarthria, an increased SPL may not be associated with improved intelligibility (Cannito et al., 2012; Ramig et al., 1994).
Five speakers with PD had at least a moderate relationship between F2 IQR and scaled sentence intelligibility, but the direction of the relationship varied. PDM06 had a relatively restricted F2 IQR (395.05 Hz) compared to PDM08 (536.70 Hz). However, for both speakers a reduced F2 IQR was associated with better scaled sentence intelligibility. Thus, an explanation was not obvious for why reduced F2 IQR tended to be associated with better scaled sentence intelligibility for these speakers. F2 IQR is presumed to reflect diminished vocalic segmental integrity. Although vocalic integrity may have been reduced for some speakers with PD, listeners may have perceived the intended message as intelligible because of a relatively slower speaking rate for these speakers. For example, on average, PDM06 produced about three syllables per second relative to other speakers with PD that averaged four to five syllables per second. Thus, speakers for whom a reduced F2 IQR was associated with better intelligibility tended to use a slower than normal rate compared to other speakers with PD.
Aside from a limited number of data points per speaker, the equivocal findings for F2 IQR may also be explained by the limited variation in F2 IQR. That is, despite the fact that a reduced F2 IQR tended to be associated with better intelligibility, speakers with PD had relatively restricted F2 IQR compared to healthy controls. With the exception of one speaker with PD, F2 IQR tended to vary between 400 and 550 Hz. F2 IQR for healthy controls varied between 500 and 650 Hz. Thus, it is possible that the variation in F2 IQR was not of sufficient magnitude for a relationship to intelligibility to emerge. Future studies are needed to investigate the relationship between segmental integrity and variations in scaled sentence intelligibility for individuals who naturally produce more sentence-to-sentence variation in F2 IQR.
Articulatory rate was at least moderately correlated with scaled sentence intelligibility for five speakers. However, the direction of the relationship varied. These findings as well as those from past studies suggest that the relationship between global speech timing and intelligibility is complex (e.g. McRae et al., 2002; Van Nuffelen et al., 2010; Weismer et al., 2000; Yorkston et al., 1999). Speakers for whom increased articulatory rate was associated with better intelligibility tended to have slower mean articulatory rates (i.e. approximately three syllables per second). Speakers for whom decreased articulatory rate was associated with better scaled sentence intelligibility (e.g. PDM03), tended to have faster mean articulatory rates (i.e. approximately 3.52–5.06 syllables per second). Thus, the direction of the relationship may have varied because of differences in habitual speaking rates.
Experimental sentences were somewhat heterogeneous with respect to phonetic content, syntax, semantics and word frequency. A qualitative post-hoc analysis showed that some sentences were consistently judged to be more intelligible, while others were consistently judged to be less intelligible (Figure 3). Sentences judged to be less intelligible were comprised of words that occurred less frequently in American English (Brysbaert & New, 2009). Thus, listeners may have been less able to use semantic knowledge to recover the intended message for these sentences. Linguistic variables and by inference, the semantic predictability of sentences, can contribute to intelligibility (Hustad, 2007; Hustad & Beukelman, 2001; Tjaden & Wilding, 2010; Yunusova et al., 2005). Thus, word frequency trends may also help to explain some of the within-speaker variation in intelligibility for speakers with mild dysarthria. Indeed, when analyses were repeated excluding sentences that contained words that were consistently judged to be more or less intelligible, results suggested that word frequency was not a factor contributing to the overall absolute strength of the associations, but may mask some acoustic variables such as measures of SPL that may contribute to within-speaker variations in sentence intelligibility.
Several factors should be kept in mind when interpreting results. The limited number of statistically significant correlations may be related to the acoustic variables studied. Therefore, in addition to larger numbers of data points per speaker, future studies should also investigate the relationship of other acoustic measures to within-speaker variation in intelligibility. For example, spectral measures of consonants as well as rhythm metrics could be studied as these types of measures have been shown to be associated with intelligibility (Chen & Stevens, 2001; Kewley-Port, Pisoni, & Studdert-Kennedy, 1983; Liss et al., 2009; Kim et al., 2011a). Most speakers in the current study also had mild dysarthria. Results may differ for individuals with more severe dysarthria. However, studies of mild dysarthria are still relevant because even mild dysarthria can have major implications for maintaining employment, social relationships and quality of life (Yorkston et al., 2010). Speakers with mild dysarthria can also benefit from treatment aimed at maximizing speech intelligibility and efficiency (Yorkston et al., 2010). In addition, the inherent nature of the speech task may have constrained the range of variation in both acoustic and perceptual measures, thus contributing to the modest association between acoustic measures and intelligibility. The nature of the scaling task also may have resulted in speech or voice qualities other than intelligibility (i.e. naturalness, clarity) influencing perceptual judgments. Finally, judgments of scaled intelligibility for Harvard sentences were obtained in background noise (i.e. 20-talker babble noise). Preliminary evidence suggests that the acoustics of intelligibility in background noise for speakers with dysarthria may differ from quiet (McAuliffe, Good, O’Beirne, & LaPointe, 2008; McAuliffe et al., 2009). Thus, findings may not generalize to other perceptual environments.
In conclusion, results suggest that a within-speaker analysis shows promise for identifying acoustic variables likely contributing to variations in sentence intelligibility for speakers with PD. Certain acoustic variables, especially articulatory rate, fundamental frequency and F2 IQR show promise for capturing within-speaker variation in intelligibility at least for speakers with mild dysarthria secondary to PD in the current study. One possible clinical implication is that the current speakers with PD may benefit from therapy focused on the underlying acoustic variable associated with intelligibility. Finally, the fact that findings appear to be at odds with at least some previously published studies emphasizes the heterogeneity among speakers with PD. This diversity suggests that pooling data across speakers may mask the contribution of some acoustic measures to variations in scaled speech intelligibility. Although additional studies are needed, results suggest care is warranted when interpreting results from group analyses until there is a better understanding of the acoustic basis of intelligibility.
We thank Jessica Sam and Helena Rosenstrauch for assistance with various aspects of this study. We also thank Dr. Christina Kuo for her contribution on portions of this study, and Dr. Stathopoulos and Dr. Higginbotham for comments on previous versions of this document.
Declaration of interest
The authors report no conflict of interest. This research was supported by R01DC004689.