|Home | About | Journals | Submit | Contact Us | Français|
Little research has been conducted on the development of suprasegmental characteristics of vocalizations in typically developing infants (TDI) and the role of audition in the development of these characteristics. The purpose of the present study was to examine the longitudinal development of fundamental frequency (F0) in eight TDI and eight infants with severe-to-profound hearing loss matched for level of vocal development. Results revealed no significant changes in F0 with advances in pre-language vocal development for TDI. Infants with hearing loss, however, showed a statistically reliable higher variability of F0 than TDI, when age was accounted for as a covariate. The results suggest development of F0 may be strongly influenced by audition.
Research on pre-linguistic vocalizations of typically developing infants (TDI) has focused mostly on segmental and syllabic aspects of these vocalizations. Suprasegmental aspects of infant vocalizations have rarely been systematically studied despite their obvious significance to mature speech in conveying meaning, emotional intent, and vocal mechanism maturity.
This lack of research on suprasegmentals has extended to the study of vocalizations of infants with severe to profound hearing loss, hereafter termed infants with hearing loss (IHL). Audition might be predicted to play an important role in suprasegmental aspects of vocalizations because it has been shown to be critical in the development of segmental and syllabic aspects of infant vocalizations (for a review, see Oller, 2000). Our laboratory observations and some published reports suggest that infants with hearing loss produce aberrant phonatory and prosodic characteristics in their vocalizations when compared to TDI (e.g. Rvachew, Slawinski, Williams, and Green, 1999; Clement, 2004). Furthermore, older deaf children have been reported to produce aberrant fundamental frequency (F0) and/or pitch, in their speech (Ryalls, Larouche, and Giroux, 2003). In fact, when older deaf speakers produce utterances characterized by atypical prosody and rhythm, they are often noted to be unintelligible despite the segmental integrity of these productions (Hudgins and Numbers, 1942). These findings highlight the importance of identifying the developmental origins of prosody and to examine the role of audition in the development of prosodic characteristics.
Kahane and Kahn (1984) noted that TDI larynges were not only smaller and weighed less than adult larynges, but also that the cricothyroid muscle in TDI, which is traditionally associated with pitch control, was proportionally larger than in adults when compared to the other laryngeal muscles. Such anatomical differences support the expectation that there should be differences in F0 values for TDI when compared to adult values. And, consistent with this expectation, it is common knowledge that the mean F0, as well as variability of F0 of infants and children, is higher than in adult speakers.
A summary of the available research, to our knowledge, on F0 in TDI is provided in Table I. The principal, although tentative, conclusion from this limited body of research is that TDI have high and variable F0 in the first year of life in comparison with adults. Studies by Delack and Fowlow (1978), Kent and Murray (1982), and Whalen, Levitt, Hsiao, and Smorodinsky (1995), which were fairly comparable to the present investigation in TDI characteristics and had reasonably large sample sizes, are discussed in detail below.
A longitudinal investigation by Delack and Fowlow (1978) followed 10 TDI throughout the first year of life and another nine TDI in the first 6 months of life. The average F0 was found to be 355 Hz. No changes in mean F0 with age were noted, but F0 range increased by 20% up to 6 months of age. Kent and Murray (1982) conducted a cross-sectional investigation of 21 TDI at 3, 6, and 9 months of life and obtained F0 values in the range of 350–500Hz. Average F0 values were 445 Hz at 3 months, 450 Hz at 6 months, and 415 Hz at 9 months of life. Whalen et al. (1995) followed 12 TDI, six from French-learning backgrounds and six from English-learning backgrounds, from 6 to 12 months of age. Extrapolated F0 values from the reported data indicates that these infants had an average F0 of 360 Hz at 6 months of age, 375 Hz at 9 months of age, and 375 Hz at 12 months of age. No significant age or language effects on F0 were noted. It should be noted that Whalen et al. eliminated vocalizations from their analysis sample if their F0 was greater than 700 Hz, which may have reduced the mean F0 values obtained, and may have particularly reduced the mean among the 6 month-olds, who have been reported to produce considerable numbers of vocalizations often termed ‘squeals’, in a very high pitch range (Oller, 1980; Scheiner, Hammerschmidt, Jürgens, and Zwirner, 2002). Also summarized in Table I are investigations by Sheppard and Lane (1968), Laufer and Horii (1977), and Robb and colleagues (Robb and Saxman, 1985; Robb, Saxman, and Grant, 1989). These studies reached similar conclusions about F0 values in TDI, but either had very small sample sizes or participants’ ages extended beyond 1 year of age (i.e. beyond the age range of the present study) and are therefore not discussed in detail here.
The results of the various studies on F0 in TDI provide much-needed perspective on speech development, but they are notable for reporting values that differ substantially. Consider, for example, that all four infants in Laufer and Horii (1977) were reported to have mean F0 values that fell entirely outside the range of values reported by Kent and Murray (1982). Furthermore, Kent and Murray reported mean F0 values that were much higher than any of the other studies (e.g. Delack and Fowlow, 1978), and more than 100 Hz higher than Laufer and Horii. Such differences may have been partly caused by differences in the ages of infants studied, but it may also be that many of the differences were due to variations in procedure of data treatment from study to study. These procedures concern, for example, whether cry-like or squeal-like vocalizations were included in the analyses, whether only the nuclei (as opposed to nuclei and margins) of syllabic-like vocalizations were included, and how the measurements were taken (by visual inspection of waveforms, by automatic pitch extraction, etc.). Reconciling these differences is difficult, because while the procedures were often clearly different, they were in many cases not sufficiently specified to make detailed comparisons useful.
Despite these procedural limitations, the Robb et al. (1989) investigation is noteworthy because it examined the impact of utterance length and vocabulary size in addition to age on the development of F0. No significant effects of these factors were found on average F0 or F0 variability. F0 values ranged from 281–652 Hz and the average F0 value was approximately 400 Hz across different utterance lengths and different vocabulary sizes. Although a previous cross-sectional investigation by Robb and Saxman (1985) yielded significant decreases in F0 variability as a function of age, this finding was not replicated in the Robb et al. (1989) investigation, presumably due to the longitudinal nature of the latter investigation. Thus, language development measures did not appear to be strongly associated with F0 changes.
This finding was not, however, supported in a more recent investigation by Amano, Nakatani, and Kondo (2006). Amano et al. followed three Japanese-speaking typically developing children longitudinally from 0–5 years of age. They noted a decrease in mean F0 of 16 Hz per 12 months of age. In addition, they noted that between-utterance variability reduced with the onset of two-word utterances, whereas within-utterance variability increased with the onset of two-word utterances. These results imply that language development could impact F0 development, but the conclusion is tentative given the small sample size. (This study is not summarized in Table I because the requisite mean F0 values were not supplied in the published article.)
The present study was conducted in an attempt to add to the limited normative database, especially longitudinal, on the development of F0 in the first year of life, while providing substantial specification of procedures of measurement. Further, we planned to match research participants on their status with regard to the robust developmental milestone of canonical babbling, a capability reflecting substantial command over production of speech-like syllables (Oller, 1980; 2000; Stark, 1980). The present study thus evaluated F0 in infants at three points referenced to the onset of canonical babbling. Canonical babbling usually begins between 5 and 10 months of age for TDI. When TDI start canonical babbling, their vocalizations appear to become especially speech-like, and parents respond to these vocalizations as if they were meaningful (Papoušek, 1994). This approach to matching represents a key alternative to matching by age, the approach that has been used in studies cited below. To our knowledge, matching on a pre-linguistic vocal developmental variable has not been previously attempted in a study comparing F0 in IHL and TDI. Developmental matching has been used in one published report using this dataset to study durational characteristics of infant vocalizations (Nathani, Oller, and Cobo-Lewis, 2003).
Studies of F0 in IHL have been limited and somewhat contradictory in terms of results. Some studies have suggested trends towards a higher and more variable F0 than TDI, but others have noted similar F0 in TDI and IHL. Kent, Osberger, Netsell, and Goldschmidt Hustedde (1987) observed that one IHL exhibited higher peak F0 and variable intonation contours when compared to his twin at 8 months of age, but overall, from 8 to 15 months of age, the twin with hearing loss did not have an atypically high F0, although it was variable. Clement (2004) found a higher median F0 value for six IHL, longitudinally studied from 2.5 to 12 months of age, than for six TDI matched by age, and there was also considerably greater variability within IHL when compared to TDI. Differences in F0 values in terms of mean, median, minimum, maximum, range, and variability, between the groups were statistically significant only at 9.5 months and beyond. van den Dikkenberg-Pot, Koopmans-van Beinum, and Clement (1998) combined data from the Clement (2004) investigation with follow-up data on the same infants until 18 months of age. Although an overall higher median F0 value for the IHL than TDI was observed from 6.5 to 13.5 months of age, the difference was not statistically significant at any age from 2.5 to 18 months. Thus, empirical reports to date have suggested that when infants are matched for age, pitch differences between TDI and IHL may be hard to detect.
Studies on F0 in older children with deafness have more consistently noted atypical F0 values than in studies on infants with deafness, but there have been exceptions. For example, in studies by Higgins and colleagues, abnormally high F0 values were recorded in some—but not all—IHL and adults fitted with hearing aids (Higgins, Carney, and Schulte, 1994; Higgins, McCleary, Ide-Helvie, and Carney, 2005). Ryalls et al. (2003) measured F0 values for 10 French-speaking children with an average age of 9 years, 4 months and profound hearing loss. When compared to both typically developing children and children with moderate-to-severe hearing loss, children with profound hearing loss had significantly higher overall average F0 values. Children with moderate-to-severe hearing loss did not differ significantly from their typically developing peers in F0 values (Ryalls and Larouche, 1992).
Thus, although the research appears to suggest a somewhat higher and more variable F0 in the vocalizations of IHL when compared to TDI, such comparisons have been difficult to interpret with certainty in part, again, because of methodological inconsistencies across studies or because of lack of sufficient detail in methodological specification in published articles. As an example of a particular problem of interpretation, the decision to eliminate values over 700 Hz (as in Whalen et al., 1995) would appear to eliminate most utterances of the ‘squeal’ variety, and certainly would eliminate most produced in falsetto register. Such a method would, of course, tend to reduce the mean F0, but it also could tend to diminish differences across groups of infants if either group were particularly inclined to produce squeals or to use falsetto register. Anecdotal reports suggest that individual deaf speakers are more inclined to use falsetto than most hearing speakers. Thus, we conclude that in this case, as in many others, it is important to be specific about measurement procedure in F0 studies. Further, it is particularly important in comparing groups to be sure that the same procedure is applied to both groups.
Because the same archived database was used here as in the Nathani et al. (2003) investigation, only relevant methodological details about the sample and procedures are provided here. Eight full-term TDI and eight full-term infants with severe-to-profound and profound hearing loss (IHL) were matched for level of pre-linguistic vocal development, socioeconomic status, and linguistic background and longitudinally studied. Although gender was approximately balanced in the IHL sample, it was mismatched in the TDI group (seven males and one female). The gender mismatch could have influenced results: Delack and Fowlow (1978) noted some gender differences in the development of pitch between 6–12 months of age in TDI, such that males had lower pitch than females. However, as noted above, results on F0 from prior literature are open to widely divergent interpretation, and, furthermore, this gender effect was not replicated by Ryalls et al. (2003) with older TDI and IHL.
Six IHL had profound hearing losses (i.e. hearing thresholds at 91 dB or higher), one IHL had severe-to-profound hearing loss, and one IHL had a severe hearing loss (i.e. hearing thresholds in the range of 71–90 dB HL). The average age of identification of hearing loss for the IHL group was approximately 13 months (M = 12.88, SD = 7.4) and the average age of amplification was approximately 1.5 months after initial identification (M = 14.25, SD = 8.4). All infants were fitted with hearing and/or tactile aids appropriate to their hearing loss and were enrolled in intensive intervention programmes (oral or total communication). Individual IHL’s specific demographic characteristics, age of identification of hearing loss, degree of hearing loss, age at amplification, use of aids (hearing and/or tactile), and communication programmes can be found in Nathani et al. (2003). None of the IHL had been reported to show progressive hearing losses across the duration of this study.
Vocalizations of TDI were recorded as they played either alone, with a parent, or with the recording assistant. Recordings were made in a sound-treated room with high-fidelity recording equipment (Marantz PMD-221 audiocassette recorder and Bose PM-10 external microphone). Vocalizations from two IHL were recorded under the same conditions and procedures as TDI. Vocalizations from the remaining six IHL were recorded as they interacted with a speech-language pathologist during therapy sessions. Recordings from these six IHL were made using a Sony CFS-720 audiocassette recorder in a quiet room that was designated for speech training.
Three sessions were selected for each infant: pre-canonical, canonical, and post-canonical. The canonical session was the session in which the infant’s canonical babbling ratio (total number of canonical syllables per total number of syllables) was first equal to or greater than .15 (Oller, Eilers, Steffens, Lynch, and Urbano, 1994). The pre-canonical session was recorded approximately 3 months before the canonical session and the post-canonical session was recorded approximately 3 months after the canonical session. For IHL, the teacher-designated or parent-reported age of onset of canonical syllable production or the .15 emergence criterion, whichever occurred earlier, was used to designate the canonical session because IHL have been reported to show especially inconsistent productions across sessions after the onset of canonical babbling (Oller and Eilers, 1988; Eilers and Oller, 1994). Pre-canonical and post-canonical sessions were selected in a similar fashion as those for TDI. Ages corresponding to the selected pre-canonical, canonical, and post-canonical sessions for TDI were approximately 4.34 (SD = 1.19), 7.07 (SD = 0.93), and 9.88 (SD = 1.17) months and for IHL were approximately 24.71 (SD = 11.07), 27.16 (SD = 11.23), and 29.57 (SD = 11.07) months, respectively. Data on individual infants can be found in Nathani et al. (2003).
Vocalizations from the three sessions for each infant were low-pass filtered at 10 kHz and digitized at a sampling rate of 22.1 kHz using the Sound Blaster 16 card and CSpeech speech analysis software package (Milenkovic and Read, 1992) running on an IBM-compatible microcomputer. Amplitude displays (amplitude across time), wide-band spectrograms (300 Hz or 600 Hz analysis filter), and, if necessary, narrow-band spectrograms (60 Hz analysis filter) were generated using CSpeech and TF32 (Milenkovic, 2001).
Vocalizations were excluded from the analysis if they constituted, or were substantially influenced by, crying, laughter, coughing, sneezing, or any other form of vegetative vocalization. On the other hand, vocal differences that could produce substantial variations in F0 (for example squealing or growling) were included in the analysis, as long as they met criteria of measurability (to be specified below). Syllables (both canonical and pre-canonical) within these vocalizations were then identified following procedures outlined by Lynch, Oller, Steffens, and Buder (1995) and Nathani and Oller (2001). Perceptually, the construct of countable beats was used to identify syllables. Thus, the first author intuitively counted the number of beats, or syllables, that was heard within each vocalization. Visual spectrographic and amplitude displays were used to support syllable counts when auditory impressions were not clear. In such cases, occurrences of regions of relatively high acoustic energy produced by phonation, generally less than 500 ms in duration, were taken to constitute syllables. All identified syllables typically contained either a vowel-like nucleus in isolation or a consonant- and vowel-like element in sequence, with either element occurring first in the syllable. A total of 4342 syllables was identified.
Syllables were examined by trained1 judges and excluded from F0 measurements if they fell into one of the following categories:
For all remaining 2168 syllables or 50% of the data, termed measurable syllables, F0 values were initially determined for the vocalic portion at every 25 ms using the automatic autocorrelation algorithm with centre clipping routine available in the CSpeech and TF32 software programs.2 However, for the great majority of syllables (due to the immature phonation of infants), the automatic F0 algorithm was unable to compute F0 values or generated unusually high or low values when compared to adjacent F0 values. Thus, after the automatic analysis, the missing or unusual (presumably artifactual) F0 values were visually determined by an acoustic analyst who counted periods in the waveforms for each of the 25 ms intervals. If 80% or more of F0 values for a syllable could be determined by this combined visual and automatic method, the syllable was considered measurable for F0.
This criterion of measurability was chosen in part to ensure consistency in the data to be reported. It is important, however, to take note of how the criterion restricts the kinds of utterances that were analysed and thus imposes limits on the interpretation of our data. For example, utterances with dysphonation or harshness, or even utterances that were merely breathy, could fail to meet the criterion. Consequently, the comparisons to be provided here between vocalizations of IHL and TDI must be recognized as pertaining to a sub-set of all vocal acts, namely the sub-set that includes relatively consistent modal phonation and transparent pitch characteristics. The present study can offer no comment on the extent to which IHL and TDI might differ in perceived pitch of vocalizations with considerable aperiodicity—it is limited instead to providing a portrayal of IHL and TDI vocalizations that show substantial phonatory periodicity.
The number of measurable syllables for each infant across the three sessions is provided in Table II. Mean F0 values were computed for each individual infant for each session. Inter-syllable and intra-syllable standard deviations in F0 values were also computed. Inter-syllable variability refers to variation in mean F0 values between syllables within a session for each infant. Intra-syllable variability refers to variation of F0 values within syllables.
A second judge coded a randomly selected sample (10% from each infant group) to verify the syllable counts of the primary judge. Cohen’s kappa for interjudge agreement for syllable counts of all infants was .7 (Cohen, 1960; 1968). Kappa values of .60–.75 have been characterized as good agreement by Fleiss (1981). It has also been noted by Nathani and Oller (2001) that, whereas canonical syllables are relatively easy to identify, pre-canonical syllables are more difficult to identify, restricting agreement across observers.
A second judge then examined 10% of the data from each infant group to verify the syllable classifications for F0 measurements (e.g. measurable vs aperiodic) by the primary judge. This second judge also measured the F0 values of measurable syllables identified within this 10% sub-set. Cohen’s kappa for interjudge reliability of syllable classifications (i.e. measurable or aperiodic) was .61. The intraclass correlation coefficient for interjudge reliability of the F0 values obtained for measurable syllables was .89. Syllable F0 measurements of the second judge were, on average, within 3% of the original judge’s measurements, indicating that the procedure was highly reliable. The intraclass correlation coefficient was used because it is specifically designed to assess agreement on continuous variables, such as F0 (Shrout and Fleiss, 1979).
Intrajudge reliability was conducted using the same procedure as for interjudge reliability. Cohen’s kappa for intrajudge reliability of syllable classifications was .7. The intraclass correlation coefficient for intrajudge reliability for F0 of measurable syllables was .97. Syllable F0 measurements of the second judge were, on average, within 3% of the first round of measurements, indicating that the procedure was highly reliable within judges as well.
Data were initially analysed using six separate two-way analyses of variance (ANOVA). The between-subjects variable was group (TDI, IHL), and the within-subjects variable was session (pre-canonical, canonical, and post-canonical). The dependent variables for four of the ANOVAs were the proportion of measurable syllables, mean F0 values for each infant, the standard deviation (SD) of the F0 values within each session (inter-syllable variability), and the SD of the F0 values within each syllable (intra-syllable variability). Individual values for mean F0 and SDs for inter-syllable variability and intra-syllable variability are provided in Table III. The fifth and sixth ANOVAs considered inter- and intra-syllable variability using the coefficient of variation (SD/Mean), rather than the simple SD values, thus separating the variability of F0 from the mean level of F0. Values for dependent variables were weighted according to the number of syllables available for analysis. No main effect of group or session or interaction effect was significant in any of the six ANOVA analyses.
Because the IHL were considerably older than the TDI, there may have been important effects of age or vocal tract maturity on the dependent variables. While research on F0 and F0 variability in infancy and childhood leaves many questions open, as indicated by the review above, it is not in doubt that increasing age produces reduction in mean F0 and its variability. These changes begin early in life, with effects that are expected to be discernible within the range of ages of the participants in the present study. Thus, comparison of IHL and TDI matched on a developmental variable but differing substantially in age, as in the present work, should be statistically analysed in a way that accounts for the role of age in the outcome. Data were, therefore, analysed using analysis of covariance (ANCOVA) with age as the covariate. The ANCOVA method provides an adjustment of values for both IHL and TDI by taking into account the effect of age on each dependent variable (measurable syllables, F0, intersyllable variability, intrasyllable variability, coefficients of variation for intersyllable and intrasyllable variability). The computed linear effect of age for each group based on the available data is used through ANCOVA to adjust dependent variable estimates for each group, yielding predictions for dependent variable values that would have been likely to be obtained had both groups been the same age.
It is important to recognize that while age and level of vocal development (the Session variable in the design) are related, they are not identical, and can have differing effects on the dependent variables. Thus, whereas infants may show, for example, important and reliable changes in F0 with age, they may not show reliable changes in F0 with changes in their status with regard to canonical syllable production (the criterion used in the present study to define level of vocal development) and vice versa. Infants in the present study were matched on the basis of vocal development. The ANCOVA approach specifically isolates level of vocal development (the session variable) from age, and thus provides a basis for a comparison on the dependent variables that is corrected specifically for age effects.
Six separate two-way ANCOVAs were conducted, similar to the ANOVAs. Independent and dependent variables were the same as for ANOVAs. Preliminary analyses for each of the six ANCOVAs evaluating the homogeneity-of-slopes assumption indicated that the relationship between the covariate and the respective dependent variables did not differ significantly as a function of the independent variable of group. Results of individual ANCOVAs were, therefore, valid and are reported below.
Figure 1 displays the unadjusted proportions of syllables that were classified as measurable or aperiodic for the two groups collapsed across the three sessions. Visual inspection of the data in Figure 1 indicates that TDI showed a slightly greater proportion of measurable syllables in comparison to IHL. The main and interaction effects from the age-adjusted ANCOVA for measurable syllables were, however, not significant, with all p-values > .05. Thus, hearing loss (the group variable) did not significantly influence the proportions of syllables classified as measurable and, conversely, as aperiodic; approximately 50% of syllables were measurable in both groups. Furthermore, level of vocal development (the session variable) did not show a statistically significant effect on measurability.
The mean (and standard error) unadjusted and age-adjusted F0 values for the two groups are displayed in Figure 2. The ANCOVA for mean F0 values narrowly missed the traditional significance level of .05 for group, F(1, 48) = 3.89, p=.0545. The main effect of session and the interaction effect were not significant. The means of the F0 values adjusted for initial differences in age were ordered, as expected, between the two participant groups yielding an F0 value nearly 46% higher for the IHL than the TDI (IHL adjusted mean F0 = 438 Hz, TDI = 299 Hz).
The ANCOVA for inter-syllable variability in F0 values, i.e. standard deviation of infant’s F0 values across syllables, was significant for Group, F(1, 48) = 10.2, p<.01. IHL appeared to be more variable for F0 values across syllables than TDI. The strength of the relationship between Group and mean F0 values, as assessed by ω2, indicated that group accounted for nearly 37% of the variance of the dependent variable, when age was adjusted. Neither session nor interaction effects were significant. Mean (and standard error) unadjusted and age-adjusted values for inter-syllable variability between the two groups are provided in Figure 3, and indicate a nearly three-fold higher adjusted variability in the IHL than in the TDI.
The ANCOVA for intra-syllable variability in F0 values, i.e. standard deviation of F0 values within syllables, also revealed a significant effect of group, F(1, 48) = 17.5, p<.01. IHL appeared to be more variable for F0 values within syllables than TDI. The effect size, ω2, indicated that Group accounted for nearly 51% of the variance of the dependent variable, when age was statistically held constant. Neither session nor interaction effects were significant. Mean (and standard error) unadjusted and age-adjusted values for syllable-level variability in F0 for the two groups are also provided in Figure 4, and indicated that variability in the IHL was more than three times than the TDI.
Results from the fifth and sixth ANOVAs that considered inter- and intra-syllable variability using the coefficient of variation (SD/Mean) yielded similar results to the ANCOVAs conducted using simple SD values. The main effect of group was significant for both inter- and intra-syllable variability using the coefficient of variation, F(1, 48) = 6.59, p<.05 and F(1, 48) = 5.17, p<.05, respectively. Neither session nor and interaction effects were significant. Thus, like results obtained with simple SD values, hearing loss (the Group variable) significantly influenced the results in this analysis.
When F0 values were statistically adjusted for age differences, F0 values were significantly more variable for the IHL group when compared to the TDI group. Inter-syllable variability in F0 values and intra-syllable variability in F0 values were significantly different for the IHL than for TDI after age adjustment. Thus, the results provide evidence that even at prelinguistic stages of vocal development, auditory impairment may significantly impact the production of F0. The results for mean F0 indicated higher values for the IHL, but they barely missed the traditional criterion level for significance of the difference.
High mean F0 tends to correspond to high variability in absolute values of F0—this fact is consonant with variability for most measures. The coefficient of variation for F0 (SD/Mean) provides a correction for this inter-relatedness, separating variability from absolute values of F0. Even after this correction was applied and ages were adjusted, the IHL were still projected to have significantly higher values than TDI for inter-syllable coefficient of variation (mean SD=.15 for TDI and .28 for IHL) and for intra-syllable coefficient of variation (mean SD=.04 for TDI and .12 for IHL). Thus, variability in F0 occurred for both relative and absolute measures of variability.
These findings are similar to those of van den Dikkenberg-Pot et al. (1998) and Clement (2004), who found that although mean F0 values were generally higher for IHL than TDI, no statistically significant differences were obtained on mean values in any of the studies. The accumulation of the trend toward higher F0 across studies is, however, suggestive of the possibility that IHL do indeed show particularly high F0. The studies were fundamentally different in how they approached the question of early effects of auditory impairment on F0. The six IHL included in the van den Dikkenberg Pot et al. and Clement studies were considerably younger (2.5 to 18 months of age) than the eight IHL in the present sample. Furthermore, infants in the van den Dikkenberg Pot et al. and Clement investigations were matched for age (but not vocal development), whereas infants in the present study were matched for vocal development (and age differences were statistically adjusted). van den Dikkenberg Pot et al. and Clement followed the infants at much closer intervals and over a longer period of time than in the present study. Despite all of these methodological differences, similar results were obtained. Our overall conclusion is that, although F0 may be somewhat higher in IHL than in INH, these differences may not be robust enough to show up at the group level in small sample studies. Larger sample sizes with infants of different ages are needed to more conclusively determine the patterns.
It is of interest that one IHL in the present study had unusually high F0 values of 529, 804, and 796 Hz across the three sessions, respectively. The existence of a particularly high-pitched infant vocalizer in the sample is consonant with prior observations from our own experience in deaf education (as well as from reports of other researchers, e.g. Clement, 2004), suggesting that occasional deaf speakers have especially high F0. When F0 averages for the IHL group were calculated without this infant, they were 347, 373, and 332 Hz, respectively and, without age adjustment, were much more comparable to TDI values (see Figure 2).
The presence of this outlier infant suggested the possibility that the negative correlation between age and F0 upon which the ANCOVA adjustment depended might have been due entirely to this one infant. If he were the youngest of the IHL group, a misleading correlation would have been a distinct possibility. However, in fact, he was not the youngest, and the negative correlation of age and F0 in the IHL group when he was left out was actually higher (−.64) than when he was included (−.28). When the F0 values for the groups with age adjustment were computed by the ANCOVA method with the unusual infant left out, the mean F0 differences were attenuated. However, even with exclusion of the outlier, statistically significant effects were obtained for Group for F0 variability. It should be cautioned that the group effect is not formally testable for statistical reliability because ANCOVA is not robust with small, unequal sample sizes. Tentatively, we are inclined to support the conclusion that although hearing loss early in life may, in general, produce higher variability in F0 (and perhaps higher mean F0), the effect may be much stronger in some individuals than in the group as a whole. This is a speculation that can be confirmed only through research with larger sample sizes.
Of course, with only eight infants in the sample for each group, individual patterns can play a large role in outcomes and thus generalization to the groups of all IHL or TDI are very speculative. It may be relevant that the two youngest infants in the IHL group were identified with a severe-to-profound hearing loss and aided within the first 3 months of life, whereas the other six IHL were identified with a profound hearing loss between 13–21 months of age and were aided between 14–23 months of age. Given the considerable differences in ages of identification and amplification, it may be possible that these two infants were different in F0 performance than their older peers. Indeed, when the ANCOVA was conducted without these two infants,3 a statistically significant difference in mean F0 was obtained between the two groups, a difference that was only marginally significant if all eight IHL were included in the analysis. Results for inter-syllable and intra-syllable standard deviations mimicked those for the entire dataset, i.e. IHL were more variable than TDI on these measures, but significant results for coefficient of variation variability measures were not obtained for the main effect of Group. These Group effects were not formally testable for statistical reliability after the two infants were dropped, due to small and unequal sample sizes. The point of the illustration is that, while the results of the present analysis indicate suggestive trends, they are clearly limited in generalizability due to small sample size.
There are some additional provisos to the present results. Both spontaneous and imitative utterances were included in the present analysis. Inclusion of imitative utterances may have influenced the measurement of F0 because some investigators have noted that prosodic features of the adult target are one of the first features to be imitated by infants (Lewis, 1936; Kessen, Levine, and Wendrich, 1979; Papoušek and Papoušek, 1989). The extent of direct imitation of prosody by infants is, however, unknown because of confounding variables, e.g. lack of measurement of overall pitch in non-imitative circumstances (Papoušek, and Papoušek, 1989; Siegel, Cooper, Morgan, and Brenneise-Sarshad, 1990; Masataka, 1992). Furthermore, as was noted in the present study, vocalizations that might be construed as imitations are often separated in time or interrupted by other vocalizations or noise from the prior adult vocalization. In addition, there appears to be reciprocal imitation across the adult-infant dyad, making it difficult to determine, in any particular case, who was imitating whom. The possibility of imitative utterances influencing F0, however, cannot be ruled out especially for IHL, because they were older than the TDI and therefore may have been more capable of imitation. Furthermore, six IHL were recorded in the context of treatment sessions with a speech-language pathologist and were encouraged to imitate her vocalizations.
No systematic changes with level of vocal development (the session variable) for mean F0 or F0 variability were observed for either group of infants. This finding is in accord with other available research on this topic (e.g. Robb et al., 1989). As Amano et al. (2006) noted, two or more years of longitudinal observation may be necessary before significant changes in F0 can be detected. When changes in F0 at smaller intervals are observed in studies with small sample sizes (e.g. Kent and Murray, 1982), they may merely reflect day-to-day fluctuations in F0 that have been noted in TDI at younger ages rather than true age differences (Oller, Iyer, Buder, Kwon, Chorna, and Conway, 2007). At the same time, it is clear that reduction of F0 with age does occur (and the negative correlations of age with F0 in the present study for IHL appear to be one indicator of this fact) with time, and it may merely require large sample sizes (large enough to swamp the effects of day-to-day variation) to observe such effects with statistical reliability early in life or during relatively narrow age intervals.
It is also possible that our results were influenced by our choice of F0 analysis measures. In the present study, mean F0 and standard deviation of F0 values were used to capture average F0 and variability respectively. Other investigations have used different measures. For example, Clement (2004) used median F0 and range to investigate the fundamental frequency of TDI and infants with hearing loss. It is possible that measures other than mean F0 and standard deviation might yield different results given that mean F0 is influenced by extreme values. However, Clement noted that results when using mean F0 values were almost identical to those obtained when using median F0. Thus, it is unlikely that our choice of measures used for analysis of F0 significantly impacted the obtained findings.
Given the individual variability and small sample sizes in the present study, future investigation with larger sample sizes is clearly indicated. Individual infants in the IHL group differed considerably in ages at each of the three recording sessions. The range of ages for IHL in the pre-canonical sessions was 8–43 months. Thus, age adjustments for the ANCOVA analyses were based on values from very small numbers of infants at each age. Future analyses with larger sample sizes to obtain more precise age adjustments are clearly desirable.
Nearly half of the data in the present study had to be discarded because of aperiodicity in the signal, either because syllables were entirely aperiodic (e.g. voiceless sounds) or because they failed to satisfy our criterion of having 80% or greater measurable F0 intervals. Because infants are known to use non-modal phonation on a frequent basis, resulting in the presence of a variety of non-adultlike phonation types (termed ‘vibratory regimes’ by Buder, Chorna, Oller, and Robinson, 2008), this 80% criterion may be viewed as especially restrictive in isolating the syllable set used for F0 analysis. Further studies need to evaluate whether expanding the criteria to accommodate syllables with non-modal phonation for analysis leads to differences in results.
Finally, in the present study, infants were matched only on the basis of prelinguistic vocal development; age was statistically adjusted. With the widespread availability of newborn hearing screening, it may be possible to match infants on the basis of both age and level of pre-linguistic vocal development because longitudinal data should be easier in the future to acquire on IHL during the first year of life.
A key question inspiring this work concerns how and why F0 is developed and managed in a circumstance of very limited audition. This round of results is insufficient to answer the key question, but it seems fitting to offer tentative thoughts that may help clarify the kinds of issues that might be illuminated in the future. Our prosaic speculation is that IHL show F0 differences from TDI simply because IHL have trouble with auditory monitoring of vocal activity: they seek to replicate speech as they perceive it, but what they perceive (both from others’ voices and from their own) is dramatically attenuated and is usually filtered and/or distorted in substantial ways. It may be that IHL have only a relatively vague awareness of the appropriate mean F0 for speech-like vocalization. Consequently, their F0 values may vary more widely around the appropriate level than in the case of TDI.
The possibility that mean F0 is indeed higher in IHL needs to be considered given the accumulation of trends across studies, even though differences have only been marginally reliable statistically. To explain the apparently higher F0, a different line of thinking appears to be necessary. Two possibilities (among many) come to mind: First it could simply be that the vocal mechanism is naturally biased in the direction of higher F0 because the high range is less restricted physically, a bias that hearing individuals correct during early development on the basis of auditory feedback. Second, it could be that the tactile and kinesthetic feedback provided by a high F0 is somehow more evident to IHL than that provided by a low F0.
These ideas may help explain group differences, but would be unable to account for the outlier phenomenon (the idea that individual IHL may tend to show especially high F0, out of the range of TDI) that has been observed here and reported in prior experiences in clinical settings and in other research on individuals with deafness. To explain the outlier phenomenon, assuming it is real, it may be necessary to invoke elements of motor development theory (see e.g. Thelen, 1995) that postulate self-organization of stable states of action based on motoric exploration in early life. According to this sort of model, it might be argued that individual infants (through somewhat random exploration) might fall into a stable state of very high F0 vocalization early in life, and then fail to modify that state because auditory feedback proves insufficient to induce the change that would naturally be induced in hearing infants. Of course, this sort of self-organizational model (in the absence of additional constraints) should also predict that some individual IHL might fall into especially low F0 stable states in their explorations. This is an empirical question but, if it is true that F0 is generally higher in IHL than TDI, then some sort of initial biasing toward higher F0 in production would seem necessary in a workable model of the self-organizational style.
The research here is basic, and we see no foundation for new clinical advice as yet. However, consistent with common clinical wisdom, we suspect that the keys to management of F0 control in deafness are similar to the presumed keys in many other domains of potential disordered communication: early diagnosis and intervention. The self-organizational perspective suggesting strong effects of early motoric exploration on the development of stable states of action (in this case in vocalization) suggests indeed that early learning is the learning that has the greatest effects. Development of F0 control may provide a particular case in point, and modification of its trajectory of development may require early treatment.
When matched for level of vocal development, results showed no significant difference in F0 mean or F0 variability between the infants with hearing loss and typically developing infants. When values were adjusted to account for different ages of the two groups, the IHL were projected to have significantly more variable F0 than TDI, and marginally higher mean F0. The data suggest that significant loss of audition appears to influence the development of F0, even at pre-linguistic levels of vocal development, and may be especially influential in individual IHL.
This work has been supported by grants from the National Institutes of Deafness and other Communication Disorders (R01DC006099 to D. K. Oller PI and Eugene Buder Co-PI, with a sub-contract to the University of Georgia, Suneeti Nathani Iyer, PI, and by prior grants R01DC00484 and R01DC01932 to D. K. Oller, PI.). Portions of this work were presented at the 2002 and 2006 American Speech-Language-Hearing Association Conventions in Atlanta, GA, and Miami, FL, respectively. We acknowledge the invaluable assistance of Alberto Centeno-Pulido, Amy Fleming, Jennifer Giesmann, Brooke Gincavage, Erin Reinstein, Mary Staten, Jessica Weed, and Melanie Yahner in the development of pilot procedures and data analysis, and Jun Ye and J. J. Bau in statistical analysis.
1Suitability of syllables for F0 analysis and F0 measurements were ascertained by eight trained listeners, who were all graduate or undergraduate students in speech-language pathology or linguistics. During training, the first author reviewed procedures for F0 measurements with the listeners. Each listener then independently conducted F0 measurements on 50 randomly selected syllables of the available 4342 syllables. If 80% agreement between a listener and the first author was reached on F0 measurements for these 50 syllables, then the listener was considered trained. All listeners achieved 80% agreement on their first round of training. The first author also verified syllable designations and F0 measurements on a randomly selected sub-set (20%) of the actual syllables in each session measured by a listener. If 80% agreement was reached on this portion of the data, F0 values were included. If 80% agreement was not reached for a session, then measurements for all the syllables in that session were repeated by the listener until 80% agreement was achieved.
2Two software programs, Computerized Speech Lab (CSL, 2001) and CSpeech, were initially considered for F0 extraction. A preliminary analysis of 25 syllables selected from the data to meet the requirements of duration (50 ms or greater) and absence of extraneous noise revealed that, whereas 74% of F0 values for these 25 syllables were accurately reported by CSpeech (as verified by visual inspection), only 60% of F0 values were accurately reported by CSL. The main discrepancy between CSL and CSpeech values occurred at frequencies over 450 Hz, at which point CSL reported F0 at approximately half of the correct value. Given these results, CSpeech (and TF32, which is the Windows version of Cspeech) was selected for analysis of F0 in the present study.
3We thank an anonymous reviewer for this suggestion for additional analysis.
Declaration of interest:The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.