We sought to determine the feasibility of using human speech samples that were collected and recorded over the telephone for voice acoustical research. The results should be considered preliminary, due to the limitations of a two subject descriptive design, the small sample size of recordings, and the use of only a single healthy male and a single healthy female. Still, these initial findings demonstrate that quite reliable and accurate measurements can be made whether the acoustical signal is collected over the telephone line or under more ideal laboratory conditions. This was true for measures of speaking rate, pitch variability, pitch range, and VOT, all of which are considered to be useful metrics in the study of affective and dysarthric speech and voice profiles.
The current study is based on an earlier comparative study conducted by our laboratory, in which international telephone calls were remotely and locally recorded as the subject performed vocal and speech exercises similar to those used in this study (Cannizzaro, & Snyder, 2003
). Recordings were made from three locations in Western Europe and yielded promising results. While this initial foray served as the inspiration for the current study, we sought to remove certain design flaws, such as inconsistencies in telephones and the modes of transmission (e.g., satellite, differing phone companies), by employing a single telephone with known transmission characteristics in a more controlled setting.
The measurement accuracy of the two time-dependent measures of interest, speaking rate and voice onset time, are reliant on accurate representation and visualization of the speech signal as a complex waveform and as a spectrogram. As long as visible landmarks, such as a plosive burst or the initiation of periodic voicing are clearly apparent during the analysis, the measurement should be as accurate as the skill level of the person performing the analysis. In all of the comparisons made in this experiment, there were no difficulties in performing these analyses, as the signals of interest were, without exception, clearly discernable from the surrounding signal. This is readily apparent by the close agreement of the measurements we were able to make across a variety of tasks in both the RR and LR conditions. These findings are further supported by the high levels of inter-rater and intra-rater reliability found for both conditions across the tasks of interest.
The accurate quantification between the two frequency related measures of interest, pitch variation and pitch range in Hertz, is a little more complicated given the known characteristics of the telephone. Telephone frequency response is generally reported to be within the range of 300 Hz to 3000 Hz acting essentially as a band pass filter rejecting higher and lower frequency transmission (Kent, & Read, 2002
, p. 78). In an initial test of the IVR system, we have found an accurate frequency portrayal of the telephone transmission between 250 Hz to approximately 3200 Hz with a steep roll off in frequency response outside of these parameters (Cannizzaro, & Snyder, 2003
). Given these qualities, the measurement of frequencies beyond these boundaries, in the interest of this study below 300 Hz, is dependant on the algorithms used in the analysis. For this reason, the periodicity-to pitch-autocorrelation function in Praat was used as it is not dependant on the actual presence of a fundamental frequency to make this measurement. Autocorrelation uses the repeat length of the waveform to determine the fundamental period (P. Boersma, personal communication, November 12, 2003). That is, the lowest common frequency in a complex periodic wave with equally spaced harmonics will generate a repeat pattern at that frequency. Since the voice fundamental frequency is generated by the periodic vibration of the vocal folds, and each harmonic is a simple multiple of that fundamental, the lowest common repeat pattern is at the same rate as the fundamental frequency of the voice. As evidenced by our findings, even a wide range of frequency excursion can be accurately assessed in telephone recordings.
While our initial findings are quite promising regarding the use of the telephone to collect acoustic data, there are some cautions that should be addressed. We chose our measures carefully based on our initial testing of the IVR system with speech and noise signals. No measures of intensity were performed due to the difficulty in accurately calibrating and equating the decibel level of the mouth to telephone and mouth to recorder levels. At the present time, this leaves out a number of measures that may hold potential value for a future clinical trial using voice acoustic data. Also, the clarity of our voice recordings was consistent as only one telephone was used in this experiment. The use of lower quality telephones and other modes of transmission (e.g., cellular or satellite) would also need to be explored before they could be deemed suitable for use in large-scale data collection.
Overall, we were encouraged by these early findings. Future investigations of larger groups of subjects with suspected and diagnosed CNS disorders, as well as healthy controls, will ultimately advance the feasibility of telephone recording of acoustic data. Similar designs that utilize simultaneous LR and RR recordings in future studies can only serve to further our understanding employing these techniques in large-scale randomized clinical trials.