This study builds upon a growing body of literature demonstrating structured IVR clinical interviews produce reliable and valid measures of depression severity (e.g., Kobak et al., 2000
; Rush et al., 2005
). The results also contribute to the emerging research relating vocal acoustics metrics and depression (e.g., Alpert et al., 2002
), and are consistent with previous results regarding the effect of depression on speaking rate, speech pauses, and pitch variability (Cannizzaro et al., 2004
). The data obtained and analyzed in this study also clearly establish the feasibility of automating procedures for speech collection using telephone transmission lines to obtain reliable and valid voice acoustic measures related to depression severity and clinical response to treatment. Methodological constraints on task implementation were also identified.
Not surprisingly, the telephone device used by study subjects to transmit speech data influenced the variability and sensitivity of the acoustical measures obtained. It is not surprising that the quality and reliability of the voice acoustics data were influenced by the variety of telephones used to provide speech data from outside the office. There was no evidence that the depression measures obtained from touch-tone responses to clinical questions were influenced by the calling location. Clearly, future research on voice acoustic measures obtained over the telephone must choose measures and metrics carefully that can be reliably captured by the telephone system utilized.
The study results obtained are consistent with previous research (Cannizzaro et al., 2004
). Measures of pitch variability and speech production were significantly influenced by depression severity and clinical response to treatment. The results also suggest additional avenues of inquiry that should be addressed by future research.
This study found the COV about F2 to correlate significantly with overall depression severity, but the measure had little reliability across speech samples recorded within the same call. The COV about F0 showed better internal consistency across speech samples obtained within a given telephone call, but did not correlate with overall depression severity. Within-subject change in the F0 COV over time, however, was correlated with clinical change in response to treatment. More research is needed to disentangle pitch variability in speech, automated data extraction methods, telephone instrumentation and transmission bandwidths, and their relationship to depression.
The relationships between depression and time-based measures of speech production, such as total speaking time and the interplay between vocalization and pausing, are more definitive and readily interpreted. Data from this study indicate that more depressed patients take longer to express themselves. They don’t vocalize more; rather they speak with greater hesitancy, producing more cumulative and variable pauses. Consequently, voice acoustic measures such as the percent pause time, vocalization/pause ratio, and speaking rates reflect depression severity. Not surprisingly, these measures also dominate the acoustic measures most sensitive to clinical change. The significantly reduced recording duration of treatment responders was due to fewer pauses, less total pause time, and faster speaking rates than observed in nonresponders. The change in the total amount of vocalizations produced before and after treatment did not differ between treatment responders and nonresponders. The responders were more efficient with their verbal communication, completing the speech tasks required with less hesitancy.
Interestingly, these data suggest that total pause time observed during completion of “automatic” speech tasks such as counting, reciting the alphabet, or reading may correlate better with depression severity than total pause time observed during production of extemporaneous “free” speech. Conversely, measures of pause variability and the vocalization/pause ratio observed during free speech may correlate better with depression severity than these measures obtained during automatic speech production. Such patterns, if replicated in future research, would be interesting and important both theoretically and clinically. Summative measures of the total number and cumulative duration of pauses while speaking may reflect a general level of psychomotor activation/retardation. During performance of highly automated speech tasks with minimal cognitive demand, the effects of depression on psychomotor retardation may be most evident in these measures and tasks. Generation of extemporaneous speech requires greater cognitive demand, requiring selection and preparation of the verbal response as well as articulatory execution. Greater pause variability and the relationship between vocalization and pausing behaviors may reflect the effects of depression on cognitive slowing, showing better association in these measures when completing free speech tasks. Such speculation requires further research and replication, but is consistent with the data obtained in this study and patterns previously observed in persons with schizophrenia (Cannizzaro, Cohen, Reppard, & Snyder, 2005
This study has several limitations that caution against over interpretation. First, the naturalistic observational study design limits generalization of these results to those that might be expected in a placebo-controlled clinical trial. Second, while larger than previous studies investigating telephone methods for obtaining vocal acoustic measures, and the first to include longitudinal assessments, the sample was limited to 35 patients and only six weeks of follow-up. Third, this study involved a homogeneous sample of primarily Caucasian patients recruited from a regional HMO. Whether or not these results would replicate in a longer study with a larger and more diverse patient population remains to be investigated.
Such limitations notwithstanding, this pilot study clearly establishes the feasibility of voice acoustic measures as potential depression severity biomarkers and indictors of treatment response. It also establishes the feasibility of using computer-automated speech solicitation techniques to obtain meaningful and analyzable data over standardized telephone interfaces. Voice acoustic measures associated with pitch variability and motoric articulation were significantly related to cross-sectional differences in depression severity between patients, as well as within-patient clinical change between treatment responders and nonresponders. These results replicate and extend prior research and identify needs for further research.