During normal conversation, emotion is conveyed largely by modulation of tone and rhythm of voice. Decoding of such information in real time, therefore, is critical for normal social interaction. Rules for decoding of emotion are either learned implicitly28
or may reflect an innate ability17
because individuals are never taught exact rules for emotion identification.28
As shown in this and previous studies,16,29
identification of intended emotions by voice alone is an inexact science, with most individuals correctly identifying intended emotion only 50%–60% of the time. Given the complexity of the task and its dependence upon basic perceptual abilities, it is axiomatic that deficits in basic sensory abilities, such as the ability to detect tonal patterns over time, in schizophrenia would produce deficits in acoustic emotion identification ability. Nevertheless, the sensitivity of patients with schizophrenia to the specific features of speech that are used to convey emotional intent has not previously been evaluated.
In the present study, performance of patients was compared with that of controls as a function of a range of cues, including those involving pitch (F0M
), intensity (voiceintSD
), and voice quality (HF500
). Analyses took advantage of the fact that some stimuli within the battery were relatively good exemplars of the intended emotion, as reflected in higher levels of correct identification, whereas others were less easily identified. Thus, for each emotion, it was possible to analyze responses across a range of acoustic values. As expected,16
pitch cues were critical for detecting happiness, sadness, fear, and disgust in controls, with different pitch measures contributing differentially. In contrast, intensity and voice quality cues were critical for detection of anger. These results are consistent with prior findings12,16
showing similar cue emotion interactions.
This study demonstrates for the first time that, when compared with controls, patients’ performance did not vary to the same degree as a function of the relative presence or absence of pitch cues but that such variability was roughly equivalent for intensity and spectral cues. This finding suggests that patients were less able than their healthy counterparts to utilize pitch-based cues to identify emotion, whereas their ability to utilize intensity cues was relatively intact. For controls, stimuli with highest levels of pitch variability were 4-fold more identifiable than those with more moderate levels. For patients, the difference was only 2-fold, leading to a large effect size (d
1.34) deficit in identifying happiness only for those stimuli with highest (ie, most happy) F0SD
Similarly, low levels of F0SD serve as a primary cue for fear. For controls, stimuli with low F0SD levels are 5-fold more identifiable than those with more moderate levels. Here, too, patients did not show this variation in the accuracy of their performance in identifying intended fear as a function of F0SD, suggesting an inability to take advantage of low, as well as high, levels of this cue when relevant. Patients also did not show variability in their responses to sadness or fear based upon pitch slope (F0contour) or disgust based upon mean pitch (F0M), suggesting an inability to utilize these pitch-based cues as well. In the MDS analysis, distances between emotions were significantly influenced by pitch-based measures for controls but not for patients (), suggesting that the differential pattern of misidentification seen in patients vs controls also relates to relative inability to utilize pitch cues in discriminating emotions.
In contrast to their inability to utilize pitch cues, patients did appear to utilize intensity cues, such as voiceintSD and voiceintM,, to detect anger equivalently to controls. In this case, patients showed increased, rather than decreased, variation in accuracy of detection of anger responses as compared with controls, leading patients to incorrectly reject anger as intended emotion for those exemplars with lowest levels of voiceintM. Further, for patients, MDS distances correlated significantly with the intensity measures voiceintSD and ATTACK, whereas such significant correlations were absent in controls, also suggesting that patients overutilize voice intensity and other secondary cues in discriminating emotion, compensatory to a fundamental deficit in ability to utilize pitch-based cues.
Finally, patients showed some ability to modulate response based upon voice quality cues such as HF500
, although the exact pattern of use differed somewhat between groups. In the MDS analysis, both controls and schizophrenia patients appeared to use F1BW
as a principal cue in differentiating emotions. F1BW
, although not strongly predictive of any single emotion, nevertheless is thought to convey mood/attitude information that is superordinate to emotion, such as the degree to which an individual is relaxed vs stressed.30
For controls, the relaxed/stressed distinction is the voice feature that is most readily perceived.30
Although patients were able to utilize F1BW
to differentiate emotions associated with high power (eg, anger, disgust) relative to those associated with low power (eg, sadness), nevertheless, they showed reduced spacing of emotions along this dimension than did controls.
As in prior studies of emotional identification, deficits in performance correlated significantly both with more basic deficits in pitch perception, such as tone matching and DTT score, as well as with deficits in global outcome, as reflected by the ILS-PB. Correlations with tone matching and DTT underscore the importance of treatment strategies aimed at reversing social communicatory disturbance, as well as the importance of correcting underlying deficits in pitch processing. In contrast, correlations with ILS-PB, as noted previously,2,31
underscore the relationship between poor acoustic emotion identification ability and poor functional outcome. Personality researchers have suggested that individuals with higher degrees of empathic ability and greater degrees of “social connectedness” perceive vocal emotional cues better than individuals who have lower degrees of these traits.32
The present findings suggest that the inverse may also be true and that one's basic perceptual abilities may determine in large part one's social experience.
At present, relatively little is known about the development of pitch perceptual abilities over the course of schizophrenia. Pitch detection deficits have been demonstrated consistently in chronic patients in schizophrenia, along with impaired generation of early auditory event–related potentials such as mismatch negativity (MMN), which reflects preattentive detection of stimulus deviance. Pitch processing was also studied in one sample of 15 first-episode (FE) patients, of whom 9 were felt to be stabilized, while 6 were persistently symptomatic. As a group, FE patients showed a moderate effect-size (d
0.58) deficit in tone matching. Further, approximately one-third of FE patients showed tone-matching thresholds outside the control range (>10% difference in pitch), leading to a significant difference vs age-matched controls (P
Deficits in MMN generation have also been reported early in the course of schizophrenia, with onset of deficit within 1.5 years of first hospitalization.34–36
Deficits in MMN generation have not been observed in patients during first hospitalization overall,35–37
although the number of subjects remains small. Further, even among patients studied within the first hospitalization, a significant correlation has been observed between MMN amplitude and the education item of the Premorbid Adjustment Scale, such that those patients who failed to complete high school showed significantly reduced MMN at first onset relative to both age-matched controls and FE patients with college education.35
Prosodic detection has, at present, been evaluated in only one study of FE patients of which we are aware. In that study, patients were evaluated within 6 months of discharge following initial hospitalization. Notably, mean years of education for this group was 10.8, with less than 30% of patients having any degree of college education. FE psychotic patients as a group showed a moderate (d = 0.69) decrement in affective prosodic detection relative to age-matched controls.
As with the present study, there was no significant difference across emotions. Patients nevertheless showed apparently greater deficits on detection of fear and sadness vs angry stimuli, similar to the pattern observed in the present study. Thus, while comparison across studies is difficult due to different cohort composition and different definitions of the term “FE,” both tone matching and prosodic detection deficits appear to be present early in the course of the illness and may be enriched with those subjects showing relatively low educational attainment. Effect sizes for both early auditory dysfunction and affective prosodic deficits appear to progress as a function of illness chronicity, although whether this reflects true degeneration within individual subjects, as opposed to distillation of poor-outcome subjects within chronic patient cohorts, remains to be determined. Overall, more study of both basic auditory processing ability and affective prosodic ability over the course of schizophrenia is required.
In the present study, patients also showed significantly reduced sensitivity to differential strength of emotional portrayals, in particular tending to overestimate intensity of intended weak emotions. As with categorization deficits, the failure to discriminate emotional intensity reflected an inability to utilize tonal, rather than intensity, information and correlated with reduced perceptual sensitivity. This is the first study to evaluate the ability of patients to discriminate auditory emotional intensity, along with emotion identity. As with deficits in emotion identification, the tendency to overestimate strength of weak emotional portrayals may also lead to significant misinterpretations in social communicatory situations.
The present study represents an initial attempt to go beyond simple correlational analyses of emotion processing accuracy vs pitch measures (eg, tone matching) and to develop instead a taxonomy of affective dysprosodia in schizophrenia. Development of such a taxonomy is crucial not only for achieving greater understanding of the neurophysiological bases of acoustic prosodic dysfunction in schizophrenia but also for the development of appropriate remediation or compensation techniques. This ecological approach is akin to attempts in both autism38,39
to link abnormal gaze of facial features to facial affect deficits. Nevertheless, because it is the first study of this type, certain limitations must be acknowledged. First and foremost, because multivariate featural batteries of this type have not previously been used for between-group analyses, the statistical approach for between-group analyses was developed post hoc. Because of this, results must be considered exploratory and confirmed with additional stimulus batteries, additional patient samples, or both. Nevertheless, we feel that the observation that controls show strong variation in ability to identify intended emotion based upon pitch-based measures whereas patients do not strongly validates the utilized analysis approach.
Second, the study took advantage of an existing, naturalistic stimulus set rather than using synthetically constructed utterances with predetermined characteristics. As a result, numerous stimulus features were strongly intercorrelated, limiting the extent to which any single feature could be isolated. This battery has the advantage of having physical features that have previously been published. Nevertheless, future synthetic stimulus development is needed in which specific parameters are modulated independently across a continuum of levels. This stimulus set further consisted of posed portrayals of emotional speech rather than evoked prosody from natural discourse. Though it can reasonably be argued that posed expressions must be relatively similar to naturally occurring expressions in order for communication to be successful, it may nevertheless be the case that posed expressions are exaggerated and more intense than authentic expressions and that the acoustic properties between posed and authentic stimuli may differ.12
Similarly, the ability to generalize performance from tasks involving posed portrayals to normal discourse may have its limitations. Nevertheless, because speakers may have complex emotions during normal discourse, obtaining objective validation of intended emotion may be impossible under naturalistic circumstances. All patients were also receiving antipsychotic medication at the time of testing, raising the possibility of a medication effect. However, no correlation was observed between performance and medication dosage.
In conclusion, disturbed emotion identification ability represents a key determinant of social cognition and functional outcome in schizophrenia. This is the first study to evaluate contribution of specific underlying acoustic cues to emotion identification dysfunction, as well as to assess the perception of emotion intensity. Primary findings are that patients show intact ability to utilize intensity-based cues but reduced ability to identify emotions based upon critical pitch modulations. Such deficits contribute to disturbances not only in identification of intended emotions but also in discrimination between intended weak and strong emotional portrayals. Further, as in prior studies, deficits correlated highly with more basic deficits in auditory processing, as well as with global outcome measures. These findings indicate the need for cue-, as well as emotion-, based assessment of prosodic dysfunction in schizophrenia and for ecological approaches to conceptualization and remediation of social communicatory impairments in schizophrenia. Furthermore, a cue-based approach may provide a method to discriminate schizophrenia dysprosodia from dysprosodias found in other illnesses such as Parkinsonism and autism.