In two experiments, we demonstrated that a human voice becomes affectively potent and acquires the capacity to influence behavior simply based on the content of the words spoken by that voice on a previous occasion. These studies demonstrate, for the first time, that a perceiver's (or listener's) responses to a particular talker can be influenced by the literal sound of that talker's voice based on the affective connotations of the words that talker produced in the past.
One possibility is that learning was enhanced by subtle affective cues present in the talker's pronunciation of words during the learning. In this view, when talkers spoke positive words their vocal acoustics were positive in affective tone simply because they were speaking words with positive connotations; these tonal attributes served to enhance the degree of positivity of the spoken words. We cannot rule out this possibility because we did not collect affective ratings of the talkers' vocal acoustics prior to learning so as not to bias perception of the voices during learning or test. Nevertheless, even if learning was enhanced by subtle affective cues present in the voices during learning, the words spoken during test were neutral in content and tone. As a result, we can conclude that the priming effects were based on affective value acquired by the identity linked vocal acoustics of each talker.
Interestingly, the pattern of reaction times was reversed from those traditionally observed in evaluative priming. There are two possible explanations for this pattern of results: that reverse learning occurred (i.e., hearing a voice speak positive words lead to that voice acquiring negative value) or that reverse priming occurred (i.e., voices that had acquired positive value facilitated the judgment of negative targets). A secondary index of the voices' acquired affective value would be required to rule out the possibility of reverse learning. In Experiment 1, we assessed participants' explicit evaluations of all of the voices but all voices were judged to be neutral. Rather than allow us to rule out the possibility of reverse learning, this finding suggests that the evaluative priming effects were not the result of strategic responding based on explicitly recalled valued information about the voices. In future studies, additional measures of voice valence should be included in order to rule out the possibility of reverse learning.
Reverse priming, or contrast effects, have been documented in diverse set of evaluative priming experiments (for reviews, Klaurer, Teige-Mocigemba, & Spruyt, 2009;
Glaser, 2003). A new theoretical approach proposed by Klaurer and colleagues (2009) suggests that reverse priming effects occur when prime stimuli occur outside of the time window during which target stimuli are evaluated but in recent temporal proximity to the prime. In other words, there is a critical, early evaluation window during which traditional priming effects emerge (called assimilation effects) wherein the prime occurs in the window during which the target is evaluated. Immediately following the assimilation priming evaluation window, is a temporally later window during which reversed priming is observed. In this window, the prime has already activated an evaluative stance but is too temporally distant to exert direct influence on the evaluation of the target. Finally, after the reverse priming evaluation window, the effect of the prime eventually dissipates all together and no priming effects are observed. The specific timing of the different evaluation windows depend on the properties of the prime and targets and their relative timing (or stimulus onset asynchrony,
SOA, time between the onset of the prime and onset of the target).
One way to vary the position of evaluation windows is by varying the SOA between the prime and the target. Typical SOAs in evaluative priming experiments that results in traditional assimilation priming effects are around 300 ms (e.g.,
Bargh et al., 1992). A good deal of research has demonstrated priming effects are quite sensitive to the SOA although the pattern of results is somewhat unclear. In some experiments priming effects appeared at 300 ms but not 1000 ms (e.g.,
Hermans, De Houwer & Eelen, 1994), while in other cases priming effects emerged at 0 and 100 ms SOAs and reverse priming was documented at 1200 ms SOA (e.g., Klauer, Robnagel, & Musch, 1997). In the present experiment the SOA was necessarily long because our prime stimuli were full spoken words (around 1 second in length) which required a longer time to play than is typical for visually presented prime stimuli; additionally a fixation cross was presented for 100 ms between the prime and target to cue participants that the target was going to be presented
2. Given this long SOA and the fact that the target evaluation window was likely cued by the intertribal interval fixation cross, it is possible, even probable, that the prime stimulus fell outside of the target's evaluative window resulting in reverse priming effects. Regardless of the mechanism underlying the effects, the fact that the voices speaking neutral words served to modulate the speed with which target words were evaluated clearly indicates that the acoustics of those voices acquired affective value.
The present findings have important implications for understanding the mechanisms of person perception. Little empirical research has addressed how vocal cues influence complex social judgments made about others or how vocal cues influence social interactions. One recent study suggests that cues from voices may have a large impact on social perception-- women whose voices were judged to be more feminine were associated to a greater degree with highly stereotypic female descriptions (
Ko, Judd, & Blair, 2006). We know that much information can be extracted from the human voice--not only can people extract information about a speaker's sex and age (e.g.,
Bachorowski & Owren, 1995;
Ladefoged & Broadbent, 1957;
Owren & Bachorowski, 2003;
Owren & Rendall, 1997), but also information about a speaker's personality (
Scherer, 1979), emotional state (
Russell, Bachorowski, & Fernandez-Dols, 2003 for a review), attractiveness (
Zuckerman & Miyake, 1993;
Berry, 1992), maturity (
Berry, 1992) and even probable occupation (
Yamada, Hakoda, Yuda, & Kusuhara, 2000). Despite evidence that such information can be extracted from human voices, how vocal acoustics come to have such meaning has not been addressed. While it is possible that some types of human vocal communication are innately pleasant (e.g., the sound of laughter;
Owren & Bachorowski, 2003), it is likely that the pairing between particular vocal acoustics and complex social constructs such as occupation are learned through experience. The present findings suggest that such learning is not only possible, but occurs through relatively mild and limited experience (hearing a disembodied voice speak 10 or 20 valued words) and is highly specific (individual speakers' voices could not be explicitly distinguished, yet still acquired specific affective value). How such learning influences subsequent interpersonal interactions is a fruitful avenue for future research.
The extent to which the affect-inducing properties of a voice become context-independent, change over time, and explicitly influences person perception are potentially fruitful avenues of future research. For example, if a given talker routinely induces negative affect in others by predominantly discussing topics that are negative or unpleasant, listeners may become more likely to attribute negative intentions to that individual even on occasions when the linguistic content of the speech is neutral or even positive. Conversely, if a talker's voice quality elicits generally positive affective responses through associations with positive content, then listeners will be more likely to make positive attributions. These pragmatic considerations are important in everyday linguistic interactions, where a listener's understanding of the significance of a communicative interaction is highly dependent on perceptions of a talker's communicative intentions. Such pragmatic considerations may become even more important in long-term relationships that are marked by repeated, socially significant interactions between two parties.