Citing Darwin (1872/1965) as inspiration, many scientists believe in the
structural hypothesis of emotion perception—the idea that certain emotion categories (named by the English words
anger, fear, sadness, disgust, and so on) are universal biological states that are triggered by dedicated, evolutionarily preserved neural circuits (instincts or affect programs), expressed as clear and unambiguous signals involving configuration of facial muscle activity (
facial expressions), and recognized by mental machinery that is innately hardwired, reflexive, and universal (e.g.,
Allport, 1924;
McDougall, 1908/1921;
Tomkins, 1962,
1963). Several influential models of emotion perception that involve the structural hypothesis now dominate the psychological literature (e.g.,
Ekman, 1972;
Izard, 1971) and are supported by empirical evidence (for a recent review, see
Matsumoto, Keltner, Shiota, O'Sullivan, & Frank, 2008). A recent article succinctly summarized the structural hypothesis: “The face, as a transmitter, evolved to send expression signals that have low correlations with one another and … the brain, as a decoder, further decorrelates … these signals” (
Smith, Cottrell, Gosselin, & Schyns, 2005, p. 188).
Even though humans, in the blink of an eye, can easily and effortlessly perceive emotion in other creatures (including each other), there is growing evidence that context acts, often in stealth, to influence emotion perception. Descriptions of the social situation (
Carroll & Russell, 1996;
Fernandez-Dohls, Carrera, Barchard, & Gacitua, 2008;
Trope, 1986), body postures (
Aviezer et al., 2008;
Meeren, van Heijnsbergen, & de Gelder, 2005), voices (
de Gelder, Böcker, Tuomainen, Hensen, & Vroomen, 1999;
de Gelder & Vroomen, 2000), scenes (
Righart & de Gelder, 2008), words (
Lindquist, Barrett, Bliss-Moreau, & Russell, 2006), and other emotional faces (
Masuda et al., 2008;
Russell & Fehr, 1987) all influence which emotion is seen in the structural configuration of another person's facial muscles.
Although researchers attempt to remove the influence of context in most experimental studies of emotion perception, one important source of context typically remains: words. A variety of findings support the hypothesis that words provide a top-down constraint in emotion perception, contributing information over and above the structural information provided by a face alone (for a review, see
Barrett, Lindquist, & Gendron, 2007; e.g., see
Fugate, Gouzoules, & Barrett, in press;
Roberson, Damjanovic, & Pilling, 2007). Furthermore, when the influence of words is minimized, both children (
Russell & Widen, 2002) and adults (
Lindquist et al., 2006) have difficulty with the seemingly trivial task of using structural similarities in facial expressions alone to judge whether or not the expressions match in emotional content (even though the face sets used have statistical regularities built in).
The
conceptual-act model of emotion (
Barrett, 2006,
2009a,
2009b;
Barrett et al., 2007) hypothesizes that facial muscle movements alone carry simple affective information (e.g., whether the face should be approached or avoided), and words for emotions increase the accessibility of conceptual knowledge for emotion, which acts as a top-down influence allowing a specific emotional percept to take shape. Within this model, conceptual knowledge is tailored to the specific situation, which leads to the hypothesis that emotion words direct attention to the situation. As a consequence, context is more likely to be encoded (if not consciously recognized) when a person's task is to perceive emotion in the face of another person rather than to judge the face's affective value.
In the present experiment, we tested this hypothesis using a memory paradigm that is sensitive to the way in which processing resources are allocated during encoding. Prior research has shown that context is not readily encoded when people process affectively potent objects (e.g., snakes;
Kensinger, Garoff-Eaton, & Schacter, 2007). Yet when it is advantageous for perceivers to attend to the context (e.g., when they must describe the context to the experimenter or remember the context), contexts are better remembered (
Kensinger et al., 2007). Perceivers' ability to remember the context can therefore be used as a proxy to understand how resources are devoted toward processing that context. We hypothesized that when asked to perceive emotion (i.e., fear or disgust) in a face, participants would devote more processing resources to encoding and remembering the context than they would when asked to perceive the face's affective value (i.e., whether to approach or avoid it).
Participants viewed objects or facial expressions (fearful, disgusting, or neutral) in a neutral context and judged either their willingness to approach or avoid the objects or faces (an affective categorization) or whether the faces were fearful or disgusting, using words presented to them during the task (an emotion categorization). We predicted that when participants were asked to label the facial expression with an emotion word, they would show better memory for the context in which the face was presented (even though the context itself was neutral) than they would when asked to make an affective judgment of the face. We reasoned that this would be true because the structural features of the face alone, even in a caricatured face (such as those typically used in studies of emotion recognition), are not typically sufficient to allow an emotion perception. As a consequence, perceivers attempt to use whatever context information is available, no matter how impoverished. In contrast, when perceivers are asked to make an affective judgment of a face, the information contained in the structural aspects of the face is more likely to be sufficient. Furthermore, we expected that labeling the emotion elicited by an object would not alter the memory for the context; there would be less disambiguation required with an object than with a face, and thus little need to devote processing resources to the context.