|Home | About | Journals | Submit | Contact Us | Français|
The present study tested whether empathic accuracy and physiological linkage during an emotion recognition task are facilitated by a cultural match between rater and target (cultural advantage model) or unaffected (cultural equivalence model). Participants were 161 college students of African American, Chinese American, European American or Mexican American ethnicity. To assess empathic accuracy—knowing what another person is feeling—participants (“raters”) used a rating dial to provide continuous, real-time ratings of the valence and intensity of emotions being experienced by four strangers (“targets”). Targets were African American, Chinese American, European American or Mexican American women who had been videotaped having a conversation about their relationship with their dating partner in a previous study. Empathic accuracy was defined as the similarity between ratings of the videotaped interactions obtained from: (a) raters in the present study, who rated how the targets were feeling during the interaction; and (b) targets in the previous study, who had rated their own feelings during the interaction. To assess emotional empathy—feeling what another person is feeling—we drew on literatures that underscore the role that mimicry and contagion play in empathy and examined physiological linkage (similarity between raters’ physiology when viewing the videotapes and targets’ physiology when in the actual interaction). Our findings for empathic accuracy supported the cultural equivalence model, with no evidence of greater accuracy when raters viewed targets of their own ethnicity. Our findings for physiological linkage provided some support for the cultural advantage model, with greater physiological linkage when Chinese Americans viewed and rated Chinese American targets.
Emotions play an essential role in human communication. An important part of our interpersonal lives is the production, perception, interpretation, and response to emotional signals. Being able to perceive these signals accurately carries clear advantages for predicting behavior, as well as forming and maintaining social bonds. These proximal functions impact more distal goals such as survival and reproduction in humans and in other species (Preston & de Waal, 2002). Thus, compelling advantages accrue to those individuals who can accurately read the emotions of others. These advantages may be even greater in situations when the individuals are members of the same in-group (Anderson & Keltner, 2002), suggesting a possible cultural advantage in emotion detection.
The question of whether individuals are more accurate at recognizing the emotional displays of members of their own culture versus those of other cultures has been an important issue in emotion research for several decades. Findings of same-culture facilitation and same-culture equivalence have both been reported (e.g., Ekman et al. 1987; Elfenbein & Ambady, 2002a; Matsumoto, 2002). One characteristic of this work has been its reliance on static emotional stimuli (e.g., still photographs, vignettes, etc.) and one-time emotional judgments (e.g., a photograph of a face is identified as “angry”). Such stimuli are common in the laboratory; however, real world emotion judgments are typically made more continuously as emotions unfold dynamically over time. In the present study, we revisit the question of a cultural advantage using a methodology that allows us to assess dynamic emotion judgments that arguably have greater ecological validity.
The cultural equivalence model of emotion recognition asserts that the mechanisms underlying the expressive and receptive aspects of emotional communication are deeply rooted in evolution (Darwin, 1872). It has now been over a century since Darwin noted striking similarities across species (including humans) in how critical emotional information such as anger/aggression, happiness, and fear is communicated in postural and expressive signals. These similarities in conveying emotional information across species are consistent with an evolutionary argument that within many social animals there appears to be a natural ability to relay emotional signals efficiently and effectively with one another—a position that has been subsequently endorsed by others (Chevalier-Skolnikoff, 1974; Linnankoski, Laasko, & Leinonen, 1994). By extension, our ability to recognize the emotional signals of all conspecifics (not just a subset of these) carries great value for the ultimate survival of the species (Preston & de Waal, 2002). Thus, a cultural equivalence model predicts that individuals should be equally accurate in understanding the emotions of in-group and out-group members.
Empirical support for the cultural equivalence model can be found in a number of studies. For example, Ekman and colleagues have demonstrated that people from different cultures are able to identify correctly the emotions portrayed in photographs of emotional facial expressions (Ekman, 1972; Ekman & Friesen, 1971; Ekman et al., 1987; Matsumoto, 1993). This evidence extends to members of pre-literate, visually isolated tribes in New Guinea and Borneo who were able to match emotion labels to pictures of facial expressions posed by Caucasians (Ekman, Sorenson, & Friesen, 1969). Although such data are widely interpreted as evidence for the universality of emotional facial expressions, this conclusion is not without controversy (see Ekman, 1994; Izard, 1994; Russell, 1994; Russell, 1995 for a lively debate regarding this conclusion). Nevertheless, the above studies and many others demonstrate that cultural differences between raters (i.e., observers or judges who are making inferences about another’s emotions) and targets (those whose emotions are being rated or judged) do not hinder accurate recognition of emotions (Boucher & Carlson, 1980; Gitter, Kozel, & Mostofsky, 1972; Kilbride & Yarczower, 1983).
Anderson and Keltner (2002) point out that evolutionary forces may also favor a cultural advantage in recognizing emotions. In support for this notion, O’Toole, Peterson and Deffenbacher (1996) found that individuals process the visual characteristics of same-race faces more accurately and efficiently than other-race faces. Moreover, the mere perception that one belongs to the same group as a target can lead to increased in-group accuracy in emotion recognition (Thibault, Bourgeois, & Hess, 2005). Belonging to the same cultural group may also increase accuracy by virtue of familiarity with slight differences in the ways emotions are expressed by members of that group, an idea captured by the notion of nonverbal accents proposed by Marsh, Elfenbein and Ambady (2003) and the dialect theory of communicating emotion (see Elfenbein, Beaupré, Lévesque, & Hess, 2007).
Several early studies documented increased accuracy in recognizing the emotions from others from the same culture (Freeman, 1984; Vinacke & Fong, 1955; Wolfgang & Cohen, 1988); however, these findings were often overshadowed by findings of “universality” within the same studies (e.g., Vinacke & Fong). Subsequent studies have increasingly emphasized the findings of a cultural/ethnic advantage in recognizing the emotions of others. For instance, Albas, McCluskey, and Albas (1976) compared the ability of White Canadians and Indian Canadians to judge the emotional tone of content-filtered speech from either White or Indian speakers. Although both groups were able to correctly judge the emotions of the other group with above-chance accuracy, each group was better at this task when rating speech from their own cultural group. Freeman (1984) found that preschool boys scored higher on a measure of cognitive empathy when judging targets of the same ethnicity than those of a different ethnicity. Wolfgang and Cohen (1988) found that Anglo Canadians were better able to recognize the facial expressions of Anglo Canadians than those of West Indian Canadians and vice versa. A similar pattern of results was found for Chinese and Australian participants by Markham and Wang (1996).
This in-group advantage has been found for ethnic minorities as well as majority group members (Weathers, Frank & Spell, 2002). In a meta-analysis of 97 studies, Elfenbein and Ambady (2002b) found a significant in-group advantage such that emotion recognition accuracy was 9.3% better when participants judged expressions posed by members of the same cultural group. Although this conclusion has been contested (see Elfenbein & Ambady, 2002a; Matsumoto, 1993), there is certainly sufficient evidence of a cultural advantage for this to constitute a rival hypothesis to the cultural equivalence model.
Empathic accuracy is the ability of one person to recognize the thoughts and feelings of another person at a specific moment in time (Ickes, 1993). It is one of the important subareas subsumed under the larger rubric of “empathy research” (which also includes such topics as prosocial behavior, sympathy, and compassion). Empathic accuracy is a measure of what people can do in the realm of emotion recognition. It does not tell us how they do it. In terms of how we come to know what others are feeling a critical distinction has been made between cognitive empathy and emotional empathy (Davis, 1994).
Cognitive empathy involves “knowing” what the target individual is feeling. At the core of cognitive empathy is some form of perspective taking or conscious cognitive processes (Hatfield, Cacioppo & Rapson, 1994) that allows an observer to deduce logically what a target is feeling. Importantly, observers do not have to become emotionally aroused for cognitive empathy to occur; they can simply process available cues and information and come to a conclusion as to what the other person is feeling. In the present study, our assessment of empathic accuracy is viewed as a measure of cognitive empathy. It determines how well individuals from various ethnic groups can tell how strangers from their own and other ethnic groups are feeling, but does not tell us how this occurs.
Emotional empathy involves “feeling” what the target individual is feeling. Applied to empathic accuracy, it suggests that one way that we come to know what another person is feeling is by experiencing some version of the emotion ourselves. One basis for emotional empathy is emotional contagion, a process whereby an observer comes to share the emotions of a target as a result of automatic mimicry and synchronization with the target’s facial expressions, vocalizations, postures and movements (see Hatfield, Cacioppo, & Rapson, 1994; Dimberg, 1982; Eisenberg et al., 1989; Preston & de Waal, 2002). “Mirror” neurons” provide another basis for emotional empathy. These neurons, respond to both the generation of certain behaviors as well the observation of those behaviors being performed by others (Rizzolatti & Craighero, 2004; Rizzolatti, Fadiga, Fogassi, & Gallese, 1999). Originally observed in non-human primates, there is now indirect evidence from functional imaging studies for their existence in humans (Decety & Jackson, 2004; Decety & Lamm, 2006; Decety & Meyers, 2008; but see counter evidence in Lingnau, Gesierich, & Caramazza, 2009).
In our prior research, we have explored “physiological linkage”, a state in which the patterns of autonomic and somatic physiological response of an observer mimic those of the person being observed. We initially demonstrated that this occurs both between spouses (Levenson & Gottman, 1983) and within spouses (Gottman & Levenson, 1985) when married couples view videotapes of their interactions and rate each other’s emotions. We later showed that physiological linkage also occurs when individuals observe and rate the emotions of strangers (Levenson & Ruef, 1992), especially when the observers show emotional facial expressions (Soto, Pole, McCarter, & Levenson, 1998). We and others (Preston & de Waal, 2002) consider physiological linkage as a form of emotion contagion in which observers have emotional reactions that are similar to those of the target person and that these similar emotions produce similar patterns of autonomic and peripheral nervous system activity (e.g., Ekman, Levenson, & Friesen, 1983; Levenson, 2003).
Support for the notion that emotional empathy is related to empathic accuracy comes from several sources. In our work, greater physiological linkage was associated with greater accuracy for detecting targets’ negative emotions (Levenson & Ruef, 1992). More recently, Zaki, Weber, Bolger, and Ochsner (in press) demonstrated that accuracy in rating emotions was correlated with activity in brain regions associated with shared sensorimotor representations (inferior parietal lobule and bilateral dorsal premotor cortex) as might be expected during emotional empathy and with activity in regions indicative of mental state attributions (medial prefrontal cortex and superior temporal sulcus) as might be expected during cognitive empathy.
The present study addresses the issue of whether there is a cultural advantage in two aspects of empathy. First, there is a test of cognitive empathy using an empathic accuracy paradigm in which raters rate targets’ emotions continuously in real time during a naturalistic social interaction. Second, there is a test of emotional empathy that assesses the extent of physiological linkage (similarity between raters’ and targets’ physiological responses) during this rating task. This design enables us to test rival predictions between the cultural equivalence and cultural advantage models. Specifically, the cultural equivalence model predicts no differences in empathic accuracy or physiological linkage as a function of the match or mismatch between raters’ and targets’ ethnicity. In contrast, the cultural advantage model predicts greater empathic accuracy and physiological linkage when raters’ and targets’ ethnicity is the same as compared to when they are different.
We recruited 161 full-time college students from the greater San Francisco Bay Area colleges and universities to serve as emotion raters. These participants were of either African American (n = 39), Chinese American (n = 43), European American (n = 38), or Mexican American (n = 41) ethnicity. Within each of the four ethnic groups, approximately 60% of the participants were women and 40% were men. The mean age of the sample was 20.23 yrs (SD=2.44) and it did not differ significantly across the four ethnic groups. Sixteen individuals failed to report their age.
Participants were recruited by making announcements in classes, posting flyers, and inviting students in common areas to participate. All interested individuals were screened for eligibility via telephone, internet, or in person. Specific eligibility criteria for each ethnic group, presented in Table 1, were established in consultation with researchers who had special expertise relevant to these ethnic groups. The criteria aimed to ensure an equal amount of cultural exposure to American culture and to the participant’s culture of origin (see Soto, Levenson, & Ebling, 2005 for a complete discussion of these criteria). Therefore, for the purposes of this study we use the terms ethnicity and culture interchangeably. Those who qualified for the study were scheduled for a laboratory session. Participants also completed a questionnaire package but these data were not used in the present study. Some students participated in the study in exchange for psychology course credit whereas others were paid $35 for participating in the research.
A remotely controlled, high-resolution video camera, partially concealed behind darkened glass, was used to obtain frontal views of each participant’s face and upper torso as they completed their ratings. All instructions were presented visually on a 13″ television monitor placed directly in front of the participant at a distance of four feet. Participants were informed that they would be videotaped.
Seven physiological functions were measured using a system consisting of a Grass Model 7 12-channel polygraph and a microcomputer. (1) Cardiac Inter-beat interval, was measured using Beckman miniature electrodes with Redux paste placed on opposite sides of the chest. The interval between successive R-waves was measured in milliseconds. (2) Pulse transmission time to the finger was measured using a UFI photoplethysmograph attached to the top phalange of the second finger of the non-dominant hand. Transmission time was measured as the time interval between the R-wave of the electrocardiogram and the upstroke of the peripheral pulse at the finger. (3) Finger pulse amplitude, the trough-to-peak amplitude of the finger pulse, was also measured from the UFI photoplethysmograph. (4) Ear pulse transmission time was measured using a UFI photoplethysmograph attached to the right ear lobe. Transmission time was measured between the R-wave of the electrocardiogram and the upstroke of the peripheral pulse at the ear. (5) Skin conductance level, the level of sweat-gland activity on the surface of the hand was measured by passing a constant voltage between Beckman regular size electrodes attached to the palmar surface of the lower phalanges of the first and second fingers on the nondominant hand. The electrodes were filled with an electrolyte paste consisting of sodium chloride in Unibase. (6) Finger temperature was measured using a Yellow Springs Instruments thermistor attached to the palmar surface of the first phalange of the fourth finger of the non-dominant hand. (7) General somatic activity was measured using an electro mechanical transducer attached to a platform under the participant’s chair. The transducer generated an electrical signal, measured in arbitrarily designated units, proportional to the amount of movement in any direction. The seven measures that were examined are sensitive to cardiac, vascular, and electrodermal responses thought to be highly relevant to emotion.
The emotion rating dial consists of a mechanical dial that traverses a 180 degree arc over a 9-point scale with the labels “extremely negative” at one end, “neutral” in the middle, and “extremely positive” at the other end. The dial is attached to a potentiometer in a voltage-dividing circuit that provides a signal to the computer from which the precise dial position can be determined (Levenson & Ruef, 1997). This rating dial was used to obtain ratings from the participants in the current study and was used previously by target individuals to rate their own emotions as part of their original study protocol (see below).
Videotapes of eight targets interacting with their dating partners were selected as stimulus materials for the current study. Two targets (and their partners) represented each of the four ethnic groups represented by our participants (African American, Chinese American, European American, Mexican American) and met the same recruitment criteria presented in Table 1. These videotapes had been collected previously as part of a study of cultural influences on couples’ interactions. In that study, couples came to the laboratory and engaged in three 15-minute conversations about: (a) the events of the day, (b) a conflict area in their relationship, and (c) a pleasant area in their relationship. During the conversations, their physiological responses were measured (the same responses used in the present study and described above). Following each conversation, partners watched the videotape of their conversation and used the rating dial described above to provide continuous ratings of their own feelings during the conversation (these procedures were originally developed by Levenson & Gottman, 1983, for use with married couples). This previous study provided approximately 180 videotaped conversations from which we selected our final targets.
Several steps were taken to select the final eight target videotapes. First, we decided to use only female partners as targets given research indicating that women are more expressive than men (Hall, Carter, & Horgan, 2000; LaFrance & Banaji, 1992) and engender greater empathic accuracy than men (Klein & Hodges, 2001; Levenson & Ruef, 1992). Second, to ensure the videos had sufficient emotion, we made use of the self-ratings provided by the female partners in the original conversation (see data reduction below) to select only those in which the female partner rated her own feelings as negative or positive for more than half the time. Third, to ensure a balance between positive and negative affect, we selected only those conversations in which the female partner’s ratings showed a ratio of negative to positive affect between 0.5 and 2.0. For conversations meeting these three criteria, we then had the female partner’s emotions rated by a panel of six judges (using the same procedures described below) and excluded conversations that were deemed too easy or too difficult to rate in terms of the judges’ average empathic accuracy scores.
Finally, from among the remaining videotaped conversations we selected two targets from each ethnic group; one where the events of the day was the topic of conversation and one in which a conflict area or pleasant area of the relationship was the topic of conversation. When more than one conversation was available, we made the final selection at random. The final eight target videotapes were randomly divided into two sets of four targets (distributed equally across ethnic group and conversation type) and participants were randomly assigned to view one of these sets, with the order of the four targets within each set counterbalanced across participants.
Participants came to the laboratory for a two and a half-hour experimental session. To minimize discomfort, the participants had contact only with a research assistant of their own gender while the physiological sensors were being attached. Participants were seated in a chair facing a monitor and written consent was obtained. Afterward, a research assistant attached the physiological sensors described earlier and explained their functions.
Participants were told that they would be watching four videotaped conversations between dating couples and that they would be asked to use the rating dial to provide a continuous report of how they thought the woman in the video was feeling. To make sure they understood how to operate the dial, participants were asked to move it to indicate positive, negative, and neutral emotions. The video recordings showed a split screen image with each partner occupying half of the screen. Before each conversation the appropriate side of screen was covered so that only the female partner’s image was visible (participants could still hear the male partner). Following each conversation, participants were asked how likeable and attractive they found the female target, and how engaged they were by the conversation1. These latter questions were answered using a 0 (not at all) to 8 (extremely) scale.
Procedures for deriving empathic accuracy and physiological linkage scores are based on those employed by Levenson and Ruef (1992). The 10-second averages were standardized and each period was classified as positive, negative or neutral using both raw and standardized scores. To be coded as a positive period, the raw rating dial score had to be greater than or equal to 6.0 (referenced to the original 1–9 emotion rating dial scale) and the z-score had to be greater than or equal to 0.5. In order for a period to be coded negative, the raw rating dial score had to be less than or equal to 4.0 and the z-score had to be less than or equal to −0.5.
Empathic accuracy scores were determined separately for positive and negative emotion using a sequential analysis (Levenson & Gottman, 1983) that compares the participant’s ratings with that target’s ratings for two lags: agreement between participant and target in the same 10-second period (lag zero) and agreement between target’s rating in one 10-second period and participant’s rating in the following 10-second period (lag one). This analysis produces a z-score index of how likely it is that the participant’s ratings agree with the target’s ratings. Thus, each participant had four empathic accuracy scores for each target they rated: positive accuracy scores for lag zero and lag one and negative accuracy scores for lag zero and lag one.
Using a computer program written by one of the authors (RWL), each of the physiological measures was quantified and averaged into successive 10-second periods. Physiological linkage was determined by applying a bivariate time-series analysis (Gottman, 1981) evaluating the extent of relatedness between the physiological data collected from the participant while rating the interaction and the physiological data collected from target while in the original interaction. Linkage was computed in both directions, in each case controlling for autocorrelations (i.e., cyclicity) within each time series (if uncontrolled, these autocorrelations can unduly inflate estimates of relationships across time series). The analysis yields log likelihood statistics, with approximate chi-square distributions for each physiological variable. Following procedures we have used previously with these kinds of data (Levenson & Gottman, 1983; Levenson & Ruef, 1992), this process was carried out for each of the seven physiological variables that were common to participants and targets in both directions of prediction (i.e., participant to target, target to participant). The proportion of these 14 statistics that was statistically significant was used as an index of overall physiological linkage between rater and target.
Before addressing the principal hypotheses, we computed a simple estimate of the overall level of rating accuracy of all our subjects. This index, computed as the percentage of 10-second periods rated the same by the participant and target, was computed separately for positive periods and negative periods at lag zero and lag one and for each target ethnicity. The resulting 16 accuracy scores are presented in Table 2. The mean level of overall empathic accuracy, collapsing across participant ethnicity, ranged from 13% – 45% with only the overall accuracy ratings of African Americans exceeding chance (set at 33%, because a period could be positive, negative, or neutral). Accuracy levels for our sample were consistent with those reported by Levenson and Ruef (1992).
Participants’ empathic accuracy scores were analyzed using a 4 × 4 (Rater Ethnicity × Target Ethnicity)2 MANOVA with Target Ethnicity as a repeated measure. The dependent variables were participants’ empathic accuracy scores for positive and negative emotion periods at lag zero and at lag one. If ethnic match between rater and target is associated with higher empathic accuracy, a significant interaction between Rater Ethnicity and Target Ethnicity in the overall MANOVA would be expected. Results revealed no significant interactions between Rater Ethnicity × Target Ethnicity, F(36,1744.3) = 0.84, ns, ηp2 = .02. Thus, we found no support for an in-group advantage in empathic accuracy.
To explore further the findings reported above, follow-up univariate ANOVAS were conducted separately for each of the four dependent variables: positive accuracy scores at lags of zero and one; and negative accuracy sores at lags of zero and one. Across each of the dependent variables, the effect of the Rater Ethnicity × Target Ethnicity interaction was not significant, further bolstering the lack of an in-group advantage. These findings, however, only indicate the lack of an absolute cultural advantage. Matsumoto (2007) distinguishes between an absolute cultural advantage and a relative culture advantage. The former is determined by a test of the omnibus interaction in a balanced ANOVA design (every rater culture rates every target culture) such as that utilized in this study.
Following recommendations outlined by Matsumoto (2007), we tested for a relative cultural advantage by first running a Linear Mixed Models procedure that allows for the specification of a main-effects-only model given a repeated-measures design. This analysis was repeated for each of the four empathic accuracy variables and the resulting residuals were saved. For each dependent variable, we then ran a single-degree of freedom planned contrast that tested the hypothesis that among the matrix of 16 residuals representing the Rater Ethnicity × Target Ethnicity interaction, participants rating targets of the same ethnicity (matrix diagonal) would show equally high empathic accuracy scores (4 contrast weights of 1) compared to those participants rating targets of a different ethnicity (12 contrast weights of -1/3 along the off-diagonals). Across all four dependent variables, the planned contrast analysis was not significant, indicating the lack of a relative cultural advantage as well: positive accuracy at lag zero, t(497.2) = 0.62, ns; positive accuracy at lag one, t(499.9) = 0.63, ns; negative accuracy at lag zero, t(499.9) = −0.53, ns; and negative accuracy at lag one, t(501.3) = 0.07, ns.3
We were also interested in the presence of any main effects associated with rater ethnicity and target ethnicity. With all four dependent variables (positive and negative accuracy at lags one and zero) in the model, main effects emerged for both Rater Ethnicity, F(12,405.1)4 = 1.88, p < .05, ηp2 = .05 and Target Ethnicity, F(12,1230.6) = 24.2, p < .001, ηp2 = .17. Follow-up univariate analyses failed to demonstrate a significant Rater Ethnicity effect for any of the four accuracy variables. The effect of Target Ethnicity, however, was significant across each of the four accuracy scores: positive accuracy at lag zero, F(2.71,423.4) = 31.04, p < .001, ηp2 = .17; positive accuracy at lag one, F(2.70,421.5) = 31.19, p < .001, ηp2 = .17; negative accuracy at lag zero, F(2.66,415.46) = 37.36, p < .001, ηp2 = .19; and negative accuracy at lag one, F(2.59,403.2) = 30.57, p < .001, ηp2 = .16. Table 2 lists the participants’ mean accuracy scores, averaging across rater ethnicity, for each of the four target ethnicities. The pattern of means for each of the four empathic accuracy scores reflects a clear and consistent pattern. All participants were most accurate in rating the emotions of the African American targets, followed by Chinese Americans, European Americans, and Mexican Americans, in order from most accurately rated to least accurately rated.
Mean physiological linkage scores (presented in Table 4) were analyzed using a 4 × 4 (Rater Ethnicity ×Target Ethnicity) repeated measures ANOVA, with Target Ethnicity serving as a repeated measure and physiological linkage (percentage of the 16 linkage variables that was significant) as the independent variable. If ethnic match between rater and target leads to greater physiological linkage, a significant interaction between Rater Ethnicity and Target Ethnicity in the overall ANOVA would be expected. Results showed no main effect for Rater Ethnicity, F(3,156) = 1.44, ns, ηp2 = .03 or for Target Ethnicity, F(3,154) = 0.78, ns, ηp2 = .02, but we did find a significant interaction between Rater Ethnicity and Target Ethnicity, F(9,374.9) = 2.08, p < .05, ηp2 = .04.
To clarify the nature of the significant interaction we conducted four separate repeated measures ANOVA (one for each rater ethnic group) with physiological linkage as the dependent variable and target ethnicity as the repeated measure. Only the analysis for Chinese Americans yielded significant results, F(3,40) = 2.76, p = 0.05, ηp2 = 0.17. For Chinese Americans, the degree of physiological linkage during the rating task did depend on the ethnicity of the target being observed. Chinese American raters showed significantly greater physiological linkage when rating Chinese American targets (M = 33.1%) than when rating European American targets (M = 25.6%), t(42) = 2.94, p < .01. A similar pattern was present for European American participants (greater linkage when rating European Americans), but this difference did not reach statistical significance, F(3,35) = 1.57, ns, ηp2 = .12.
To explore our finding that African American targets were rated most accurately, we conducted a post hoc analysis using the ratings provided by participants immediately after they rated each target. This took the form of repeated measure ANOVAs with Target Ethnicity serving as a repeated measure and ratings of how engaged they were by the conversation, difficultly of the rating task, perceived accuracy of ratings, and likeability and attractiveness of targets as the dependent measure.5
The analyses revealed significant differences across the four target ethnicities in how engaged participants were with the conversation, F(3,342) = 3.29, p < .05, how attractive they found the targets, F(3,450) = 40.1, p < .001, and how likeable they found the targets, F(2.54,380.5) = 14.90, p < .001. Follow-up pairwise comparisons (Bonferroni adjusted) revealed that raters felt significantly less engaged by the African American target conversations relative to the Chinese American target conversations (see Table 5 for complete results). Analysis of the attractiveness and likability ratings for the four target ethnicities revealed that African American targets received significantly higher scores than European American and Mexican American targets, but did not differ from Chinese American targets. In sum, although participants were not more engaged in viewing the African American targets, they did report the African American targets to be more attractive and likeable than their European American and Mexican American counterparts.
The goal of this study was to examine the effects of ethnic match on empathic accuracy and physiological linkage. Two models were considered, each of which clearly makes different predictions about the effects of ethnic similarity: (1) a cultural advantage model that predicts greater empathic accuracy and greater physiological linkage when individuals rate the emotions of members of their own ethnic/cultural group compared to a different ethnic/cultural group, and (2) a cultural equivalence model that predicts no cultural differences for empathic accuracy or physiological linkage regardless of whether raters rate the emotions of members of their own or a different ethnic/cultural group. Because previous studies have attempted to answer this question using static emotional stimuli (e.g., photographs) and one-time emotion judgments, the present study’s use of dynamic, interpersonal emotional stimuli (conversations of couples) and continuous, real time emotion judgments offers an opportunity to revisit this question using stimuli and ratings that arguably more closely approximate the conditions under which these judgments are made in the real world.
Our results for empathic accuracy revealed no support for the cultural advantage model. This was the case when we tested for an absolute cultural advantage as well as when we tested for the presence of a relative cultural advantage. These results are consistent with those of many previous studies in the emotion recognition literature that are more supportive of a cultural equivalence model of empathic accuracy (Boucher & Carlson, 1980; Gitter, Black, & Mostofsky, 1972a; Gitter, Black, & Mostofsky, 1972b; Gitter, Kozel, & Mostofsky, 1972; Kilbride & Yarczower, 1983; McAndrew, 1986; Sogon & Masutani, 1989). Our findings are also consistent with more recent studies that have attempted more stringent tests of the cultural advantage. For example, Beaupré and Hess (2005) failed to find a cultural advantage when a careful stimulus equivalence procedure was used to select target pictures. Similarly, Matsumoto, Olide, & Willingham (in press) failed to find a cultural advantage when using spontaneous expressions of emotions as targets, arguably a more ecologically valid stimulus than posed expressions.
On the other hand, our findings are not consistent with work that demonstrates an in-group advantage in emotion recognition (e.g., Albas, McCluskey, & Albas, 1976; Freeman, 1984; Weathers, Frank & Spell, 2002), nor with Elfenbein and Ambady’s (2002b) meta-analytic review, which found support for an in-group advantage in the recognition of emotion across cultures. There are a number of possible explanations for why such an in-group advantage did not emerge in our study. As noted previously, we used a method for assessing empathic accuracy that attempted to maximize ecological validity. Most of the studies included in the Elfenbein and Ambady (2002b) review used a different kind of task in which participants observe a static emotional stimulus (usually a photo of a person portraying an emotional facial expression) and then apply the correct emotion term (a single emotion judgment). In our study, empathic accuracy was assessed in terms of ability to track the changing valence and intensity of a target’s emotion during a 15-minute conversation with a relationship partner. In this kind of real-life, dynamic, interpersonal context, the need to perceive another person (whether friend or foe) accurately may simply outweigh or overshadow any cultural advantage.
Another possibility is that we inadvertently stacked the deck against finding a cultural advantage in empathic accuracy by our choice of participants. We used selective eligibility criteria to ensure that selected participants had equal amounts of exposure to their cultures of origin and American mainstream culture. By employing these criteria we felt we could more confidently attribute any found differences to actual cultural differences. Nevertheless, all participants were recruited from the same geographic region. Research has demonstrated that part of the in-group advantage is diminished when in-group members have greater exposure to members of the out-group (Elfenbein & Ambady, 2003). Thus, a similar effect may have played out with the participants in our study whereby frequent contact with members of the other ethnic groups may have tempered a possible cultural advantage.
A third possibility is that our design was not powerful enough to detect an in-group advantage that did exist. Because the cultural equivalence model essentially proposes the null hypothesis of no difference, it is important to consider whether any lack of findings constitute the true absence of an effect or a problem with statistical power. In particular, we are concerned with the effect size and power estimates associated with the tests of the Rater Ethnicity (between subject variable) × Target Ethnicity (within-subject variable) interaction for empathic accuracy and physiological arousal. Based on estimates computed by our analysis software (SPSS), the effect size for the empathic accuracy interaction analysis was ηp2 = .02 (equivalent to r = 0.15) with an observed power of .83. This effect size falls in the lower range of the 95% confidence interval for the overall effect size of the in-group advantage (0.15 – 0.34) reported by Elfenbein and Ambady (2002b) in their meta-analysis. Thus, according to these estimates it appears that we had sufficient power to detect even the smallest expected effect.
As noted earlier, we view physiological linkage as reflecting processes of emotional contagion/emotional empathy (i.e., feeling what another person is feeling). Prior research has shown that physiological linkage occurs both between (Levenson & Gottman, 1983) and within (Gottman & Levenson, 1985) spouses when they watch videotapes of their interactions and rate each other’s emotions and that greater physiological linkage is associated with greater empathic accuracy when viewing and rating the negative emotions of strangers (Levenson & Ruef, 1992). Analysis of linkage data provided our only support for the cultural advantage model in that Chinese Americans showed greater physiological linkage when rating Chinese American targets. This finding is reminiscent of our prior finding that observing members of the in-group in an emotional situation produces similar emotions in the observer (Roberts & Levenson, 2006). However, we are not able to answer two important questions raised by these findings. First, we do not know why this finding was limited to Chinese American participants viewing Chinese American targets, although it is tempting to consider the well-documented collectivism in Chinese American culture (Hofstede, 2001) as a possible explanation. Second, we did not find greater physiological linkage in Chinese Americans to be associated with an in-group advantage in empathic accuracy as we would have predicted. Thus, replication of this finding using experimental designs that would enable us to address these questions is clearly called for.
Although ethnic match did not emerge as an important determinant of how well we read emotions from others, we did find some evidence of cultural variation in empathic accuracy. Participants were consistently most accurate when rating the emotions of African Americans, followed by Chinese Americans, European Americans and Mexican Americans. This finding is noteworthy because the conversations were selected in a manner that maximized their equivalence in terms of topic, amount of negative and positive affect, and ratio of negative affect to positive affect. However, differences in emotional expressiveness may still have affected the effectiveness with which emotion was conveyed.
Evidence from Johnson (1989) might begin to explain these findings. Johnson found that African Americans reported experiencing anxiety more frequently and anger more intensely than European Americans. Assuming that experiencing an emotion more frequently and more intensely is ultimately tied to more visible displays of the emotion, it is reasonable to expect that African Americans may be more demonstrative with at least some negative emotions. Given the explicitly negative focus of at least half of the stimulus materials, African American targets may have manifested greater and more detectable negative emotion thereby giving participants an edge in rating their emotions overall.
Also striking among our results is the level of empathic accuracy among participants when rating Chinese American and Mexican American targets. Previous research and ethnographic accounts suggest that Chinese culture embraces emotional moderation and that Mexican Culture embraces emotional expression (Murillo, 1976; Potter, 1988; Soto, Levenson, & Ebling, 2005). Based on this we might have expected Mexican American targets to be rated most accurately and Chinese American targets to be rated least accurately. However, our findings were almost exactly the opposite, with Mexican American targets being rated least accurately and Chinese American targets being rated second-most accurately. These results were surprising, but may be understood in light of results from our previous work examining patterns of emotional expression and experience among Chinese American and Mexican American college students exposed to an acoustic startle (Soto, Levenson & Ebling, 2005). We found that while Chinese Americans consistently reported feeling less emotion than their Mexican American counterparts, there were no differences in how much emotional behavior the two groups displayed. Thus, the notion that Chinese Americans are not very emotionally expressive and that Mexican Americans are, may only hold true when considering self-reports of emotion and not other channels of emotion such as observable behavior or physiology.
As noted earlier, this study was designed to maximize ecological validity of the empathic accuracy assessment by using a task that more closely resembles the way emotions are communicated in real-life (real-time judgments from naturalistic behavior). We also aimed to use a subject recruitment strategy that insured that participants had adequate exposure to the cultural traditions and other members of the in-group. These strengths notwithstanding, there were also several limitations of the study. These include: (a) the use of college students who may have extensive exposure and acculturation to the out-group cultures, thus lessening a possible in-group advantage; (b) using an empathic accuracy assessment based on assessment of emotional valence and intensity and not discrete emotions (as in previous photo identification studies); (c) not assessing similarity between raters and targets in expressive behavioral responses (only physiological linkage was assessed); and (d) variability related to ethnic groups in targets and conversations (the downside to using naturalistic stimuli).
The last of these points is arguably the most significant limitation of the present study: the potential lack of equivalency among the stimulus conversations. Ideally, the stimulus conversations used would have differed only in the ethnicity of the targets (see Matsumoto, 2002; Beaupre & Hess, 2005). Despite the fact that each conversation selected for the study met exacting selection criteria, there was no way to ensure that the conversations representing each ethnic group were equivalent in all important ways. Given the focus on culture in the present study, two representative stimuli from each cultural group were used. However, due to time constraints and considerations of participant fatigue, each participant only rated one conversation from each ethnic group. A more optimal design would have been to have each participant rate more than one representative from each cultural group, perhaps by using fewer ethnic groups and/or shorter conversations. Devising larger sets of well-matched stimulus conversations will be important for conducting future studies of culture and empathic accuracy.
The question of how culture influences the communication of emotions has a long and rich history in the empirical literature. Evidence has been found in support of both the culture equivalence model of emotion recognition (Boucher & Carlson, 1980; Gitter, Kozel, & Mostofsky, 1972; Kilbride & Yarczower, 1983) and the cultural advantage model (Elfenbein & Ambady, 2002a; Elfenbein & Ambady, 2002b). The present study revisited this issue examining empathic accuracy and physiological linkage using a methodology that more closely approximates the ways that emotions are typically detected in everyday life. Our findings leant no support to the cultural advantage model in empathic accuracy, but did find some support for this model in physiological linkage for Chinese American participants and targets. Extending this research to other samples and other materials would help map out the contexts in which cultural equivalence and cultural advantage are found. The present results suggest that when making dynamic, real-time, judgments of others’ emotions in the kinds of interpersonal contexts where most such judgments occur, we are equally adept at recognizing the emotions of in-group and out-group members. This finding can be construed as hopeful news for increasing cooperation and communication in our increasingly multi-ethnic environments.
This research was supported by National Institute on Aging Grant R37-AG17766 and National Institute on Mental Health Grant MH39895 to Robert W. Levenson.
1Of the 161 total participants, only 117 (73%) were asked about their level of engagement with the conversation as this question was introduced after the first 43 subjects had been run. The percentage of each participant ethnic group that did not answer this question is as follows: African Americans (10%), Chinese Americans (32%), European Americans (50%), and Mexican Americans (17%).
2Identical analyses with participant gender included as a between subjects variable were conducted for each dependent variable and the pattern of findings was unchanged. Therefore, for the ease of presentation all analyses and findings presented will exclude gender as a variable of interest.
3We would like to thank a reviewer for suggesting these additional analyses regarding absolute vs. cultural advantages.
4Degrees of freedom reflect adjustments for violations of underlying assumptions for a particular procedure.
5Rater Ethnicity was not included as a between subject variable because the overall MANOVAs for empathic accuracy revealed that there were no significant differences in accuracy across the four ethnic groups.
Publisher's Disclaimer: The following manuscript is the final accepted manuscript. It has not been subjected to the final copyediting, fact-checking, and proofreading required for formal publication. It is not the definitive, publisher-authenticated version. The American Psychological Association and its Council of Editors disclaim any responsibility or liabilities for errors or omissions of this manuscript version, any version derived from this manuscript by NIH, or other third parties. The published version is available at www.apa.org/pubs/journals/emo.