People are remarkably adept at extracting social information from visual scenes. A rapid glance around the room at a dinner party is often all it takes to judge the overall mood and decide whether or not it is time to open up another bottle of wine. Research conducted in the last 15 years has shown that the visual system is highly specialized for the perception of social material, such as bodies, faces, and facial expressions. From these basic social features, people infer important information about the thoughts and intentions of others. However, it seems unlikely that the perception of these elementary social cues alone is sufficient for understanding the subtleties of complex social interactions. Making sense of social interactions requires going beyond the observable data and inferring intentions, beliefs, and desires—in short, attributing mental states (i.e., mentalizing; see
Frith et al. 1991).
Recently, researchers have turned to the question of whether social information is preferentially processed when people view natural visual scenes. Studies of visual attention have demonstrated that observers automatically orient toward social information, such as faces and bodies, when viewing complex scenes and that this bias is detectable as early as the first saccade (
Fletcher-Watson et al. 2008). Moreover, when viewing natural social scenes, observers spend a disproportionate amount of time looking at eyes and faces, almost to the exclusion of other content (
Birmingham et al. 2008a,
2008b). This suggests that when viewing complex scenes involving people, observers spontaneously attempt to make sense of what is happening between characters in the scene by using features such as gaze direction and facial expression. That such inferences might occur spontaneously was hinted at over 60 years ago in the work of
Heider and Simmel (1944). In their study, participants were asked to describe the motion of geometric shapes based on simple animations involving 2 triangles and a circle. One of these animations made it appear as though the triangles and circles were interacting with each other. When participants described what was happening in this condition, they spontaneously developed complex narratives that often described 2 triangles caught in a rivalry over the affections of a small but not unattractive circle. Just as interesting is the description made by one participant, who went to great lengths to describe the animation in purely geometric terms. This subject's attempt ultimately proved futile as, seemingly despite herself, she began referring to one of the triangles in animate terms by describing how “he” is searching around for an opening to escape from inside a rectangle (
Heider and Simmel 1944). This classic study demonstrates how difficult it can be to successfully avoid attributing thoughts and intentions to anything that appears to have them—even 2D triangles and circles.
Over 15 years of neuroimaging studies on social cognition and mental state attribution have shown that inferring mental states is associated with activity in the dorsal medial prefrontal cortex (DMPFC), the anterior temporal poles, the temporal parietal junction (TPJ), and the precuneus. The involvement of these regions in social cognition has been replicated using a multitude of tasks, stimulus types, and imaging modalities (e.g.,
Fletcher et al. 1995;
Castelli et al. 2002;
Saxe and Kanwisher 2003;
Jackson et al. 2006;
Gobbini et al. 2007;
Spreng et al. 2009). In particular,
Mitchell et al. (2002,
2004,
2005) have demonstrated over a range of studies the myriad ways in which the DMPFC is involved in forming impressions and attending to social information about persons.
Despite a wealth of research on the neural systems involved in social cognition, there have been surprisingly few studies examining how these regions are recruited when viewing natural social scenes. This is important because, for instance, people tend not to encounter objects like faces in isolation but rather in social contexts that guide their interpretation of the mental and emotional states of the target (e.g.,
Kim et al. 2004;
Aviezer et al. 2008). Prior work using short video clips of 1 or 2 people has shown that regions involved in mental state attribution exhibit greater activity when participants passively view social interactions versus single person movie clips (
Iacoboni et al. 2004). In a similar vein, these same regions are engaged when passively viewing Heider and Simmel-like social animations (
Gobbini et al. 2007;
Wheatley et al. 2007). Although, the majority of the research on the neural basis of social cognition has examined how people infer intentions in tasks where participants are explicitly instructed to engage in mental state inferences, these and other studies suggest that regions involved in social cognition may be obligatorily recruited by complex social material (e.g.,
Iacoboni et al. 2004;
Gobbini et al. 2007;
Wheatley et al. 2007) or when engaged in tasks that strongly invite mental state reasoning (e.g.,
Spiers and Maguire 2006; Young and Saxe 2009). Less well understood is whether people spontaneously recruit these same brain regions when viewing social scenes even when performing tasks that do not require social cognition and, if so, whether there are individual differences that mediate the degree to which people spontaneously activate these areas.
It seems likely that some individuals may process social information more readily than others. For instance, high-functioning individuals with autism exhibit deficits in both basic social perception and mental state attribution. Indeed, individuals with autism are less likely to look at faces not only when presented in isolation (
Dalton et al. 2005) but also when part of a social scene (
Klin et al. 2002). Moreover, neuroimaging evidence suggests that, compared with controls, those with autism underrecruit the DMPFC when viewing social animations that are similar to those used by Heider and Simmel (
Castelli et al. 2002). Research by Baron-Cohen and colleagues has examined individual differences in traits that are related to autism. The empathizing quotient is a self-report measure of an individual's propensity for engaging in both emotional empathy (i.e., feeling the pain of others) and mentalizing (i.e., inferring the thoughts and intentions of others) and has been shown to reliably discriminate between healthy participants and high-functioning individuals with autism (
Baron-Cohen et al. 2003;
Baron-Cohen and Wheelwright 2004). Similarly, behavioral research using the Autism Quotient, a measure related to the empathizing quotient and sharing many of the same questions, found that individual differences in autistic traits in the normal population are negatively correlated with the ability to infer dynamic changes in emotional states in others (
Bartz et al. 2010).
In the present study, we sought to investigate two understudied aspects of the neural basis of social cognition. First, we examined whether regions involved in making explicit mental state attributions are spontaneously recruited when people view socially complex scenes. Second, we examined whether brain activity in these areas is related to individual differences in trait empathizing. To do so, we recruited 48 male participants who were prescreened with a measure of the empathizing quotient (
Baron-Cohen and Wheelwright 2004) and selected for participation in order to maximally represent the range of scale scores. The empathizing quotient was selected over other commonly used measures of empathy (i.e., the Interpersonal Reactivity Index) because of its demonstrated ability to discriminate between individuals with high-functioning autism and healthy controls (
Lombardo et al. 2007). Moreover, the empathizing quotient is one of the few measures designed specifically to measure not only emotional empathy but also individual differences in mentalizing. We note that because a primary aim of this study was to investigate individual differences, we recruited a comparatively large sample of participants. This was motivated by recent reports suggesting that common sample sizes in neuroimaging are significantly underpowered when it comes to detecting even strong effects in correlational designs (i.e.,
Yarkoni 2009).
During functional neuroimaging, participants completed a simple categorization task in which they classified 4 types of visual scenes (animal, vegetable, mineral, and human social scenes) as belonging either to the animal, vegetable, or mineral categories. This categorization task served 2 purposes: First, it ensured that participants were alert and attending to the stimuli and second, it provided a plausible cover story (i.e., examining how people play the 20 questions game, also sometimes called the “animal, vegetable, or mineral?” game) which ensured that participants would be unlikely to infer the social nature of the task. We predicted that viewing social scenes would recruit regions involved in mental state attribution (e.g., DMPFC, temporal poles) and that individual differences in empathizing would correlate with activity in these regions when viewing social scenes.