Humans seek social interactions, from birth onwards. Within only two months after birth, typically developing infants prefer the subtle patterns of contingency in face-to-face interactions, including turn-taking and correlated affect (
Gergely and Watson, 1999;
Murray and Trevarthen, 1985). By 9-months infants are able to follow another person’s gaze to a location outside of their visual field: a key first step in establishing joint attention (for review, Moore 2007). Joint, also called “triadic”, attention provides a platform by which two or more people coordinate and communicate their intentions, desires, emotions, beliefs, and/or knowledge about a third entity (e.g. an object or a common goal) (
Tomasello et al., 2005). Joint attention is distinct from shared attention or mutual gaze, in which two people share attention by looking at each other, rather than coordinating their attention on a third entity. Despite the centrality of contingent responding and joint attention in human social interactions, the neural mechanisms of these key features of social interactions remain understudied.
Previous research has investigated the neural bases of various aspects of social interactions in adults via several approaches: (1) the participant observes a recorded social interaction between two other people in a story, cartoon, or movie (
Iacoboni et al., 2004;
Pierno et al., 2008;
Saxe and Kanwisher, 2003;
Walter et al., 2004) (2) the participant plays an online game with an alleged, but invisible, human partner (
Fukui et al., 2006;
Gallagher et al., 2002;
Kircher et al., 2009;
Rilling et al., 2002;
Rilling et al., 2004) or (3) the participant views a virtual character who shifts gaze towards or away from the participant (
Pelphrey et al., 2004b;
Schilbach et al., 2006). These approaches provide important indications of possible neural mechanisms for social interaction, but are missing key components of everyday social interactions: contingent responding and joint attention.
These two features of social interaction are difficult to examine with functional MRI due to several methodological challenges. The first challenge is common to examining both contingent interaction and joint attention. That is, to create live, face-to-face contact with minimum temporal delay, while at least one of the people is lying inside the bore of a scanner. To address this challenge, we used a dual-video presentation to allow two people to interact face-to-face with minimal temporal delays. A second challenge, which is specific to identifying the neural correlates of a live interaction, is to design a control condition that would capture the visual complexity of a live interaction, thus isolating the social, contingent aspect of the interaction.
To address these challenges, inspired by a paradigm from infant research (
Murray and Trevarthen, 1985), we compared a live social interaction to a recorded video of the same interaction in Experiment 1. By comparing a live interaction to a recording of the same interaction, we controlled for the perception of a person speaking and moving. The key difference between the two conditions is thus contingency and/or self-relevance: only during the live condition are the participant’s and the experimenter’s actions contingent on one another. Thus, the live condition should differentially recruit brain regions that are sensitive to interpreting another person’s actions and speech in a self-relevant context, like an online face-to-face social interaction. We hypothesize that these regions will include those involved in reasoning about another person’s actions or intentions and representing another’s mental state, including dorsal medial prefrontal cortex (dMPC), right posterior superior temporal sulcus (rpSTS), and right temporo-parietal junction (rTPJ) (review,
Saxe 2006). We additionally predicted that contingent interaction would recruit regions involved in attention, or goal-directed tasks, including dorsal anterior cingulate cortex (dACC) (
Dosenbach et al., 2007).
A full understanding of the contributions of multiple brain regions to social interaction will thus require breaking social interaction into its component parts. In Experiment 2, we utilized the same live set-up to isolate the neural bases of one component of a social interaction, namely joint attention. In the “joint attention game”, two players coordinate their visual attention in order to jointly discover a target (joint attention); in the control condition, the two players deployed their attention independently (solo attention). Thus, both conditions involved ‘face-to-face’ interactions, but only one (joint attention) required coordinating attention with another person. Given previous findings of right posterior STS (
Materna et al., 2008) and dorsal medial prefrontal cortex (
Williams et al., 2005) involvement in joint attention, we predicted these regions would selectively recruited during joint attention, as compared to solo attention, trials. Further, as joint attention is a key social component of a live interaction, we predicted that parts of the social brain areas (e.g. pSTS, RTPJ, and dMPFC), but not attention-related areas, identified in Experiment 1 would be selectively recruited during joint as compared to solo attention.