|Home | About | Journals | Submit | Contact Us | Français|
We know a great deal about the neurophysiological mechanisms supporting instrumental actions, i.e., actions designed to alter the physical state of the environment. In contrast, little is known about our ability to select communicative actions, i.e., actions directly designed to modify the mental state of another agent. We have recently provided novel empirical evidence for a mechanism in which a communicator selects his actions on the basis of a prediction of the communicative intentions that an addressee is most likely to attribute to those actions. The main novelty of those findings was that this prediction of intention recognition is cerebrally implemented within the intention recognition system of the communicator, is modulated by the ambiguity in meaning of the communicative acts, and not by their sensorimotor complexity. The characteristics of this predictive mechanism support the notion that human communicative abilities are distinct from both sensorimotor and linguistic processes.
In very general terms, animal communication exploits the correlations between signals and the contexts that induce them (Dawkins and Krebs, 1978; Owings and Morton, 1998). Several organisms can extract information from the behavior of other agents, i.e., behavior that inadvertently provides information to onlookers (Danchin et al., 2004; Oates et al., 2010). One popular example, among many others, is given by the calls of vervet monkeys and baboons (Rendall et al., 2000; Seyfarth et al., 1980). It has been shown that they function as contact and alarm calls because eavesdroppers learn to infer location and alarm state of the caller (Cheney and Seyfarth, 2007). The calls are not intentional in the sense that the caller does not seem to predict the context-dependent consequences of the interpretation process occurring in listeners (Rendall et al., 2000). This type of inadvertent communicative behavior appears to rely on fixed associations between sensorimotor events and their communicative implications, i.e., one-to-one matches between physical and semantic properties of an action (Maynard Smith and Harper, 1995). This communicative phenomenon is widespread in the animal world (Bargh et al., 1996; Dyer, 2002; Hauser, 1996; Mather, 2004), and it could be supported by “mirror” neurons that couple perceived and executed behaviors (Prather et al., 2008). This paper is concerned with a quite different type of communicative behavior, i.e., behaviors whose sole purpose is to have their intentions recognized by another agent (Grice, 1957). We label this type of behavior intentional communication, to distinguish it from the incidental communication described above. Intentional communicative actions, arguably the last major evolutionary transition in the way information is propagated across biological systems (Szathmary and Smith, 1995), appears to build on the ability to use cognitive variables (e.g., mental states like goals and desires) to disambiguate and predict behavior of other agents (Byrne and Bates, 2006; Frith, 2007), a useful tool when faced with the challenge of coping with the mind of other cognitive agents (Humphrey, 1976). There is a long research tradition on how we could recognize intentions in behaviors (Frith, 2006; Heider, 1958; Kelley, 1973; Kruglanski, 1975; Malle et al., 2001; Mcarthur, 1972; Nichols and Stich, 2003), and it has become clear that this ability is already computationally complex even under extremely simplified experimental scenarios (Baker et al., 2009; van Rooij et al., 2008; Yoshida et al., 2008). Intentional communication goes well beyond those scenarios, requiring the ability to signal ones’ own intentions, i.e., to select which of an indefinite number of possible behaviors is most likely to be interpreted by a recipient as conveying a particular communicative intention, given the current commonly known knowledge [common ground, (Clark, 1996)]. Yet, some scholars have ignored this complexity, trying to reduce human communication to the coding-decoding of a conventional message (Rizzolatti and Craighero, 2007; Shannon, 1948; Tognoli et al., 2007). Even if this exhausted the nature of human communication, this account would not explain how we could build such conventions in the first place. Yet children do it, without prior knowledge of those conventions, up to the point of inventing new languages when deprived of a pre-existing one (Goldin-Meadow, 2003). In fact, all human communication rests on an inferential base, on top of which the coded message rides, otherwise ironies, sarcasms, hints, and indirections would pass us by. Nor are we troubled by the vagueness or multiple ambiguities and semantic generalities in every utterance (Levinson, 2000; Sperber and Wilson, 2001). The same system that resolves the coded messages probably lies behind our ability to communicate without any pre-existing conventions at all, as in the gestures one might use behind the boss’ back, or to signal to others out of earshot. A number of converging paths of evidence suggest that this faculty is distinct from our language abilities, and is ontogenetically and phylogenetically prior to language (Levinson, 2006), yet at the same time constitutes the foundation for effective language use. Here we illustrate some of the complexities inherent in studying the cognitive and cerebral bases of this faculty, focusing on one of the first neurophysiological studies dealing with the production of human communicative actions (Noordzij et al., 2009), and delineating research lines opened up by those results (see also (de Ruiter et al., 2010; Newman-Norlund et al., 2009; Willems et al., 2010) and http://www.frontiersin.org/human_neuroscience/specialtopics/understanding_human_intentiona/64.
In order to understand our capacity to generate and interpret communicative actions, it might be useful to consider how it could emerge from the combination of simpler and experimentally tractable cognitive processes, using the layout suggested by communicative skills we share with different taxa along our line of descent (Byrne, 1995; Herrmann et al., 2007). For instance, human communicators share with other vertebrates the inferential mechanisms involved in identifying other biological agents from dynamic sensory stimuli (Blake, 1993; Oram and Perrett, 1996; Vallortigara et al., 2005). Well-studied examples are the ability to infer visual patterns of limb articulation from point-light displays (Puce and Perrett, 2003) or acoustic patterns of vocal identities from non-linguistic sounds (Gervais et al., 2004; Ghazanfar et al., 2001). Being so basic and evolutionarily preserved, it is tempting to speculate that developmental alterations in these perceptual mechanisms, as found in Autism Spectrum Disorders patients (Dakin and Frith, 2005; Zilbovicius et al., 2006), might have serious consequences for human social behavior. Both human psychophysics and computational models have suggested that perception of biological motion relies on extracting sequences of postures on the basis of dynamic form templates (Beintema and Lappe, 2002; Giese and Poggio, 2003; Vaina and Gross, 2004). This process can then be used for action segmentation (Baird and Baldwin, 2001; Baldwin and Baird, 2001). Accordingly, it appears relevant to test whether the basic inferential mechanisms involved in identifying other biological agents could also provide a viable computational substrate for parsing a string of movements into communicative versus instrumental segments, and whether these mechanisms are used for both perception and selection of communicative actions.
Even if these basic perceptual processes could support the parsing problem faced by a communicator (i.e., “How can I mark the communicative elements of an action within a continuous stream of instrumental actions so they can be parsed by a recipient”), the brain still needs to resolve the mutual dependencies between that parsing problem and the meaning-mapping problem (i.e., “Given an intended meaning, which action would have the best chance of being correctly interpreted as an attempt to communicate that particular intended meaning?”). In this context, it might be relevant to move beyond basic perceptual mechanisms, and consider a different level of analysis, namely the ability to deal with mental states (Frith and Frith, 2006). To illustrate some of the complexities inherent in this level of analysis, imagine a customer in a bar grasping an empty glass from a table and raising it in the air, while looking at the bartender. Raising the glass for ordering a drink is a motorically simple action that belies the complex cognitive structures that underlie it, not to mention the largely unknown neurophysiological mechanisms supporting them. First, the customer needs to build a conceptual model of the bartender [including a Theory of Mind (Call and Tomasello, 2008; Premack and Woodruff, 1978)], assuming for instance that the person behind the bar-counter is willing to serve drinks on demand, at that particular time of the day, remembering previous orders, etc. Second, the customer needs to keep online this conceptual model of the bartender, without confusing it with his own factual knowledge. For instance, when looking at the bartender, the customer should not confuse the momentary lack of vision of his own raised hand with the clear line of sight between the bartender gaze and the empty glass. Third, the customer needs to keep those pieces of knowledge separate from another crucial element, his knowledge of what he and the bartender presumably and mutually know and believe (Clark, 1996). For instance, the customer needs to keep into account that both customer and bartender are informed about the glass being empty (so that the glass-raising action cannot be confused with toasting). This feat of representational capacity, an example of a third-order intentional system (Dennett, 1987), is arguably the simplest system that could support genuine intentional communication (Grice, 1957), and it involves a three-way relation between sender, receiver, and behavior (Baron-Cohen, 1995; Tomasello and Carpenter, 2005). The crucial point here is that, for intentional communication to occur, the glass-raising movement needs to be processed together with the mental structures it is designed to evoke in the communicators. Put differently, the meaning conveyed by the raised glass is not an intrinsic property of that action, as implied by some recent accounts of human communication (Iacoboni et al., 2005). Without denying the relevance of simpler forms of communication, like imperative pointing (Leavens et al., 2010; Tomasello, 2006), third-order intentional systems raise the issue of how our brain manages to integrate these time-varying and hierarchical relations between observable, planned, and un-observable events. How can the human brain solve the context-dependency of communicative behaviors without collapsing under the astronomical computational demands entailed by this type of problem (van Rooij et al., 2010 under revision)? How are these processes, and in particular the considerable cognitive control they entail, influenced by motivational drives toward prosocial behavior (Hrdy, 2009; Roelofs et al., 2009)? Is the manipulation of these mental structures supported by dedicated cerebral circuits as hypothesized for other social constructs (Adolphs, 2009), or is it an instance of our ability to guide first-person behavior on the basis of mental models and future goals (Behrens et al., 2009)?
We have started to address some of the issues introduced above in a recent report (Noordzij et al., 2009), using an experimental situation in which people have to communicate in the absence of an a priori common code (de Ruiter et al., 2010). We used this apparently artificial scenario in order to give emphasis to those mechanisms that create new communicative behaviors, rather than the utilization of existing conventions. The rationale was that in the absence of an a priori common code, the selection of an effective communicative behavior needs to rely on some heuristics that constrain a potentially infinite search-space (Levinson, 1995). The study is based on the insight that a communicator could solve this problem by predicting how a particular addressee will interpret a given behavior. We have tested the nature of this prediction, assessing whether the sender of a signal uses his own intention recognition system to predict the intention recognition performed by the addressee (or receiver). A neurophysiological test of this hypothesis involves the ability to directly compare cerebral responses supporting both the production and the comprehension of communicative actions. Accordingly, we have measured behavioral and cerebral responses (with functional magnetic resonance imaging, fMRI) in subject pairs engaged in communicative exchange. In contrast to recent studies that have addressed human communicative abilities using pre-existing communicative conventions (Emmorey et al., 2010; Green et al., 2009; Schippers et al., 2009; Straube et al., 2010; Walter et al., 2004), here we prevented the participants from using those conventions, forcing them to generate and interpret new communicative visuomotor behaviors (see also (Galantucci, 2005; Scott-Phillips et al., 2009; Selten and Warglien, 2007)).
Subject pairs played the Tacit Communication Game (TCG, (de Ruiter et al., 2010)). In the TCG two players (called the sender and the receiver) are told they have to re-create a spatial configuration of two simple geometrical objects (one for the sender and one for the receiver) on a game board. The crucial manipulations are that the sender initially sees the spatial configuration, while the receiver doesn't, and that the sender consequently has to communicate the position and orientation of the object of the receiver by means of moving his own object. Similar to Galantucci's study (Galantucci, 2005), we force people to communicate with an unconventional communicative tool (i.e., moving a simple geometrical shape), thereby creating a situation without pre-existing communicative conventions (Figure (Figure1).1). Using an event-related fMRI design, we can isolate cerebral activity evoked when planning a communicative action (Figure (Figure1:1: phase 2, sender column), and when interpreting the meaning of that action (Figure (Figure1:1: phase 4. receiver column).
The results indicate that planning communicative actions (by a sender) and recognizing the communicative intention of the same actions (by a receiver) relies on spatially overlapping portions of their brains. These cerebral responses, in both sender and receiver, were localized in the right posterior superior temporal sulcus (pSTS, Figures Figures1A,D),1A,D), a region previously associated with attribution of intention (Castelli et al., 2000; Saxe et al., 2004), and they were independent from sensory inputs and motor outputs (Figures (Figures1F,H).1F,H). The response profile of the pSTS points to a contribution confined to planning and comprehension of communicative actions, being absent during the speeded execution of those communicative actions. It remains to be seen whether the pSTS response is driven by the incorporation of a model of the communicative partner into the action parsing procedures.
These findings support the hypothesis that, in humans, planning communicative acts relies on a conceptual prediction of the intention recognition in a receiver, rather than on simulation mechanisms based on the sender's sensorimotor system (Toni et al., 2008). By studying human intentional communication within an experimental context that respects its complexities, it has become clear that the supporting cerebral infrastructure does not overlap with the “mirror neurons” thought to provide an account of intersubjectivity (Gallese and Goldman, 1998; Rizzolatti and Arbib, 1998). The predictive mechanism discussed here appears to have general relevance, playing also a role in ordinary language use, by disambiguating and elaborating inferred intent (as in Grice's theory of meaningNN (Grice, 1957)) through interactions with a culturally modulated “mentalizing” system involved in the generation of social constructs (Frith and Frith, 2006; Markus and Kitayama, 1991; Miller, 1984; Mitchell et al., 2006). Future research could explore whether and how this Gricean mechanism might have supported the communicative infrastructure that has been crucial to language development in both ontogeny and phylogeny (Levinson, 2006).
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The present study was supported by the EU-Project “Joint Action Science and Technology” (IST-FP6-003747) and a VICI grant (#453-08-002) from NWO to Ivan Toni.
Ivan Toni investigates the integration of rules, percepts, and social constructs into the motor system, applying neurophysiological techniques to healthy and pathological human brains. He studied Biology in Bologna, before joining the Institute of Human Physiology (Parma, Italy) and INSERM Unité 94 (Lyon, France) for his PhD in Neuroscience. After working with Dick Passingham at the Functional Imaging Laboratory (London, UK), and with Karl Zilles (Jülich, Germany), he joined the Donders Institute in Nijmegen (Netherlands) as principal investigator of the Intention and Action Research Group.