|Home | About | Journals | Submit | Contact Us | Français|
During social interactions, people’s eyes convey a wealth of information about their direction of attention and their emotional and mental states. This review aims to provide a comprehensive overview of past and current research into the perception of gaze behavior and its effect on the observer. This encompasses the perception of gaze direction and its influence on perception of the other person, as well as gaze-following behavior such as joint attention, in infant, adult, and clinical populations. Particular focus is given to the gaze-cueing paradigm that has been used to investigate the mechanisms of joint attention. The contribution of this paradigm has been significant and will likely continue to advance knowledge across diverse fields within psychology and neuroscience.
The “language of the eyes” is a rich and varied vocabulary. The eyes and their highly expressive surrounding region can communicate complex mental states such as emotions, beliefs, and desires. This review is inspired by one aspect of gaze perception: the use of perceived gaze direction to shift visual attention, that is, the seemingly automatic propensity to orient to the same object that other people are looking at. This joint attention has been studied in infants for decades (e.g., Farroni, Massaccesi, Pividori, & Johnson, 2004; Scaife & Bruner, 1975). Recently, interest has been engaged in describing the mechanisms of attention underlying this feature of social interaction in adults as well as infants. In doing so, visual attention, a classic area of cognitive research, has been reinvigorated with the use of ecologically valid stimuli (i.e., eyes). Equally, research into joint attention has benefited from the use of cognitive spatial cueing paradigms. The fusion of these two domains has allowed perhaps one of the most interesting fields of research in social cognition to emerge. This review will focus on the contribution of gaze-cueing research to our knowledge about social cognition and attention as well as the complex processes that allow these mechanisms to interact so dynamically.
Although the primary focus of this review is the effect of perceived gaze direction on attention, we also aim to give an overview of the gaze perception abilities of humans and to look at the effects that averted gaze and gaze contact can have on people’s perception of the individuals they interact with. We note that, with regard to these issues, models such as Perrett’s description of how gaze, head, and body position cues are integrated in the neural responses of superior temporal sulcus (STS; Perrett, Hietanen, Oram, & Benson, 1992) and Baron-Cohen’s idea that some aspects of gaze perception are performed by innate modules (an Eye Direction Detector; Baron-Cohen, 1995) have been reviewed more extensively elsewhere (e.g., Emery, 2000; Langton, Watt, & Bruce, 2000), as has the development of joint attention and its neural basis (e.g., Allison, Puce, & McCarthy, 2000; Moore & Dunham, 1995). Our discussion of these issues is intended to provide a context in which we can then place the new evidence from gaze-cueing studies. We aim to address the following issues in recent and ongoing research into gaze cueing: What can gaze cueing tell us about the mechanisms of visual attention? What can gaze cueing of attention tell us about social interactions? What can the attention literature on gaze cueing say about the processing of gaze in development and in clinical populations? What can gaze cueing reveal about the origins and neural bases of symbolic attention cueing? Tackling these questions as well as ideas for how this field might succeed in furthering knowledge in these areas are the major objectives of this review.
The eyes fascinate us from the day we are born. The human neonatal visual system, although underdeveloped, is efficient at distinguishing these stimuli from others. The infant will come to find that those two oval shapes can be used to gain otherwise inaccessible information, to learn the names for objects in the world, and to ultimately unlock the secrets of other minds. Over the years into adulthood, the pervasive influence of the eyes and social gaze will continue, leading to ever more elaborate skills in engaging in collaboration, deception, and inference of intentions and mental states. Even though the linguistically adept adult possesses many other skills to aid navigation of the social world, reliance on eye-gaze perception to guide and interpret social behavior remains a central facet of social interactions throughout life.
In this section on how gaze perception is achieved, we look at the development of gaze perception in the infant as well as the effect that eye contact can have on the observer and briefly discuss the gaze perception abilities of nonhuman primates. We also examine the mechanisms of joint attention in infants and clinical populations. First, though, we provide an overview of the neural basis of gaze perception, as the neural mechanisms underpinning such behaviors as joint attention will be a recurring theme throughout this review.
The perception of direct and averted gaze has been investigated extensively with brain imaging techniques and electrophysiology in humans as well as nonhuman primates. The following brief overview of the neural substrates of gaze perception is by no means exhaustive. It should serve as a reference frame in which to place some neuropsychological and neuroimaging findings that will be reported in subsequent sections of this article. More extensive reviews of the neural architecture of gaze perception can be found elsewhere (e.g., Allison et al., 2000; Emery, 2000; Grosbras, Laird, & Paus, 2005; Hooker et al., 2003; Jellema, Baker, Wicker, & Perrett, 2000; Perrett et al., 1992).
A central component of the neural system for social perception is the cortical region within and near the STS. The STS is responsive to movements of the hands and body, as well as the eyes and the mouth, and therefore is supposed to code biological motion (Bonda, Petrides, Ostry, & Evans, 1996; Oram & Perrett, 1994; Pelphrey, Morris, Michelich, Allison, & McCarthy, 2005; Puce, Allison, Bentin, Gore, & McCarthy, 1998). However, this region is also activated by static images of different postures of the face and body. Cells in the macaque STS are sensitive to different orientations of another’s head and eyes. Although many cells are most responsive to the combined direction of head and gaze (i.e., frontal view of the face with eye contact or profile view with averted gaze), others are tuned independently to body, head, and gaze information (Perrett et al., 1990; Perrett, Smith, Potter, et al., 1985; Wachsmuth, Oram, & Perrett, 1994). Perrett et al. (1992) suggested that such view selectivity could be used to infer the direction of attention of another individual under a variety of viewing conditions. Jellema et al. (2000) supported this idea with their finding that the response magnitude of a subset of cells in STS that are sensitive to reaching movements of an arm can be influenced by the apparent direction of attention (as indicated by gaze and/or head orientation) of the agent performing the action. They proposed that this brain area, which is specialized in processing the orientation of faces in general and eye gaze in particular, is part of a distributed network that allows the observer to determine another person’s intentions.
There is evidence from behavioral paradigms that humans also have neurons that code for specific gaze directions (i.e., left vs. right) rather than simply distinguish between direct and averted eye gaze (R. Jenkins, Beaver, & Calder, 2006; Seyama, 2006; Seyama & Nagayama, 2006). In an adaptation paradigm, observers were exposed to several presentations of a particular gaze direction (e.g., right) and then presented with a test stimulus. There was a strong tendency to erroneously judge test stimuli with eyes that were deviated in the adapted direction as looking straight ahead. This is presumably because cells coding for rightward gaze become habituated, with reduced responding relative to cells encoding leftward gaze direction. These effects were not due to low-level stimulus properties as they survived across changes in size and orientation (R. Jenkins et al., 2006; Seyama & Nagayama, 2006). This observation is also reflected in dedicated neural activity responsive to left- and rightward gaze (Calder et al., 2007).
As with macaques, the human brain region that is responsive to perceived gaze direction is the STS area, with both dynamic (Hooker et al., 2003; Puce et al., 1998) and static face displays (Hoffman & Haxby, 2000).1 This activation does not appear to depend on the presence of a face per se because averted eyes viewed in isolation are sufficient to modulate brain activity (Puce, Smith, & Allison, 2000). In addition, the STS is more responsive to eye movements that provide meaningful directional information compared with other gaze shifts (e.g., cross-eyed; Hooker et al., 2003). It is interesting to note that neural activity in response to faces with deviated gaze is modulated depending on whether the gaze is directed toward an object or toward empty space (Pelphrey, Singerman, Allison, & McCarthy, 2003). This implies that gaze processing is influenced by the perceived goal of the action and therefore is context sensitive. Viewing faces with direct and, in particular, averted gaze compared with those with eyes closed activates some of the same brain areas that are involved in tasks that require the attribution of other people’s intentions and beliefs (Calder et al., 2002; Castelli, Frith, Happé, & Frith, 2002). These findings are in line with Baron-Cohen’s (1995) proposal that encoding of another’s eye-gaze direction is an integral part of a theory of mind.
The STS is part of a wider network for social perception that embodies other aspects of face perception, including the processing of face identity. Haxby and colleagues (Haxby, Hoffman, & Gobbini, 2002; Haxby et al., 1999; Hoffman & Haxby, 2000) proposed that these different functions (i.e., encoding of face identity and of face properties that are important for social communication such as gaze perception) are distinct cognitive aspects of face perception that are also anatomically dissociable, taking place in lateral fusiform gyrus and STS, respectively (see also Hasselmo, Rolls, & Baylis, 1989; Kanwisher, McDermott, & Chun, 1997; McCarthy, Puce, Gore, & Allison, 1997). For example, when participants are instructed to attend to the identity of face stimuli, a stronger response is evoked in the fusiform gyrus than in STS. When the task requires attention to the direction of gaze of a face, the STS region is activated more strongly than is the fusiform gyrus (Hoffman & Haxby, 2000). This anatomical distinction lends support to the idea that aspects of face perception that are changeable, are communicative, and therefore require continual on-line monitoring (e.g., emotional expression, eye gaze) are processed in functionally separate systems to those that involve analysis of invariant features (e.g., identity and gender; Andrews & Ewbank, 2004; Bruce & Young, 1986; Haxby et al., 2002; Hoffman & Haxby, 2000; but see Calder & Young, 2005, for arguments that suggest that the issue is not by any means settled).
Further input-output connections from the STS project to the amygdala, a structure of the limbic system that is heavily implicated in the processing of the emotional content of stimuli, including facial expressions, and in linking this information to emotional responses in the observer (Aggleton, 1993; Aggleton, Burton, & Passingham, 1980; Thomas et al., 2001). Lesions of the amygdala result in deficits in judgments of both gaze direction and facial expression (Aggleton, 1993; Young et al., 1995), suggesting that it plays a critical role in both tasks. The role of the amygdala in gaze monitoring has been highlighted by several recent functional neuroimaging studies, which showed that amygdala activity occurs in response to passive viewing of direct and averted gaze (Wicker, Michel, Henaff, & Decety, 1998) as well as when active detection of eye contact versus deviated gaze is required (Kawashima et al., 1999), even when the face holds a neutral emotional expression. A further study by Hooker and colleagues (2003) suggested that amygdala response to observed gaze reflects the observer’s monitoring for emotional gaze events (e.g., eye contact; see also Whalen, 1998).
The STS is also heavily connected with the parietal cortex, which is implicated in orienting of attention (Harries & Perrett, 1991; Rafal, 1996). Specifically, there are reciprocal connections between STS and the intraparietal sulcus (IPS), an area that is associated with spatial processing and covert shifts of attention (Corbetta, Miezin, Shulman, & Petersen, 1993; Nobre et al., 1997). Via these connections, information about eye-gaze direction could project to spatial attention systems to initiate orienting of attention in the corresponding direction, as in joint attention. Indeed, passive viewing of a face with averted gaze elicits a stronger response in the IPS than viewing a face with direct gaze (Hoffman & Haxby, 2000). In addition, activity in the STS and fusiform area is correlated with activity in the IPS when a face with deviated gaze is seen (George, Driver, & Dolan, 2001; Pelphrey et al., 2003; see also Wicker et al., 1998). The relation between gaze perception and spatial attention is apparent in recent behavioral studies that demonstrated that perceived gaze direction can trigger reflexive attention shifts in the corresponding direction in the observer.
A wide range of species, from black iguanas and hog-nosed snakes to nonhuman primates and humans, have a very accurate ability to determine whether they are being looked at (e.g., Burger, Gochfeld, & Murray, 1992; Burghardt & Greene, 1990; Perrett & Mistlin, 1991). The unique morphology of the human eye means that “useful information can be recovered from it with robust simple processing mechanisms” (Langton et al., 2000, p. 52). Compared with other primates, humans have a relatively small dark region (the pupil and iris) and large regions of white sclera to either side of the iris (Kobayashi & Kohshima, 1997). This makes the discrimination of gaze direction much easier in humans than in other animals. J. J. Gibson and Pick (1963) showed that the participant’s threshold for accepting truly deviated gaze as direct gaze is just 2.8°. However, the accuracy of discerning deviated gaze declines as the angle of gaze becomes smaller (e.g., 98% correct at 10° compared with 71% correct at 5°; R. Jenkins et al., 2006). Nonhuman primates such as adult rhesus monkeys can discriminate between photographs depicting direct gaze and gaze averted by 5°, the same ability that has been reported in human infants (Campbell, Heywood, Cowey, Regard, & Landis, 1990; Symons, Hains, & Muir, 1998).
When participants observe eyes with inverted polarity (i.e., dark sclera, light iris), gaze perception is severely disrupted (Ricciardelli, Baylis, & Driver, 2000; Sinha, 2000). That is, participants fail to report that the face is looking in the direction of the (now white) pupil, which is the same shape and size as before, but often report the direction of the (now dark) sclera (see Figure 1, compare Panels A and B). Ando (2002) showed that a similar effect is found when participants are presented with a face with direct gaze but either the left or right section of sclera is presented as grey (Figure 1, Panel C). However, as Ando (2002) noted, the geometry of the eye must also be important because an eye represented only by the outlines of an oval and a circle, with no luminance contrast at all, is sufficient to determine gaze direction (see Figure 1, Panel D). Ando suggested that both luminance and geometry may be important but that their processing demands may mean that luminance is computed quickly for a raw representation of gaze direction, whereas a geometric analysis of gaze direction is more resource consuming and more vulnerable to noise.
Higher level factors also influence where we think someone is looking. For example, people usually look at objects rather than empty space. Lobmaier, Fischer, and Schwaninger (2006) demonstrated this flexibility in the system by showing that the presence of an object near the line of observed sight causes the perceived gaze direction to gravitate toward the object. Hence, assumptions about where people tend to look override pure perceptual geometry (see also Todorovic, 2006, for further evidence that geometry cannot fully explain gaze perception). The perception of gaze direction can further be influenced by its face context. For instance, although people are generally highly accurate in assessing whether someone else’s gaze is directed at them, these judgments tend to err when the observed face is not oriented toward the observer but at an angle to the left or right (Anstis, Mayhew, & Morley, 1969; Cline, 1967; J. J. Gibson & Pick, 1963). Perception of gaze direction is also impaired when the face is inverted, a manipulation thought to disrupt holistic or configural processing (e.g., Vecera & Johnson, 1995; Yin, 1969). However, inverting the eye region alone (independently of the face context) impairs judgments of gaze direction, too, suggesting that configural processing of the eye region itself rather than, or perhaps in addition to, configural processing of the face as a whole contributes to perception of gaze direction (J. Jenkins & Langton, 2003).
Eye contact has profound effects on the receiver (e.g., Kleinke, 1986). Indeed, the ability to discriminate between direct and averted gaze that is found across different species may have evolved because direct gaze can signal that a predator is attending, making its detection an important tool for survival (Emery, 2000). Many animal species respond to the presence of staring eyes with displays of fear and submission, indicating that such stimuli act as warning cues (Gallup, Cummings, & Nash, 1972; Hennig, 1977; Ristau, 1991; Schwab & Huber, 2006; see also Beausoleil, Stafford, & Mellor, 2006). In humans, prolonged eye contact can also be perceived as an aggressive approach signal, as it leads to increases in galvanic skin response, as compared with observation of averted eye gaze in adults (Nichols & Champness, 1971).
Establishing eye contact also acts as a signal of attraction between people. For example, recent research has shown that when a person is seen to move their eyes to engage in eye contact, they are perceived as more likable and attractive than if they are seen to disengage eye contact (Mason, Tatkow, & Macrae, 2005). Mason et al. (2005) further found that the (female) faces shifting their gaze toward the observer were rated as more attractive by male participants but not by female participants (see also Vuilleumier, George, Lister, Armony, & Driver, 2005). A study by Jones, DeBruine, Little, Conway, & Feinberg (2006) showed that this kind of effect is modulated by emotional expression. That is, faces looking at you are perceived as more attractive when smiling than when holding a neutral expression, whereas faces looking away from you are less attractive when smiling than when holding a neutral expression. This not only shows that eye contact influences perception of another’s attractiveness but also that this effect is modulated by the social context of the judgment, that is, the perceived relationship between the observer and the observed party.
It is not surprising that people are highly sensitive to being attended to (i.e., gazed at) by others. The subjective feeling of being “looked at” is a common experience, suggesting that people may have a predisposition to the detection of the gaze of others. Such a predisposition may be supported by a dedicated module, an “eye direction detector,” for example (Baron-Cohen, 1995). During visual search, eyes that are looking at the observer are found more efficiently than are eyes that are looking elsewhere, suggesting that attention is guided or captured by direct gaze (Conty, Tijus, Hugueville, Coelho, & George, 2006; Senju, Hasegawa, & Tojo, 2005; von Grünau & Anston, 1995). Senju & Hasegawa (2005) showed that detection of peripherally presented targets was delayed when a central face stimulus was gazing ahead compared with when its gaze was averted or the eyes were closed. When a temporal gap was introduced between the offset of the face stimulus and the appearance of the target, this delay was no longer evident. Thus, it appears that direct gaze both captures attention and delays disengagement of attention from the face stimulus.
Seeing a face with direct gaze engages the observer’s attention, perhaps because of the social significance conveyed by eye contact (see Baron-Cohen, 1995). Indeed, activity in the fusiform area is enhanced when the observed face is looking at the observer compared with when its gaze is directed away from the observer, indicating that it receives preferential processing in the former condition (George et al., 2001). It is known that attended faces elicit stronger fusiform activity than do faces that are presented outside the focus of attention (Wojciulik, Kanwisher, & Driver, 1998). Thus, the differential fusiform activation observed by George et al. (2001) likely reflects attentional modulation. The notion that direct gaze facilitates processing of the observed face is supported by behavioral evidence showing improved performance on gender categorization and face recognition tasks when the face stimuli display direct rather than averted gaze (Hood, Macrae, Cole-Davies, & Dias, 2003; Macrae, Hood, Milne, Rowe, & Mason, 2002; Mason, Hood, & Macrae, 2004; see also Smith, Hood, & Hector, 2006, for a similar finding in children). Furthermore, when an approaching person is seen to suddenly shift gaze toward the participant, greater STS activity is elicited as compared with that when the person disengages mutual gaze (Pelphrey, Viola, & McCarthy, 2004). Similar effects have been found in the superior temporal gyrus, where greater activity is shown while making judgments of emotional expression when the face is making eye contact (Wicker, Perrett, Baron-Cohen, & Decety, 2003).
As well as modulating affective and neurophysiological responses to “lookers,” being looked at can influence seemingly unrelated behavior, even when the eyes are simple “eyespots” on a computer screen. Haley and Fessler (2005) showed that in an experimental economic game in which participants could choose whether to share money with their fellow participants, more money was shared by participants whose irrelevant computer backdrop contained schematic eye stimuli compared with a backdrop with no eyes present. In this situation, increased economic fairness and collaboration presumably resulted from the feeling of being watched.
As already noted, a strikingly strong sensitivity to eye gaze is observed from birth (Farroni, Csibra, Simion, & Johnson, 2002), and this sensitivity is key to the development of social cognition in early life (see Striano & Reid, 2006, for a recent review). The behavior of joint attention may emerge because of the activation of an innate module attuned to the visual appearance of eyes (Baron-Cohen, 1995) or because of a less dedicated mechanism that develops from associating high contrast black stimuli (pupils) with interesting objects and reward (Moore & Corkum, 1994). Young infants smile more at faces with visible eyes (Spitz & Wolf, 1946, as cited in Argyle & Cook, 1976) and even neonates prefer to gaze at a face with the eyes visible (Batki, Baron-Cohen, Wheelwright, Connellan, & Ahluwalia, 2000; Farroni et al., 2002). Evidence for enhanced processing of direct gaze, as compared with averted gaze, also comes from event-related potentials recorded from 4-month-olds (Farroni et al., 2002; Farroni, Johnson, & Csibra, 2004). Around this age, this sensitivity to direct gaze results in deeper face processing (Farroni, Massaccesi, Menon, & Johnson, 2007), just as it does in adults (Hood et al., 2003). At 5 months, infants can already discriminate between very small horizontal deviations (5°) of eye gaze (Symons et al., 1998). However, the ability to explicitly determine whether an adult is making eye contact, or where an adult is looking, may not develop until the age of 3 years (Doherty & Anderson, 1999).
Gaze following is the behavior of central relevance to this review. Orienting one’s own attention (overtly, through eye movements or head turns, or covertly through a shift of spatial attention) to the direction of another’s gaze has been the subject of intense research in infant development. Scaife and Bruner (1975) found that infants reliably follow caregivers’ head turns within the 1st year of life, whereas Hood, Willen, and Driver (1998) showed that observing shifts of eye gaze by a face presented on a computer screen resulted in facilitated saccades to the direction of gaze in infants as young as 3 months old. The possibility that even neonates can follow gaze has been suggested by one study, as long as the pupils are seen to move (Farroni, Massaccesi, et al., 2004). Clearly, the capacity to use another person’s eye gaze as a cue to attention develops very early in life.
Despite its early emergence, the use of gaze cues by young infants seems to be based on rather low-level factors. As noted by Farroni, Massaccesi, et al. (2004), neonates must see the eyes move to be able to follow the gaze of another person. That is, if the eyes are seen statically looking left or right, joint attention is not established. Therefore, the depth to which the infant understands another person’s gaze behavior is unclear (see Butterworth, 1991; Moore & Corkum, 1994, for reviews). Further evidence for “naïve” gaze following that really relies on following motion comes from Moore, Angelopoulos, and Bennett (1997), who found that 9-month-old infants who had already developed gaze following could follow gaze on the basis of the observation of static stimuli. However, infants who had not yet developed spontaneous gaze following needed to see the motion of the head turn to learn to follow gaze—learning from static models was not found. Furthermore, if a gaze cue is produced by a lateral translation of the stimulus face independently of the pupils, such that the pupils are stationary but the facial movement results in averted gaze, 4- to 5-month-old infants orient to the direction of motion rather than to the opposite side of space cued by gaze (Farroni, Johnson, Brockbank, & Simion, 2000), whereas adults always orient to the direction of gaze and not head movement (Bayliss, di Pellegrino, & Tipper, 2005). The importance of motion to gaze following in infants was confirmed in another study that showed that eye contact prior to the motion of the pupils was necessary for early gaze following (Farroni, Mansfield, Lai, & Johnson, 2003). This suggests that gaze contact—engaging attention on the face—followed by pupil motion, is vital to give rise to the more complex behaviors involved in joint attention in infancy.
Although gaze cueing may be based on basic perceptual processes initially, infants soon learn to use gaze cues flexibly. Between the ages of 12 and 18 months, infants begin to show signs that they are not simply following the motion of head turns because when the actors’ eyes are closed during the head turn, gaze following occurs less often (Brooks & Meltzoff, 2002, 2005). Furthermore, if another attention-engaging facial expression (such as the expression of happiness or sadness) is made by the looking face, joint attention is less likely to be engaged (Flom & Pick, 2005). Brooks and Meltzoff (2002, 2005) and Flom and Pick (2005) found that 7-month-olds follow the gaze of a face with a neutral emotional expression more strongly than they do when the face looks happy or sad. Hence, although infantile joint attention may be based on simple mechanisms, it quickly becomes sensitive to the context in which the observed gaze behavior occurs. It is interesting that around the same age that this new flexibility emerges, other skills involved in face processing, such as recognition of facial identity and emotional expression, become more robust and sophisticated (see, e.g., Nelson, 2001, for a review). The flexible use of other’s gaze direction to orient attention may arise from higher level social cognition interacting with and building on a perhaps innate basic mechanisms for analyzing gaze direction (cf. Baron-Cohen, 1995).
Despite emerging from arguably simple origins, gaze following has a remarkable influence in the development of higher level representations of other people’s perceptions. For example, orienting to the object of a caregiver’s attention might allow the speedy acquisition of nouns, through the pairing of an observed object and its vocalized name (Baldwin, 1995). Indeed, gaze following at 6 months has been shown to correlate with vocabulary size at 18 months (Morales et al., 2000; Morales, Mundy, & Rojas, 1998). The development of joint attention at 20 months can predict theory of mind abilities at 44 months (Charman et al., 2001), demonstrating the importance of gaze following in the development of social cognition. The development of joint attention behavior is also associated with an increase in frontal lobe activity, crucial for higher order representations (Mundy, Card, & Fox, 2000).
One fundamental concept that the developing child has to grasp to achieve successful joint attention is the understanding that people attend to their own actions. As such, observed gaze direction can be used as an indicator of another person’s future actions. In fact, some cells in monkey temporal cortex that respond to the observation of actions make this very distinction: They fire only when the action is attended to by the actor; if the actor is gazing away from their own hand, the cells do not respond (Jellema et al., 2000). These cells may therefore form part of an intentionality detection system. There is behavioral evidence supporting the idea that during observation of actions, the observed person’s motor intentions can be inferred by monitoring their eye gaze (that is, whether they are focusing on the target of their action; Castiello, 2003; see also Pierno, Becchio, et al., 2006). Discerning another’s intentions is a vital building block in social cognition. In line with this notion, Amano, Kezuka, and Yamamoto (2004) found that looking from an adult’s head to their hand was one of the first joint attention abilities that emerge in development (after around 3 to 4 months). This is certainly an important skill because using joint attention to understand and predict others’ actions is an integral part of theory of mind and social perception.
Another central part of joint attention is orienting to the object that is the current focus of another’s attention rather than showing a simple spatial orienting response in the general direction of observed gaze. It would be disadvantageous if an infant consistently failed to orient to the exact object that an adult is looking at. For example, language acquisition (specifically, naming objects) may be slowed if the referent of a novel word, as indexed by the speaker’s gaze, is not accurately identified. There is no clear consensus at what age infants stop simply orienting to the nearest object in the general vicinity of the adult and instead turn their attention more precisely toward the looked-at object (see Butterworth & Jarrett, 1991; Corkum & Moore, 1998; Moore & Corkum, 1994, for evidence that this does not happen until after at least 12 months). One study, however, has suggested that by 9 months, infants can correctly orient to an object even in the presence of closer-by distractor objects as long as the target object is in their field of view (Flom, Deák, Phill, & Pick, 2004). Further evidence for a relatively early appreciation for adults’ gaze behavior comes from Moll and Tomasello (2004), who showed that 12- and 18-month-olds will crawl a short distance to peek around a barrier (obscuring the infant’s view) to inspect what an adult is looking at.
Hence, the infant rapidly learns that gaze can provide very reliable information about another person’s likely object of reference or imminent action. Once these building blocks of joint attention are in place, children can begin to use other people’s orienting behavior for a more sophisticated “mind-reading” purpose (Baron-Cohen, 1994). Baron-Cohen, Campbell, Karmiloff-Smith, Grant, and Walker (1995) found that not only can children ages 3 and 4 years old deduce the direction of gaze of a schematic face, but they can ascribe mental states such as desires on the basis of the direction of gaze. That is, if Charlie is looking toward the chocolate bar, Charlie wants the chocolate bar (see also Lee, Eskritt, Symons, & Muir, 1998). Thus, understanding that direction of gaze can indicate which objects a person knows exists, is currently attending to, and holds a mental state about can help a child infer much about the current visual world. Nevertheless, children of the same age have difficulty coping with conflicting information (Friere, Eskritt, & Lee, 2004; Pellicano & Rhodes, 2003). Pellicano and Rhodes (2003) found that when an arrow pointed in a different direction to the eyes, children were unable to correctly choose the object that was looked at by Charlie as the one he wanted. This suggests that children’s use of eye-gaze direction is vulnerable to interference at least into the 5th year of life.
By about the age of 5, children learn that the eyes can give information that people want to hide from them. That is, the eyes can help children detect deception—another vital step in the development of theory of mind (Baron-Cohen, 1992; Leslie, 1987). In the study by Friere et al. (2004), children ages 3-5 years were lied to by an adult who claimed that she did not know the true location of a toy, which was hidden under one of three cups. Meanwhile, the adult looked toward the cup hiding the toy. The 3-year-olds performed poorly, unable to use the eyes to infer the location of the cup, as if the verbal information dominated their decision. The 4- and 5-year-olds, however, performed well (Experiment 1). A further experiment found that children of these ages also act differently when verbal and gaze cues are in direct opposition—if the toy is in the looked-at cup but the adult says it is in a different location, 3-year-olds inspect the cup indicated verbally, whereas the 5-year-olds correctly inspect the looked-at cup (Experiment 3). It is interesting to note at this stage that children’s increasingly subtle and flexible use of gaze is reflected in recruitment of neural structures. That is, by at least 7 years of age, children activate similar brain regions to adults when analyzing gaze direction (Mosconi, Mack, McCarthy, & Pelphrey, 2005).
The development of gaze perception in infants may be due to the use of simple systems (gaze perception) by higher level systems dedicated to the establishment of a sophisticated picture of other people’s overt behavior, future intentions, and mental states. Gaze perception is crucial for joint attention and social cognition. Hence, should this normal development of these simple gaze perception and gaze-following mechanisms be impeded in any way, profound implications for social cognition could result and persist throughout development. This may indeed be the case in children with autism, a developmental disorder affecting social cognition.
The developmental disorder autism is characterized by a triad of symptoms that relate to poor social, communicative, and imagination skills in affected individuals (Baron-Cohen, 2000). Children with autism perform poorly on first-order tests of theory of mind (e.g., understanding that “Mary thinks the marble is in the basket”) and often fail on second-order tests (e.g., understanding that “Mary thinks that John thinks the marble is in the basket”) compared with normal children and children with Down syndrome (Baron-Cohen, 1989). Social interactions are also different from those of normally developing children, as there are fewer attention-sharing behaviors with other children and caregivers in children with autism (Sigman, Mundy, Sherman, & Ungerer, 1986). Shifts of attention are more often made between two (nonsocial) objects rather than between people (Dawson, Meltzoff, Osterling, & Brown, 1998; Swettenham et al., 1998). Imitation, another index of learning through experience sharing, is also impaired in children with autism (Charman et al., 2001, 1997; Stone, Ousley, & Littleford, 1997). Furthermore, whereas normally developing children use the speaker’s direction of gaze to infer the referent of a novel word, children with autism tend not to use gaze cues in this manner, which could play a role in the sometimes profound language deficits prevalent in this population (Baron-Cohen, Baldwin, & Crowson, 1997).
Thus, along with general learning, language, and IQ deficits, children with autism present a highly impaired cognitive profile. A variety of underlying problems have been postulated for the deficits found in autism. For example, theory of mind may not have developed fully (Baron-Cohen, 1989), emotional systems may be disrupted (e.g., the amygdala, Baron-Cohen et al., 2000), or abnormal function of the frontal lobes may lead to executive dysfunction (Hill, 2004). Other theories postulate that the presence of “islets of ability” or even superior performance in certain tests of cognitive ability demonstrate that a more comprehensive framework, for example based on weak central coherence, is necessary to explain autism (U. Frith & Happé, 1994). It has further been proposed that the autistic cognitive profile trades off empathizing skills for systemizing skills (Baron-Cohen, 2002; Baron-Cohen, Richler, Bisarya, Gurunathan, & Wheelwright, 2003; Baron-Cohen, Wheelwright, Skinner, Martin, & Clubley, 2001). One major question of current research into autism is whether these two sets of indicators of autism (weak central coherence and weak theory of mind) are related or independent (e.g., Jarrold, Butler, Cottington, & Jiminez, 2000; Morgan, Maybery, & Durkin, 2003).
With regard to gaze processing, the evidence is clear that it is treated very differently by people with autism than by people without autism. When observing a face, normally developed adults tend to scan the eye and mouth region in a highly consistent manner (Mertens, Siegmund, & Grüsser, 1993). In contrast, people with autism often dislike and avoid eye contact (see Baron-Cohen, 1988; Dalton et al., 2005; Pelphrey et al., 2002). Normal children detect gaze contact quicker than averted gaze, whereas children with autism are equally quick at detecting either gaze type (Senju, Yaguchi, Tojo, & Hasegawa, 2003; Senju, Hasegawa, & Tojo, 2005). When the task demands exploring the eye region of a face, people with autism show greater galvanic skin response and greater neural activity in the fusiform gyrus and amygdala compared with that of control participants, suggesting eye-region avoidance is an arousal modulation strategy on the part of people with autism (Dalton et al., 2005). Such behavioral traits are mirrored in people with social phobia, in whom scanning of faces rarely includes the eye region (Horley, Williams, Gonsalvez, & Gordon, 2002; Larsen & Shackelford, 1996).
Normally developing children make eye contact more readily if a person is performing an ambiguous action than if the action is unambiguous, whereas children with autism make little eye contact whatever the action’s semantic context (Phillips, Baron-Cohen, & Rutter, 1992). Indeed, when normally developed adults see a person unexpectedly look away from an object, higher STS activity is elicited compared with when the person looks toward the object. This modulation of neural activity in relation to violation or confirmation of expected behavior is absent in autism (Pelphrey, Morris, & McCarthy, 2005; see Zilbovicius et al., 2006, for a review of the STS and autism). Ignoring contextual information can be seen as a symptom of weak central coherence (see Brosnan, Scott, Fox, & Pye, 2004). Furthermore, adults and children with autism are poor at attributing emotions to people on the basis of the eye region, something that normal attributers are proficient at (Baron-Cohen et al., 1995; Baron-Cohen, Wheelwright, Hill, Raste, & Plumb, 2001; Baron-Cohen, Wheelwright, & Jolliffe, 1997).
Electrophysiological evidence for deviant gaze processing was provided by Senju, Tojo, Yaguchi, & Hasegawa (2005; see also Grice et al., 2005), who found different brain responses for direct versus averted gaze in normally developing children, but such differences were absent in children with autism. Because these differences arise from processes thought to take place in occipitotemporal areas, it is interesting to note that children with autism appear to have lower gray matter densities in the STS (Boddaert et al., 2004)—exactly the area considered to be responsible for gaze processing (Perrett et al., 1992; Wicker et al., 1998). This, along with structural differences in the cerebellum (Boddaert et al., 2004), the frontal lobe (Hill, 2004), and functional differences in the amygdala (Baron-Cohen et al., 2000; Dalton et al., 2005) demonstrates the substantial anatomical divergence between people with and without autism.
Given the well-established patterns of gaze aversion, it is no surprise to find that joint attention is also impaired in people with autism (e.g., Charman et al., 1997; Dawson et al., 2004; Leekam, López, & Moore, 2000; Roeyers, van Oost, & Bothuyne, 1998). However, as with normal participants, better joint attention skills are associated with larger vocabularies, and fewer social and communicative difficulties, in people with autism, illustrating the vital importance of joint attention in the social development of children with autism as well as normally developing children (Charman, 2003). Furthermore, orienting to the direction of another’s gaze can occur at normal levels in children with high-functioning autism (for whom IQ is within the normal range), perhaps based on the same low-level motion cues from which joint attention develops in normally developing children (Chawarska, Klin, & Volkmar, 2003; Leekam, Hunnisett, & Moore, 1998; Swettenham, Condie, Campbell, Milne, & Coleman, 2003; but see Kylliäinen & Hietanen, 2004, who demonstrated gaze cueing in children with autism by using statically averted gaze). Thus, it is possible that although the low-level perceptual aspects of gaze cueing (e.g., motion, luminance contrast, and geometry of sclera and pupil) are intact in children with autism, it is the higher level social cognition skills (e.g., attribution of emotional states) or their interactions with those basic processes that are impaired; in other words, the flexibility with which normally developing children use gaze cues is lacking in autism.
Differences between children with and without autism are found not only in how they orient attention to social stimuli (e.g., Dawson et al., 1998; Swettenham et al., 1998) but also with other attentional abilities. Children with autism have been found to display normal or superior attentional processing in visual search and selective attention (Brian, Tipper, Weaver, & Bryson, 2003; O’Riordan, Plaisted, Driver, & Baron-Cohen, 2001), and slowed orienting of covert attention has also been noted in autism (Townsend, Courchesne, & Egaas, 1996). We will return to these issues when we have introduced the gaze-cueing paradigm more comprehensively, so that we can evaluate how this and other attention paradigms can contribute to knowledge about autism.
Theory of mind impairments often accompany the cognitive and affective deficits encountered in schizophrenia. For example, failing to correctly attribute the agent of an action is a feature of schizophrenia (C. D. Frith, Blakemore, & Wolpert, 2000). People with schizophrenia tend to misattribute actions of others to themselves, whereas normal participants are proficient at telling the difference between their gloved hand performing an action on a TV screen and the experimenter’s gloved hand performing the same action (Daprati et al., 1997). Self–other confusion is characteristic of mentalizing problems associated with schizophrenia (Langdon et al., 1997), and when processing facial affect, people with schizophrenia recruit premotor areas as opposed to the amygdala. This suggests that a “mirror system,” which links the internal representations of observed and executed actions, is hyperactive in people with schizophrenia as they process other people’s mental states (Quintana, Davidson, Kovalik, Marder, & Mazziotta, 2001). Such overactivation of a facial mirror system may contribute to the blurring of boundaries between potential mentalistic agents in the environment. Despite all this, the accuracy of eye direction determination is good in people with schizophrenia, for whom performance does not differ significantly from that of normal participants (Franck et al., 1998, 2002). This demonstrates that the lower level aspects of gaze perception are normal in people with schizophrenia. However, Langdon, Corner, McLaren, Coltheart, and Ward (2006) present evidence that gaze-following systems are somewhat overactive in people with schizophrenia.
Other syndromes that affect social cognition, such as Turner syndrome (Elgar, Campbell, & Skuse, 2002; Lawrence et al., 2003) and Williams syndrome, have also been investigated in gaze perception studies (Mobbs et al., 2004). Nevertheless, further work is needed with these populations to more comprehensively determine the effect these syndromes have on gaze perception.
The establishment of a dyadic joint attention relationship may be a behavior that originally develops from stimulus-response relationships and reinforcement, yet it is a higher level interpersonal skill that requires at least some level of theory of mind. To take Emery’s (2000) definition, “Joint attention requires that two individuals . . . are attending to the same object . . . based on one individual using the attention cues of the second individual” (p. 588). This definition demands that attention is directed to the appropriate feature of the environment, whereas gaze following is perhaps simple orienting to the appropriate region or hemifield. Shared attention is a higher state of the dyadic relationship whereby both individuals are attending the same object, as with joint attention, but both are aware of each other’s attentional state (Emery, 2000). The subtle differences between gaze following, joint attention, and shared attention are highlighted not only by work with human infants but also in work with nonhuman primates.
For example, Myowa-Yamakoshi, Tomonaga, Tanaka, and Matsuzawa (2003) showed that chimp infants (ages 10-32 weeks) have a preference for attending to direct human gaze. This result mirrors that of Batki et al. (2000) in 36-hour-old human infants. Furthermore, adult chimpanzees, like human infants, follow gaze direction to appropriate objects in the environment (Tomasello, Hare, & Agnetta, 1999). These animals have been shown to display behaviors that suggest they possess the ability to comprehend psychological states, such as understanding that, relative to themselves, a conspecific might have a different visual perspective, and hence might have access to different knowledge (Tomasello, Call, & Hare, 2003).
On the other hand, macaque monkeys (Deaner & Platt, 2003; Emery, Lorincz, Perrett, Oram, & Baker, 1997; Ferrari, Kohler, Fogassi, & Gallese, 2000; S. V. Shepherd, Deaner, & Platt, 2006) show gaze following and some aspects of joint attention but cannot use such cues to solve simple object-choice problems (Anderson, Montant, & Schmitt, 1996; see also Vick, Toxopeus, & Anderson, 2006). Tamarind monkeys (Santos & Hauser, 1999), squirrel monkeys, and capuchin monkeys (Anderson, Kuroshima, Kuwahata, & Fujita, 2004) seem to understand that the object of a human’s action is likely to be the object they are looking at as they perform the action (see also Jellema et al., 2000). However, baboons fail to use gaze cues unless these are perfect predictors of stimulus location (Fagot & Deruelle, 2002) or if they have competitive motivation to do so (Vick & Anderson, 2003). These data suggest that joint attention abilities vary between species, with some primates (especially chimpanzees) using social gaze to higher levels than do others. Research is not restricted to primates. For example, domestic dogs show stronger patterns of gaze perception than do wolves (Miklósi et al., 2003; see also Gácsi, Miklósi, Varga, Topál, & Csányi, 2004; and Emery, 2000, for reviews on primates and other species). In some ways, the differences in primate use of gaze resemble early stages in human infant development, before the child learns to use those cues in a more flexible manner, and also stages at which some people develop difficulties with social cognition because of developmental disorders such as autism, adult-onset disorders such as schizophrenia, or difficulties in face perception caused by brain damage.
In summary, research on gaze processing has demonstrated that humans are highly adept at detecting and encoding other people’s eyes and direction of gaze in particular. This sensitivity develops very early in life and allows for the development of a plethora of other social and cognitive skills. Thus, the basic and perhaps innate skills that enable the analysis of another person’s gaze direction serve as building blocks for more higher level social cognition abilities. In some clinical populations, this development is impaired, leading to profound social and cognitive difficulties. In the following sections, the mechanisms that may be underlying this joint attention behavior are discussed.
We receive an abundance of visual information whenever our eyes are open, but not all of this input may be relevant for our current behavioral goals. Therefore, it is highly beneficial for our cognitive system to be able to select pertinent input for further processing by attending selectively to relevant aspects of the environment. Orienting of attention refers to the alignment of some internal mechanism with an external sensory input source that results in the preferential processing of that input. This article concerns visual attention specifically, therefore the term orienting will henceforth refer to orienting to visual input. Orienting of attention may be elicited and controlled in different ways, and, consequently, a great deal of research has centered on attempts to distinguish between different types of orienting or orienting processes.
For example, Posner (1980) differentiated between overt and covert orienting. Overt orienting involves the directly observable orientation of sensory receptors and/or body parts toward a spatial location or object to enable better processing of the target stimulus. Thus, you may move your eyes and head toward an object of interest that will allow the visual input to be foveated and to receive optimal processing. Covert orienting refers to alignment of an internal mechanism with some sensory input in the absence of overt responses. Such “invisible” shifts of attention can be detected by using response accuracy or reaction times (RTs) as a measure of processing efficiency of a visual target. In the next section, we briefly review prior research investigating attentional orienting to set the scene for comparing and contrasting such orienting effects with those evoked by gaze cues.
Usually, some variation of the spatial cueing paradigm (Posner, 1980; Posner & Cohen, 1984; Posner, Nissen, & Ogden, 1978) is used to investigate covert orienting. In a typical example of this paradigm, participants are instructed to fixate on a marker at the center of the screen and to respond to the onset of a target stimulus that can appear to the left or right of the fixation marker by making a speeded keypress response. The onset of the target is preceded by some cue that elicits a shift of attention to either the left or right (see Figure 2). Faster RTs and/or more accurate performance with targets appearing in the previously cued location (compared with those in the uncued location) indicate attention shifts to the cued location.
One way of distinguishing between different forms of orienting is to examine the effects of different types of attention cues, in other words, how attention is controlled. Such control is commonly assumed to manifest itself in two major types: (a) bottom-up (exogenous, reflexive, or stimulus driven), and (b) top-down (endogenous, voluntary, or goal driven).
Traditionally, bottom-up control is achieved by the capture and guidance of attention by events in the visual field, often in the periphery (Eriksen & Hoffman, 1974). Any dynamic perceptual change such as a sudden change in luminance, texture, motion, or depth automatically attracts attention (e.g., Oonk & Abrams, 1998; Yantis & Hillstrom, 1994). In the basic peripheral cueing paradigm (e.g., Posner & Cohen, 1984), two empty placeholder boxes are arranged to the left and right of the central fixation marker. The outline of one of the peripheral boxes is briefly brightened before a target appears randomly in either box after variable cue-target stimulus-onset asynchronies (SOAs). As soon as the target is detected, the participant responds by pressing a key. The abrupt increase in luminance of the peripheral box is assumed to trigger a reflexive attention shift to the cued location that should facilitate stimulus processing at that point in space (see Figure 2 Panel A). Indeed, RTs are faster when the target occurs in the box that had been brightened (i.e., cued) compared with targets in the opposite (uncued) box. This type of orienting occurs rapidly and even though the cue is not predictive of the actual target location. Furthermore, instructions to ignore the cue fail to disrupt the cueing effect that is observed even if the target is more likely to appear in the uncued location (Jonides, 1981; Remington, Johnston, & Yantis, 1992). Thus, this kind of orienting is considered automatic and reflexive because it cannot be suppressed.
The initial beneficial effect of peripheral cues on target detection is short lived. Facilitation declines between 150 ms and 300 ms after cue onset (Müller & Findlay, 1988). Even more striking, this initial facilitation triggered by peripheral exogenous cues is overcome by inhibitory effects at longer cue-target intervals. That is, RTs to targets on valid trials (i.e., when the target appears in the cued location) are now slower than are responses on invalid trials (inhibition of return [IOR]; Maylor, 1985; Maylor & Hockey, 1985; Posner & Cohen, 1984). Posner and Cohen (1984) reasoned that this result reflects the operation of two distinct components of orienting. A sudden event in the environment triggers both facilitatory and inhibitory processes whose joint effect influences responses to targets in the environment (see also Maylor, 1985). If a target occurs in close temporal proximity to a peripheral event, facilitation dominates at the cued location resulting in speeded detection of the target. Once attention is drawn to new locations, inhibition becomes evident at the previously cued location, expressed in elevated RTs. They argued that such a two-fold orienting mechanism would aid the detection of new events in the environment by preventing attention from repeatedly returning to a location that has already been examined. Hence, this phenomenon has been coined “inhibition of return” (IOR) in reference to the presumed purpose of the inhibitory orienting mechanism (Posner, Rafal, Choate, & Vaughan, 1985).
In contrast to this automatic control of attention, orienting in response to centrally presented symbolic cues appears to be, at least partly, under voluntary control (i.e., top-down). Such cues may be an arrow pointing to one direction (see Figure 2 Panel B) or other semantic cues such as a word indicating the likely target location (e.g., LEFT). What these central cues have in common is that, unlike peripheral cues, they do not directly indicate a spatial location but rather require interpretation. Jonides (1981) presented a central arrow that was, like peripheral cues, not predictive of the target location. He found no evidence for the rapid attention shifts associated with peripheral cues. In many subsequent studies, the central cue correctly predicted the target location on most trials to provide an incentive for the participant to orient in the direction of the cue, bringing orienting of attention under voluntary control. Under these circumstances, attention shifts were observed in response to central cues (e.g., Posner, Snyder, & Davidson, 1980).
However, more recent studies obtained cueing effects even with spatially nonpredictive arrow cues (e.g., Eimer, 1997; Hommel, Pratt, Colzato, & Godijn, 2001; Pratt & Hommel, 2003; Ristic, Friesen, & Kingstone, 2002; M. Shepherd, Findlay, & Hockey, 1986; Tipples, 2002). This suggests that orienting in response to symbolic cues is not entirely under strategic control. It appears that any reflexive component of symbolic cues may be evoked only when the cue is asymmetric (like an arrow), allowing spatial correspondence between the central cue and the target location to be automatically paired (Lambert, Roser, Wells, & Heffer, 2006; see also Lambert & Duddy, 2002). Nevertheless, there are differences in the effects produced by arrow and peripheral cues, respectively. As opposed to peripheral cueing, orienting evoked by the directional information of central cues can be suppressed if that information conflicts with task demands, indicating that orienting to central cues is less automatic than orienting to peripheral cues (Jonides, 1981; see also Friesen, Ristic, & Kingstone, 2004). Also, unlike peripheral cueing, attention shifts incited by central cues are susceptible to interference arising from processing demands of concurrent secondary tasks or orienting reflexes triggered by task-irrelevant peripheral events (Jonides, 1981; Müller & Rabbitt, 1989).
Different neural systems appear to be specialized in exogenous and endogenous control of attention. Exogenous orienting is assumed to be subserved largely by a posterior attention system involving subcortical structures such as the pulvinar and the superior colliculus (SC; Posner, Cohen, & Rafal, 1982; Rafal, Calabresi, Brennan, & Sciolto, 1989). Endogenous orienting is presumably supported more strongly by cortical areas in anterior (e.g., the cingulate gyrus and the supplementary motor area, which are involved in executive functions such as developing and maintaining expectancies; Carr, 1992; see also Corbetta et al., 1993) and posterior regions of the brain (e.g., intraparietal sulcus; Corbetta, Kincade, Ollinger, McAvoy, & Shulman, 2000). Both systems, which are specialized in bottom-up and top-down control, respectively, are assumed to interact such that salient sensory events can attract attention in a bottom-up fashion regardless of the ongoing task, thereby interrupting top-down control (Corbetta & Shulman, 2002).
The time courses of the attentional effects produced by peripheral and central cues, respectively, appear to be characteristic and different. Orienting in response to symbolic cues may arise more slowly than does orienting to peripheral cues (Müller & Rabbitt, 1989); whereas peripheral cues produce their maximum facilitatory effects at cue-target intervals of approximately 100 ms, the effects of central cues build up more gradually and achieve their largest effects at SOAs of circa 300 ms (Cheal & Lyon, 1991). Another distinction between the two forms of orienting is apparent in the maintenance of cueing effects across time. Facilitation effects triggered by central cues are sustained at optimum level beyond their peak activation at 300 ms, whereas it is around this time that the inhibitory effects of peripheral cues begin to reveal IOR.
Although spatial visual attention has been studied extensively over the past decades, the cues in these experiments were typically artificial and arbitrary (e.g., the brightening of the outlines of geometric shapes). In recent years, cognitive research has begun to use more naturalistic, social cues of attention that had been used by studies in developmental psychology for decades: the perceived direction of another person’s eye gaze. As noted, another’s eye-gaze direction communicates information about important events in the environment. People are typically looking toward the objects to which they are attending so that the relevant input receives optimal perceptual processing. Therefore, the encoding and interpretation of another person’s gaze direction enables the observer to detect that person’s focus of attention and to align their own attention accordingly.
In studying the precise cognitive mechanisms underlying attention shifts in response to observed eye-gaze direction, modifications of Posner’s cueing paradigm have been used (see Figure 3). Participants view a face stimulus at the center of the display. The gaze direction of that face substitutes the peripheral onset or symbolic arrow cues used in previous studies of attention orienting. In one of the first investigations of eye-gaze cueing, Friesen and Kingstone (1998) explored whether observed gaze shifts, like traditional attention cues such as peripheral luminance increases or central arrows, produce orienting responses in adults. Participants were asked to respond to target letters that appeared to either the left or the right of a schematic face with varying SOAs after the pupils of the face appeared, constituting a directional gaze cue. The response required was either the mere detection of the target’s appearance or the indication of its location or its identity by pressing appropriate response keys. The eyes of the face looked either left, right, or straight ahead. On valid trials, the target appeared in the gazed-at location, whereas on invalid trials, it occurred in the opposite location. On neutral trials, the face gazed ahead, and the target appeared randomly on either side. Participants were informed that the direction in which the eyes looked was not predictive of the location or the identity of the target or of when it would appear. Thus, the eye gaze of the face was used as a centrally presented but spatially uninformative cue. The results of the experiment showed that RT was facilitated on valid-cue trials relative to neutral and invalid-cue trials, independent of response type. This cueing effect emerged relatively rapidly at short cue-target SOAs (105 ms in two response conditions and 300 ms in all response conditions) and disappeared with longer SOAs (1,005 ms). Thus, another’s gaze shift results in a corresponding shift of attention in the observer, which has been labeled reflexive and therefore likened to orienting in response to peripheral cues.
A separate study by Driver et al. (1999) used photographs of a face whose eyes were looking to the right or to the left as a central, spatially uninformative cue. Participants were required to discriminate a target letter that could appear on either side of the face after 100, 300, or 700 ms. The pattern of results they obtained was comparable with the findings of Friesen and Kingstone (1998). RTs were significantly faster on valid compared with invalid trials at 300- and 700-ms SOAs, even though the direction of gaze was entirely nonpredictive of target location or identity. Indeed, in one of their experiments (Experiment 3), participants were informed that the target was four times as likely to appear at the uncued side so that they would endogenously orient away from the gazed-at location. Under these circumstances, facilitation was still obtained for the cued location, but only at the 300-ms SOA. At the later interval, a trend toward facilitation at the expected target side emerged, suggesting that participants were able to eventually voluntarily shift their attention in that direction (see also Downing, Dodds, & Bray, 2004; Friesen et al., 2004). Together, these studies demonstrated automatic shifts of attention in response to central, spatially unpredictive cues that apparently cannot be suppressed at short SOAs.
Gaze cues also trigger automatic overt orienting responses (Mansfield, Farroni, & Johnson, 2003). Mansfield et al. (2003) recorded eye movement latencies to a target presented to the left or right of a face with averted gaze. Reliable facilitation effects were obtained at a fixed SOA of 300 ms. It is interesting to note that observing averted gaze could also elicit spontaneous saccades in the direction of the cue prior to target onset, even though participants were instructed to fixate on the center during this period (however, see Itier, Villate, & Ryan, 2007, who showed that when overt attention is voluntarily directed away from the eyes by task instruction, such spontaneous gaze following is less frequent, presumably because the gaze direction signals are weaker when presented further from the fovea). This suggests that observing another’s gaze shifts may evoke a similar motoric program in the observer. Such simulated or “mirrored” activations of motor programs by the mere observation of actions have previously been reported with hand reaching and grasping actions (e.g., di Pellegrino, Fadiga, Fogassi, Gallese, & Rizzolatti, 1992; Grafton, Arbib, Fadiga, & Rizzolatti, 1996). The notion that a similar mirror system may also exist for the oculomotor domain is supported by the finding that similar cortical regions are recruited during execution and observation of eye movements (Grosbras et al., 2005). The involuntary cue-driven saccades recorded by Mansfield et al. did not, however, account for their observed cueing effects because the results for target-driven saccades were the same when cue-saccade trials were excluded.
In a somewhat different task, Ricciardelli, Bricolo, Aglioti, and Chelazzi (2002) investigated whether seen gaze can interfere with goal-driven saccades. In their experiments, potential saccade targets were presented to the left or right of fixation. An instruction cue signaled that a saccade was to be made to one of those targets. A distractor face with averted gaze was then displayed at the center. Saccadic performance to the target was less accurate when the gaze cue was incongruent with the saccade instruction (i.e., when the face gazed at the nontarget). This effect was less pronounced if an arrow was used as the distracting direction indicator. Taken together, these two studies demonstrated that observation of averted gaze can trigger both covert and overt automatic orienting responses (see also Friesen & Kingstone, 2003a). Furthermore, observing gaze, but not arrow cues, evokes a tendency to execute saccades in the corresponding direction, imitating the observed behavior.
Early demonstrations of gaze cueing consistently lacked evidence of IOR. Friesen & Kingstone (2003b) used an elegant procedure showing that facilitation effects of gaze cueing and inhibition effects of peripheral cueing co-occur at the same SOA and at different locations in response to the same stimulus. In their study, four empty circles were presented. The features of a cartoon face, with its gaze straight ahead or averted, then appeared abruptly in one of the circles. Thus, the same stimulus could serve as a directional gaze cue as well as a sudden-onset cue. RTs were fast when the target appeared in the gazed-at location (facilitation effect) but were slow when it occurred in the location that was cued by the sudden onset of features (IOR). Critically, the magnitude of the IOR effect was unaffected by whether a gaze cue was concurrently presented. Because the magnitude of IOR is known to decrease when distributed over several locations (e.g., Tipper, Weaver, & Watson, 1996), Friesen and Kingstone (2003b) argued that IOR and gaze-triggered facilitation effects were separate and independent phenomena and that gaze cues do not elicit inhibition processes. However, facilitation and inhibition processes have been shown to co-occur even when triggered by the same type of cue (Danziger & Kingstone, 1999). Therefore, the coexistence of gaze cueing and peripheral cueing effects may not be a reliable indicator for a lack of IOR with gaze cues.
Friesen and Kingstone (2003a) also provided evidence that gaze cueing and IOR are subserved by different neural systems. They found that unlike IOR, gaze cueing does not interact with an attention phenomenon known as the gap effect.2 This term refers to speeded responses to a peripheral target when the onset of the target is preceded by the offset of the fixation stimulus, compared with those of conditions in which the fixation point remains visible during presentation of the target (Fischer & Ramsperger, 1984). Like IOR, the gap effect is thought to be mediated by the SC (Dorris & Munoz, 1995; Munoz & Wurtz, 1995a, 1995b; Schiller, Sandell, & Maunsell, 1987; Sparks & Mays, 1983). The lack of an interaction between gaze cueing and the gap effect suggests that in contrast to IOR, the SC is not directly implicated in gaze cueing. Rather, the SC may be activated only as a consequence of the engagement of attentional networks (Nummenmaa & Hietanen, 2006) as revealed by saccade curvature analysis (e.g., Sheliga, Riggio, Craighero, & Rizzolatti, 1995; Tipper, Howard, & Paul, 2001).
Nevertheless, IOR is a multifaceted phenomenon that implicates various cortical areas as well as established subcortical structures such as the SC (e.g., Dorris, Klein, Everling, & Munoz, 2002; Ro, Farné, & Chang, 2003; Tipper, Weaver, Jerreat, & Burak, 1994; Vivas, Humphreys, & Fuentes, 2003) and is not necessarily restricted to one type of cueing. Indeed, although initial studies failed to demonstrate IOR with centrally presented symbolic cues such as arrows (e.g., Posner & Cohen, 1984), inhibition is obtained with these cues if the observer’s oculomotor system is activated (Abrams & Dobkin, 1994; Posner et al., 1985). For example, Rafal et al. (1989) compared cueing effects in response to peripheral and central cues while manipulating eye movement responses to the cue: Participants were instructed to either keep fixated at the center, to execute a saccade in the direction of the cue, or to prepare a saccade to the cued location that was to be executed only if a target appeared subsequently (a brightening of the central fixation box otherwise signaled that the saccade should be cancelled). Inhibition at the cued location was observed in all peripheral cue conditions but also emerged in the central cue conditions that required direct activation of the oculomotor system (i.e., saccade execution and preparation). Akin to these experiments, gaze studies present the cue at the center of the display, and saccades are evoked in response to the cue but typically suppressed (Ricciardelli et al., 2002). It is therefore puzzling that IOR should not be observed in response to these cues.
In a series of experiments, Frischen and Tipper (2004) demonstrated that inhibition effects can be obtained in response to eye-gaze cues. They noted that previous demonstrations of gaze-cueing effects possessed experimental features associated with failures to obtain IOR effects: Usually, the gaze cue was presented until target appearance. However, even with peripheral sudden-onset exogenous cues, IOR is not observed if there is a temporal overlap between cue and target onset (Collie, Maruff, Yucel, Danckert, & Currie, 2000; Maruff, Yucel, Danckert, Stuart, & Currie, 1999). Furthermore, Posner and Cohen (1984) proposed that attention needs to be withdrawn from the cued location for inhibition to be observable (see also Danziger & Kingstone, 1999). This is usually achieved by presenting a second cue at fixation, a feature that was also lacking in gaze-cueing paradigms. Frischen and Tipper (2004) attempted to draw attention away from the gazed-at location by shifting the gaze of the face back to the center and by offsetting the face stimulus between cue and target appearance. In these conditions, they did indeed observe reliable inhibition effects, but only at a considerably extended SOA of 2,400 ms, whereas at shorter intervals of 1,200 ms, no cueing effects were observed (cf. Friesen & Kingstone, 1998; Langton & Bruce, 1999). This gaze-evoked IOR contrasts dramatically with that evoked by peripheral cues, with which IOR is usually observed from around 200 ms following cue onset and reliably within 1,000 ms (see Samuel & Kat, 2003, for a review and meta-analysis). Frischen and Tipper (2004) speculated that another’s gaze direction is a very powerful attentional cue potentially signaling important events in the environment so that the observer is reluctant to withdraw attention from the gazed-at location. At 1,200 ms, concurrent facilitation and inhibition processes may cancel each other out so that the observed net effect shows no difference between valid and invalid trials (cf. Danziger & Kingstone, 1999; Posner & Cohen, 1984). Thus, only after a fairly long interval, inhibition becomes dominant and is revealed behaviorally.
The observation of IOR with gaze cues seems to depend on an active trigger to draw attention away from the gazed-at location. Frischen, Smilek, Eastwood, and Tipper (in press) observed inhibition at a 2,400-ms SOA only when the face stimulus disappeared between cue and target onset, but no significant effect emerged at the same interval when the face remained in view throughout the trial. When a sudden luminance change around a visible face stimulus was used as a perceptual cue between gaze cue and target onset, inhibition was again evident at a 2,400-ms interval. Again, however, no cueing effect was observed at a shorter SOA (1,200 ms). Together, these findings show that gaze-cueing results in both prolonged facilitation and a delayed onset of inhibition processes at the gazed-at location.
In summary, gaze cueing is a very robust phenomenon that in some ways resembles the effects traditionally obtained with peripheral sudden-onset cues. Perceiving another’s gaze direction reliably triggers both covert and overt shifts of attention in the corresponding direction. Given that eye gaze is typically seen in the context of a face, we will now examine the extent to which gaze cueing is affected by face perception, and vice versa.
The perception of another person involves many different facets of face processing, from lower level configural aspects such as head orientation, to semantic representations such as face identity, and to affective and social inferences such as liking and personality judgments. In this section, we review the literature on how these different aspects of face perception influence gaze following as well as to what extent the perception of another person is affected by their direction of gaze.
A number of studies have investigated the relationship between gaze and head orientation in cueing attention. Neurophysiological evidence showed that cells in the macaque STS code for head and gaze direction in the perception of social interactions and that cells coding for gaze direction are dominant in determining the neural response (Perrett et al., 1992). In other words, when presented with both gaze and head direction as cues for the other’s focus of attention, gaze takes precedence over head direction. Accordingly, Perrett and colleagues (1992) proposed that directional information from gaze, head, and body cues is combined hierarchically in a mechanism dedicated to detect another’s direction of attention (Perrett, Smith, Potter, et al., 1985). Nevertheless, recall that behavioral studies showed that the perception of gaze direction is affected by the orientation of the gazing head (Anstis et al., 1969; Cline, 1967; J. J. Gibson & Pick, 1963). Similarly, judgments of both head and gaze direction are equally impaired when they are incongruent, implying that head orientation plays a larger role in discerning social attention than was suggested by Perrett’s model (Langton, 2000; see also Langton & Bruce, 2000). Indeed, there is evidence that gaze and head orientation interact dynamically rather than in a strictly hierarchical fashion in triggering orienting responses in the observer.
Langton and Bruce (1999) demonstrated that, like gaze direction, head orientation yields robust cueing effects. They presented head stimuli that were averted to the left or right (full profile view), up or down. RTs to targets that appeared at congruent locations were faster than those to targets at incongruent locations. Whereas head and gaze direction were always compatible in their study (that is, the face was looking in the same direction that the head was turned toward), Hietanen (1999) manipulated both cues independently. In apparent conflict to Langton and Bruce (1999), he found that congruency of head and eye orientation reduced the cueing effect when the faces were averted by 30°. Similar results were also obtained when head and body orientation were used as cues; cueing effects were observed with incongruent but not congruent signals (Hietanen, 2002). This may appear counterintuitive, as one would expect combined cues to result in stronger orienting responses. However, he argued that in this condition, the eye gaze is not “averted” in terms of the observed person’s frame of reference; that is, although the head is averted, the face is in fact gazing straight ahead. When the gaze of the laterally averted face is shifted toward the opposite direction, the observer’s attention is oriented accordingly (see Figure 4). This suggests that the other person’s direction of attention is computed before being related to the observer’s own frame of reference. Furthermore, when another person is facing away from the observer (as in the congruent head and gaze condition), no interaction between both parties is established. As a consequence, the other’s behavior is perceived to be unrelated to the observer so that their direction of attention has less signal value. Orienting in response to another’s direction of attention therefore appears to be sensitive to social context in terms of their relation with the observer, which in turn is conveyed by a dynamic interaction between perceived gaze, head, and body orientation.
Whereas averting the head laterally clearly changes the perceived social interaction between the observed person and the observer, head orientation can also be manipulated in a nonsocial manner by turning the stimulus upside down. Inverting the face has been used extensively in face perception research, as it is known to disrupt holistic processing, which is assumed to underlie face recognition (Bartlett & Searcy, 1993; Yin, 1969). As noted before, face inversion can also affect gaze processing (J. Jenkins & Langton, 2003). This suggests that gaze and face processing interact at some level. If this interaction takes place at a level prior to the engagement of attentional orienting mechanisms, then gaze cueing could be disrupted. In paradigms using inverted faces, Kingstone, Friesen, and Gazzaniga (2000) and Langton and Bruce (1999) did produce some evidence supporting this idea in which gaze cueing was abolished when faces were inverted (however, see Tipples, 2005, who demonstrated that such an interaction between face orientation and gaze cueing is not mandatory).
Rotating the face isomorphically (i.e., clockwise or counter-clockwise from the upright) serves a different purpose than does using a laterally averted face: Rather than acting as an additional attentional cue (cf. Langton & Bruce, 1999), the face provides a context for a head-centered frame of reference. Previous research suggested that face stimuli are encoded in terms of intrinsic head-centered representations (Hommel & Lippa, 1995; Proctor & Pick, 1999). For example, the viewed left side of a face is encoded as the left side regardless of its orientation in space. Thus, the left cheek of an upside-down face would be encoded as the left part of the face even though it is in fact presented on the right side of the stimulus in pure spatial terms. This might explain why gaze-cueing effects are sometimes disrupted when the face stimulus is presented upside down (Kingstone et al., 2000; Langton & Bruce, 1999) as attention would be simultaneously biased toward the actual gazed-at side (spatial frame of reference) as well as the opposite side that would be gazed at if the face was upright (head-centered frame of reference, see Figure 5, Panel C).
To investigate how such head-centered frames of reference would interact with spatial frames of reference, Bayliss, di Pellegrino, and Tipper (2004) used a face that was rotated 90° clockwise or anticlockwise. The face would in fact be gazing toward the top or the bottom, but the detection targets would appear to the left or right of the face. Hence, the targets were never directly looked at. Without an active head-centered frame of reference, gaze-cueing effects would be impossible. However, they did find significant cueing effects as if the face had been presented upright. For example, if the face was rotated clockwise 90° from the upright position and gazed toward the top of the display, responses were faster for left targets compared with right targets (see Figure 5, Panel B). Bayliss and Tipper (2006a) further showed that viewing a rotated face simultaneously induces shifts of attention in the direction where the face would have been looking if it had been presented upright (i.e., left or right) and toward the actual spatial direction of gaze (i.e., up or down). This showed that head orientation influences gaze cueing such that attention is biased in accordance with the canonical view of the stimulus as well as in purely spatial frames of reference. It should also be noted that observed gaze direction induces spatial “Simon” effects (Simon, 1969; Zorzi, Mapelli, Rusconi, & Umilta, 2003). That is, although irrelevant to the task, the direction of observed gaze facilitates responses made by the corresponding hand. For example, a face looking left will result in quicker left-handed responses (as compared with right-handed responses) made to another feature (e.g., the color of the iris). It is interesting to note that head-centered gaze effects as reported by Bayliss et al. (2004; Bayliss & Tipper, 2006a) generalize to this procedure (Ansorge, 2003). Finally, in further support for the representation of gaze in a head-centered frame of reference, Seyama (2006) showed that adapting to a gaze direction presented in misoriented faces (e.g., via repeated exposure to the face oriented 90° clockwise shown in Figure 5b, in which the inferred direction of gaze is to the left in head-centered coordinates) influences judgments of gaze direction in upright faces (e.g., direct gazing faces are perceived as looking to the right).
Despite surviving significant perceptual modifications of the face stimulus, such as changes in orientation, gaze cueing can be sensitive to top-down modulation. Ristic and Kingstone (2005) presented an ambiguous stimulus that could either be perceived as a cartoon face with a large hat (in which case the critical features would serve as gazing eyes) or a cartoon car (in which case the same features would serve as wheels). Robust cueing effects were obtained only when the stimulus was referred to as a face, but not when it was perceived as a nonface object. With the same stimuli and instructions, Kingstone, Tipper, Ristic, and Ngan (2004) showed that activity in STS was higher in the face than in the car condition.
Although gaze perception and gaze cueing may be influenced by both bottom-up and top-down changes to the face context, it does not necessarily depend on face perception. That is, gaze cueing can be triggered even when eyes are presented alone (Kingstone et al., 2000). Local processing of the eye stimuli themselves may be more critical in determining the magnitude of the gaze-cueing effect than is the face context. For example, larger areas of white sclera may enhance gaze cueing (a factor that may be important when observing a fearful face; Tipples, 2005, 2006). Furthermore, equivalent gaze-cueing effects are found with a wide range of face stimuli, such as impoverished schematic drawings (e.g., Friesen & Kingstone, 1998; Ristic et al., 2002), computerized faces (e.g., Bayliss et al., 2004, 2005), or rich photographs of faces (e.g., Bayliss & Tipper, 2005; Frischen & Tipper, 2004, 2006; Langton & Bruce, 1999).
Frischen and Tipper (2004) further showed that the gaze-cueing effect is not modulated by the identity of the face. In their study, equivalent cueing emerged whether the same face was presented throughout hundreds of trials or whether each trial showed a different face. Hence, gaze cueing is neither abolished nor potentiated by the novelty of the gazing face. Frischen and Tipper (2004) also examined whether encoding of attentional states associated with a particular object was underlying the gaze-evoked inhibition effect they had obtained. Recall that in their experiments, the face stimulus gazed to one side and then disappeared to reappear a short while later gazing straight ahead. It was possible that the attentional processes activated by the gaze shift were retrieved when the same face was presented again and that this influenced responses to the target. However, equivalent cueing effects were observed when different faces were displayed during cue and target presentation and even when a completely unrelated nonface object followed the face producing the gaze cue. Thus, even the “longer term” IOR effect was not coupled with a particular face identity.
Hence, the processing of face identity does not seem to influence gaze cueing. However, further work has shown that gaze cueing can indeed be linked to processing of specific face identities under certain circumstances. Frischen and Tipper (2006) used a gaze-cueing paradigm with distinct face stimuli, half of which depicted famous people, as a means to produce distinctive episodes that could be retrieved from memory. Participants viewed a face shifting its gaze to one side and then processed a further 40 different faces shifting their gaze before the original face was presented again after approximately 3 minutes. In this second presentation, the face was gazing straight ahead and with a target presented to the previously gazed at or ignored side of the face. Under these circumstances, reliable gaze-cueing effects emerged for famous faces and for the left visual field. This is in spite of the fact that the target appeared 3 minutes after the gaze cue. These long-term gaze-cueing effects require gaze and face identity to be jointly encoded and later retrieved from memory. Kessler and Tipper (2004) proposed that long-term cueing effects are most likely to occur under optimal processing conditions that facilitate encoding and retrieval of the episode. Encoding of gaze direction (e.g., Kingstone et al., 2000, 2004), faces, and famous faces in particular (Tranel, Damasio, & Damasio, 1997) is predominantly achieved in the right hemisphere. Accordingly, the most robust long-term cueing effects are observed when famous faces orient attention via a gaze shift toward the left visual field, projecting to the right cerebral hemisphere. The specificity of the long-term gaze-cueing effect to the left visual field lends further support to the notion that attentional states can be encoded along with a particular object. In particular, attention shifts that are evoked via observed gaze can be associated with the specific face producing the gaze cue, but only in specific experimental contexts.
In support of a role for identity processing under some circumstances in gaze cueing, Deaner, Shepherd, and Platt (2007) showed stronger cueing for personally familiar faces even in a short-term cueing task. Of interest, this was only found with female participants, whereas the cueing effects of the male participants to whom the faces were familiar did not differ from those of the male participants to whom the faces were novel. This highlights an interesting issue of individual differences in gaze cueing, an issue we will return to in a later section.
What about the relationship between perceived gaze direction and changeable aspects of the face, such as the emotional expression? Intuitively, one would expect that gaze following should be influenced by the nature of the observed facial expression. For example, someone looking in a certain direction with a fearful expression likely indicates the presence of something threatening and potentially dangerous at that location. It would be adaptive for the observer to focus their attention on that location more rapidly or thoroughly than if the observed person had a benign or neutral expression.
Surprisingly, behavioral studies have often failed to consistently demonstrate a clear influence of emotional expression on gaze cueing, at least when RTs are used as a measure. Hietanen and Leppänen (2003) varied the expression of the face in a gaze-cueing paradigm independently of gaze direction. Although they used a variety of emotional expressions (happy, angry, fearful, and neutral), both with schematic drawings and naturalistic photographs of faces, as well as a wide range of cue-target SOAs (14 ms to 600 ms), they found no evidence for an interaction between gaze direction and facial expression. Mathews, Fox, Yiend, and Calder (2003) also presented faces with either neutral or fearful expressions. Furthermore, they distinguished between participants with high and low trait anxiety because they reasoned that anxious people would be more sensitive toward implied threat (such as signaled by a fearful face; see, e.g., Fox, Russo, & Dutton, 2002). They found that observation of fearful gaze resulted in larger cueing effects than did observation of neutral gaze. Of noted importance, however, this was true only for highly anxious participants; the low anxious group showed no difference between fearful and neutral gaze-cue conditions. More recently, Tipples(2006) noted that in both Hietanen and Leppänen’s (2003) and Mathews et al.’s (2003) studies, the emotional expression was presented before the onset of the averted gaze cue. Tipples (2006) showed that by simultaneously presenting the gaze cue and a change in facial expression from neutral to emotional, orienting to the direction of a fearful face was reliably potentiated. This finding is also in line with neural imaging studies showing enhanced brain activity in response to dynamic compared with static displays of facial expression (Sato, Kochiyama, Yoshikawa, Naito, & Matsumura, 2004). Just as in Mathews et al.’s (2003) study, this effect of emotional expression was stronger in highly anxious participants (see also Holmes, Richards, & Green, 2006).
The effect of observed gaze direction on perception of emotional expressions appears more reliable. Adams, Gordon, Baird, Ambady, and Kleck (2003) reported that differential sensitivity of the amygdala to faces displaying anger or fear varied as a function of gaze direction. Amygdala activity was less pronounced in situations that clearly signal a threat in the environment (e.g., a fearful face with averted gaze) or clearly signal threat to an observer (e.g., an angry face with direct gaze), than it was in situations in which the source of threat requires additional interpretation by the observer (e.g., an angry face with averted gaze or a fearful face with direct gaze). Likewise, fearful faces coupled with averted gaze and angry faces coupled with direct gaze are recognized more quickly than are either fearful faces with direct gaze or angry faces with averted gaze (Adams & Kleck, 2003).
Accordingly, Adams and his colleagues (Adams et al., 2003) argued that the amygdala may play a special role in processing threat-related ambiguity and that gaze is highly relevant in resolving such ambiguity. These findings are in line with the notion that there are distinct patterns of neural activity involved in basic approach and avoidance categories of emotion, motivation, and affective response (e.g., Cacioppo & Gardner, 1999; Davidson, 1995; Davidson & Irwin, 1999). According to this view, an approach system operates through emotions that convey and motivate social interactions (e.g., happiness), whereas an avoidance system operates to facilitate withdrawal from aversive situations (e.g., fear, disgust). Adams and Kleck (2005) extended their initial findings by demonstrating the influence of gaze direction on the perception of other approach (joy and anger) and avoidance (sadness and fear) emotional expressions. Gaze direction influenced the perception of faces with neutral, ambiguous, and prototypical emotional expressions such that the dispositions of faces with averted gaze were rated as more avoidance oriented than were those with direct gaze. It is interesting to note that these results show, contrary to the notion that direct gaze enhances face perception relative to averted gaze (cf. Hood et al., 2003; Mason et al., 2004), the way in which observed gaze affects face perception critically depends on the specific emotion expressed. That is, for avoidance-type expressions, perception is, in fact, enhanced by averted rather than direct gaze.
Thus, as well as various aspects of face perception influencing gaze cueing, the opposite interaction can be observed in which gaze shifts can influence the way that the other person is perceived. Of interest, neural activation related to processing of faces is also sensitive to the context in which gaze shifts away from the observer occur. For example, as noted previously, fusiform activity in response to the same face stimulus changes depending on whether the viewed face is shifting its gaze correctly to focus on a visual target or whether it fails to do so (Pelphrey et al., 2003).
Bayliss and Tipper (2006b) investigated the context in which observing another’s gaze shift away from the observer might differentially influence perception of that individual’s personality. Gaze shifts do not always indicate a point of interest in the environment but can be used to deceive the observer instead (see Emery, 2000, for a review). Even some nonhuman primates appear to be able to derive when another individual cannot see a piece of food within their own view and can use this knowledge to apparently ignore the food by orienting elsewhere, thereby effectively deceiving their opponent in food competition tasks (Fujita, Kuroshima, & Masuda, 2002; Hare, Call, Agnetta, & Tomasello, 2000). In humans, observation of deceptive behavior influences the perception of the associated individual. Faces perceived as being deceptive are recognized better (Tanida, Shimoma, Mashima, Ma, & Yamagishi, 2003) and are judged as less trustworthy (Singer, Kiebel, Winston, Dolan, & Frith, 2004) than are those of cooperative individuals. Bayliss and Tipper (2006b) used a gaze-cueing paradigm in which some faces would always gaze toward the subsequent target location (cooperative gaze), some faces never gazed toward the target location (deceptive gaze), and others were equally likely to gaze toward or away from the target (unpredictive gaze). Speeded responses to the target were not influenced by gaze contingency. That is, participants followed the gaze of all types of face, despite the negative consequences of following the deceptive faces. Although the gaze-cueing effect appeared to be blind to these contingencies, there were consequences for what the participants felt about the faces that had cued them. Specifically, faces whose gaze never indicated the target location were judged as being less trustworthy than were faces exhibiting cooperative gaze behavior. It is of interest to note that participants felt that they had viewed the deceptive faces more often, presumably because these deceptive encounters are more important to commit to memory than are the more natural cooperative episodes. This provides further evidence that whereas gaze following is not usually influenced by semantic properties of the face, observed gaze behavior does affect perception of the other person.
Bayliss, Paul, Cannon, and Tipper (2006) demonstrated a somewhat different relationship between observed gaze direction and affective response in the observer. They examined whether the affective appraisal of objects depends on whether they are being looked at by another person. Indeed, objects that had been looked at were rated more favorably than were objects that had been looked away from. This effect did not appear to be mediated by the attention shift that was elicited by the gaze cue as there was no relationship between the size of the gaze-cueing effect and the “liking” effect. Furthermore, whereas similar cueing effects were evoked by arrows cues, no affective boost for cued objects was found. Thus, observed gaze elicits shifts of attention and influences emotional responses to objects, but these two effects appear to be independent, with the affective response being specific to social cues.
Bayliss, Frischen, Fenske, and Tipper (in press) investigated whether this effect of observed gaze on object appraisal is further modified by the emotional expression of the observed face. Studies with infants have suggested that their preference for objects that are the focus of another person’s interest differs depending on the emotional signal expressed by that person. Objects that are interacted with when a person expresses happiness are subsequently explored more than they are when the experimenter expresses disgust (Mumme & Fernald, 2003; Repacholi, 1998). In Bayliss et al.’s (in press) study, objects that were looked at with a happy expression were liked more than were objects looked at with an expression of disgust, although attention was equally cued in the direction of observed gaze by happy and disgusted faces. These results demonstrated that facial expression does modulate the way that gaze cues are used by observers, such that objects that are looked at by another person are evaluated with respect to the valence of the observed facial expression.
Another person’s direction of gaze can also influence affective judgments of other people. This effect is modified by gender differences. Jones, DeBruine, Little, Burriss, and Feinberg (2007) found that women rate male faces that are being looked at by a female face as more attractive when the female face is smiling than when it has a neutral expression. Conversely, men prefer male faces that are being looked at by a female face with a neutral expression compared with those that are being looked at by a smiling female face. This shows that the effects of observed gaze and emotional expression are sensitive to the perceived social relation between the observer and the gazing face as well as to the social relation between the observer and the face that is being looked at by the other person.
This review of the literature on gaze and face perception suggests that although information regarding gaze direction and face identity or regarding expression appears to be processed in different regions of the brain, these types of information can influence each other. Judgments of gaze direction can be influenced by the context of the face containing the eye region, and vice versa, perceived gaze direction affects judgments of semantic aspects of the face such as likeability or emotional expressions. These influences are further modified by the social context in which they occur, for example, whether gaze or facial expression signal approach or avoidance. Likewise, the effect of face perception on gaze cueing or the reverse relationship needs to be viewed in light of the specific circumstances in which they occur. Although changing perceptual or semantic properties of the face stimulus does not appear to affect the short-term gaze-cueing effect in the general population, such influences can be glimpsed when the experimental context encourages encoding of individual face encounters or when taking into account individual differences in the observers. Similarly, manipulating the contingencies between the gaze cue and the target that is being gazed at does not change the cueing effect but nevertheless influences evaluation of the face itself. These findings highlight the need to take into account contextual factors when considering the relationship between gaze and face processing. Thus, rather than attempting to answer the question whether these processes are modular or dependent on one another, it seems more fruitful to try to define the boundary conditions and examine the circumstances under which interactions between gaze and face processing might occur and how they manifest themselves.
The literature on sex differences in adults is extensive, both in the domain of cognitive skills (e.g., spatial cognition, at which male participants often perform better than do female participants; Ecuyer-Dab & Robert, 2004; Geary, 1998) and in decoding of nonverbal behavior (Hall, 1978). There is also recent evidence that the mirror neuron system, discussed above, is weaker in male systems (Cheng, Tzeng, Decety, Imada, & Hsieh, 2006). Individual differences in the sensitivity to other people’s eye gaze are detectable very early in development by the age of around 12 months. Male infants make less eye contact (Lutchmaya, Baron-Cohen, & Raggatt, 2002) and orient toward faces less (Connellan, Baron-Cohen, Wheelwright, Batki, & Ahluwalia, 2000; Lutchmaya & Baron-Cohen, 2002) than do female infants. A strong biological component to the development of eye contact behavior is suggested because of its early emergence and because of the finding of a significant quadratic relationship between prenatal testosterone levels and the amount of eye contact made (Lutchmaya et al., 2002). Furthermore, a recent study has shown stronger joint attention in female infants than male infants at 12 months (Olafsen et al., 2006). Although the developmental and social psychology literature had documented sex differences in response to social stimuli such as faces and gaze, the attention literature has rarely concerned itself with how males’ and females’ attention systems may respond in simple cueing tasks.
Bayliss et al. (2005) investigated whether a reflexive phenomenon such as gaze cueing would reveal sex differences in a normal adult population. The findings were striking: Male participants did indeed display significantly weaker gaze-cueing effects than did female participants (see also Deaner et al., 2007). Although the gender difference in gaze cueing may have come as little surprise to developmental and social psychology researchers, the idea that such a simple orienting response is markedly different in one half of the population may be an unexpected result in the research of attention. To investigate whether this gender difference was specific to social cues, Bayliss et al. (2005) also compared male and female performances with centrally presented arrow cues and peripheral sudden-onset cues. Peripheral cueing revealed no difference between male and female performances. However, with arrow cues, male participants showed no performance differences at cued versus uncued locations, whereas female participants showed a standard cueing effect (see also Merritt et al., 2005).
As well as the sex difference in Bayliss et al.’s (2005) data, a negative correlation between gaze-cueing magnitude and score on the Autism-Spectrum Quotient (AQ), developed by Baron-Cohen, Wheelwright, Skinner, et al. (2001), was found. This questionnaire was designed to be appropriate for testing nonclinical populations to provide an estimate of the number of autistic-like traits that an individual participant displays. In Bayliss et al.’s data, the higher a participant scored on the AQ, reflecting more autistic-like traits, the weaker was their gaze-cueing effect. This adds some support to Bayliss et al’s assertion that the gender difference in gaze cueing is somehow related to the fact that male individuals, on average, share more cognitive traits with people who have a diagnosis of an autism-spectrum disorder than do normal female individuals. However, it is difficult to determine what the primary factor in an individual’s response to a gaze cue is: Is it their gender or their position on the autism spectrum?
In another study, Bayliss and Tipper (2005) tested mainly female participants’ performance at detecting targets appearing on either whole or scrambled faces and, in a separate experiment, targets on whole or scrambled tools. A gaze or an arrow cue served as the nonpredictive cue to attention. Although significant cueing was found in all conditions, participants with low scores on the AQ (i.e., fewer than average autistic-like traits) tended to display stronger cueing toward whole objects than toward the scrambled displays, irrespective of cue type. The high AQ participants (again, mostly female) showed only slightly weaker cueing effects in general, but their orienting style was different. Larger cueing effects were found in this group when targets appeared on scrambled displays compared with those when whole objects appeared.
This finding has a number of implications. First, because high and low AQ participants showed strong cueing effects, it suggests that sex, rather than AQ score, may play a larger role in determining the overall strength of attention shifts evoked by central nonpredictive directional stimuli. However, performance on the AQ is correlated with the degree to which holistic stimuli attract attention when they are cued. This interaction may relate to the well-documented global processing bias in normal participants (Navon, 1977), with people with autism showing a bias toward processing local information (U. Frith & Happé, 1994; Happé, 1996, 1999; Mottron, Belleville, & Menard, 1999; Mottron, Burack, Iarocci, Belleville, & Enns, 2003). Bayliss and Tipper’s (2005) data suggest that such orienting biases may vary within the normal population as a function of position on the autism spectrum. This intriguing possibility may suggest that some sort of weak central coherence may exist in the normal population and that the global processing bias is not as universal as once assumed.
How do individual differences in gaze and arrow cueing emerge? There are a number of possibilities that have implications for research into social cognition and attention. First, it is possible that the cognitive systems underlying gaze and arrow cueing are separate and are both impaired in male individuals. Alternatively, it may be that gaze and arrow cueing are built on the same cognitive system, which is weaker in male individuals (the evolutionary primacy and importance of gaze processing would mean that arrow cueing was a secondary adaptation of the system). The second of these explanations seems to fit better logically because it would be inefficient to allow the evolution of an entirely separate system that enables humans to reflexively follow other directional signals than gaze. However, Hietanen, Nummenmaa, Nyman, Parkkola, and Hamalainen (2006) provided evidence that there are indeed dual systems for orienting to gaze and arrows. Hence, a third, yet not mutually exclusive, possibility is entertained: Male individuals have better control over how symbolic stimuli can direct their attention. In most gaze-cueing studies, the participants are informed that the direction of the cue is uninformative of target location. Rather than having a weaker gaze/arrow-cueing system, male individuals could have a more effective system of control over their attention systems, being able to inhibit the irrelevant gaze cues.
In some support of this notion, S. V. Shepherd et al. (2006) reported individual differences in the use of gaze in nonhuman primates. Low-status male monkeys demonstrated gaze cueing in response to the averted gaze of peers and higher status monkeys, with saccadic RT patterns that were indicative of reflexive orienting. Higher status monkeys oriented more strongly to the direction of other high-status males but were not influenced by the gaze cues of lower status monkeys. This suggests a degree of voluntary control over social attention. That is, whether gaze cueing appears to be the result of reflexive or voluntary processes may be modulated by individual differences. Further, the high-status monkeys had signs of having higher testosterone levels than those of the low-status monkeys, which may form part of an explanation for differences in human social attention (Lutchmaya et al., 2002) and empathy (Hermans, Putman, & van Honk, 2006) and in turn help explain the sex differences reported by Bayliss et al. (2005).
This notion that individual differences in gaze and arrow cueing may be explained by differences in top-down control is worthy of further investigation. It is likely that the two main hypotheses, that processing of these cues is weaker in male participants or that male participants are able to exert more control over the influence of central cues, are both correct to some extent. That is, perhaps slightly weaker processing of cues enables voluntary processes to be engaged more readily such as to control the influence of gaze and arrow cues.
The finding of significant individual differences in orienting of attention in the general population warrants a note of caution for the interpretation of data from neuropsychological studies. Such studies often use patients with focal brain lesions to draw conclusions about the neural architecture underlying cognitive phenomena. However, given that these processes appear to vary widely among normal individuals, it is often unclear how an individual with brain damage would have performed prior to his or her injury.
For example, Vecera and Rizzo (2004, 2005) have presented data that suggest that without a fully functioning frontal lobe, gaze cueing does not occur. The patient, EVR, who has damage to the frontal lobe, fails to orient to the direction of predictive or nonpredictive gaze cues. The authors use this as evidence against the idea that gaze cues result in the automatic allocation of attention to where someone else looks. That is, because the frontal lobes are responsible for executive, voluntary control processes, damage to the frontal lobe should not result in impairment of automatic processes. Because gaze cueing does not appear in this patient, it is concluded that gaze cueing in the general population cannot be the result of a reflexive process. Although this conclusion may eventually prove to be correct, the heterogeneity of gaze-cueing effects in the normal population (Bayliss & Tipper, 2005; Bayliss et al., 2005) leaves the very real possibility that the individual patient EVR may have never shown gaze cueing before his lesion occurred. Indeed, EVR is a male patient, so it is unsurprising that he produced no gaze-cueing effects because male patients tend not to do so (Bayliss et al., 2005).
Vuilleumier (2002) provided evidence for a special neural system underlying eye-gaze cueing. Patients with lesions to temporoparietal regions of the right hemisphere often present with unilateral neglect, an attentional deficit for processing stimuli presented on the left (contralesional) side of space (Rafal, 1998). In its severe form, neglect can lead to a complete ignorance of the left side of space in spite of an intact visual field. Extinction is a more common residual deficit, whereby contralesional stimuli are not reported only when a competing ipsilesional stimulus is present. This may be because of greater competitive weight being applied to the ipsilesional stimulus, to the additional detriment of processing of the contralesional one (e.g., di Pellegrino, Basso, & Frassinetti, 1997). In the Vuilleumier study, patients with neglect were found to show improved detection of contralesional stimuli if the competing ipsilesional stimulus was a face gazing to the contralesional side. It is a surprising result because any concurrent ipsilesional stimulation might be expected to impair performance in extinction patients. Of noted importance, the reduction in extinction was not repeated for arrow stimuli, suggesting that only social cues of biological origin can ameliorate neglect. However, later experiments showed that only gaze cueing, but not arrow cueing, was found in the same patients. It is therefore unclear as to whether a contralesional arrow would still have no effect in patients who display both arrow- and gaze-cueing effects.
Patients with focal lesions to the superior temporal areas are rare in humans, but one such patient, with a right superior temporal gyrus lesion, has been studied recently by Akiyama and colleagues (Akiyama, Kato, Muramatsu, Saito, Nakachi, & Kashima, 2006; Akiyama, Kato, Muramatsu, Saito, Umeda, & Kashima, 2006). This patient, MJ, finds it hard to make eye contact, and she is impaired at determining the direction of observed gaze. We find it very interesting to note that MJ shows no gaze-cueing effect, but nonpredictive arrows do cue her attention. This is exactly what one would predict if the human superior temporal gyrus is responsible for interpreting gaze direction and for providing information to the attention system regarding gaze direction, whereas a separate system encodes arrows (Hietanen et al., 2006). This is in line with the results of Vuilleumier (2002), whose patients with more dorsal lesions showed the opposite dissociation, gaze cueing in the absence of arrow cueing. Unfortunately, this single-case study, like the study of Vecera and Rizzo (2004, 2005), cannot delineate the neural basis of gaze cueing simply because lots of individuals do not show gaze cueing. If the two types of cue are really underpinned by separate neural systems, then one might expect that gaze- and arrow-cueing effects do not correlate strongly within an individual. At present, it is unclear whether people who show strong arrow-cueing effects also exhibit robust gaze cueing. This stresses the need for research to examine individual differences in performance in their own right rather than to try to avoid their influences.
The intention here is not to criticize the work of Vecera and Rizzo (2004, 2005) and Akiyama and colleagues (Akiyama, Kato, Muramatsu, Saito, Nakachi, & Kashima, 2006; Akiyama, Kato, Muramatsu, Saito, Umeda, & Kashima, 2006); after all, in the literature, it has long been assumed that in nonpsychiatric participants, gaze cueing is universal. It is now clear that this ubiquity is not the case. Individuals can show a great range of cueing magnitudes, in contrast to the effect of peripheral cues, which are consistent across participants (Bayliss et al., 2005; it should be noted that other laboratories have also noted that not all individuals show gaze cueing; see Hietanen et al., 2006, p. 412). Instead, rather than a universal effect, it is simply a robust effect given an appropriately sized random sample. Therefore, when investigating the neural basis of gaze cueing by analyzing behavior following deactivation of a focal brain region, it should be noted that single-case lesion studies may not be wholly adequate. In light of variability in gaze cueing in the normal population and of the fact that prelesional data on gaze cueing are unlikely to be available, the use of transcranial magnetic stimulation on healthy participants is a more appropriate technique to be used if we are to achieve the goal of delineating the neural basis of gaze cueing. Because participants in this case can be screened for whether they show reliable gaze cueing in the first place, areas such as the STS, inferior parietal lobe, frontal eye fields, and other frontal areas could be investigated.
The individual differences in gaze and arrow cueing identified by Bayliss et al. (2005) and Bayliss and Tipper (2005) have been limited to normally developed young adult men and women. So, how do adults and children with autism-spectrum disorders perform in simple gaze-cueing paradigms? Given the difficulties with joint attention in autistic populations (Charman, 2003), one would expect that such groups should show even weaker gaze and arrow cueing than do normally developing male participants. However, the heterogeneity of the autism-spectrum disorder population, along with procedural differences, has led to a rather more equivocal picture. For example, Leekam et al. (1998) looked at responses to targets presented alongside a head-turn cue in children with and without autism. They found that it was only the children with autism and a low mental age who failed to use gaze direction to shift their own attention. Relatively high-functioning children with autism show similar gaze-following behavior as children without autism. In computer-based tasks, comparable results have been found. For example, children with autism at age 2 years (Chawarska et al., 2003) and with high-functioning autism at age 10 years (Swettenham et al., 2003) appear to show cueing effects with moving eyes that are similar to those of normally developing children.
It is possible that the fact that the eyes moved in these three studies, enabling more low-level analysis of the gaze cue (i.e., in terms of motion, not gaze perception per se), had led all but the most impaired participants (the low mental age children in the autism group of Leekam et al.’s 1998 study) to show gaze cueing. However, subsequent studies have confirmed cueing effects in response to static gaze direction in children with high-functioning autism (at around age 10 years; Kylliäinen & Hietanen, 2004; Senju, Tojo, Dairoku, & Hasegawa, 2004). Such variations also occur in older participants with autism. For example, Ristic et al. (2005) found that nonpredictive gaze cues did not result in an automatic shift of attention in their group of high-functioning young adults with autism. On the other hand, another group of high-functioning young adults with autism were able to use a predictive gaze cue to shift attention voluntarily to a cued location.
So what can be concluded from these conflicting results? There are several problems for interpretation. First, procedural differences between laboratories may account for some of the observed differences. Gaze cueing is a relatively new procedure, and there is no particular standard method that all researchers should use, but factors like stimulus duration, cue-target SOA, task demands, fatigue, and practice effects have not been investigated rigorously, and these may differ in their influence in the autistic population. Second, the heterogeneity of the autism spectrum makes it unlikely that any two groups will perform in the same way, especially when the procedures used are different. So, cueing effects and time course differences between the groups may be distorted by main effects of performance fluency. Nevertheless, what we can conclude is that under most circumstances (head turns, eye movements, sudden onset of averted gaze), children with high-functioning autism do reflexively shift their attention in the direction of another person’s attention.
Regardless of the role of procedural factors in determining such variations within the autistic population, it appears that gaze direction cues attention differentially in people with autism compared with normally developing individuals. For example, in contrast to participants with autism, the use of gaze cues is much more nuanced in typically developing samples. Participants without autism show asymmetric cueing effects (Frischen & Tipper, 2006; Okada, Sato, & Toichi, 2006; Vlamings, Stauder, van Son, & Mottron, 2005), whereas people with autism orient to left and right gaze cues equally (Vlamings et al., 2005). Further, differences between people with autism and control participants are modulated by target location expectancy (Senju et al., 2004). This is congruent with additional findings from Senju et al. (2004), whereby typically developing children showed differences in cueing between counterpredictive gaze and arrow cues, whereas children with autism exhibited the same effects for both types of cues.
A final example of how gaze cues are used differently by children with autism and by control participants comes from a motor task used by Pierno, Mari, Glover, Georgiou, and Castiello (2006). These authors found that control participants who observed a model gazing at an object were quicker to initiate a reach to that object than they were when the model gazed away. A similar effect was found when the model had reached for the object previously. However, neither action observation nor observation of gaze direction influenced the performance of children with autism. The authors suggest that the intentional aspect of gaze perception is not internalized into the autistic children’s motor system in the same way as in normally developing children (see also Jellema et al., 2000). Taken together, these findings suggest that the system that allows a gaze cue to shift attention in normal samples is at least more flexible, has access to higher level processing, or is perhaps even a different system to that which results in gaze cueing in people with autism.
The above literature review also reveals a very interesting developmental time course issue: Stronger gaze-cueing effects are shown in the studies with younger children with autism. It is the adult samples (Ristic et al., 2005; Vlamings et al., 2005) who report atypical orienting. This observation is surprising, as one would think that the use of compensation strategies would have led to the improvement of the use of gaze in older people with autism and, because of the delayed social development, that younger children would show weaker effects. The converse appears to be true: In spite of a developmental delay, children with autism at age 2 years and above show gaze cueing (Chawarska et al., 2003), whereas adults show no gaze cueing (Ristic et al., 2005). Previously in this article, we have discussed the possibility that individual differences in social attention may be due to the degree of control an individual can exert over the gaze-following reflex (Bayliss et al., 2005; S. V. Shepherd et al., 2006). This raises the possibility that perhaps this age difference in gaze cueing in people with autism is due to a development of voluntary control over joint attention in autism, whereas normally developed adults retain a more reflexive cueing response. Recall that Dalton et al. (2005) reported greater amygdala activity in autistic individuals when they attended to the eyes. This might account for the fact that increased arousal is often reported when people with autism make eye contact. One adaptive strategy to reduce the greater emotional response triggered by eye gaze would be for individuals with autism to learn to avoid looking at the eye region of other people. Indeed, eye contact is often shunned by children with autism. This avoidance could be driven by an inhibitory mechanism preventing the person orienting to the eye region in the first place.
Such an avoidance strategy could potentially explain much of the individual variation in gaze-cueing magnitudes in the normal population (e.g. Bayliss et al., 2005) and could have critical implications in autism. That is, a failure to orient covertly or overtly to the eye region in the first place could account for individual differences in gaze cueing. As noted before, Itier et al. (2007) showed that averted gaze has a weaker effect on attention if the eye region is not fixated. One would presume that if the eye region is not fixated, then the quality of the gaze direction signal would be weaker and, consequently, gaze cueing would be weaker. Simply, how can you orient in the direction of gaze if the gaze-cueing system has little information with which to compute the appropriate direction? Future studies using careful analysis of gaze fixation patterns, or manipulation of task instruction to lead participants to attend to different parts of the face, may shed light on individual differences in gaze cueing. A key question is whether variations in the reflex to orient to the eye region can account for differences in cueing or if the individual differences emerge from variations in the gaze-cueing system itself.
Observing another person’s gaze direction leads to shifts of attention in the corresponding direction. What does this finding tell us about attentional orienting? For decades, attention research has been based on the assumption that two distinct types of orienting are triggered by different types of cues: Peripheral sudden-onset (exogenous) cues were assumed to trigger reflexive involuntary orienting of attention, whereas central symbolic (endogenous) cues were thought to induce nonreflexive voluntary shifts of attention. With the introduction of gaze cues, this previously clear distinction became suddenly blurred. Now there were centrally presented cues that did not directly indicate a likely target location, and yet automatic shifts of attention were consistently observed. As such, gaze cueing resembles peripheral cueing in the following ways: First, gaze-evoked cueing effects emerge rapidly even at short SOAs (e.g., Friesen & Kingstone, 1998; Langton & Bruce, 1999). Second, those cueing effects arise at short cue-target intervals even if the cue is counterpredictive (e.g., Driver et al., 1999; Friesen et al., 2004). Third, both types of cue elicit inhibition that is evident at longer SOAs (Frischen et al., in press; Frischen & Tipper, 2004). The fact that such automatic shifts of attention could be triggered by centrally presented, nonpredictive cues has led some researchers to suggest that eye gaze is a special attentional cue because of its biological significance (e.g., Friesen & Kingstone, 1998, 2003b; Langton & Bruce, 1999). However, further investigations using arrow cues showed that they, too, can induce rapid shifts of attention, even if they are uninformative with regard to the likely target location (e.g., Hommel et al., 2001; M. Shepherd et al., 1986; see also Soto-Faraco, Sinnett, Alsius, & Kingstone, 2005, for similar findings with gaze and arrow cues in tactile attention). This shows that biologically relevant (gaze) and biologically irrelevant (arrow) central cues trigger very similar attention shifts, which in turn resemble those elicited by peripheral cues. Thus, eye gaze may not be as different from other types of cues as previously suggested, at least in terms of their basic behavioral effects. In particular, it seems that the old distinction between exogenous and endogenous orienting does not apply with regard to different types of cues.
Unfortunately, research continues to focus on relabeling these “new” cueing effects in terms of the already existing categories. The questions that are often posed are the following: Is gaze cueing exogenous or endogenous? And where does cueing by arrows fit in? Validation for this kind of categorization is sought by looking at the similarities between these types of cueing and how they correspond to the old classification (see Lambert et al., 2006, for a discussion on the difficulty of such categorization). Specifically, given the aforementioned cueing patterns of gaze and arrow cues, they are now commonly regarded as reflexive or exogenous. However, it is imperative that the differences between the effects evoked by various cues are taken into account, too. Indeed, despite their broad resemblance, there are more subtle differences between gaze, arrow, and peripheral cueing. For example, the time course of gaze-evoked orienting effects differs from that of peripheral cueing. Facilitation effects build up more gradually in response to gaze cues than peripheral cues, and the onset of IOR is relatively delayed (Frischen & Tipper, 2004). Paradoxically, precisely the time course of gaze cueing has been cited as evidence for exogenous-type orienting (Friesen & Kingstone, 1998), when in fact, one might argue that its time course more closely resembles that of endogenous orienting (cf. Cheal & Lyon, 1991). On the other hand, gaze cueing can be regarded as reflexive in the sense that it cannot be suppressed. When the target is more likely to appear in the uncued location (i.e., the cue is counterpredictive), gaze cues nevertheless trigger attention shifts in the gazed-at, but unpredicted, direction at short cue-target intervals (Driver et al., 1999). The directional incentive of arrow cues can easily be overridden so that orienting occurs to the predicted location only (Friesen et al., 2004). It has therefore been suggested that both gaze and arrow cueing are reflexive, but gaze cueing is more strongly so.
At this point we should perhaps question the usefulness of categorizing cueing effects in terms of existing labels. One must keep in mind that those categories were defined on the basis of specific patterns of responses that were, at the time, observed with different types of cues. Does it therefore make sense to take new effects and try to sort them into those groups? Is it not more fruitful to further explore the boundary conditions and more closely examine the specific circumstances under which certain effects are observed?
For example, it is becoming more and more apparent that basic attention orienting is influenced by individual differences, such as gender, autistic tendencies (Bayliss et al., 2005; Bayliss & Tipper, 2005), or anxiety (Mathews et al. 2003; Tipples, 2006). The fact that normal male participants show different orienting styles in simple cueing studies has implications for attention research. It means that the early confusion over the differences between peripheral and symbolic control of attention may have been due in part to sampling biases, with more male participants in experiments with null effects and more female participants in experiments with significant effects. Unfortunately, standard demographic information is often omitted from method sections in basic attention studies, so it is unknown whether this factor truly contributed to the conflicting results. Although it is likely that methodological differences can explain much of the confusion (B. S. Gibson & Bryant, 2005), the field still has some catching up to do if these sex differences do indeed indicate basic differences in the way that cues to attention are used by men and women. Clearly, replication and extension of these sex differences with different cueing paradigms are necessary.
With regard to inhibition effects, it appears that like facilitation cueing, IOR in response to peripheral cues is not affected by gender (Bayliss et al., 2005). However, it is unknown whether sex differences exist with IOR in response to gaze or arrow cues. Given that gender differences in gaze cueing seem to emerge primarily at later SOAs (e.g., 700 ms; Bayliss et al., 2005), with women showing larger facilitation effects than those of men at those intervals, it is possible that this factor contributes to the delayed emergence of IOR with gaze cues. Indeed, the samples in the Frischen and Tipper (2004) study that demonstrated gaze-evoked IOR contained mainly female participants. Therefore, it may be that this time course of cueing is specific to women, but not to men, who might show an earlier onset of inhibition effects. Taking into account individual differences could help to resolve the confusion regarding similarities and differences between different types of cueing.
It is also important to investigate the role of the context in which orienting of attention occurs. As demonstrated by Bayliss and Tipper (2005), gaze-cueing effects differ depending on the perceptual coherence of the target object. This is another demonstration that the nature of the target processing demands can have profound impacts on the way that attention is directed in a scene (Jordan & Tipper, 1998). This issue is particularly pertinent for attention researchers wishing to investigate the attentional control of social orienting. The object of joint attention—that is, the stimulus that both parties look at—is a vital element of a social attention episode. Orienting to the correct object in the scene, rather than toward any object or the general hemispace, is a defining feature of the advancement from mere gaze following to joint attention in infants (Emery, 2000). Furthermore, using gaze cues and object interactions will enable researchers to investigate the intentional aspect of gaze (Pelphrey et al., 2003; Pierno, Mari, et al., 2006). That is, where people look indicates their likely next action because people tend to attend to their own actions, and we are surprised when they act on an object other than the one at which they are looking. Hence, although it has been all but ignored thus far (but see Bayliss & Tipper, 2005; Friesen, Moore & Kingstone, 2005; Lobmaier et al., 2006), manipulations involving the target object may well prove very promising in future work, if attention researchers are truly interested in the nature and mechanisms of social attention. Perhaps pursuing such lines of inquiry may prove the gaze-cueing paradigm’s true worth to the study of the development of flexibility in joint attention and hence unlock the problems with joint attention associated with clinical populations such as people with autism.
Gaze cueing is informative with regard to the neural architecture of attention orienting and its origins. As already noted, stimulus-driven and voluntary shifts of attention are subserved by different though interacting neural circuits (e.g., Corbetta & Shulman, 2002; Rosen et al., 1999). Some neuroimaging studies suggest that the neural map activated by observing gaze direction more closely resembles exogenous than endogenous orienting. Further, covert shifts of attention show similar activation to that elicited by saccadic eye movements (see Grosbras et al., 2005, for a meta-analysis). This similarity between neural activation of eye movements, attention orienting, and gaze perception accords with the finding that observing another person’s action activates a corresponding motor program (see, e.g., Gallese, Keysers, & Rizzolatti, 2004; Iacoboni & Dapretto, 2006; Rizzolatti & Craighero, 2004, for reviews). That is, observation of a particular action elicits the same patterns of activity in the premotor areas that are also recruited when performing the action. This has been found for a whole range of actions, from foot and arm movements to movements of the mouth (Buccino et al., 2001). Besides premotor cortex, a distributed network of brain areas interact in the human mirror system, including temporal, parietal cortices, although the basal ganglia and cerebellum may also have roles (Kessler et al., 2006). It is reasonable to assume that a similar action/observation-matching mechanism would also be engaged by observing eye movements. Indeed, observing gaze triggers involuntary saccades in the corresponding direction, lending support to the notion that the observed action is mirrored (Mansfield et al., 2003; see also Ricciardelli et al., 2002).
Could such mirror activation account for gaze cueing? Saccade preparation and shifts of covert attention are closely linked. Cells in the monkey SC show enhanced firing rates if a target that is to be saccaded to is within their receptive fields, even prior to the actual execution of an eye movement (Goldberg & Wurtz, 1972). The SC contains both visual and oculomotor cells, suggesting that stimulus-driven orienting and eye movements are coupled (Mohler & Wurtz, 1976; Wurtz & Mohler, 1976). Another brain area that contains both visual and oculomotor neurons is the frontal eye field (FEF), which is also active during covert shifts of attention and saccade preparation (e.g., Connolly, Goodale, Menon, & Munoz, 2002; Gitelman et al., 1999; Schall, 1997). Finally, involvement of the STS has been reported during covert shifts of attention, gaze perception, and eye movements, and neurons in the STS project directly to the FEF (e.g., Grosbras et al., 2005; Hooker et al., 2003; Komatsu & Wurtz, 1989; Nobre, Gitelman, Dias, & Mesulam, 2000; Schall, Morel, King, & Bullier, 2005). Such findings are in congruence with the premotor theory of attention, which holds that the preparation of goal-directed actions and attention shifts are tightly linked (Rizzolatti, Riggio, & Sheliga, 1994; Rizzolatti, Riggio, Dascola, & Umiltà, 1987). That is, spatial selective attention mechanisms are proposed to originate in neural circuits that code visually guided movements in space in terms of motor requirements. Assuming that observing another person’s saccade activates a corresponding motor program, this mirror system-evoked saccade preparation would thus result in an attention shift in the corresponding direction.
In line with the notion of a link between mirror system activity and joint attention, it has been suggested that a dysfunctional mirror system may be a root cause in the development of autism (Williams, Whiten, Suddendorf, & Perrett, 2001). Mirror neuron activity is thought to play a pivotal role not only in action imitation but also in decoding the goal of the action and, ultimately, understanding another person’s intentions—all key elements in theory of mind, and precisely the skills that are impaired in autism. Indeed, mirror system activation is markedly decreased in individuals with autism during observation and imitation of movements of the lips and hands as well as of emotional facial expressions (Dapretto et al., 2006; Nishitani, Avikainen, & Hari, 2004; Oberman et al., 2005; Theoret et al., 2005; see Iacoboni & Dapretto, 2006, for a review). Furthermore, mirror neuron activity is related to the level of social functioning—the stronger the mirror activity, the better the social skills (Dapretto et al., 2006). Whether mirror system activity is also impaired in autistic individuals during observation and imitation of gaze shifts remains to be established. Some evidence points toward that possibility, as both gaze following (Langdon et al., 2006) and mirror system activity (Quintana et al., 2001) appear to be overactive in people with schizophrenia, which supports the notion of a functional link between the two.
Koval, Thomas, and Everling (2005) reported evidence against the mirror hypothesis of gaze orienting. By using a similar saccade-target paradigm as Ricciardelli et al. (2002), they showed that when antisaccades were required where participants have to make a saccade toward the opposite side of a target stimulus, observed gaze direction can be effectively ignored. Saccadic RTs were actually shorter when the executed saccade was in the opposite direction to the observed gaze compared with those when performed gaze and observed gaze were congruent. However, whereas both Mansfield et al. (2003) and Ricciardelli et al. presented the gazing face concurrently with the target, the face was no longer present when the saccade target appeared in Koval et al.’s study. It is possible that the evoked mirror activation is rather short lived so that performance is more susceptible to task-dependent top-down modulation when presentation of the observed action has ceased. Indeed, as the authors themselves point out, it may be the case that saccades are initially prepared in the direction of perceived gaze but are then modified on the basis of current task demands. Furthermore, during action observation, in addition to premotor activation, corticospinal projections are modulated in an inverse way so that the mirrored action does not interfere with executed action (Baldissera, Cavallari, Craighero, & Fadiga, 2001). Such an action inhibition mechanism may be differentially engaged depending on the task at hand. Koval et al.’s (2005) finding once again demonstrated the importance of taking into account contextual factors that impact gaze-cueing behavior.
The influence of inhibitory mechanisms in the perception of other people’s actions may be a critical variable (Jonas et al., 2007). As discussed above, inhibition of gaze signals emerge slowly (Frischen & Tipper, 2004), perhaps more slowly than does inhibition of action-imitative responses. This may suggest a subtle and flexible role for inhibition of gaze cueing. Performing the same action as another may often be socially inappropriate. (Imagine you see someone reach for his glass: While you simulate what you see, it would often violate social norms if you were to execute the same action.) However, orienting to the direction of another’s gaze is less costly and is usually beneficial. Therefore, although the rapid inhibition of imitative actions may be a natural state of the mirror system for observation of limb actions, a similar inhibitory system in gaze observation may be perhaps controlled by a higher level system and hence may be computationally more complex. Finally, one more difference between imitation of action and imitation of gaze is that although one can observe one’s own actions and their consequences, one can only see the consequences of one’s own eye movements.
If a mirror neuron-type system is involved in gaze following, then the FEF may be a candidate region. The FEF can be considered part of the premotor cortex, responsible for generating and programming saccades. Hence, the gaze-matching system could be influenced by processes in this area and by interactions with STS. In conclusion, gaze following is the result of an action observation/action execution-matching system par excellence, but how similar it is to the mirror neuron system is not clear.
It is possible that both gaze cueing and orienting of attention via arrow cues have evolved from similar mirror system mechanisms. Directional arrows may have emerged as schematic representations of gaze direction or pointing gestures, both of which are coded in the STS. Of course, at present, such considerations remain speculative because too little is known about the precise role of the STS and other brain areas in orienting attention to gaze and other symbolic cues, and even less is known about the neural architecture of a mirror system for gaze. The STS is more strongly activated during gaze cueing than arrow cueing, which is in line with the behavioral observation that orienting of attention is “more reflexive” in response to gaze cues than to arrows (Friesen et al., 2004; Hooker et al., 2003). Could this be because gaze observation directly activates the hypothetical mirror system, whereas arrow cueing recruits only the attention orienting system that has evolved from it? After all, gaze cues are biological stimuli that carry social meaning, a dimension that arrow cues are lacking. Further studies into the neural basis of gaze- and arrow-evoked attention shifts are needed to clarify the role of the STS in each. Also, the contribution of the FEF to gaze and arrow cueing remains unclear. Most neuroimaging studies of gaze cueing have either focused on regions of interest such as the STS, the fusiform face area, or the IPS, or have used arrow cueing as a baseline against which to compare gaze cueing. Perhaps both processes recruit similar neural circuits to a greater extent than commonly assumed, which would make it difficult to identify the underlying neural basis for each in a direct comparison.
As a final note, the use of observed gaze direction to induce attention shifts provides intriguing opportunities for attention research. It is the sociobiological aspect of gaze cues that will stimulate a myriad of research possibilities because interactions between attention and social cognition can be directly investigated. In this way, gaze cues introduce a new level of ecological validity that has often been lacking in traditional cognitive research (Kingstone, Smilek, Ristic, Friesen, & Eastwood, 2003). With the use of biologically relevant stimuli, findings from laboratory research may be generalized back to the reallife encounters that are being mimicked. Indeed, much of our everyday behavior is intrinsically linked to social interactions with other people. Using social cues such as gaze allows researchers to investigate attention processes under much more natural, yet controlled, conditions than those used in traditional approaches. Observing another person’s behavior allows the observer to decode and make inferences about a whole range of mental states such as intentions, beliefs, and emotions. Aside from ecological validity, the study of attention via the use of gaze cues is a perfect opportunity to develop a multifaceted approach to research questions surrounding the function of visual attention. That is, the study of how attention serves other processes such as social cognition (e.g., face perception, emotional processing), and vice versa, can now be addressed in a more profitable way than if restricted to using traditional arrow cues. Hence, the study of gaze cues has facilitated the merging of previously separate fields of psychology investigating cognitive processes and social variables, respectively. Bridging the gaps between different disciplines is perhaps one of the great challenges faced by science, and only in overcoming this hurdle can knowledge be truly advanced.
The study of how we process other peoples’ eyes is a fascinating topic which encompasses several realms of psychological, sociological, anthropological, clinical, and neuroscientific investigation. Taking its place alongside decades of research about the psychophysics of gaze perception and the influence of another’s gaze on social perception and interactions, the gaze-cueing paradigm promised much. Now, with a strong and expanding knowledge base, the previously disparate fields of attentional cueing, social psychology and social neuroscience can progress in a mutually beneficial direction. Though there are some basic questions that do require urgent attention (e.g., what is the specific role of the STS in gaze following?), we are confident that the field will quickly find elegant ways of addressing and overcoming such problems. If gaze-cueing and related paradigms can contribute to our knowledge about attention and social functioning as strongly as is hoped, then substantial progress in the understanding of social functioning in normal human interaction, in infancy, and in developmental disorders (especially autism), will surely follow.
Work on this article was supported by Economic and Social Research Council Postdoctoral Fellowship ESRC PTA-026-27-0980 and Leverhulme Early Career Fellowship ECF/2006/0205 awarded to Andrew P. Bayliss. Some of the research reported in this article was supported by a Wellcome Trust Programme and Grant ESRC RES-000-23-0429 awarded to Steven P. Tipper.
1When drawing comparisons between monkeys and humans, one should keep in mind that although there are striking similarities in the neural architecture of visual processing, the inferred homology of specialized brain areas is not necessarily straightforward (Orban, van Essen, & Vanduffel, 2004). For example, whereas selectivity for processing faces is predominantly found in the fusiform face area in humans, in macaques it is more strongly confined to the STS (Tsao, Freiwald, Knutsen, Mandeville, & Tootell, 2003). Nevertheless, although there are certainly differences in the anatomical organization of brain areas involved in face processing, the functionality of the human STS region with regard to gaze cueing is apparently very similar to its monkey counterpart; both the human and macaque STS regions respond preferentially to eye gaze, compared with other face-responsive brain areas (see Allison et al., 2000, for a review).
2Senju and Hasegawa (2005) have demonstrated that the gap effect does interact with the attentional impact of the observation of direct gaze. We thank an anonymous reviewer for pointing out that this implies that the SC is involved in holding attention on a direct gazing face but not in generating the shift of attention induced by the observation of an averted gaze cue, as shown by Friesen and Kingstone (2003a). Note also that in gaze-cueing studies with infants, the central face stimulus is removed prior to target onset, akin to the gap effect paradigm (e.g., Farroni et al., 2000; Hood et al., 1998). It is, however, unclear whether the resulting facilitated disengagement of attention from the face stimulus is necessary to achieve gaze cueing in babies.