|Home | About | Journals | Submit | Contact Us | Français|
Annika Paukner, Laboratory of Comparative Ethology, NIH Animal Center, PO Box 529, Poolesville, MD 20837, USA.
Kazuo Fujita, Department of Psychology, Graduate School of Letters, Kyoto University, Sakyo-ku, Kyoto 606, Japan
Many studies have used mirror-image stimulation in attempts to find self-recognition in monkeys. However, very few studies have presented monkeys with video images of themselves; the present study is the first to do so with capuchin monkeys. Six tufted capuchin monkeys were individually exposed to live face-on and side-on video images of themselves (experimental Phase 1). Both video screens initially elicited considerable interest. Two adult males looked preferentially at their face-on image, whereas two adult females looked preferentially at their side-on image; the latter elicited lateral movements and head-cocking. Only males showed communicative facial expressions, which were directed towards the face-on screen. In Phase 2 monkeys discriminated between real-time, face-on images and identical images delayed by 1 second, with the adult females especially preferring real-time images. In this phase both screens elicited facial expressions, shown by all monkeys. In Phase 3 there was no evidence of discrimination between previously recorded video images of self and similar images of a familiar conspecific. Although they showed no signs of explicit self-recognition, the monkeys’ behaviour strongly suggests recognition of the correspondence between kinaesthetic information and external visual effects. In species such as humans and great apes, this type of self-awareness feeds into a system that gives rise to explicit self-recognition.
Chimpanzees can use televised images of their hands to find hidden targets that are visible only on the television screen (Menzel et al. 1985), and use televised images to explore parts of their bodies that cannot be seen directly (Savage-Rumbaugh and Rubert 1986, see also Hirata, 2007). These acts confirm visual self-recognition as shown in studies using mirror-image stimulation (for reviews: Anderson and Gallup 1999; Gallup et al. 2002). Television and video studies of self-recognition abilities have also been conducted with gorillas (Law and Lock 1995) and dolphins (Marten and Psakaros 1995).
Unlike chimpanzees and other great apes, monkeys have not been shown to recognize themselves either with mirrors or video (Anderson and Gallup, 1999; Gallup et al. 2002). Video and televised images of conspecifics have been presented to monkeys in studies of communication (Miller 1967), the reward value of social video content (Andrews and Rosenblum 2002; Swartz and Rosenblum 1980), personality influences on perception of social interactions (Capitanio 2002), contagious yawning (Paukner and Anderson 2006), and environmental enrichment (Platt and Novak 1997). A few reports hint at the potential value of video as a tool to study self-recognition in monkeys (see Anderson 2000). Rhesus monkeys reportedly preferred a video of themselves to one of a roommate or an unfamiliar conspecific as a reward for performing a joystick task (Washburn et al. 1997). In contrast, when a group of cotton-top tamarins were shown live televised images of themselves, a previously filmed video of themselves, and a videotape of unfamiliar conspecifics, the latter elicited the most visual interest, although a mirror was more effective than any video stimulus (Neiworth et al. 2001). Both the on-line video and the mirror elicited “contingency testing”, defined by the authors as consisting of multiple swooping head movements towards the mirror or monitor, or multiple movements of a limb observed while gazing at the mirror or monitor.
Iriki et al. (2001) reported that Japanese monkeys learned to use televised images of their hands to pick up food. Furthermore, when a laser pointer spot was directed onto one hand, the monkeys reportedly tried to touch the spot with the other hand, which the authors interpreted as self-recognition akin to that shown by chimpanzees in Gallup’s mark-test of facial self-recognition (Gallup, 1970; Gallup et al. 2002). However, Iriki et al. (2001) presented neither details of procedures nor quantitative data.
The present report is the first to describe the reactions of capuchin monkeys (Cebus apella) to video images of themselves. In addition to assessing potential social stimulus properties of video images, we were interested in whether video self-confrontation might yield insights about capuchins’ awareness of self, given the continuing interest in the cognitive correlates of self-recognition (Gallup et al., 2002). In Phase 1 we presented two real-time images of the monkey, one face-on, the other side-on. As face-on mirror reflections elicit more facial expressions and visual interest than profile reflections in capuchins (Anderson and Roeder 1989), we predicted more interest in and social responses to the face-on video screen. Phase 2 presented the monkey’s live face-on image along with the same image delayed by 1 sec. In the absence of prior work with non-human primates, studies with human infants (see Miyazaki and Hiraki 2006) led us to predict a preference for the delayed image. Finally, in Phase 3 one video screen presented the monkey with its own image, recorded the previous day, while the other screen presented the previously recorded image of a familiar group-member. As Washburn et al. (1997) reported that rhesus monkeys preferred video of themselves compared to video of a familiar conspecific, we predicted a similar preference in capuchins.
Six tufted capuchin monkeys (Cebus apella) were tested: 2 adult males (Heiji, 8 years old; Pigmon, 5), 3 adult females (Zilla, 8; Kiki, 7; Theta, 7), and one juvenile male (Zinnia, 2). All captive group-born and –raised, they lived as a group in an indoor multi-cage complex of over 3m3. They were on long-term loan to the laboratory from the Primate Research Institute of Kyoto University through its Cooperative Research Program. The monkeys received a diet of commercial monkey pellets and varied fresh fruits and vegetables every day, a portion of which they received as rewards in a variety of experimental tasks. Water was available ad libitum. Room temperature and humidity were maintained at around 30C and 85%, respectively. None of the monkeys had previous experience of seeing monkeys on video.
The monkeys were individually tested in a familiar, transparent-walled test box (46 × 46 × 52 cm) in a familiar room. For added interest, on half of the test sessions two familiar ceramic tiles (4cm × 4 cm) were placed in the test box before the monkey entered, but this variation is ignored here. Sessions were filmed with Sony Mini-DV digital camcorders, and the video images were presented on 2 15” Sharp LCD monitors (Model LL-T1530A). One monitor was centered against one wall of the box, and the other monitor was similarly positioned against an adjacent wall, giving the monkey an unobstructed view of both screens. The cameras were positioned just above the screens so as to capture as much of the monkey as possible. The monkey’s head always showed on the screens except if the monkey approached very close and crouched to look below the screen, stood bipedally to look over the top of it, or peered beyond the edge of the screen at close range. Figure 1 shows examples of monkey responding to their face-on image.
The study consisted of three phases. In Phase 1 one camera transmitted a real-time, face-on image on the screen directly below it, when the monkey faced in that direction. When the monkey turned to face the second screen, the same camera now transmitted the side view of the monkey onto that screen; the camera above this second, ‘side-on’ screen simply recorded the monkey’s behaviour, including looking at this second screen. In Phase 2 one screen showed a real-time, face-on image, while the second screen showed an identical image but delayed by 1 second, an effect produced by an I–O Data hard disk video recorder (Model Rec-On VR-HDA80). In Phase 3 one screen showed a previous face-on recording of the monkey, while the other screen showed a face-on recording of another group-member.
The two cameras were switched on before the monkey was brought into the test room. The monkey entered the test box from the transport box, then the experimenter left the room, at which point the 15-min session started. In Phase 1, on one screen the monkey saw a real-time face-on image of itself, and on the other screen a real-time side-on view of itself; importantly, only with the face-on image was a close approximation to eye contact possible. The respective positions of the two screens were maintained for all 10 sessions, which were conducted over a 2-week period, with no more than one session per day.
Phase 2 was identical to Phase 1 except that the screen previously showing the side-on view now showed a face-on view, again in real time. The other screen replicated this face-on image but with a 1-second delay. Thus, movements visible on both video screens were dependent upon the monkey’s movements, but in one case it was a live image and in the other case delayed feedback. Ten sessions were run over a 2-week period, with no more than one session per day.
In Phase 3 one screen presented a recording of the monkey itself, recorded on the real-time, face-on camera in the final session of Phase 2, while the second screen showed a similar recording of one of the monkey’s group-mates. On the second day of Phase 3 the videos used were those that were taken during the previous day’s recording, and so on, until 4 sessions were completed. The stimulus monkey was matched as closely as possible to the subject for size, and was used for all 4 sessions.
The videotapes of the monkeys’ responses to the images were analyzed on a time-coded Sharp 17” video monitor. Playing speed was slowed down from regular speed (30 frames per second) for frame-by-frame analysis when appropriate. Frequency and duration of all instances of looking towards each screen were recorded, using the video record from the corresponding camera. Looking was defined as head and eyes oriented towards the screen, regardless of bodily orientation. Also coded from videotape were all occurrences of facial expressions, yawns, lateral head and body movements/head-cocking, and self-scratching while looking at the screen. Videotapes from all sessions except two in Phase 1 were coded by JRA, the others by AP. Those two sessions were subsequently coded blind by JRA. For every monkey, agreement between the two coders for frequency of looking was 80% or above. Furthermore, the inter-observer correlations for durations of all agreed looking events were between +.75 and +.94, all at p < 0.001 or less, except for one (N = 12) at p < 0.01. Analyses consisted of repeated-measures analyses of variance (ANOVAs) with screens (N=2) and days (N=10 for Phases 1 and 2, N=4 for Phase 3) as factors. Individual monkeys’ responses to the two video conditions in each phase were analyzed using paired t-tests with scores from each session treated as data points. For all analyses alpha was set at 0.05.
The overall mean frequency of looking at the two screens was similar: 51.0 and 56.7 times per session for face-on and side-on screens, respectively. Concerning total duration of looking, the main effect of days was significant, F9, 36 = 15.19, P < 0.001, as was the screens × days interaction, F9, 36 = 2.76, P = 0.015 (see Figure 2). After looking for a long time at both screens on Day 1, the monkeys habituated to the side-on view rapidly, whereas habituation to the face-on view was more erratic.
Regarding the mean duration of individual looks, the longest occurred on Day 1 (mean: 38.1 frames), the shortest on Day 7 (14.8), with a slight recovery on Day 10 (23.1), F9, 36 = 7.88, P < 0.001. The screens × days interaction was significant, F9, 36 = 3.89, p = 0.002. Both screens received their mean longest looks on Day 1 (face-on: 39.8 frames, side-on: 36.3). Looks towards both screens then became shorter, with face-on looks lasting slightly longer than side-on looks until Day 8, when the trend was reversed. The dominant adult male, Heiji, looked more frequently at the face-on than the side-on image (means: 33.7 and 20.9 respectively), t = 2.91, df = 9, P = 0.017, d = 10.8. Heiji also looked longer at the face-on screen (means: 926.2 vs. 571.6 frames, respectively), t = 2.21, df = 9, P = 0.054, d = 0.59. For the younger adult male, Pigmon, individual looks toward the face-on screen were longer (means: 21.7 vs. 15.1 frames), t = 2.52, df = 9, P = 0.033, d = 0.70. The adult female Kiki looked longer overall at the side-on screen than the face-on screen (means: 973.6 and 691 respectively), t = −2.91, df = 9, P = 0.018, d = 0.64; the corresponding difference for frequency of looking (means: 39.9 vs. 30.2) was not significant, t = −2.06, df = 9, P = 0.069, d = 0.73. Zilla looked at the side-on screen on average 38.7 times per session and at the face-on screen 29.7 times, but this difference failed to reach significance, t = −2.17, df = 9, P = 0.058, d = 0.61.
Only the males displayed facial expressions, directed either preferentially or exclusively towards the face-on image (Table 1). Four monkeys showed lateral head movements/head-cocking, and with one exception all of these occurred while looking at the side-on image. Two other behaviours are noteworthy: On 21 occasions Heiji emitted “whistle series” vocalizations while looking at his face-on image, compared to only 4 times while looking at his side-on image. The adult female Theta turned to look back at the face-on image (occasionally under her arm, or through her legs) 38 times, whereas she did this only 4 times with her side-on image.
In Phase 2 the group as a whole looked at the real-time and delayed images on average 35.7 and 29.3 times per session, respectively, a non-significant difference. Mean total duration of looking was longer for the real-time image (915.3 vs. 603.9 frames), F1, 36 = 13.84, P = 0.02, d = 1.16. The main effect of days on mean bout length was significant, F9, 36 = 2.15, P = 0.051: on Day 1 the mean bout was 24.0 frames; this measure peaked at 28.2 on Day 3 and declined gradually thereafter. There were no significant interactions.
All adult females looked preferentially at the real-time screen; frequencies are shown in Figure 3 along with those of the males (Zilla: t = 3.82; Kiki: t = 7.05; Theta: t = 3.56, all df = 9, P = 0.004, d = 1.46, P < 0.001, d = 2.40 and P = 0.0065 d = 1.44, respectively). Furthermore, individual looks at the real-time image were usually longer than towards the delayed image (Zilla: 25.4 frames vs. 15.4, t = 2.31, df = 9, P = 0.046, d = 1.04; Kiki: 24.0 vs. 18.2, t = 5.56, df = 9, P = 0.0004, d = 0.93; Theta: 22.9 vs. 17.4, t = 2.86, df = 9, P = 0.019, d = 0.89), and total duration of looking was longer (Zilla: 425.6 vs. 126.6, t = 5.31, df = 9, P = 0.0005, d = 1.94; Kiki: 794.4 vs. 286.7, t = 7.58, df = 9, P < 0.0001, d = 2.24; Theta: 1136.3 vs. 645.8, t = 5.81, df = 9, P = 0.0003, d = 1.48). The male Heiji showed no clear bias for either image; he looked at the real-time image for 440.1 frames vs. 267.9 frames for the delayed image, but this was not a significant difference, t = 1.92, df = 9, P = 0.088, d = 0.93. Like the adult females, the juvenile male Zinnia looked more frequently at the real-time image, t = 3.00, df = 9, P = 0.015, d = 0.80 (see Figure 3), although duration measures were not significantly different between screens. Finally, Pigmon looked at the delayed image more frequently, t = −3.5, df = 9, P = 0.008, d = 1.15, and for longer overall (698.6 vs. 1538.7), t = −3.98, df = 9, P = 0.041, d = 1.45.
The two screens did not differentially elicit facial expressions, lateral movements/head-cocking (Table 1), vocalizing, or scratching, but there were some notable individual responses. In this phase all 3 females showed some facial expressions. The juvenile Zinnia was again highly responsive, especially to the real-time image. Heiji and Kiki performed 5 and 6 lateral movements/head-cocking responses while looking at the real-time screen, respectively, and none while looking at the delayed-image screen. Heiji produced three “whistle series” vocalizations while looking at the real-time image, and 11 while looking at his delayed image. Finally, Pigmon scratched himself twice while looking at his real-time image, and 7 times while looking at his delayed image.
There were no significant group-level or individual-level differences in measures of looking at the two screens in Phase 3. Only one facial expression occurred, shown by Zinnia toward his previously recorded image. The same individual was the only one to show lateral movements/head-cocking, which he did three times when looking at the image of the other monkey.
Although Phase 3 consisted of only 4 sessions, frequency and total duration measures of looking exceeded those in Phase 2 (see Table 2), reversing a clear reduction in visual interest shown towards the video images between Phases 1 and 2.
In Phase 1 the capuchin monkeys showed little evidence of a clear visual preference for either face-on or side-on views of themselves. They looked longest at both screens on the first day, when the side-on image was especially frequently looked at. Some clear individual preferences emerged, however: the two adult males looked preferentially at their face-on images, whereas two adult females showed greater interest in side-on images. Almost all facial expressions were directed toward face-on images, either real-time (males in the first phase, both sexes in the second phase), or delayed (both sexes, second phase). In contrast, lateral movements and head-cocking were mostly elicited by the side-on image. These results concur broadly with the literature on primates’ responses to their reflections in mirrors, in particular the initial peak in interest followed by habituation (Anderson 1984, 1994), and the greater social responsiveness toward face-on reflections compared to side-on or profile views (Anderson and Roeder 1989). Further data are required to confirm whether males and females diverge in their responses to face-on and side-on views.
Compared to mirror-image stimulation, the face-on video images elicited relatively few social responses (see Anderson 1994; Anderson and Roeder 1989; de Waal et al. 2005; Marchal and Anderson 1991). All of the social responses were affinitive, consisting of the eyebrow raise facial expression, sometimes accompanied by the posture known as “bunny sit”, also shown by males towards their reflection in de Waal et al. (2005). Although their responses appeared affiliative, adult females did not show lipsmacking, unlike in the study by de Waal et al. (2005). In Phase 1, two females appeared more interested in their profile views, and showed lateral movements and head-cocking. In callitrichid monkeys head-cocking signals curiosity (Menzel, 1980), whereas in capuchins it is often an affiliative and courtship-related social act (Carossi and Visalberghi 2002). Although repeated movements in front of a mirror are sometimes interpreted as “reality-“ or “contingency testing” (e.g., Eglash and Snowdon 1983; Riviello et al. 1992) the lateral movements and head-cocking observed here both looked like attempts to get a better look at the side-on monkey, possibly to make eye-contact with it.
The monkeys showed no evidence of explicit self-recognition; there was no obvious contingency testing or use of the image to explore normally unseen body parts. These observations concur with recent assessments of mirror self-recognition in capuchins (Paukner et al. 2004; Roma et al. 2007). One noteworthy behaviour shown by a female, was frequent glancing back at the face-on image, but this was did not appear to be related to self-recognition. These glances allowed her to make eye contact with her image, something that was not possible with the side-on view. The behaviour appeared exploratory rather than playful, leading us again to suggest that seeking periodic eye contact with the image was the main aim.
The prediction that the capuchins would prefer a delayed image to a real-time image was not supported, with one exception. In fact, the group overall and especially the females showed an overall bias for the real-time image. This general picture contrasts with 7-month-old human infants’ (see Miyazaki and Hiraki 2006) response to real-time vs. delayed images of themselves. It has been suggested that for younger human infants contingency in video images of self has no more than a marginal effect on visual attention, and that they might not even be sensitive to delays smaller than 2 sec (Rochat and Striano 2002). In contrast, capuchin monkeys clearly did discriminate between a live image and a 1-sec-delayed image.
Diminished visual attention in Phase 2 probably reflects continued generalized habituation to video images. The juvenile male again showed the most facial expressions, directed preferentially to the real-time image. In Phase 2 the adult females also displayed some facial expressions, but with no bias towards either the live or the delayed image. The data suggest that social responses to video stimuli emerge later in females than males.
It is conceivable that monkeys might perceive a 1-sec delayed image of themselves as imitating, or at least responding to their movements (see Paukner et al. 2005; Rochat and Striano, 2002). However, there was no evidence of this, for example, no preferential looking (with one exception) or contingency testing. The dominant male did emit “whistle series” vocalizations more frequently while looking at the delayed image (Phase 2), and at the face-on image (Phase 1), but what this means is unclear. Whistle series vocalizations play a role in the coordination of spatial movements during travel, and signal food presence (Di Bitetti 2003), neither of which appears relevant here. The second adult male scratched himself more frequently while looking at his delayed image. Scratching is considered a sign of mild tension or arousal (Diezinger and Anderson 1986; Maestripieri et al. 1992), so this male might have been puzzled by his delayed image.
Discrimination between the real-time image and delayed image in Phase 2 raises the issue of what kind of self-awareness capuchin monkeys might have. Three-year-old children, but not 2-year-olds, passed a mark test of self-recognition when viewing 1-sec-delayed video images of themselves; however, a 2-sec delay resulted in failure (Miyazaki and Hiraki 2006). Iriki et al. (2001) stated that real-time images were necessary when training macaques to retrieve food via a video monitor, but no relevant details were presented. Those authors’ claim that video-mediated finding of food equates to visual self-recognition appears too strong, in the absence of any complementary evidence that the monkey recognized its hands, or any other body parts, as its own. We cannot discount the possibility that delay-induced differences in perceived bodily orientation of the monkey observed on the screens contributed to the monkeys’ discrimination between self-produced real-time and delayed images. However, given existing literature on mirror- and video-induced contingency testing by monkeys (see Introduction), it seems likely that the monkeys recognized an association between kinaesthetic activity and external visual effects. The direction of the preference - for real-time or delayed images – appears unimportant; what counts is the fact that discrimination occurred.
Although the final phase consisted of only 4 sessions, there was an obvious recovery in viewing time compared to earlier phases. As there were no notable differences in responses to the two screens, it is unknown what caused the increased interest: the lack of contingency (both screens) or the change in identity of one stimulus monkey (one screen). Neiworth et al. (2003) reported that cotton-top tamarins showed more interest in video images of unfamiliar conspecifics than on-line or previously recorded images of themselves. Further work is required to determine the role of different factors in habituation reversal for previously recorded self-images presented simultaneously with images of other conspecifics. We can conclude, however, that in the limited, free-viewing conditions of the present study, there was no clear preference for videos of self (cf Washburn et al. 1997).
In a review of earlier self-recognition research in nonhuman primates, Gallup (1991) called for a move away from mirror-image stimulation (MIS) studies. The call has gone largely unheeded. Recent studies have used variants of MIS (Paukner et al., 2004), refined behavioral comparisons between MIS and real conspecifics (de Waal et al. 2005), assessed previously unstudied species (e.g., talapoin monkeys, Posada and Colell 2005), modified the classic mark test of self-recognition (Heschl and Burkart 2006), and attempted explicit mirror training (Roma et al. 2007). While such developments are welcome, we agree with Gallup (1991) that they are not enough; a comprehensive understanding of self-awareness requires multiple approaches. The present study has shown that, as in the study of human self-recognition, video images have a role to play. Given the likely role of visual-kinaesthetic matching ability in mirror self-recognition (Mitchell 1997), video-based techniques may offer a promising way to explore self-processing in species that fail to show explicit self-recognition in mirror tests.
This study was supported by JSPS Grants-in-Aid for Scientific Research Nos. 13410026 and 17300085 to KF and 21st-Century COE Program, D-10 to Kyoto University, and by awards from the Carnegie Trust for the Universities of Scotland and the Great Britain Sasakawa Foundation to JRA. This study complied with Kyoto University’s Guide for the Care and Use of Laboratory Primates.