Automated facial tracking was successfully applied to create real-time re-synthesized avatars that were accepted as being video by naive participants. No participant guessed that we were manipulating the apparent video in their videoconference conversations. This technological advance presents the opportunity for studying adaptive facial behaviour in natural conversation while still being able to introduce experimental manipulations of rigid and non-rigid head movements without either participant knowing the extent or timing of these manipulations.
The damping of head movements was associated with increased A–P and lateral angular velocity. The damping of facial expressions was associated with increased A–P angular velocity. There are several possible explanations for these effects. During the head movement attenuation condition, naive participants might perceive the confederate as looking more directly at him or her, prompting more incidents of gaze avoidance. A conversant might not have received the expected feedback from an A–P or lateral angular movement of a small velocity and adapted by increasing her or his head angle relative to the conversational partner in order to elicit the expected response. Naive participants may have perceived the attenuated facial expressions of the confederate as being non-responsive and attempted to increase the velocity of their head nods in order to elicit greater response from their conversational partners.
As none of the interaction effects for the attenuated conditions were significant, the confederates exhibited the same degree of response to the manipulations as the naive participants. Thus, when the avatar's head pitch and turn variation was attenuated, both the naive participant and the confederate responded with increased velocity head movements. This suggests that there is an expected degree of matching between the head velocities of the two conversational partners. Our findings provide evidence in support of a hypothesis that the dynamics of head movement in dyadic conversation include a shared equilibrium: both conversational partners were blind to the manipulation and when we perturbed one conversant's perceptions, both conversational partners responded in a way that compensated for the perturbation. It is as if there were an equilibrium energy in the conversation and when we removed energy by attenuation, and thus changed the value of the equilibrium, the conversational partners supplied more energy in response and thus returned the equilibrium towards its former value.
These results can also be interpreted in terms of symmetry formation and symmetry breaking. The dyadic nature of the conversants' responses to the asymmetric attenuation conditions is evidence of symmetry formation. But head turns have an independent effect of negative coupling, where greater lateral angular velocity in one conversant was related to reduced angular velocity in the other: evidence of symmetry breaking. Our results are consistent with symmetry formation being exhibited in both head nods and head turns, while symmetry breaking being more related to head turns. In other words, head nods may help form symmetry between conversants while head turns, contribute to both symmetry formation and symmetry breaking. One argument for why these relationships would be observed is that head nods may be more related to acknowledgement or attempts to elicit expressivity from the partner, whereas head turns may be more related to new semantic information in the conversational stream (e.g. floor changes) or to signals of disagreement or withdrawal.
With the exception of some specific expressions (e.g. Keltner 1995
; Ambadar et al. 2009
), previous research has ignored the relationship between head movements and facial expressions. Our findings suggest that facial expression and head movement may be closely related. These results also indicate that the coupling between one conversant's facial expressions and the other conversant's head movements should be taken into account. Future research should inquire into these within-person and between-person cross-modal relationships.
The attenuation of facial expression created an effect that appeared to the research team as being that of someone who was mildly depressed. Decreased movement is a common feature of psychomotor retardation in depression, and depression is associated with decreased reactivity to a wide range of positive and negative stimuli (Rottenberg 2005
). Individuals with depression or dysphoria, in comparison with non-depressed individuals, are less likely to smile in response to pictures or movies of smiling faces and affectively positive social imagery (Gehricke & Shapiro 2000
; Sloan et al. 2002
). When they do smile, they are more likely to damp their facial expression (Reed et al. 2007
Attenuation of facial expression can also be related to cognitive states or social context. For instance, if one's attention is internally focused, the attenuation of facial expression may result. Interlocutors might interpret damped facial expression of their conversational partner as reflecting a lack of attention to the conversation.
Naive participants responded to damped facial expression and head turns by increasing their own head nods and head turns, respectively. These effects may have been efforts to elicit more responsive behaviour in the partner. In response to simulated maternal depression by their mother, infants attempt to elicit a change in their mother's behaviour by smiling, turning away and then turning again towards her and smiling. When they fail to elicit a change in their mothers' behaviour, they become withdrawn and distressed (Cohn & Tronick 1983
). Similarly, adults find exposure to prolonged depressed behaviour increasingly aversive and withdraw (Coyne 1976
). Had we attenuated facial expression and head motion for more than a minute at a time, naive participants might have become less active following their failed efforts to elicit a change in the confederate's behaviour. This hypothesis remains to be tested.
There are a number of limitations of this methodology that could be improved with further development. For instance, while we can manipulate the degree of expressiveness as well as the identity of the avatar (Boker & Cohn in press
), we cannot yet manipulate specific facial expressions in real time. Depression not only attenuates expression, but also makes some facial actions, such as contempt, more likely (Ekman et al. 2005
; Cohn et al. 2009
). As an analogue for depression, it would be important to manipulate specific expressions in real time. In other contexts, cheek raising (AU 6 in the facial action coding system) (Ekman et al. 2002
) is believed to covary with communicative intent and felt emotion (Coyne 1976
). In the past, it has not been possible to experimentally manipulate discrete facial actions in real time without the source person's awareness. If this capability could be implemented in the videoconference paradigm, it would make possible a wide range of experimental tests of emotion signalling.
Other limitations include the need for person-specific models, restrictions on head rotation and limited face views. The current approach requires manual training of face models, which involves hand labelling about 30–50 video frames. Because this process requires several hours of pre-processing, avatars could be constructed for confederates but not for unknown persons, such as naive participants. It would be useful to have the capability of generating real-time avatars for both conversation partners. Recent efforts have made progress towards this goal (Lucey et al. in press
; Saragih et al. 2009
). Another limitation is that if the speaker turns more than about 20° from the camera, parts of the face become obscured and the model can no longer track the remainder of the face. Algorithms have been proposed that address this issue (Gross et al. 2004
), but it remains a research question. Another limitation is that the current system has modelled the face only from the eyebrows to the chin. A better system would include the forehead, and some model of the head, neck, shoulders and background in order to give a better sense of the placement of the speaker in context. Adding forehead features is relatively straight-forward and has been implemented. Tracking of neck and shoulders is well advanced (Sheikh et al. 2008
). The videoconference avatar paradigm has motivated new work in computer vision and graphics and made possible new methodology to experimentally investigate social interaction in a way not possible before. The timing and identity of social behaviour in real time can now be rigorously manipulated outside of participants' awareness.