PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of transbThe Royal Society PublishingPhilosophical Transactions BAboutBrowse By SubjectAlertsFree Trial
 
Philos Trans R Soc Lond B Biol Sci. 2009 December 12; 364(1535): 3527–3538.
PMCID: PMC2781892

Role of expressive behaviour for robots that learn from people

Abstract

Robotics has traditionally focused on developing intelligent machines that can manipulate and interact with objects. The promise of personal robots, however, challenges researchers to develop socially intelligent robots that can collaborate with people to do things. In the future, robots are envisioned to assist people with a wide range of activities such as domestic chores, helping elders to live independently longer, serving a therapeutic role to help children with autism, assisting people undergoing physical rehabilitation and much more. Many of these activities shall require robots to learn new tasks, skills and individual preferences while ‘on the job’ from people with little expertise in the underlying technology. This paper identifies four key challenges in developing social robots that can learn from natural interpersonal interaction. The author highlights the important role that expressive behaviour plays in this process, drawing on examples from the past 8 years of her research group, the Personal Robots Group at the MIT Media Lab.

Keywords: social robotics, human robot interaction, robot learning, expressive behaviour, affective computing

1. Introduction

Studies by the United Nations Economic Commission and International Federation of Robotics forecast a dramatic increase in consumer demand for robots that assist, protect, educate and entertain over the next 20–30 years. In the future, personal robots will be able to help people as capable assistants in their daily activities. Consider cooperative activities such as preparing a meal together, building a structure with teammates or teaching someone a new skill. Through sophisticated forms of social interaction and learning, people are able to accomplish more than they could alone. Socially intelligent robots could have a significant positive impact on real-world challenges, such as helping elders to live independently at home longer, serving as learning companions for children and enriching learning experiences through play, serving a therapeutic role to help children with autism learn communication skills, or functioning as effective members of human–robot teams for disaster response missions, construction tasks and more.

Many of these applications require robots to engage humans in sophisticated forms of social interaction, including human-centred multi-modal communication, teamwork and social forms of learning such as tutelage. Over the past several years, my research has focused on endowing autonomous robots with social intelligence to enable them to engage in the powerful, social forms of interaction and learning that people readily participate. This vision is motivated by the observation that humans are ready-made experts in social interaction; the challenge is to design robots to participate in what comes naturally to people. By doing so, socially interactive robots could help not only specialists, but anyone.

Today, however, autonomous and semi-autonomous robots are widely regarded as tools that trained operators command and monitor to perform tasks. Beyond robustness and proficiency in the physical world, however, the promise of personal robots that can partake in the daily lives of people is pushing robotics and AI research in new directions. Whereas robotics has traditionally focused on developing machines that can manipulate and interact with things, the promise of personal robots challenges us to develop robots that are adept in their interactions with people. Further, in contrast to the traditional view of robots as sophisticated tools that we use to do things for us, this new generation of socially intelligent robots is envisioned as partners that collaborate to do things with us.

Over the past several years, new research fields have emerged (i.e. human–robot interaction and social robotics) to address challenges in building robots that are skilful in their interactions with people (Dautenhahn 1995; Fong et al. 2003; Breazeal 2004b; Duffy 2008). Given that social robots are designed to interact with people in human-centric terms within human environments, many are humanoid (e.g. Tanaka et al. 2004; Ogura et al. 2006) or animal-like (e.g. Fujita 2004; Wada et al. 2005) in form, and even the more mechanical-looking robots tend to have anthropomorphic movement or physical features (e.g. Kozima 2006; Tanaka et al. 2006).

A unifying characteristic is that social robots communicate and coordinate their behaviour with humans through verbal, non-verbal or affective modalities. For instance, these might include whole-body motion (e.g. dancing, Duffy 2003; walking hand-in-hand, Lim et al. 2004), proxemics (i.e. how a robot should approach a person, Walters et al. 2008; follow a person, Gockley et al. 2007; or maintain appropriate interpersonal distance, Brooks & Arkin 2007), gestures (e.g. pointing, shrugging shoulders or shaking hands, Miwa et al. 2004a,b; Roccella et al. 2004), facial expressions (e.g. Iida et al. 1998; DiSalvo et al. 2002; Berns & Hirth 2006; Hayashi et al. 2006), gaze behaviour (e.g. Kikuchi et al. 1998; Sakita et al. 2004; Sidner et al. 2005), head orientation and shared attention (e.g. Imai et al. 2001; Fujie et al. 2004), linguistic and paralinguistic cues (e.g. Matsusaka et al. 2003; Fujie et al. 2005) or emotive vocalization (e.g. Cahn 1990; Abadjieva et al. 1993), social touch-based communication (e.g. Stiehl et al. 2005) and how these cues complement verbal communication (e.g. Cassell et al. 2000).

Progress continues in building robots that can learn from people, through observation, imitation or direct tutelage (for reviews see Schaal 1999; Argall et al. 2009). For instance, impressive strides have been made in designing robots that learn new skills (e.g. pendulum swing-up, Atkeson & Schaal 1997b; body schema, Hersch et al. 2008; peg insertion, Hovland et al. 1996; dance gestures, Mataric et al. 1998; communication skills and protocols, Billard et al. 1998; Roy & Pentland 1998; Scassellati 1998) as well as tasks (e.g. stacking objects, Kuniyoshi et al. 1994; Calinon & Billard 2007; fetch and carry, Nicolescu & Mataric 2003; setting a table, Pardowitz et al. 2007 or sorting objects into bins, Saunders et al. 2006; Chernova & Veloso 2008).

Modern robots are beginning to participate as members of heterogeneous teams that cooperate with people in order to achieve shared goals. For instance, a remote human might supervise a distributed team of robots to perform a task (e.g. disaster response or search and rescue, Bluethmann et al. 2004; Murphy et al. 2008). In addition, co-located teamwork has been explored such as a human and a robot working side by side (Adams et al. 2009), or a team of humans and robots working in the same area to assemble a structure (Fong et al. 2005).

Furthermore, as people begin to interact with robots more closely, it is important that robots' behaviour, rationale and motives be easily understood. The more these mirror natural human analogues, the more intuitive it becomes for us to communicate and coordinate our behaviour with robots. Researchers have begun to explore the role of affect (e.g. Picard 2000; Fellous & Arbib 2005; Duffy 2008; Cañamero), perspective taking and theory of other minds (e.g. Scassellati 2001; Johnson & Demiris 2005; Trafton et al. 2005), and even simple forms of empathy (Dautenhahn 1997; Breazeal et al. 2005a) and models of attachment (Cañamero et al. 2006) in generating a robot's behaviour.

A relevant issue underlying these different kinds of interactions is how people form social judgements of robots—are robots perceived as trustworthy, persuasive, reliable, likeable, etc. (e.g. Kidd & Breazeal 2008; Siegel 2008)? A number of groups have also explored how people's social judgements of robots compare to animated agents and even mixed-reality agents (Holz et al. 2009). It is intriguing that the physical presence of robots seems to matter to people as robots often score higher than their virtual counterparts on measures of engagement, social presence, working alliance as well as social influence on human behaviour (e.g. Kidd & Breazeal 2004; Powers et al. 2007; Bainbridge et al. 2008). Researchers have started delving into functional magnetic resonance imaging studies to try to understand these differences and to what extent people attribute human characteristics to robots, including theory of mind (Krach et al. 2008).

2. Robots that learn from people

Within this broader context of human–robot interaction (HRI) and social robotics, this paper summarizes the past 8 years of research from my group (the Personal Robots Group at the MIT Media Lab; http://www.media.mit.edu/~cynthiab; http://robotic.media.mit.edu; Breazeal 2002) with respect to significant lessons we have learned in our quest to build robots that can learn from anyone. My group is recognized for pioneering HRI and social robotics through the development of expressive autonomous robots that socially interact with people in a natural manner (Breazeal 2002). Figure 1 presents the three ‘flagship’ social robots we have developed, starting with Kismet in the late 1990s, Leonardo spanning the early–mid 2000s and our new robot Nexi. Each design is considered state-of-the-art (building upon lessons and technologies of its predecessor) and supports a different set of highly related scientific questions at the intersection of emotion and HRI, social learning, sophisticated forms of social cognition and human–robot teamwork.

Figure 1.

Three examples of social robots used in this research: (a) Kismet, (b) Leonardo and (c) Nexi.

One of my main research interests has been to develop robots that can learn from natural interpersonal interactions. Personal robots of the future will need to quickly learn new tasks and skills from people who are not specialists in robotics or machine learning techniques but possess a lifetime of experience in teaching and learning from one another. A major technical goal is to engineer robots that can leverage social guidance to efficiently and robustly acquire new capabilities from natural human instruction and to do so dramatically faster than it could alone. As an integral part of this endeavour, my group has contributed new knowledge and findings towards how humans teach social robots, and the important role that the robot's expressive behaviour plays in this interpersonal process.

In contrast to traditional statistical machine learning approaches that require human expertise to craft a successful large-scale search problem that uses little or no real-time human input, my group's approach recognizes the advantage of designing robots that can leverage from the same rich forms of social interaction that people readily use to teach or learn from one another. Human teachers verbally and non-verbally guide the exploration of learners by directing attention, providing feedback, structuring experiences, supporting learning attempts, and regulating the complexity and difficulty of information to push learners a little beyond their current abilities in order to help them acquire new skills and concepts. In turn, learners tune their teachers' instruction and shape subsequent guidance by expressing their current understanding through demonstration and a rich variety of communicative acts. Through this interaction, learner and teacher form mental models of each other that they use to support the learning–teaching process as a richly collaborative activity.

It is actually very difficult to build robotic systems that can successfully learn in real time from the general public. Human teaching behaviour is highly variable and complex, and different people bring different styles of interaction to the table. Today, it is common practice for robots to be taught and evaluated by the same researchers who developed it. Not surprisingly, if the teacher has special technical expertise and knowledge of the underlying learning algorithms that the robot uses, this leads to a strongly machine-centric style of interaction that is neither natural nor intuitive to someone who lacks such expertise. In fact, although there exists quite substantial work in developing robots that learn from people, it is still uncommon to conduct human participant studies with members of the general public to assess the learning performance of a robot when taught by someone who is not an expert in robotics, machine learning or otherwise.

My research group is unusual for a robotics group, having conducted over a dozen controlled, in-lab human participant studies with hundreds of participants in order to gain greater qualitative and quantitative understanding on how people approach the task of teaching a socially responsive machine. Often, we begin an investigation with a human study to learn more details about how people teach each other. Then, computationally modelling this process allows us to identify and explore the use of a variety of social cues, expressive behaviours, skills and cognitive capabilities that support social learning in robots. In this way, we use social robots as a scientific tool for measuring and quantifying human behaviour in new ways. This in turn has allowed us to generate new findings and discover new knowledge that can even inform how people teach and learn from one another. Figure 2 contrasts (a) the traditional machine-centric approach with (b) our human-centric approach.

Figure 2.

(a) The traditional, machine-centric approach to teaching robots where the teacher is often has expertise in the robot's learning algorithms. (b) Our new, human-centric approach that supports how ‘ordinary’ people approach the task of ...

3. Challenges in building teachable robots

Applying these results, my group has developed and evaluated how these social behaviours and expressive capabilities enable robots to learn interactively with human participants, as well as how the same social skills address several key challenges in learning from natural human instruction. I highlight several challenges below together with research highlights of my research group's contributions towards their solution.

(a) Challenge 1

Robots face the situation that there is a fundamental mismatch in their social and communicative sophistication relative to humans. For effective learning, however, it is important that learners are slightly challenged to push themselves towards new abilities that are within reach, while avoiding situations where they are too overwhelmed to make sense of things. Fortunately, teachers and learners can work together to establish to a suitable level of difficulty and to regulate the complexity of the interaction to be suitable for both.

(i) Example: envelope displays

To address this challenge, our research has contributed evidence for the importance of paralinguistic communication cues in HRI, and how they can be used to successfully manage this imbalance in a natural and intuitive manner. Through HRI studies with our robot, Kismet, we found that humans readily entrain to a robot's non-verbal social cues (e.g. envelope displays that regulate the exchange of speaking turns in human conversation) to improve the efficiency and robustness of ‘conversational’ flow by intuitively slowing the rate of turn exchanges to a level that the robot can handle well. For instance, humans tend to make eye contact and raise their eyebrows when ready to relinquish their speaking turn, and tend to break gaze and blink when starting their speaking turn. When these same facial displays are implemented on a robot, we found that they are effective in smoothing and synchronizing the exchange of speaking turns with human subjects, resulting in fewer interruptions and awkward long pauses between turns (Breazeal 2003b).

(ii) Example: coordination behaviours

Through another series of HRI studies, we examined the use of a number of coordination behaviours where participants guided our robot, Leonardo, using speech and gesture to perform a physical task involving pressing a sequence of coloured buttons ON. Leonardo communicates through gaze (visual attention) and facial expressions (affective state) or explicitly through gestural cues (i.e. pointing). The robot's coordination behaviours include visually attending to the human's actions (e.g. pointing to or pressing a button) to acknowledge their contributions, issuing a short nod to acknowledge the success and completion of the task or subtask (i.e. turning the buttons ON), visually attending to the person's attention directing cues such as to where the human looks or points, looking back to the human once the robot presses a button to make sure its contribution is acknowledged, and pointing to buttons in the workspace to direct the human's attention towards them. Both self-report via questionnaire and behavioural analysis of video support the hypothesis that these non-verbal communication cues positively impact human–robot task performance with respect to understandability of the robot, efficiency of task performance and robustness to errors that arise from miscommunication (Breazeal et al. 2005b).

(iii) Example: emotive displays

In addition, we found that emotive expressions (as governed by the robot's emotion-based models) are interpreted by humans as natural analogues, and thereby can be used by the robot to regulate its interaction with the human—to keep the complexity of the interaction within the robot's perceptual limits and even to help the robot to achieve its goals (Breazeal & Scassellati 2000). Many of these results were first observed with our robot, Kismet, the first robot designed to explore socio-emotive face-to-face interactions with people explicitly (Breazeal 2002). Our research with Kismet was strongly inspired by the origins of social interaction and communication in people, namely that which occurs between carer and infant, through extensive computational modelling guided by insights from developmental psychology and behavioural models from ethology (Breazeal 2003a). It is well established that early infant–carer exchanges are grounded in the regulation of emotion and its expression.

Inspired by these interactions, Kismet's cognitive–affective architecture was designed to implement core proto-social responses exhibited by infants given their critical role in normal social development. Internally, Kismet's models of emotion interacted intimately with its cognitive systems to influence behaviour and goal arbitration. Through a process of behavioural homeostasis, these emotive responses served to restore the robot's internal affective state to a mildly aroused, slightly positive state—corresponding to a state of interest and engagement in people and its surroundings that fosters learning. One purpose of Kismet's emotive responses was to reflect the degree to which its drives and goals were being successfully met. A second purpose was to use emotive communication signals to regulate and negotiate its interactions with people. Specifically, Kismet utilized emotive displays to regulate the intensity of playful interactions with people, ensuring that the complexity of the perceptual stimulus was within a range the that robot could handle and potentially learn from. In effect, Kismet socially negotiated its interaction with people via its emotive responses to have humans help it achieve its goals, satiate its drives and maintain a suitable learning environment (Breazeal 2004a).

(iv) Summary: joint action

While more established approaches to instructing robots view the interaction as a one-way flow of information from human to machine, this body of work challenges the paradigm by illustrating the myriad of ways in which humans participate in the teaching–learning process as tightly coupled joint action. Humans do not simply provide training inputs as a one-sided interaction to which the learner must react. Rather, people are constantly reading and interpreting numerous behavioural cues of the robot as indicators of its internal state, and are continually adapting and tuning their teaching behaviour to be suitable for the robot learner.

This interaction dynamic has significant implications for the design of robots that learn from people. The robot is not restricted to learning in a complex environment that does not care whether the robot succeeds or fails—a common assumption in robot learning systems. Rather, people view teaching and learning as a partnership with shared goals. Because of this, the robot can proactively improve the quality of its learning environment, tuning the teaching acts of the human to be more suitable, through using communication acts that reveal its learning process to the human teacher.

(b) Challenge 2

Faced with an incoming stream of sensory data, a robot must figure out which of its myriad of perceptions are relevant to the task at hand. This is an important capability for generating coherent behaviour as well as for learning given that the search over state space becomes enormous as perceptual abilities and complexity of the environment increase.

(i) Example: saliency and shared attention

To address this challenge we have identified a set of socially embodied cues and socio-cognitive abilities that assist the robot's determination of saliency when learning a task. These cues and abilities make the robot's underlying attention mechanisms responsive to a human teacher's efforts to highlight a distinct environmental context or change that is relevant to the learning task.

In a series of human studies we have identified a growing set of social cues and socio-cognitive skills that play an effective role in addressing the saliency question.

For instance, we have implemented a multi-modal attention system to enable the robot to leverage the human teacher's desire to direct its visual attention by following the human's pointing gestures or gaze (estimated by head pose). To compute our robot's attentional focus, the attentional system computes the level of saliency (a measure of ‘interest’) per feature channel for objects and events in the robot's perceivable space (Breazeal & Scassellati 1999; Breazeal et al. 2000). For Leonardo, the contributing factors to an object's overall saliency fall into three categories: its perceptual properties (i.e. its proximity to the robot, its colour, whether it is moving, etc.), the internal state of the robot (i.e. whether this is a familiar object, what the robot is currently searching for and other goals) and social reference (if something is pointed to, looked at, talked about or is the referential focus). For each item in the perceivable space, the overall saliency at each time step is the result of the weighted sum for each of these factors. The item with the highest saliency becomes the current attentional focus of the robot, and also determines where the robot's gaze is directed. The gaze direction of the robot is an important communication device to the human, verifying for the human partner what the robot is attending to and thinking about.

The human's attentional focus is determined by what he or she is currently looking at. Leonardo calculates this using the head pose tracking data, assuming that the person's head orientation is a good estimate of his or her gaze direction. By following the person's gaze, the shared attention system determines which (if any) object is the attentional focus of the human's gaze. The mechanism by which infants track the referential focus of communication is still an open question, but a number of sources indicate that looking time is a key factor, such as word learning studies. For example, when a child is playing with one object and hears an adult say ‘It's a modi’, the child does not attach the label to the object the child happens to be looking at (which is often the adult's face!). Instead the child redirects his or her attention to look at what the adult is looking at, and attaches the label to that object. For our robot, we use a simple voting mechanism to track a relative-looking-time for each of the objects in the robot's and human's shared environment. The object with the highest accumulated relative-looking-time is identified as the referent of the communication between the human and the robot (Thomaz et al. 2005).

Using these models, we have found that active monitoring of shared visual attention between the human teacher and the robot learner is important in order to achieve robustness in the learning interaction. In a series of human participant studies where human teachers guide a robot to perform a simple task (learning to operate a control panel with a lever, toggle and button), we have found that humans readily coordinate their teaching behaviour with the robot's gaze behaviour—waiting until the robot re-establishes eye contact before offering their next guidance cue, adaptively re-orienting their guidance cue to be in alignment with the robot's current visual focus, actively trying to re-direct the robot's gaze through deictic cues or offering more guidance if the robot's gaze behaviour conveys uncertainty in what to do next (e.g. looking back and forth among several possible alternatives) (Breazeal & Thomaz 2008a; Thomaz & Breazeal 2008). These findings suggest that people read the robot's gaze as an indicator of its internal state of attention as well as solicitations for help, and intuitively coordinate their teaching acts to support the robot's learning process.

(ii) Example: perspective taking

In another series of human and HRI studies, we identified, verified and evaluated mental perspective taking as an important socio-cognitive skill that helps either human or robot learners to focus attention on the subset of the problem space that is important to the teacher by actively considering the teacher's experience such as visual perspective, attentional focus or resource considerations (Berlin et al. 2006). This constrained attention enables the robot learner to overcome the ambiguity and incompleteness that is often present in human demonstrations.

To endow Leonardo with perspective taking abilities, our cognitive–affective architecture incorporates simulation-theoretic mechanisms as a foundational and organizational principle. Simulation theory holds that certain parts of the brain have dual use; they are used not only to generate our own behaviour and mental states, but also to predict and infer the same in others. To try to recognize or infer another person's mental process, the robot uses its own cognitive processes and body structure to simulate the mental states of the other person—in effect, taking the mental perspective of another.

In figure 3, the two concentric bands denote two different modes of operation. In the generation mode (the light band) the robot constructs its own mental states to behave intelligently in the world. In the simulation mode (the dark band) the robot constructs and represents the mental states of its human collaborator based on observing his or her behaviour and taking their mental perspective. By doing so, the mental states of the human and the robot are represented in the same terms so that they can be readily compared and related to one another. For instance, within the perception system, the robot performs a transformation to estimate what the human partner can see from his or her vantage point. Within the motor system, mirror-neuron inspired mechanisms are used to map and represent perceived body positions of the human into the robot's own joint space to conduct action recognition. Within the belief system, belief construction is used in conjunction with adopting the visual perspective of the human partner in order to estimate the beliefs the human is likely to hold given what he or she can visually observe. Finally, within the intention system where goal-directed behaviours are generated, schemas relate preconditions and actions with desired outcomes and are organized to represent hierarchical tasks. Within this system, motor information is used along with perceptual and other contextual clues (i.e. task knowledge) to infer the human's goals and how he or she might be trying to achieve them (i.e. plan recognition).

Figure 3.

The self-as-simulator architecture.

In a learning situation, the robot can take the perspective of the teacher in order to model the task from their perspective. In effect, the robot runs a parallel copy of its task-learning engine that operates on its simulated representation of the human's beliefs. In essence, this focuses the hypothesis generation mechanism on the subset of the input space that matters to the human teacher. This enables the robot to learn what the teacher intends to teach even if the demonstrations are ambiguous.

To investigate this, we conducted a human participant study where the participants were asked to engage in four different learning tasks involving foam building blocks. We gathered data from 41 participants, divided into two groups: 20 participants observed demonstrations provided by a human teacher sitting opposite them (the social condition), while 21 participants were shown static images of the same demonstrations, with the teacher absent from the scene (the non-social condition). Participants were asked to show their understanding of the presented skill either by re-performing the skill on a novel set of blocks (in the social context) or by selecting the best matching image from a set of possible images (in the non-social context). Figure 4 (left) illustrates sample demonstrations of each of the four tasks. The tasks were designed to be highly ambiguous, providing the opportunity to investigate how different types of perspective taking might be used to resolve these ambiguities. The subjects' demonstrated rules can be divided into three categories: perspective taking (PT) rules, non-perspective taking (NPT) rules and rules that did not clearly support either hypothesis (other). For instance, task 1 focused on visual perspective taking during the demonstration. Participants were shown two demonstrations with blocks in different configurations. In both demonstrations, the teacher attempted to fill all of the holes in the square blocks with the available pegs. Critically, in both demonstrations, a blue block lay within clear view of the participant but was occluded from the view of the teacher by a barrier. The hole of this blue block was never filled by the teacher. Thus, an appropriate (NPT) rule might be ‘fill all but blue’ or ‘fill all but this one,’ but if the teacher's perspective is taken into account, a more parsimonious (PT) rule might be ‘fill all of the holes’ (see figure 4).

Figure 4.

The four tasks presented to human participants. In task 1, subjects were asked to infer the rule for which blocks received a yellow peg. A visual occlusion presents a different viewpoint of the demonstration between teacher and learner (the teacher cannot ...

The tasks from our human study were used to create a benchmark suite for our architecture. In our simulation environment, the robot was presented with the same task demonstrations as were provided to the study participants. The learning performance of the robot was analysed in two conditions: with the perspective taking mechanisms intact and with them disabled. Table 1 (left) shows the hypotheses entertained by the robot in the various task conditions at the conclusion of the demonstrations. The hypotheses favoured by the learning mechanism are highlighted in italic. For comparison, table 1 (right) displays the rules selected by study participants, with the most popular rules for each task highlighted in italic. For every task and condition, the rule learned by the robot matches the most popular rule selected by humans.

Table 1.

(Left) Rules learned by the robot with perspective taking enabled (PT) or disabled. (Right) The corresponding rules learned by people for the same tasks. The difference in rule choice by subjects between social and non-social condition is highly significant ...

These results support our hypothesis that the robot's perspective taking mechanisms focus its attention on a region of the input space similar to that attended to by study participants in the presence of a human teacher. It should also be noted, as evident in table 1, that participants generally seemed to entertain a more varied set of hypotheses than the robot. In particular, participants often demonstrated rules based on spatial or numeric relationships between the objects—relationships that are currently not yet represented by the robot. Thus, the differences in behaviour between humans and the robot can largely be understood as a difference in the scope of the relationships considered between the objects in the example space rather than as a difference in this underlying space. The robot's perspective taking mechanisms seem to be successful at bringing the robot's focus of attention into alignment with the humans' focus of attention in the presence of a social teacher.

(iii) Example: spatial scaffolding

In other human participant and HRI experiments, we have identified, verified and evaluated a set of simple, prevalent and highly reliable spatial scaffolding cues by which human teachers interactively structure and organize the physical workspace to help direct the attention of the learner (e.g. moving objects nearer or farther from the learner's body to signify their relevance) (Breazeal & Berlin 2008).

For example, we designed a set of tasks to examine how teachers emphasize and de-emphasize objects in a learning environment with their bodies, and how this emphasis and de-emphasis guides the exploration of a learner and ultimately the learning that occurs. In our human study, we gathered data from 72 individual participants, combined into 36 pairs. For each pair, one participant was randomly assigned to play the role of teacher and the other participant was assigned the role of learner for the duration of the study. For all the tasks, participants were asked not to talk, but were told that they could communicate in any way other than speech. The teacher and learner stood on opposite sides of a tall table, with 24 colourful foam building blocks (four different colours and six different shapes) arranged between them on the tabletop. The study tasks were interactive ‘secret constraint’ tasks where one person (the learner) knows the task goal (construct a tangram-like figure out of the blocks) but does not know that there is a secret constraint to accomplish the task successfully. The other person (the teacher) does not know the task goal (the figure) but knows the constraint (e.g. ‘the figure must be constructed using only blue and red blocks, and no other blocks.’). Hence, both people must work together to complete the task successfully.

To record high-resolution data of the study interactions, we developed a data-gathering system that incorporated multiple, synchronized streams of information about the study participants and their environment. For all the tasks, we tracked the positions and orientations of the heads and hands of both participants, recorded video of both participants and tracked all the objects with which the participants interacted such as the positions and orientations of all the foam blocks. To identify the emphasis and de-emphasis cues provided by the teachers in these tasks, an important piece of ‘ground-truth’ information was exploited: for these tasks, some of the blocks were ‘good’ and others were ‘bad.’ In order to complete the task successfully, the teacher needed to encourage the learner to use some of the blocks in the construction of the figure and to steer clear of some of the other blocks.

We observed a wide range of embodied cues provided by the teachers in the interactions for these two tasks as well as a range of different teaching styles. Positive emphasis cues included simple hand gestures such as tapping, touching and pointing at blocks with the index finger. These cues were often accompanied by gaze targeting, or looking back and forth between the learner and the target blocks. Other positive gestures included head nodding, the ‘thumbs up’ gesture and even shrugging. Teachers nodded in accompaniment to their own pointing gestures, and also in response to actions taken by the learners. Negative cues included covering up blocks, holding blocks in place or maintaining prolonged contact despite the proximity of the learner's hands. Teachers would occasionally interrupt reaching motions directly by blocking the trajectory of the motion or even by touching or (rarely) lightly slapping the learner's hand. Other negative gestures included head shaking, finger or hand wagging, or the ‘thumbs down’ gesture.

However, by far the most important set of cues used related to block movement and the use of space. To emphasize blocks positively, teachers would move them towards the learner's body or hands, towards the centre of the table, or align them along the edge of the table closest to the learner. Conversely, to emphasize blocks negatively, teachers would move them away from the learner, away from the centre of the table, or line them up along the edge of the table closest to themselves. Teachers often devoted significant attention to clustering the blocks on the table, spatially grouping the bad blocks with other bad blocks and the good blocks with other good blocks. These spatial scaffolding cues were the most prevalent cues in the observed interactions (Breazeal & Berlin 2008).

To verify the prevalence and usefulness of these spatial scaffolding cues for a robot, we substituted our robot Leonardo for the role of the learner (Berlin et al. 2008). The robot's attention system was designed to pay attention to block movement towards and away from its body. In order to give the robot the ability to learn from these embodied cues, we developed a simple, Bayesian learning algorithm. The algorithm was designed to learn rules pertaining to the colour and shape of the foam blocks and maintained a set of classification functions that tracked the relative odds that the various block attributes were ‘good’ or ‘bad’ according to the teacher's secret constraints. Each time the robot observed a salient teaching cue, these classification functions were updated using the posterior probabilities presented in the previous section—the odds of the target block being ‘good’ or ‘bad’ given the observed cue. For example, if the teacher moved a green triangle away from the robot, the relative odds of green and triangular being good block attributes would decrease. Similarly, if the teacher then moved a red triangle towards the robot, the odds of red and triangular being good would increase.

These simple spatial scaffolding cues proved to be highly effective. We invited 18 participants to teach Leonardo the identical secret constraint tasks as our human learners. The robot successfully learned the task in 33 of the 36 interactions (92%). These results support the conclusion that the spatial scaffolding cues observed in human–human teaching interactions do indeed transfer to HRIs, and can be effectively taken advantage of by robot learners (Berlin et al. 2008).

(iv) Summary: social filters

Whereas traditional approaches to teaching robots do not model social–cognitive skills and abilities as integral to the learning process, this body of work has identified and verified a number of ways that internal and external social factors play an important role in how a robot learner filters the incoming perceptual stream to attend to what matters, that human teachers bring many of these same social cues and skills to bear when teaching either humans or robots, and that these ‘social filters’ can be effectively used by a robot to help it identify the most relevant items to consider, thereby making the learning problem significantly more manageable.

(c) Challenge 3

Once the robot has identified salient aspects of the scene, how does it determine what actions it should take? If the robot had a way of focusing on potentially successful actions, its exploration would be more effective. This can be addressed in a number of ways, such as by experimenting on its own as in reinforcement learning (RL). However, for large state–action spaces this typically requires a prohibitively large number of trials.

(i) Example: tutelage-style interaction

To address this issue, we have explored how social skills such as turn-taking enable a human teacher to play an important and flexible role in guiding the robot's exploration. This focuses the robot's selection of the most promising actions in specific contexts to discover solutions more quickly. By participating in a ‘dialogue’ of demonstration followed by feedback and refinement, the human helps the robot to determine what action to try through a communicative and iterative process. We evaluated this approach by comparing it to learning the same task using traditional RL and achieved significant improvements in efficiency without loss of accuracy and with decreased sensitivity to noise (e.g. errors introduced by miscommunication are quickly repaired, which leads to greater robustness) (Breazeal et al. 2004).

(ii) Example: socially guided exploration

Unfortunately, a common limitation of human teachable robots is that the robot only learns when being explicitly taught. Personal robots, however, will need to learn while ‘on the job’ even when a person is not present or willing to teach it. To address this, we have developed and evaluated a learning system whereby learning opportunities for the robot's hierarchical RL mechanism arise from a combination of intrinsically motivated self-exploration and social scaffolding provided by a human teacher, such as suggesting actions for the robot to try, drawing the robot's attention to relevant contexts and highlighting interesting outcomes (Breazeal & Thomaz 2008b). We have systematically identified and verified our set of social scaffolding mechanisms through a series of HRI studies where a human teacher guides Leonardo's exploration as it discovers a set of behaviours (e.g. opening or closing, playing music, changing colors of lights) of a ‘smart box’ through pressing buttons, pushing levers and sliding toggles. Over time, Leonardo learns a set of task policies for bringing about each of these behaviours from different starting conditions to ‘master’ the ‘smart box’. We analysed the learning performance of the robot both with and without human teachers and found that learning performance via self-exploration is slower but more serendipitous resulting in a broader task suite, whereas learning with a human teacher makes learning more efficient and robust but tends to result in a smaller, more specialized task suite that reflects what the person wanted the robot to learn (Breazeal & Thomaz 2008a,b).

(iii) Summary: intrinsically motivated but guidable learning

Personal robots will need to adapt their learning style to suit the dynamics of a changing learning environment. Sometimes the robot will have to explore on its own, while at other times a teacher might be present to help guide the robot's exploration. Through our studies, we have found that each style of learning has its respective advantages and produces learning products that are synergistic. For instance, what is learned more slowly but serendipitously through intrinsically motivated exploration yields a broader task suite that can come in handy at a later date—especially when the robot encounters a human teacher who helps the robot to rapidly hone and build on its growing skill set through socially guided exploration. Importantly, the mechanisms by which the robot's learning can be guided by the human should be informed by how people are naturally inclined to teach robots.

(d) Challenge 4

Once the robot attempts to perform an action, how can it determine whether it has been successful? How does it assign credit for that success? Further, if the robot has been unsuccessful, how does it determine which parts of its performance were inadequate? It is important that the robot be able to diagnose its errors in order to improve performance.

(i) Example: multi-modal feedback

To address this challenge, our approach recognizes that the teacher can readily help the robot do this given that he or she has a good understanding of the task and knows how to evaluate the robot's success and progress. One way in which a human facilitates a learner's evaluation process is by providing feedback through various communication channels. For instance, we demonstrated the capability of a robot to interpret and appropriately respond to the affective intent in human speech, such as praising or scolding tones of voice (Breazeal & Aryananda 2002). In HRI studies we showed that people refer to the robot's expressive cues to confirm that the robot understood them as well as the strength of the affective intent. We have applied verbal feedback in teaching scenarios to help the robot correct its task model as soon as mistakes are made. Furthermore, the robot provides the human with communicative feedback so that misunderstandings can be detected quickly. Both forms of feedback help to prevent errors from persisting for multiple steps, which could make them more awkward to correct later on. In recent HRI studies, our data suggest that these various forms of feedback contribute to a more fluid, efficient, accurate and robust teaching/learning interaction (Breazeal et al. 2004; Breazeal & Thomaz 2008a,b).

(ii) Example: guidance and understanding intent

Note that for any given feedback channel, it is important to understand what people are trying to communicate through it and how they are trying to make use of it. Our HRI studies with an interactive RL agent revealed that people use the reward signal not only to provide feedback on past actions (what is commonly assumed in the design of RL algorithms) but also to guide future action (Thomaz & Breazeal 2008). Further, we discovered a strong bias of positive over negative feedback over the entire duration of the training, even in the beginning when the agent was doing many things wrong (Thomaz & Breazeal 2008). This suggests that people were using the feedback channel to motivate and encourage the robot. In short, people were naturally inclined to use the reward signal in many ways that the traditional RL framework was not designed to handle. Given our findings, we were then able to adapt the RL agent algorithm and teaching interface to accommodate how and what people were trying to communicate to the learner. As a result, our modified RL agent learned much more efficiently and robustly in a subsequent series of HRI experiments (Thomaz & Breazeal 2008).

(iii) Summary: transparency

While traditional approaches to robot training do not consider how a robot can proactively communicate and reveal its learning process to the human teacher, the findings generated by this body of work argue for the importance of transparency in designing interactive robot learners. People are willing and able to help robots address the difficult task of assigning value to its past actions. People are also willing to help guide the robot to select good future actions, to motivate the robot and more. However, human teachers cannot do this well if they lack a good mental model of the robot's learning process or if they are not provided with the right set of communication channels. The robot's behaviour, both its expressive cues and instrumental actions, can play a significant role in shaping the mental model that the human has for the robot. These readily observable expressive and performance-based cues make the robot's learning process transparent to the teacher. Much of our work to date has emphasized the role of the robot's non-verbal cues, such as facial expressions, gestures and use of gaze, in supporting this process. And conversely, our HRI studies have helped us to identify what kinds of intents people want to communicate to the robot through both verbal and non-verbal channels—to help the robot learn by influencing its evaluation process and more.

4. Conclusion

While it might be tempting to compare our outcomes with those of statistical machine learning techniques, my research vision and the challenges I wish to solve are ultimately different. My students and I have built and evaluated autonomous robotic systems that are able to leverage from the interplay of social guidance with statistical inference algorithms to learn new tasks and concepts from humans from natural social interactions. For task learning, our robots are able to quickly infer the critical preconditions and desired outcome for each step of the learned task, as well as how these steps relate to one another in the overall task structure, with improved efficiency and robustness to noise without loss of accuracy over traditional statistical machine learning methods (e.g. traditional RL). For concept learning, our robots are able to learn the correct concept from natural interactions by exploiting natural scaffolding cues such as how the teacher uses space to highlight the concept to be learned, or by applying socio-cognitive skills to consider the teachers' perspective in order to learn the appropriate concept in the face of ambiguous demonstrations. The underlying machine learning algorithm can be simple because the robot appropriately leverages the social structure inherent in the teacher's behaviour or the modified workspace to attend to what matters and learn the right thing. Furthermore, the same social cues can be repurposed to support other social capabilities such as multi-modal communication and our research on human–robot teamwork.

To conclude, the field of social robotics is very young but growing rapidly—motivated by the vision of personal robots that help anyone in their daily activities. My dream is to enable machines to engage in the powerful, social forms of interaction, collaboration, understanding and learning that people readily participate in. This vision is motivated by the observation that humans are ready-made experts in social interaction; the challenge is to design robots to participate in what comes naturally to people. By doing so, socially interactive robots could help a wide demographic of people in a broad range of applications and real-world challenges from health, therapy, education, communication, security, entertainment, or physical assistance. In this article, I have tried to illustrate the myriad of ways in which designing social robots that successfully interact with and learn from ordinary people presents new challenges and opportunities, and have highlighted some of the key lessons and findings learned along the way. We live in an exciting time where so much is possible at the intersection of science and technology. Social robots promise to be not only helpful to us in the future but also a lot of fun. And in the process of building them, we may learn even more about ourselves.

Footnotes

One contribution of 17 to a Discussion Meeting Issue ‘Computation of emotions in man and machines’.

References

  • Abadjieva E., Murray I. R., Arnott J. L. 1993. Applying analysis of human emotional speech to enhance synthetic speech. Proc. Eurospeech ’93, Berlin, Germany, pp. 909–912
  • Adams J. A., Humphrey C. M., Goodrich M. A., Cooper J. L., Morse B. S., Engh C., Rasmussen N. 2009. Cognitive task analysis for developing UAV wilderness search support. J. Cogn. Eng. Dec. Mak. 3, 1–26 (doi:10.1518/155534309X431926)
  • Argall B., Chernova S., Veloso M. 2009. A survey of robot learning from demonstration. Robot. Autonomous Syst 57, 469–483 (doi:10.1016/j.robot.2008.10.024)
  • Atkeson C. G., Schaal S. 1997. bRobot learning from demonstration. In Int. Conf. on Machine Learning, pp. 11–73
  • Bainbridge W., Hart J., Kim E., Scassellati B. 2008. The effect of presence on human–robot interaction. IEEE Int. Symp. on Robot and Human Interactive Communication, Munich, Germany
  • Berlin M., Gray J., Thomaz A. L., Breazeal C. 2006. Perspective taking: an organizing principle for learning in human–robot interaction. Proc. 21st Natl. Conf. on Artificial Intelligence (AAAI-06), Boston, MA
  • Berlin M., Breazeal C., Chao C. 2008. Spatial scaffolding cues for interactive robot learning. Proc. 2008 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS 2008), Nice, France
  • Berns K., Hirth J. 2006. Control of facial expressions of the humanoid robot head ROMAN. Proc. 2006 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Beijing, China, pp. 3119–3124
  • Billard A., Dautenhahn K., Hayes G. 1998. Proceedings of the Socially Situated Intelligence Workshop, Zurich, Switzerland. As part of the Int. Conf. on Simulation of Adaptive Behavior
  • Bluethmann W., et al. 2004. Building an Autonomous Humanoid Tool User. Proc. IEEE/RAS Fourth Int. Conf. on Humanoid Robots (Humanoids 2004), Santa Monica, CA, pp. 402–421
  • Breazeal C. 2002. Designing sociable robots Cambridge, MA, USA: MIT Press
  • Breazeal C. 2003. aEmotion and sociable humanoid robots. Int. J. Hum. Comput. Interact 59, 119–155
  • Breazeal C. 2003. bRegulation and entrainment for human–robot interaction. Int. J. Exp. Robot. 21, 883–902
  • Breazeal C. 2004. aFunction meets style: insights from emotion theory applied to HRI. IEEE Trans. Syst. Man Cybernet. Part C 34, 187–194 (doi:10.1109/TSMCC.2004.826270)
  • Breazeal C. 2004. bSocial interactions in HRI: the robot view. IEEE Trans. Syst. Man Cybernet. Part C 34, 181–186 (doi:10.1109/TSMCC.2004.826268)
  • Breazeal C., Aryananda L. 2002. Recognizing affective intent in robot directed speech. Autonomous Robots 12, 85–104 (doi:10.1023/A:1013215010749)
  • Breazeal C., Berlin M. 2008. Spatial scaffolding for sociable robot learning. Proc. 23rd Conf. on Artificial Intelligence (AAAI-08), Chicago, IL
  • Breazeal C., Scassellati B. 1999. A context-dependent attention system for a social robot. Proc. 16th Int. Joint Conf. on Artifical Intelligence (IJCAI 99), Stockholm, Sweden, pp. 1146–1151
  • Breazeal C., Scassellati B. 2000. Infant-like social interactions between a robot and a human caregiver. Adap. Behav 8, 47–72
  • Breazeal C., Thomaz A. 2008. aExperiments in socially guided exploration: lessons learned in building robots that learn with and without human teachers. Connection Sci. 20, 91–100
  • Breazeal C., Thomaz A. L. 2008. bLearning from human teachers with socially guided exploration. Proc. 2008 IEEE Int. Conf. on Robotics and Automation (ICRA-08), Pasadena, CA
  • Breazeal C., Edsinger A., Fitzpatrick P., Scassellati B., Varchavskaia P. 2000. Social constraints on animate vision. IEEE Intell. Syst. Special Issue on Humanoid Robotics 15, 32–37
  • Breazeal C., Hoffman G., Lockerd A. 2004. Teaching and working with robots as a collaboration. Proc. Third Int. Joint Conf. on Autonomous Agents and Multi Agent Systems (AAMAS), pp. 1030–1037
  • Breazeal C., Buchsbaum D., Gray J., Gatenby D., Blumberg B. 2005. aLearning from and about others: towards using imitation to bootstrap the social understanding of others by robots. Artif. Life 11, 1–32 [PubMed]
  • Breazeal C., Kidd C., Thomaz A. L., Hoffman G., Berlin M. 2005. bEffects of nonverbal communication on efficiency and robustness in human–robot teamwork. Proc. Int. Conf. on Intelligent Robotics and Systems
  • Brooks A. G., Arkin R. C. 2007. Behavioral overlays for non-verbal communication expression on a humanoid robot. Autonmous Robots 22, 55–74 (doi:10.1007/s10514-006-9005-8)
  • Cahn J. E. 1990. The generation of affect in synthesized speech. J. Am. Voice Input/Output Soc. 8, 1–19
  • Calinon S., Billard A. 2007. What is the teacher's role in robot programming by demonstration?—toward benchmarks for improved learning. Interaction Stud. Special Issue on psychological benchmarks in human–robot interaction 8, 441–464
  • Cañamero L. Animating affective robots for social interaction. In Animating expressive characters for social interaction (eds Cañamero L., Aylett R., editors. ), pp. 103–122 Advances in Consciousness Research Series John Benjamins Publishing Co.
  • Cañamero L., Blanchard A., Nadel J. 2006. Attachment bonds for human-like robots. Int. J. Humanoid Robot 3, 301–320
  • Cassell J., Sullivan J., Prevost S., Churchill E. (eds) 2000. Embodied conversational agents Boston, MA, USA: MIT Press
  • Chernova S., Veloso M. 2008. Teaching collaborative multi-robot tasks through demonstration. IEEE-RAS Int. Conf. on Humanoid Robots, Daejeon, Korea, December 2008
  • Dautenhahn K. 1995. Getting to know each other—artificial social intelligence for autonomous robots. Robot. Autonomous Syst. 16, 333–356 (doi:10.1016/0921-8890(95)00054-2)
  • Dautenhahn K. 1997. I could be you: the phenomenological dimension of social understanding. Cybernet. Syst. 28, 417–453 (doi:10.1080/019697297126074)
  • DiSalvo C., Gemperle F., Forlizzi J., Kiesler S. 2002. All robots are not created equal: the design and perception of humanoid robot heads. Designing interactive systems. Proc. Fourth Conf. on Designing Interactive Systems: Processes, Practices, Methods, and Techniques, pp. 321–326
  • Duffy B. 2003. Anthropomorphism and the social robot. Robot. Autonomous Syst. 42, 177–190
  • Duffy B. R. 2008. Fundamental issues in affective intelligent social machines. Open Artif. Intell. J. 2, 21–34
  • Fellous J.-M., Arbib M. A. (eds) 2005. Who needs emotions?: The brain meets the robot. Oxford, UK: Oxford University Press
  • Fong T., Nourbakshsh I., Dautenhahn K. 2003. A survey of social robots. Robot. Autonomous Syst. 42, 143–166 (doi:10.1016/S0921-8890(02)00372-X)
  • Fong T., Nourbakhsh I., Ambrose R., Simmons R., Schultz A., Scholtz J. 2005. The peer-to-peer human–robot interaction project. Proc. AIAA Space 2005, September, 2005.
  • Fujita M. 2004. On activating human communications with pet-type robot AIBO. Proc. of IEEE 92, pp. 1804–1813
  • Fujie S., Ejiri Y., Nakajima K., Matsusaka Y., Kobayashi T. 2004. A conversation robot using head gesture recognition as paralinguistic information. Proc. IEEE RO-MAN 2004, September 2004, pp 158–164
  • Fujie S., Fukushima K., Kobayashi T. 2005. Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue system. Proc. Interspeech 2005, September 2005, pp 889–892
  • Gockley R., Forlizzi J., Simmons R. 2007. Natural person-following behavior for social robots. Proc. of Human-Robot Interaction, March, 2007, pp 17–24
  • Hayashi K., Onishi Y., Itoh K., Miwa H., Takanishi A. 2006. Development and evaluation of face robot to express various face shape. Proc. of IEEE Int. Conf. on Robotics and Automation, pp. 481–486
  • Hersch M., Sauser E., Billard A. 2008. Online learning of the body schema. Int. J. Humanoid Robot. 5, 161–181 (doi:10.1142/S0219843608001376)
  • Holz T., Dragone M., O'Hare G. M. P. 2009. Where robots and virtual agents meet: a survey of social interaction across milgram's reality-virtuality continuum. Int. J. Soc. Robot. 1, 83–93
  • Hovland G. E., Sikka P., McCarragher B. J. 1996. Skill acquisition from human demonstration using a hidden Markov model, IEEE Int. Conf. on Robotics and Automation, Minneapolis, MN, pp. 2706–2711 Minneapolis, MN: IEEE Press
  • Iida F., Tabata M., Hara F. 1998. Generating personality character in a Face Robot through interaction with human. Proc. of Seventh IEEE Int. Workshop on Robot and Human Communication, pp. 481–486
  • Imai M., Ono T., Ishiguro H. 2001. Physical relation and expression: joint attention for human–robot interaction. Proc. RO-MAN 2001, pp. 512–517
  • Johnson M., Demiris Y. 2005. Perceptual perspective taking and action recognition. Int. J. Adv. Robot. Syst 2, 301–308
  • Kidd C., Breazeal C. 2004. Effect of a robot on user perceptions. Proc. 2004 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Sendai, Japan, vol. 4. pp 3559–3564
  • Kidd C., Breazeal C. 2008. Robots at home: understanding long-term human–robot interaction. Proc. 2008 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS 2008), Nice, France
  • Kikuchi H., Yokoyama M., Hoashi K., Hidaki Y., Kobayashi T., Shirai K. 1998. Controlling gaze of humanoid in communication with human. Proc. IROS, October 1998, pp 255–260
  • Kozima H. 2006. An anthropologist in the children's world: a field study of children's everyday interaction with an interactive robot. Proc. Int. Conf. on Development and Learning, ICDL-2006, Bloomington, IN, USA
  • Krach S., Hegel F., Wrede B., Sagerer G., Binkofski F., Kircher T. 2008. Can machines think? Interaction and perspective taking with robots investigated via fMRI. PLoS ONE 3 [PMC free article] [PubMed]
  • Kuniyoshi Y., Inaba M., Inoue H. 1994. Learning by watching: extracting reuseable task knowledge from visual observation of human performance. IEEE Trans. Robot. Automat. 10, 799–822 (doi:10.1109/70.338535)
  • Lim H., Ishii A., Takanishi A. 2004. Emotion-based biped walking. Int. J. Inform., Educ. Res. Robot. Artif. Intell. 22, 577–586
  • Mataric M. J., Zordan V. B., Mason Z. 1998. Movement control methods for complex, dynamically simulated agents: Adonis dance the Macarena. Proc. Agents 1998, pp. 317–324
  • Matsusaka Y., Tojo T., Kobayashi T. 2003. Conversation robot participating in group conversation. Trans. IEICE E86-D, 26–36
  • Miwa H., Itoh K., Matsumoto M., Zecca M., Takanobu H., Roccella S., Carrozza M. C., Dario P., Takanishi A. 2004. aEffective emotional expressions with emotion expression humanoid robot WE-4RII. Proc. 2004 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 2203–2208
  • Miwa H., Itoh K., Takanobu H., Takanishi A. 2004. bDesign and control of 9-DOFs emotion expression humanoid arm. IEEE Int. Conf. on Robotics and Automation, New Orleans, pp 128–133
  • Murphy R. R., Tadokoro S., Nardi D., Jacoff A., Fiorini P., Choset H., Erkmen A. M. 2008. Search and rescue robotics. In Handbook of robotics (eds Zelinkski A., Sciliano B., editors. ), pp. 1151–1173
  • Nicolescu M. N., Mataric M. J. 2003. Natural methods for robot task learning: instructive demonstrations, generalization and practice. AAMAS 241–248
  • Ogura Y., Aikawa H., Shimomura K., Kondo H., Morishima A., Lim H., Takanishi A. 2006. Development of a new humanoid robot WABIAN-2. Proc. IEEE Int. Conf. on Robotics and Automation, pp. 76–81
  • Pardowitz M., Zoeliner R., Knoop S., Dilmann R. 2007. Incremental learning of tasks from user demonstrations, past experiences and vocal comments. IEEE Trans. Syst., Man Cybernet., Part B 37, 322–332 [PubMed]
  • Picard R. 2000. Affective computing Cambridge, MA: MIT Press
  • Powers A., Kiesler S., Torrey C., Fussell S. Second ACM/IEEE Int. Conf. on Human–Robot Interaction (HRI 2007) 2007 Comparing a computer agent with a humanoid robot.
  • Roccella S.A., et al. 2004. Design, fabrication and preliminary results of a novel anthropomorphic hand for humanoid robotics: RCH-1. Proc. of 2004 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Sendai, pp. 266–271
  • Roy D., Pentland A. 1998. Learning audio-visually grounded words from natural input. AAAI Workshop on the Grounding of Word Meaning: Data and Models, Madison, WI
  • Sakita K., Ogawara K., Murakami S., Kawamura K., Ikeuchi K. 2004. Flexible cooperation between human and robot by interpreting human intention from gaze information. Proc. Int. Conf. on Intelligent Robot and Systems, pp. 846–851
  • Saunders J., Nehaniv C., Dautenhahn K. 2006. Teaching robots by moulding behavior and scaffolding the environment. Proc. ACM SIGCHI/SIGART Conf. on Human–Robot Interaction (HRI), pp. 118–125
  • Scassellati B. 1998. Imitation and mechanisms of joint attention: a developmental structure for building social skills on a humanoid robot. Computation for metaphors, analogy and agents, vol. 1562 (ed. Nehaniv C., editor. ). Springer Lecture Notes in Artificial Intelligence Berlin, Germany: Springer-Verlag
  • Scassellati B. 2001. Theory of mind for a humanoid robot. Autonomous Robots 12, 13–24
  • Schaal S. 1999. Is imitation learning the route to humanoid robots? Trends Cogn. Sci. 3, 233–242 (doi:10.1016/S1364-6613(99)01327-3) [PubMed]
  • Sidner C. L., Lee C., Kidd C. D., Lesh N., Rich C. 2005. Explorations in engagement for humans and robots. Artif. Intell. 166, 140–164 (doi:10.1016/j.artint.2005.03.005)
  • Siegel M. 2008. Persuasive robotics: towards understanding the influence of a mobile humanoid robot over human belief and behavior September 2008, S. M. Media Arts and Sciences Cambridge, MA: MIT Press
  • Stiehl W., Lieberman J., Breazeal C., Basel L., Lalla L., Wolf M. 2005. Design of a therapeutic robotic companion for relational, affective touch. Proc. of 14th IEEE Workshop on Robot and Human Interactive Communication, pp. 408–415
  • Tanaka F., Noda K., Sawada T., Fujita M. 2004. Associated emotion and its expression in an entertainment robot QRIO. Proc. of the Third Int. Conf. on Entertainment Computing, Eindhoven, pp. 499–504
  • Tanaka F., Movellan J. R., Fortenberry B., Aisaka K. 2006. Daily HRI evaluation at a classroom environment: reports from dance interaction experiments. Proc. of the First Annual Conf. on Human–Robot Interaction (HRI 2006), Salt Lake City, USA, March 2006, pp. 3–9
  • Thomaz A. L., Breazeal C. 2008. Teachable robots: understanding human teaching behavior to build more effective robot learners. Artif. Intell. (AIJ) 172, 716–737 (doi:10.1016/j.artint.2007.09.009)
  • Thomaz A. L., Berlin M., Breazeal C. 2005. An embodied computational model of social referencing. Proc. 14th IEEE Workshop on Robot and Human Interactive Communication, Nashville, TN
  • Trafton J. G., Cassimatis N. L., Bugajska M. D., Brock D. P., Mintz F. E., Schultz A. C. 2005. Enabling effective human–robot interaction using perspective-taking in robots. IEEE Trans. Syst. Man Cybernet. Part A: Syst. Hum. 35, 460–470
  • Wada K., Shibata T., Sakamoto K., Tanie K. 2005. Long-term interaction between seal robots and elderly people—robot assisted activity at a health service facility for the aged. Proc. of the Third Int. Symp. on Autonomous Minirobots for Research and Edutainment, pp. 325–330
  • Walters M. L., Dautenhahn K., Woods S. N., Koay K. L. 2008. Robotic etiquette: results from user studies involving a fetch and carry task. Proc. of the Second ACM/IEEE Int. Conf. on Human–Robot Interaction, pp. 317–342

Articles from Philosophical Transactions of the Royal Society B: Biological Sciences are provided here courtesy of The Royal Society