|Home | About | Journals | Submit | Contact Us | Français|
With Centers for Disease Control and Prevention prevalence estimates for children with autism spectrum disorder 1 in 88, identification and effective treatment of autism spectrum disorder (ASD) is often characterized as a public health emergency. There is an urgent need for more efficacious treatments whose realistic application will yield more substantial impact on the neurodevelopmental trajectories of young children with ASD. Robotic technology appears particularly promising for potential application to ASD intervention. Initial results applying robotic technology to ASD intervention have consistently demonstrated a unique potential to elicit interest and attention in young children with ASD. As such, technologies capable of intelligently harnessing this potential, along with capabilities for detecting and meaningfully responding to young children’s attention and behavior, may represent intervention platforms with substantial promise for impacting early symptoms of ASD. Our current work describes development and application of a novel adaptive robot-mediated interaction technology for facilitating early joint attention skills for children with ASD. The system is composed of a humanoid robot endowed with a prompt decision hierarchy to alter behavior in concert with reinforcing stimuli within an intervention environment to promote joint attention skills. Results of implementation of this system over time, including specific analyses of attentional bias and performance enhancement, with 6 young children with ASD are presented.
Autism Spectrum Disorders (ASD) are characterized by difficulties in social communication as well as repetitive and atypical patterns of behavior . An estimated 1 in 88 children and an estimated 1 out of 54 boys in the United States have ASD . ASD is associated with enormous individual, familial, and social cost across the lifespan . The cumulative ASD literature suggests earlier and more intensive behavioral interventions are efficacious for many children . However, many families and service systems struggle to provide intensive and comprehensive evidence-based early intervention due to extreme resource limitations [5, 6]. As such, there is an urgent need for more efficacious treatments whose realistic application may yield more substantial impact on the neurodevelopmental trajectories of young children with ASD within resource strained environments.
Robotic technology appears particularly promising for potential application to ASD intervention [7, 8]. Initial results applying robotic technology to ASD intervention have consistently demonstrated a unique potential to elicit interest and attention in young children with ASD [9–11]. Our research is motivated by this highlighted potential of robotic technology and designs and tests a potentially transformative co-robotic technological paradigm for future ASD intervention. In particular, it focuses on developing a co-robotic intervention platform and environment specifically designed to accelerate improvements in early joint attention skills [12, 13]. Joint attention skills are thought to be fundamental, or pivotal, social communication building blocks that are central to the etiology and treatment of ASD [14, 15]. At a basic level, joint attention refers to the development of specific skills that involve sharing attention with others (e.g., pointing, showing objects, and coordinating gaze). These exchanges enable young children to socially coordinate their attention with other people to more effectively learn from their environments. Fundamental differences in early joint attention skills have been demonstrated to underlie the deleterious neurodevelopmental cascade of the disorder and successful treatment of these deficits has been demonstrated to substantially improve numerous developmental skills across settings [12, 13, 15].
The present work is built upon our previous work [16, 17] where we developed and piloted a robot-mediated autism intervention architecture called ARIA (Adaptive Robot-mediated Intervention Architecture) and demonstrated three significant findings (refer II. B for details): i) children with ASD demonstrated an attentional bias toward the robot as opposed to a human therapist; ii) it was possible to develop a closed-loop autonomous robot-mediated joint-attention intervention system that could dynamically adapt interaction based on the performance of the child; and iii) this system performed as well as a therapist on a small sample of children with ASD over a very limited time course (1 session). In this present work, we expand upon our previous work to test two important questions: i) whether repeated interactions with the robotic system would impact performance regarding early joint attention skills, and ii) whether the initial attentional bias and preference to the robot would hold over time. These questions are extremely important to test the ultimate value of robotic interactions in children with ASD, as if the initial attentional bias quickly habituates or if repeated exposure is unable to utilize initial attentional preferences to promote skills, robotic interactions may be of more limited value.
The rest of the paper is organized as follows. In Section II we discuss relevant literature and our own previous work that provides the motivation for the current work. We present the system architecture in Section III. The experimental investigation is discussed in Section IV. Finally we summarize our contributions and discuss its impact and future directions in Section V.
A number of research groups have studied the response of children with ASD to both humanoid robots [9, 18–21] and non-humanoid toy-like robots [22, 23]. Data from these groups have demonstrated that many individuals with ASD show a preference for robot-like characteristics over non-robotic toys [24, 25], and in some circumstances even respond faster when cued by robotic movement than human movement [26, 27]. Further, a number of studies have indicated the advantages of robotic systems over animated computer characters for skill learning and optimal engagement likely due to the capability of robotic systems to utilize physical motion in a manner not possible in screen technologies [28, 29].
Despite this hypothesized advantage, there have actually been relatively few systematic and adequately controlled applications of robotic technology to investigate the impact of directed intervention and feedback approaches [22, 30–32]. Scassellati and his colleagues demonstrated  that children with ASD spoke more to an adult confederate when asked by a robot than when asked by another adult or by computer. In another study  it was shown that children who paired with a robot had increased shared attention than those who paired with a human. Goodrich and colleagues reported  that a low-dose robot-assisted ASD exposure with a humanoid robot yielded enhance positive child-human interactions immediately afterwards. Feil-Seifer and Mataric  showed that when a robot acted contingently during an interaction with a child with ASD, it had positive effect on his social interaction. Although these approaches have certainly demonstrated the potential and value of robots for more directed intervention, there is a need for robotic systems in terms of application to intervention settings necessitating extended and meaningful adaptive interactions. We believe that adaptive interaction, operationalized as within system changes in response to measured behaviors, is important for individualization of intervention and ultimately addressing core deficits of ASD . Only a few adaptive robotic interaction works with children with ASD have appeared in the literature: proximity-based closed-loop robotic interaction , haptic interaction , adaptive robot-assisted play  and our own closed-loop adaptive interaction work based on affective cues inferred from physiological signals . While all these works were able to put forth robust systems for adaptive interaction, the paradigms explored had very little direct relevance to the core deficits of ASD in that they were focused on simple task and game performance. The current work explicitly focuses on realizing a co-robotic interaction architecture capable of measuring behavior and adapting performance in a way that addresses a fundamental early impairment of ASD (i.e., joint attention skills).
In order to determine the feasibility and potential value of adaptive robotic intervention system for young children, we developed the prototype ARIA system capable of administering joint attention tasks to young children with ASD (Fig. 1).[16, 17, 38, 39] We developed a test-bed that consisted of a humanoid robot NAO, 3 Infrared (IR) cameras mounted in the test room, and a series of 23 inch networked computer monitors capable of displaying relevant recorded task stimuli. We also instrumented a baseball hat with arrays of IR LEDs (Light Emitting Diodes) and designed a gaze inference algorithm based on real-time image processing of the camera images obtained from LED arrays. The algorithm could detect gaze with both head pitch and yaw angles with an error bounding box of 2.6 cm × 1.5 cm from 1.2 meters distance . We performed an initial feasibility study comparing performance and gaze detection for a sample of 6 typically developing children and children with clinically confirmed ASD diagnosis (ages 3–5; IQ range = 49–102) and variable baseline skills regarding response to joint attention (ADOS RJA Item range: 0–3, mean = 1.2 (1.1)). Within the system, a series of joint attention prompts were administered via either a human administrator (x2) or the humanoid robot (x2) with randomized presentation to control order effect. The child sat in a chair across from the robot or interventionist for the trial block and was instructed through a hierarchy of prompts (i.e., head/gaze shifts, pointing, target activation) to look to a target.
The system registered gaze across all trials and provided reinforcement for looking through a simple reinforcement protocol (e.g., praise and target activation). Available data suggest that children with ASD spent approximately 27% more time looking toward the robot administrator than the human administrator, that they did not fixate on either robot or target, and ultimately directed gaze correctly to the target for 95.83% of the total 48 trials, a rate equal to TD success. Further, children successfully oriented to robotic prompting, meaning they responded to robot prompts prior to target activation, at very high levels (i.e., ASD = 77.08% success; TD = 93.75%). However, note that this study was a single session study and as such did not provide any indication whether the children will respond similarly over multiple sessions. In terms of tolerability, we anticipated a fairly large fail rate across both the ASD and TD samples in terms of willingness to wear the LED cap. Out of 10 ASD and 8 TD children, 6 ASD and 6 TD children completed the study. The completion rates of 60% (ASD) and 80% (TD) were promising, but ultimately specifically highlighted the need for the development of a non-invasive system for realistic extension to a young ASD population commonly demonstrating sensory vulnerabilities.
In this work we wanted to investigate the impact of such a robot-mediated joint attention intervention on attention and performance. As a result, we have modified the ARIA architecture described above in two important ways. First, we wanted to monitor the eye gaze of a child with ASD on the robot. We hypothesized that if the child gets bored with repeated exposure to the robot then he/she will look less at the robot over multiple trials. In order to capture the eye gaze we introduce an eye tracker into the architecture (Fig. 2 and Fig. 3) that monitored the gaze of the child on and around the robot. Second, in order to address the previously observed sensory vulnerabilities due to the wearing of the hat, we included a human therapist in the loop to replace the hat and the camera system who determined when and how the child responded to robotic prompts (Fig. 3).
The robot-mediated intervention system (Fig. 2) is developed around the robot NAO  (Fig. 1). The child with ASD is seated in a booster chair. The robot NAO, which is a humanoid robot made by Aldebaran Robotics, stands on a raised platform in front of the child. NAO is the size of a young child (height = 58 cm, weight = approximately 4.3 kg) and is suitable to a robot-child interaction study. Its body is made of plastic and it has 25 degrees of freedom which allow the user to control its head, fingers and feet independently. Its software modules provide ease of programming and encourage distributed processing. NAO is capable of displaying complex social communication behaviors that are often missing or under-developed in children with ASD. An Eye Tracker, Tobii X120 , is calibrated around NAO such that it can capture the child’s gaze when he looks at the robot. The calibration for the eye tracker is done by projecting the calibration image on a screen at the robot’s position. Small cartoon characters were displayed in each calibration point to attract participant’s eye gaze. After calibration, the screen is removed and the robot is placed at that position. In order to display where the participant is looking on or around the robot in real-time, we use a camera to record the robot motion and then superimpose this video with gaze data. This video is displayed in the monitoring station in another room for parents and other researchers to intuitively understand the participant’s attention on robot.
There are two computer display monitors, one at the left and one at the right of the child, where joint attention stimuli are presented. The robot presents joint attention bids, which are discussed in Section IV, and a therapist observing the child’s response indicates correct or incorrect response by pressing a button that is connected to the robot controller. Based on the child’s response, the robot either presents a new joint attention task if the previous task was successful, or increases the prompt level to get the child to look at the stimuli. The system interconnection is shown in Fig. 3. The robot action and the stimuli presentation are coordinated by the Centralized Controller (CC), which performs the following tasks: i) initiate a session by sending messages to the eye tracker to initiate calibration and record timestamped eye gaze data; ii) send message to the robot to initiate joint attention bid; iii) activate the display monitors with appropriate stimuli; iv) continuously monitor signals from the human therapist to determine whether a trial has been successful; v) based on performance, continue with the trial as discussed above; and vi) end the session by sending messages to the robot, eye tracker and the display monitors. The CC is designed as an event-based system and communication between the CC and the different modules of the system has been implemented using socket communication.
A total of 6 boys completed the tasks with their parents’ consent. The details of the participants are given in Table I. It is a well-documented finding that males are more commonly affected with ASD than girls at a rate of almost 5:1. As such, this predominantly male sample was not atypical for this population. There was no drop-out in this study. Given the primary aims of the study regarding documenting attention/habituation and performance change over time within ASD group we did not include a comparison sample of typically developing controls.
Tables I shows the specific age, baseline cognitive skills, and ratings of autism symptoms for the participating children. All participants with ASD were recruited through existing clinical research programs at Vanderbilt University and had an established clinical diagnosis of ASD. The study was approved by the Vanderbilt Institutional Review Board (IRB). To be eligible for the study, participants had to be between 2–4 years of age and had to have an established diagnosis of ASD based on the gold standard in autism assessment, the Autism Diagnostic Observation Schedule (ADOS) . The parents completed ASD screening/symptom measurements: the Social Responsiveness Scale (SRS)  and the Social Communication Questionnaire (SCQ) .
The stimuli presented via the monitors were pictures of interest (e.g., pictures of children/animals), videos of similar content, and discrete audio and visual events relevant to the pictures. These stimuli were adaptively changed in form or content based on participant’s response in order to provide additional levels of prompts toward target and to ensure that they function as reinforcing objects of interest. The pictures, audio, and video clips were carefully selected from children’s TV programs (e.g., Bob the Builder, Dora the Explorer, etc.). Segmented clips of these shows were selected wherein a dance, performance, or other actions were carried out by the character such that the clip could be easily initiated and ended without abrupt start or end. The clips were also selected based on consultant review that the particular segments were developmentally appropriate and potentially reinforcing to our ASD population. Within the study, these clips were selected to be both part of the prompt and feedback structure, utilized to draw attention as needed and reinforce correct looking.
During joint attention tasks, a hierarchy of prompts was presented by the robot. We choose a least-to-most prompt (LTM) hierarchy , a common convention in ASD intervention, which essentially provides support to the learner only when needed. The method allows for independence at the outset of the task, ensures opportunities for successful performance and reinforcement at baseline, and only provides increasing support when the child has been given an opportunity to display independent skills. Table II explicitly demarcates the prompt hierarchy utilized in our preliminary studies of robot assisted joint attention platforms which emphasizes utilizing the least level of prompting to achieve success as a methodology for shaping and improving performance over time.
Each participant attended 4 sessions, scheduled on different days to assess the cumulative effect of the experimental sessions on participants’ attention and performance. During each session, participants were told they would be “playing with Mr. Robot NAO.” Instructions were, “If you follow Mr. Robot’s words, he will reward you!” To heighten children’s engagement across sessions, we used different video sets as rewards but kept the main procedure in every session identical. Each session included 8 trials, described below for a total of 32 trials across all sessions.
In each trial, there were 6 potential prompt levels. After prompt 2, the robot engaged in successively more attention-directing Robot Speech or Robot Motion. The Target Display also became more attention-getting as more prompts were required. For each of the 8 trials, the system randomly put the target on the left or right monitor. The target direction remained the same within each trial. The robot turned its head or turned while pointing to the corresponding target. After the start of each prompt, a 7 second response time window was set. “Target hit” was defined as the participant responding to, i.e., turning to look at the correct target within this 7 seconds. Regardless of the participant response, the robot turned back to the starting neutral posture. If the participant failed to hit the target, the robot proceeded to the next prompt level. If the participant hit the target, the reward video was shown and then the next trial started. Note that during the first 4 prompts, only a static cartoon picture was presented on both left and right targets. If the participant failed to hit the target in the first 4 prompts, first audio and then video were provided to further draw the participant’s attention to the target. If the participant successfully hit the target before prompt 6, they were immediately rewarded with a 10s video displayed on the target (for prompt 6, the video was incorporated into the prompt.) After each successful hit, NAO also gave verbal rewards.
To evaluate the system’s effect on participants’ responses to the robot, we analyzed i) target hit response rates, and ii) the eye gaze data. The former evaluated whether participants followed the robot’s prompt and looked at the target. The latter evaluated participants’ attention toward the robot during the interaction.
Across all sessions and participants, 99.48% of the 32 trials ended with a target hit. This illustrates that our robot-mediated joint attention intervention system successfully caught and transferred the attention of children with ASD. The average prompt level before participants looked at the target is shown in Fig. 4.
Fig. 4 displays how participants’ performance, as measured by number of prompts until target hit, improved from session 1 to session 4. In session 1, the average target hit prompt level was 2.17. As children completed sessions and became more familiar with the “game”, the target hit prompt level went lower, falling to 1.44 in session 4. A two-sided Wilcoxon rank-sum test indicated that the median difference between session 1 and session 4 was statistically significant (p = .0029). In other words, with more exposure, the performance improvement was found to be statistically significant. For prompt1 to prompt4, the target was only a static picture (no sound or animation). Therefore, if the participant hit the target, one can assume it was because the participant understood the robot’s instruction and accompanying look/gesture. Fig. 5 shows the average percentage of target hits at prompt1 and with prompt 1–4. The trends for the two cases are all increasing. In the first session, 52.8% of trials ended with a target hit on the first prompt, while in session 4 this rate increased to 81.25%. Participants hit the target within the first 4 prompts 87.5% of the time in session 1, and that increased to 95.83% in session 4. Therefore, by session 4, the participants almost always hit the target by following the robot’s gesture and instruction alone, without additional attraction from target itself.
Fig. 6 shows the individual performance for each of the 6 participants, with the average target hit prompt level for each trial depicted on the y-axis. From Fig. 6, we can see that four out of the six participants’ performance improved, one fluctuated and one decreased.
We defined the robot attention gaze region as a box of 76cm × 58 cm which covered the body and arm movement of NAO. Given the distance from the participant to the calibration screen/robot, the accuracy of gaze detection was about 5cm in both the horizontal and vertical directions. The following analysis explains participants’ attention to the robot in terms of their eye gaze pattern.
The gaze pattern was analyzed in two ways: 1) The whole session (from the start of the first prompt to the end of the session), and 2) Within the 7 second response time window for each prompt in a trial. For example, if the participant hit the target on the 5th prompt in a trial, then the time within response time window is the sum total of looking time across all five 7 second time windows from trial 1 to trial 5. Examining looking times across all participants and sessions, the average time that participants looked at the robot was 14.75% of total experiment time. Within the 7 second window across all participants and sessions, the average time that the participants looked at the robot was 24.80% (Fig. 7). From session 1 to session 4, participants’ average time looking at the robot were 14.88%, 15.17%, 17.94%, and 11.02% for the whole session, and 22.15%, 26.52%, 28.14%, and 22.41% for the 7 second response window.
A two-sided Wilcoxon rank-sum test showed that the median difference in looking time between all sessions was not statistically significant, with p-values range from 0.8850 to 0.1797 between different sessions. This indicates that the looking patterns of participants did not change statistically significantly across sessions. In other words, the initial interest that the participants showed towards the robot did not statistically significantly change with repeated exposure.
Qualitatively from the therapist’s analysis we found that children’s attention in session 1 was initially described as focused on the robot itself. At first, they were described as attracted to this unusual “playmate’s” special appearance, accent and flashing LED eyes. Sometimes they were described as seemingly distracted by these aspects and ignored the robot’s instructions. Beginning in session 2, participants were described as focusing on the robot’s instructions with increasingly better performance, as previously discussed. In session 4, the participants were quite familiar with the “game”. They stared at the robot less and responded to the target quickly once the robot gave out the instruction, waiting for the reward. Because of this, the average robot looking time was lower in session 4.
Statistical analyses showed that participants looked at the robot more in the response window than after, which explains why the percentage of looking time for the response window is larger than the one for whole session. We also found that in the video task (prompt 6 and video reward), the children usually looked at the monitor instead of the robot. This suggests that the video effectively captured children’s attention for the final prompt as well as the reward condition.
In this work, we studied the development and application of an innovative adaptive robotic system with potential relevance to core areas of deficit in young children with ASD. The ultimate objective of this study was to empirically test the attention and performance of a robotic system capable of administering and altering a joint attention hierarchy based on performance.
Children with ASD documented sustained interest with the humanoid robot over several sessions and demonstrated improved performance within system regarding joint attention skills. These findings together are promising in both supporting system capabilities and potential relevance of application. Robotic systems endowed with enhancements for successfully pushing toward correct orientation to target, either with systematically faded prompting or potentially embedding coordinated action with human-partners, might be further capable of taking advantage of baseline enhancements in non-social attention preference [46, 47] in order to meaningfully enhance skills related to coordinated attention.
The current system only provides a preliminary structure for examining ideal instruction and prompting patterns. Future work examining prompt levels, the number of prompts, cumulative prompting, or a refined and condensed prompt structure would likely enhance future applications of any such robotic system. While our data provides preliminary evidence that robotic stimuli and systems may have some utility in preferentially capturing, shifting, and attention, it is unclear how such performance would compare to instruction provided by a human administrator in the current study. In many of their current forms, humanoid robots are not as capable of performing sophisticated actions, eliciting responses from individuals, and adapting their behavior within social environments as their human counterparts [10, 48]. Though NAO is a state-of-the-art commercial humanoid robot, its interaction capacities have numerous limits. Its limb motions (driven by servo motors) are not as fluid as human limb motions, it creates noise while moving its hand that is not present in the human limb motion, flexibility and degrees of freedom (DOF) limitations produce less precise gestural motions, it cannot move its eyes – its head orientation is approximated as its gaze, and its embedded vocalizations have inflection and production limits related to its basic text-to-speech capabilities. These limitations may adversely affect task performance. As such, it is unlikely that the mere introduction of a humanoid robot that performs a simple comparable action of a human in isolation will drive behavioral change of meaning and relevance to ASD populations. Robotic systems will likely necessitate much more sophisticated paradigms and approaches that specifically target, enhance, and accelerate skills for meaningful impact on this population. Closed-loop technologies [34, 37] that harness powerful differences in attention to technological stimuli, such as humanoid robots or other technologies may hold great promise in this regard.
There are several methodological limitations of the current study that are important to highlight. The small sample size examined and the limited time frame of interaction are the most powerful limits of the current study. As such, while we are left with data suggesting the potential of the application, the utilized methodology, potently restricts our ability to realistically comment on the value and ultimate clinical utility of this system as applied to young children with ASD. Eventual success and clinical utility of robot-mediated systems hinges upon their ability to accelerate and promote meaningful change in core skills that are tied to dynamic neurodevelopmentally appropriate learning across environments. While we did assess learning within system, we did not systematically compare such improvements to learning in other methods nor did we see if such learning generalized to other interactions. Finally, a pre-intervention evaluation of of each child could more clearly determine the impact of the presented intervention.
As such, questions regarding whether such a system could constitute an intervention paradigm remain open. Another important technical limitation was the utilization of a human confederate within the robotic system loop. While this modification from our original closed-loop system resulted in dramatic improvement in terms of tolerability (all children completed the protocol), such wizard-of-oz paradigms carry additional human resource burdens to accomplish. This highlights the need to develop a non-contact remote eye gaze tracker capable of integration into a closed-loop system.
Despite limitations, this work is the first to our knowledge to design and empirically evaluate the usability, feasibility, and preliminary efficacy of an adaptive interactive robotic technology capable of modifying performance regarding joint attention skills for young children with ASD. More importantly, this is the first study that investigated how repeated exposure to robot-assisted ASD intervention impact habituation and performance. These questions are extremely important to test the ultimate value of robotic interactions in children with ASD, because if the children lose interest in the robot and if their performance degrades, robot-assisted ASD intervention will have limited utility. Fortunately, although in a small sample study, the results show that the children did not lose interest in the robot and their performance improved over time. Few other existing robotic systems [34, 37] for other tasks have specifically addressed how to detect and flexibly respond to individually derived, socially and disorder relevant behavioral cues within an intelligent adaptive robotic paradigm for young children with ASD. Movement in this direction introduces the possibility of realized technological intervention tools that are not simple response systems, but systems that are capable of necessary and more sophisticated adaptations. Systems capable of such adaptation may ultimately be utilized to promote meaningful change related to the complex and important social communication impairments of the disorder itself.
Ultimately, questions of generalization of skills remain perhaps the most important ones to answer for the expanding field of robotic applications for ASD. While we are hopeful that future sophisticated clinical applications of adaptive robotic technologies may demonstrate meaningful improvements for young children with ASD, it is important to note that it is both unrealistic and unlikely that such technology will constitute a sufficient intervention paradigm addressing all areas of impairment for all individuals with the disorder. However, if we are able to discern measurable and modifiable aspects of adaptive robotic intervention with meaningful effects on skills seen as tremendously important to neurodevelopment, or tremendously important to caregivers, we may realize transformative accelerant robotic technologies with pragmatic real-world application of import.
We gratefully acknowledge the voluntary participation of the children and their caregivers in our study. This work was supported in part by the Vanderbilt University Innovation and Discovery in Engineering and Science (IDEAS) grant, in part by the National Science Foundation under Grant 0967170, and in part by the National Institute of Health under Grant 1R01MH091102-01A1.