|Home | About | Journals | Submit | Contact Us | Français|
Work towards the development of a handheld health counseling agent designed to promote physical activity is described. Previous work on automated health counselors is discussed, along with the affordances of mobility and context awareness for health behavior interventions. We present a general-purpose software architecture for the rapid design and deployment of mobile health counseling agents. We also describe the results of an initial field trial in which such a mobile agent plays the role of an exercise coach designed to motivate users to walk more. Results were mixed. We found that the context awareness mechanism that was implemented for detecting walking led to greater user-agent social bonding, but less walking in study participants.
A number of technologies have been developed over the last three decades for effecting health behavior change in users. These systems provide health information and counseling based on a wide variety of health behavior theories, using a variety of communication media to provide the information (desktop computers, automatically tailored print, mobile devices, and automated phone systems), and have been applied to a number of behaviors (physical activity promotion, diet adherence, medication regimen adherence, smoking cessation, chronic disease self-management, and others). Overall, these systems have been shown to be effective in a number of randomized clinical trials .
Effective health behavior change interventions take into account the messy details of people’s lives and deliver tailored motivational and informational messages based on this information, along with user characteristics such as motivational readiness , past behavior, ethnicity and age. Given this complexity and nuance, perhaps the most effective technologies are those which come closest to the “gold standard” of one-on-one, face-to-face counseling with an expert health provider. Automated health dialogue systems emulate this form of conversational interaction to communicate health information to users in a format that is natural, intuitive and dynamically tailored . Health dialogue systems that also use an animated agent in the interface (such as FitTrack, Figure 1) have the additional ability to use nonverbal cues— such as hand gesture, head nods, eye gaze motion and facial display—as additional communication channels to convey information, regulate the structure of the conversation, and communicate agent attitude and affect. These latter aspects are particularly important given that many studies have now demonstrated that health provider empathy and the quality of the provider-patient working relationship are key factors in improving patient satisfaction [6,13,16,37], improved adherence to treatment regimens [4,5,17,18] and improved health outcomes [22,23,32,33].
Mobility and context awareness represent important directions of research for such counseling agents, in that these features have unique affordances for health behavior interventions. A mobile/wearable agent could be consulted whenever and wherever the user needs help, for example, with a diet choice or dealing with an unhealthy craving. The ability to sense some aspects of the user’s environment (context awareness), especially those aspects related to health behaviors of interest, provides the agent with the ability to proactively intervene in real-time.
Mobility and context awareness also provide unique affordances for agent-user relationship formation and maintenance. A wearable agent has the potential to be with a user for a significant period of time (even continuously), and frequency of contact alone has been shown to be associated with increased solidarity between people. In addition, simply being with a user during significant portions of his/her life may lead to increased perceptions of familiarity and common ground. Since the agent may also be with the user during private experiences, the user may perceive a high level of intimacy with the agent, even without explicit self-disclosure . Proximity may also work to a wearable agent’s advantage, since, by necessity, a wearable device must be kept physically close the user, and close proximity is associated with higher immediacy, liking and solidarity [3,31]. Mobile devices also tend to be shared less often than their desktop equivalents, and this increased exclusivity may also lead to greater social bonding. Automatically recognizing and commenting on situations in the user’s life (context awareness) may amplify many of the perceptions described above, including familiarity, common ground, solidarity and intimacy. In addition, an agent’s ability to actively interrupt and help a user in a situation that is automatically sensed by the agent may lead to increased perceptions of trust and caring by the user (assuming this can be done reliably and without annoying the user too much).
In this paper we describe the design and implementation of a mobile conversational agent that plays the role of an exercise counselor who attempts to motivate users to walk more. We first discuss related work and the design of a general-purpose mobile platform for health counseling agent research, followed by a description of a preliminary study to evaluate the impact of context awareness on user-agent relationship formation and health behavior change outcomes in a mobile exercise promotion agent.
We briefly review previous research in the development of context aware mobile computing devices for health interventions, relational agents for health behavior change, and health counseling agents on mobile devices.
Mobile, context-aware devices have been developed for several health interventions. Amft et al  used an ear-worn microphone to determine information about a user’s eating habits for a dieting intervention. Intille et al  described a prototype system for the prevention of congestive heart failure which used a context-aware device to ask patients questions at the time and place when the answers would be most relevant to a health professional in making a correct diagnosis. The system sensed user context by combining readings from both physiologic and location sensors, inferring higher-level information, and using that information to determine which health-related questions to ask at a given time. Sung et. al.  developd LiveNet, a wearable computing platform for assisting patients with physical rehabilitation that combined physiological and contextual sensors to build a rich profile of a user’s physical health over time. LiveNet also provided real-time relevant feedback to users in order to promote healthy behavior at the point of behavioral decision making. There are also a growing number of wearable commercial products in the self-help health market that provide feedback to users during or after physical activity, based on physiological sensors (typically accelerometers and heart rate sensors). The Nike+iPod system, for example, integrates a foot-worn pedometer with a portable music player to monitor and provide feedback regarding a user’s exercise activities.
Relational agents are computational artifacts designed to establish long-term, social-emotional relationships with their users . Some of the earliest research in this area was performed by Reeves and Nass in their “Computers as Social Actors” studies involving lab studies of very simple agents with minimal text-based interfaces . Some of the many relational effects they demonstrated include: increased liking for agents that flatter the user; increased liking for agents that match users in personality; and increased liking for agents that praise others rather than praising themselves. Many studies in this same paradigm have since been conducted by other researchers demonstrating, for example, that agents that use humor , self-disclosure , or empathy  are better liked, compared to equivalent agents that do not.
Bickmore has explored user-agent relationships as mediators in task performance, specifically within health behavior change interventions, using animated health counseling agents (Figure 1) [8,12]. In two longitudinal studies he demonstrated the ability of these agents to establish a social bond (“therapeutic alliance”) with users during repeated interactions over time, although these bonds have yet to be conclusively shown to lead to improvements in health outcomes.
A few researchers have developed health counseling agents for use on mobile devices. Johnson, et al, developed DESIA, a counseling agent on a handheld computer that teaches problems solving skills to mothers of pediatric cancer patents. The system features an animated conversational agent who uses balloon text or recorded speech output for the agent utterances . Outcome evaluations have yet to be reported.
Bickmore conducted a study of the nonverbal conversational behavior exhibited by users when interacting with a health counseling agent on a handheld device, and found that they exhibit the same range of behavior as when conversing with a person, but at a significantly lower frequency . More recently, he conducted a study comparing an animated health counseling agent on a mobile device to equivalent agents that had text only or text and static image representations, and found that the animated versions led to significantly better social bonding with users . He also conducted a series of studies exploring the curvilinear relationship between the perceived politeness of a protocol for interrupting the user with a health behavior request and long-term adherence to a desired health regimen .
We have developed a general purpose health counseling agent interface for use on Personal Digital Assistant (Figure 2). The animated agent appears in a fixed close-up shot, and is capable of a range of nonverbal conversational behavior, including: facial displays of emotion; head nods; eye gaze movement; eyebrow raises; posture shifts and “visemes” (mouth shapes corresponding to phonemes). These behaviors are synchronized in real time with agent output utterances, which are displayed in a text balloon rather than using speech, for privacy reasons. The words in the agent utterance are individually highlighted at normal speaking speed and the nonverbal behavior is displayed in synchrony. As reported in the previous section, this form of interface was found to be superior for social bonding, compared to a static image of the agent or a text-only display . User inputs are constrained to multiple choice selections and time-of-day specifications at the bottom of the display.
Interaction dialogues are scripted in an XML-based hierarchical state-transition network, which allows for the rapid development and modification of system behavior. Scripts consist primarily of agent utterances (written in plain text), the allowed user responses to each agent utterance, and instructions for state transitions based on these responses and other system events (timers, sensor input, etc.). Scripts are authored using an Eclipse-based visual design tool, which enables non-programmers to create health dialogue content. Figure 3 shows a fragment of the script language specification. Of particular interest are “wait” commands, which blank the PDA’s screen and suspends execution until one of the specified conditions occurs. Triggering events include passage of a specified length of time or a specified variable being be set to a particular value. A background process continually updates a small set of variables that track the user’s bout status based on accelerometer readings, and these can be specified in the triggering condition of a script wait command, allowing script execution to resume once a specified walking state has been achieved.
Once a script is written, it is preprocessed using the BEAT text-to-embodied-speech engine , which automatically adds specifications for agent nonverbal behavior. In addition, each word of each utterance is processed by a viseme generator (based on the freeTTS text-to-speech engine ) that provides the appropriate sequence of mouth shapes the agent must form in order to give the appearance of uttering that word.
Protocols for interrupting the user can be very flexibly defined using a variety of wait states and state transitions conditioned on events. During specified wait states, the PDA’s display shuts off, and the interface remains dormant until some condition is met, however, sensor inputs and other background processes remain active. Example wake up conditions include specific times of day, changes in user behavior as measured by sensor input, or hardware key presses by the user. The particular modality of an interruption can consist of various combinations of audio tones and/or visual cues presented on an arbitrarily complex schedule. User failure to respond to an interruption (or any agent utterance) can also be handled in a flexible manner.
The architecture of the run-time system on the handheld is shown in Figure 4. The actions of the system are primarily controlled by a finite state machine, which is built at run time according to the XML script. The Agent/Interface module comprises the relational agent itself (graphics, animations, audio, etc.), as well as areas for text output and user input in the form of clickable buttons which effect state transitions. During time periods in which the script does not explicitly specify agent actions, the idle action system takes over control of the agent, randomly performing various idle behaviors (eye blinks, posture shifts, etc.).
Details of the animation and action subsystem are provided in Figure 5. Upon a transition to a new state, the contents of that state are broken down into individual agent actions, as well as system directives (store a piece of information, wait for a particular time to pass, etc.). These actions are then placed, in order, into an Action Queue. Items are pulled from the front of the queue based on the current state of the agent, and executed; upon a given action’s completion (e.g. the end of a sequence of animation frames is reached), the next action is pulled from the queue, and so on.
The animation system will also automatically add animation segments required to transition the agent from its current state into a required state in order to execute a specified command. For example, if the agent’s head is turned away and an eyebrow raise is commanded (and there does not exist an animation sequence depicting an eyebrow raise with head turned), the system recognizes this fact and inserts appropriate actions into the front of the queue. In this particular case, a face-forward action would be added ahead of the raise-eyebrows action before execution commences.
If necessary, the animation subsystem can be interrupted or overridden at any time by system events or other conditions. The Action Queue is automatically flushed whenever a state transition occurs to ensure that animations and other display directives do not get out of sync relative to script execution.
The run-time software was developed entirely in Macromedia Flash, and the software is deployed in “kiosk” mode so the PDAs can only be used to run our software. We are using Dell Axim X30 Pocket PC computers for development and experimentation, and all important interaction and study data is logged to a flash memory card in the device. We have also developed a custom case for the PDA, which can be worn either on the waist like a large pedometer or pager, or in a shoulder harness. The initial application domain for the handheld agent is exercise promotion using an integrated 2D accelerometer so the agent can tell whether a user is currently walking at a moderate intensity or not (65% of heart rate reserve, based on walking speed, calibrated for each user ). For this application, the PDA is worn either on the waist or in a shoulder harness so that the accelerometer can function properly (Figure 6).
In order to explore the impact of context awareness on health behavior and perceived relationship with a mobile health counseling, we conducted a pilot study to compare a mobile context aware agent with an otherwise identical agent without sensing ability. The application domain was exercise promotion: both agents attempted to motivate users to walk more, but in one condition (AWARE) the agent could sense whether the user was walking at moderate intensity or not and automatically and proactively provided feedback to the user whenever they finished a walk, while in the other condition (NON-AWARE) the user had to explicitly tell the agent when they were starting or ending a walk.
The primary hypotheses evaluated were that: (H1) context awareness would lead to greater social bonding; and (H2) context awareness would lead to increased physical activity. These hypotheses were evaluated in an eight-day field study, in which the two treatments were compared in a counterbalanced within-subjects design experiment, with each treatment lasting four days. The study was approved by the Northeastern University IRB.
Two interaction scripts were authored for the mobile platform described in Section 3 to support the experiment. In order to minimize extraneous factors, the interactions were kept as minimal as possible except for the desired manipulations.
In the AWARE condition, the system continuously monitors for the start of moderate intensity walking (based on accelerometer readings). When the PDA detects that the user has stopped walking at moderate intensity for at least one minute, it interrupts the user with an audio tone and the agent either: (a) provides positive reinforcement if the bout was 10 minutes or longer, of the form “You walked for X minutes and got some good exercise. Great job, Joe!”; or (b) if the bout was less than 10 minutes long, simply states “You walked for X minutes.”. In addition, if the user pushes a hardware button on the PDA, the agent displays the number of steps walked since midnight and, if they are currently walking at moderate intensity, provides the encouraging message “Looks like you are walking, that is great!”.
In the NON-AWARE condition, the accelerometer is not used for anything visible to the user (including step counts). When the user pushes the hardware button on the PDA, the agent appears and the user can say “I’m starting a walk now.” Once a walk is completed, the user can push the hardware button on the PDA again and the agent appears and, just as in the aware condition, either provides positive reinforcement for a bout of 10 minutes or more, or a simple minute count for shorter bouts. In this condition, the PDA only alerts the user if it appears he or she has forgotten to indicate they have completed a walk, in which case the agent will interrupt them after 30 minutes to ask them if they have finished walking.
In addition to the above interactions, the agent can be asked for a “webcode” at any time after 6PM. The webcode is a 15-character string that encodes all of the significant study data for the current day (e.g., number of steps, number of walking bouts, study day, etc.) along with a checksum. A website was also created that can be used by participants to report these webcodes on a daily basis, so that we can incrementally capture data during a longitudinal study in case a PDA is lost, stolen or malfunctions. The website also delivers self-report questionnaires to study participants at appropriate times during the study, and provides help and a feedback mechanism.
The PDA automatically records the following measures of Physical Activity, based on data from the accelerometers:
The following self-report measures of User-Agent Relationship were acquired via the scheduled web questionnaires:
In addition, one self-report item was included as a Manipulation Check:
Eight participants were recruited from the Northeastern University campus via fliers, and were compensated for their time. Participants were screened for any health conditions that may have been contraindicative to beginning an exercise program using the Physical Activity Readiness Questionnaire (PAR-Q ). Participants were all male, ranged from 19–23 years of age, and had moderately high experience with computer technology (average 3.5 on a scale with 1=“never use a computer” to 5=“expert”). Participants had body mass indices (BMIs) of 25.5 (+/−5.0), with 3 individuals categorized as “overweight” and 1 categorized as “obese”. Relative to the current US government recommendations for physical activity  and a standard measure for progress in health behavior change , 3 individuals were found to be in contemplation, 3 in preparation, and 2 in action stage of change.
Upon completion of the study, one participant was found to have never used his PDA (zero steps and zero minutes of moderate intensity walking recorded for every day) and several of his self report measures were significant outliers relative to the others in the study, so his data was removed from subsequent analysis.
Participants held an initial meeting with a research assistant in our HCI lab. After initial screening, they were consented and then filled out a demographic questionnaire. After this, they were asked to walk on a treadmill while wearing the PDA so that the walking frequency at which they reached 65% of their heart rate reserve (corresponding to “moderate intensity” physical activity ) could be calibrated. They were then trained in the use of the PDA for the first study condition, told how to access and use the website (via user ID and password), and sent home with the PDA.
For the next eight days each participant wore their PDA continuously during the day, logged into the website each evening to report their webcode, and plugged the PDA into a charger overnight. On the fourth evening, each participant was given instructions for the next study condition that began automatically the following morning. On the fourth and eighth evenings the website also administered a questionnaire with the self-report measures described in Section 4.2.
Following their eighth day, participants returned to the HCI lab to return their PDA and for debriefing and payment.
Repeated measures ANOVAs were run on all measures with BMI and stage of change as covariates.
Participants scored the AWARE condition of the study higher on perceived awareness, F(1,4)=8.6, p<.05, confirming the manipulation.
On relational measures, participants rated themselves as having a closer relationship with the AWARE agent compared to the NON-AWARE agent (RSHIP), F(1,4,)=8.6, p<.05, but differences on all other relational measures (including WAI) were not significant.
All measures of physical activity indicated that participants did more walking in the NON-AWARE condition of the study: on STEPS, F(1,4)=146.5, p<.001; on MODP, F(1,4)=74.3, p<.001; and on BOUTS, F(1,4)=92.0, p<.001.
No significant order effects were found.
We expected our manipulation to lead to an increased perception of the agent’s awareness of the user’s behavior, which would lead to increases in relational bonding, which, in turn, would result in participants performing more walking in the AWARE condition. We discuss the results for each of these three variables in turn, with supporting comments from participants.
Our one measure of awareness indicated that users did perceive our awareness manipulation:
“She was aware, she was observant of my movements.”
“… she was observant and perceptive to how much walking I did and she would ask questions that showed she was aware of what I was doing.”
However, we feel that there were several mitigating factors that potentially weakened the effect of our manipulation. First, several participants felt that the system’s walking detection was not very reliable:
“There were times that I wasn’t sure if it was picking up if I was walking briskly because I figured that I had to hit a certain speed. And I wasn’t sure exactly what speed that was. And sometimes I would walk briskly and I don’t think it did pick it up.”
“I wasn’t sure if it was always there.”
Second, participants felt that the NON-AWARE system was aware of their behavior as well, but in different ways:
“Well, I mean, obviously when I was telling her I was going for a walk she was aware [of my walking].”
(Researcher): “How did you feel that she was aware?” … “Just because she would give me feedback at the end of every walk, and if I forgot to hit when I was done it would start beeping: ‘you’re done walking right?’”
Although one of our relational measures indicated increased social bonding in the AWARE condition, most of the measures, including the one that should have been the most sensitive and comprehensive—the WAI—did not indicate that there were significant differences between treatments. A few participants mentioned that they preferred the AWARE condition:
“I liked that, it was definitely better to use than having to say ‘I’m starting now’ or you know, start and stop.”
However, in addition to the weak manipulation of awareness discussed above, the perception of low reliability of the sensing mechanism may have directly contributed to social distancing:
“I liked the second one, or, I mean the first one better just because of that insurance I had that it was working.”
Finally, the audio interruption at the end of a walk in the AWARE condition, to signal to the user that the agent was ready to provide positive reinforcement, was actually found to be annoying by at least one participant:
“It beeps, which is a little bit annoying…
The physical activity results from the study were exactly opposite from those we hypothesized. In hindsight there are several reasons why this may have happened, in addition to the problems listed above. First, the bout detection algorithm waits one minute after a user transitions from moderate intensity walking to sub-moderate intensity before signaling the end of a bout, in order to allow for users needing to pause temporarily (at stoplights, etc.) during their walk. Thus, in the AWARE condition, the agent only offered positive reinforcement one full minute after the user successfully finished a walking bout. In comparison, there were no such delays in the NON-AWARE condition, leading to more immediate feedback and reinforcement. One of the tenets of behavioral reinforcement is that rewards should follow reinforced behavior as soon in time as possible, and this difference in delays could have been significant.
The perception of low reliability may have also directly and negatively impacted physical activity in the AWARE condition; if participants did not feel that the walking they did do was being accurately monitored, they may have lost the motivation to walk altogether.
Finally, the act of explicitly telling the agent that they were going for a walk may have represented an independent positive influence on participants’ motivation in the NON-AWARE condition:
“I think that the one where you were saying ‘I’m going for a walk’ was better, as opposed to just passively monitoring. … Because, you know, you are saying to your self ‘I am going to go walking at that pace now, and get a little exercise.’”
We have described our experimental platform for mobile health counseling agent research, and the results from a preliminary pilot study evaluating the impact of context awareness for these agents. Obviously, much work remains. Building health counseling agents that are not just used in the laboratory, or even on the desktop, but rather are thrust into the middle of people’s everyday lives are extremely challenging to design to reliably produce intended effects. One significant lesson from the pilot is that the perception of agent reliability (at least for critical functionality) is a pre-requisite not only for effective health outcomes but for relational bonding as well.
The study described here is a pilot for a larger field study in preparation that will evaluate the efficacy of “just in time” counseling information on health behavior change. We will be comparing positive reinforcement and “problem solving” (helping users overcome obstacles) delivered either in real-time, on the basis of sensors, or in an end-of-day retrospective review session. This larger study will avoid many of the problems encountered in the pilot by having participants schedule when they will try to go for walks, and focusing sensing, intervention and interaction within these time windows. In addition, during these scheduled windows, the PDA will explicitly notify users via an audio tone when they are or are not walking at moderate intensity, which should improve the perceived reliability of the system’s walking detection.
Significant opportunities exist in making health counseling agents that are mobile, can sense the user’s environment, and that have the social and relational competencies required to maintain on-going relationships with their users.
Thanks to Brian Sweeney and Francisco Crespo for their help in preparing and running the evaluation studies. Stephen Intille (MIT House_N project) developed the signal processing software for the accelerometer, and provided significant input on the design of the system, as did Dr. Amanda Gruber (Harvard Medical School). Erin Rapacki developed the custom PDA cases. Jennifer Smith provided many helpful comments on the paper. This work was supported by NIH National Library of Medicine grant R21LM008553.
Timothy Bickmore, Ph.D., is an Assistant Professor in the College of Computer and Information Science at Northeastern University. Dr. Bickmore’s research involves the development and evaluation of technologies for health education and health behavior change, such as exercise promotion, smoking cessation and medication adherence. These technologies emulate face-to-face interaction between health professionals and patients spanning multiple interactions over extended periods of time in home-based and hospital-based interventions. Dr. Bickmore is the recipient of multiple grants from NSF, NIH and healthcare companies. Dr. Bickmore received his Ph.D. in Media Arts & Sciences from the Massachusetts Institute of Technology, where he was a member of the Gesture & Narrative Language and Affective Computing research groups at the Media Laboratory. Before joining Northeastern, Dr. Bickmore was an Assistant Professor at the Boston University School of Medicine.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.