|Home | About | Journals | Submit | Contact Us | Français|
The recent explosion of technology has moved the field of patient reported outcomes (PROs) into a new era. Use of paper-and-pencil questionnaires administered before and after treatment has been eclipsed by highly sophisticated random prompts for symptom ratings at multiple points throughout the day, a method known as ecological momentary assessment (EMA). During the last 25 years, research has demonstrated that retrospective ratings are subject to a variety of cognitive heuristics that can distort the report. Initially, this was addressed by adopting paper diary protocols involving multiple ratings in a day or across a week. Technology was also advancing, and some researchers began to utilize electronic platforms for EMA assessment. A good deal of research has been conducted comparing paper and electronic formats. Issues of compliance have been particularly problematic for paper diaries. Electronic technologies can be expensive and require expertise in programming and data management. Not all research questions will require intensive momentary assessment, and end-of-day ratings may be adequate for many applications. What is required of the investigator is familiarity with the strengths and weaknesses of the methods and platforms available as well as a reasoned decision to elect a particular methodology for the study question at hand.
The historical roots of diaries, as a form of data collection go back to at least the early part of the 20th century. In order to better understand the etiology and course of symptoms, patients were asked to keep an on-going symptom record that could be reviewed by the physician. However, it was not until later in the century that the method gained scientific attention. Csikszentmihalyi, while at the University of Chicago, developed the “Experience Sampling Method” during his seminal work on the “flow” of personal experience. Rather than limiting the characterization of experience to a single rating or set of ratings provided retrospectively, Csikszentmihalyi asserted that the unfolding of experience on a moment-to-moment basis contained unique information not available from traditional self-report methods. At the same time, the blossoming of the Skinnerian behavioral movement in psychology in the 1970's viewed human behavior as the result of learned experiences happening at the moment in the person's environment. Consequently, behavioral observation of momentary events and reactions by an individual were viewed as theoretically and methodologically superior to traditional questionnaire assessment. Shortly after that, psychologists in Europe and the United States began to apply this new approach in a variety of studies, such as schizophrenia, the relationship between stress and onset of physical illness symptoms, and daily experiences and mood. The term, “Ecological Momentary Assessment (EMA),” became widely adopted in the United States to refer to methods involving multiple random assessments within a day. Initially, diary studies gathered observations using paper-and-pencil diaries. However, once hand-held computers became widely available and relatively inexpensive, electronic diaries began to be used.
From a practical and economic point of view, traditional baseline and post-treatment assessment of symptomatology with a questionnaire is by far the easiest. Many self-report instruments have a long history of use in clinical trials and good reliability and validity data associated with them. They are inexpensive for the researcher to use and involve minimal burden on the patient. So, why has there been a movement to more burdensome diaries? The most important reason stems from accumulating evidence of bias in retrospective questionnaire ratings. Most assessment instruments have recall periods that range from days to weeks or months. Thus, patients are being asked to generate a summary rating of patient reported outcomes (PROs), such as pain, fatigue, sleepiness, or nausea over some period of time. This recall task assumes that the patient has access to all of the relevant symptom experiences over the recall period and is able to aggregate them to generate an accurate representation of their symptom experience. Emerging research suggests that both of these assumptions are faulty.[8, 9] Moreover, there is evidence that as patients attempt to remember and report their symptoms, cognitive heuristics become operative and result in distorted reports. Diaries that ask the patient to simply report on his experience at the moment theoretically avoid these threats to the validity of the assessment.
While diaries can avoid recall bias, the high frequency of some momentary assessment protocols has raised concern about reactive effects. Does the process of repetitive monitoring and reporting alter the phenomenon being assessed? A few studies have examined this question and have failed to uncover evidence of reactivity.[12, 13] However, there may well be instances in which reactivity could take place. Likewise, the issue of stereotypic responding as the protocol extends over long time periods needs to be investigated.
When patients are recalling experiences retrospectively, research has shown that several cognitive heuristics can come into play even over short recall periods (see Fredrickson for review). Two heuristics are known as the “peak” and “end” effects. Redelmeier and Kahneman conducted a study with patients who remained awake during a colonoscopy or lithotripsy. Patients rated their pain throughout the procedure, and at the end of the procedure they provided an overall rating of their pain. Analyses indicated that the recalled pain was not a simple arithmetic average of the pain, but instead was more heavily weighted by the peak pain and the pain during the last three minutes of the procedure. A second study by these investigators randomly assigned colonoscopy patients to a usual colonoscopy procedure or to one in which at the end of the procedure, the colonoscope was left in the rectum without moving for several minutes – thus producing no pain. Despite the fact that the second condition was longer, patients rated the colonoscopy as less aversive compared with the shorter, usual procedure. Moreover, those patients indicated that they were significantly more willing to undergo another colonoscopy at a future date. Thus, the “end” pain had a significant impact on patients' ratings. We examined these phenomena in naturalistic, prospective studies of patients with chronic pain. In the first study of rheumatoid arthritis patients who completed up to 7 pain ratings a day for 7 days, the combined variable of peak and end pain from momentary ratings corresponded significantly greater to the 7-day recall at the end of the week compared with the mean of all momentary reports. In a second study, using data collected over two weeks by patients with chronic pain, we also found evidence that one week recall of pain was over-weighted by the peak of pain during that week. We also observed that patients with more variability in their momentary pain gave higher recall pain ratings, even though their mean levels were not different from those with less variability. Other researchers studying other populations have observed similar outcomes.[19, 20]
These findings demonstrate that retrospective ratings are the product of multiple selected aspects of the experience being reported. Robinson and Clore have written eloquently about the types of memory accessible across different time frames. When patients cannot access in memory some or all of the specific experiences being assessed, there is evidence that they rely upon general beliefs or knowledge about their experiences. Even though beliefs at times may be generally good estimates of experience during a targeted interval, some situations may evoke distortion. Of relevance to clinical trials is the finding that patients who want to believe that a treatment was helpful may distort their memory of baseline status to yield an exaggerated view of improvement. Two studies have demonstrated that at post-treatment ratings of pre-treatment status can be recalled as worse than they actually were, thus increasing the apparent effectiveness of the treatment.[22, 23]
Currently, there is controversy surrounding the choice of paper or electronic diaries. Paper diaries preceded the technological advancements that led to the availability of small, hand-held computers to use as electronic diaries. The advantages of paper diaries include inexpensive assembly and little training required for patients in their use. A major disadvantage that has become apparent from studies in recent years is compliance with the timing of completion of paper diaries. Two studies have found significant non-compliance with paper diaries over a three-week recording period in samples of patients with chronic pain.[24, 25] In the first study with a sampling density of three fixed-time assessments per day, compliance with paper was directly compared with electronic diaries in patients with chronic pain randomized to one of the two data collection methods. The paper diaries were surreptitiously imbedded with a microchip that could determine when the diary was opened to make an entry. Although patients indicated on the paper diary cards that they completed the entry at the scheduled assessment 90% of the time, in fact, the actual compliance was only 11% (allowing a ± 15 minute response window). When the compliance window was expanded to ± 45 minutes, the verified compliance was 20%. Patients recorded entries dated for more than 90% of the days, yet the electronic chip determined that the binder was not even opened on 32% of the study days, indicating that patients filled out the ratings for those days on another day. Moreover, our data found evidence of both backfilling as well as forward filling of symptom ratings. These outcomes were in contrast to 94% verified compliance in the patients randomized to the electronic diary condition. Our second study from the same pool of patients attempted to improve paper diary compliance by conducting the same protocol and adding a watch with pre-set alarms to remind the patient to complete the diary at each of the three assessment times during the day. As in the first study, the self-reported compliance was over 85%, however, the verified compliance was 29% for the ± 15 minute response window and 39% for the ± 45 minute response window. Thus, signaling did provide some improvement in compliance. Nevertheless, less than half of the patients opened the paper diary at least once every day of the 21 days of the study. Furthermore, verified compliance dropped significantly after the first week of symptom ratings.
To date, these are the only studies that have empirically examined compliance with paper diaries. The results have been sobering and have stimulated a spirited dialogue. Other researchers have presented data to suggest that compliance with paper diaries might be higher, but the evidence is indirect and unverified. A noteworthy commentary by Tennen and colleagues suggested that acceptable compliance with paper diaries is probably possible in some samples and under some circumstances. The research participant's motivation for being in the study, the perceived importance of the study, the collaboration established between researcher and participant, the characteristics of preparation and training to provide the ratings, and the length and burden of the questions are likely important determinants of how successful participants will be in adhering to the assessment protocol. Nevertheless, without having in place some procedures for verifying when questions were answered, the researcher will have to accept on faith participants' self-report of compliance.
A recent systematic review of nine health-related studies comparing data collected by paper and electronically examined the feasibility, compliance, data accuracy, and respondent preference for the two methods. Five of the nine studies reported occasional technical difficulties with the electronic diary, most often battery/power problems as well as software or hardware malfunctions. Two studies monitored the required hours for data entry and found that the electronic diary took only a fraction of the time compared with paper (e.g., 4 hours versus 96 hours). However, appropriately the review's authors commented that the costs of electronic hardware as well as the time for software programming and uploading and downloading of data on the electronic units can be substantial. There was a clear pattern across most studies indicating less missing data and fewer entry errors in the electronic version compared with paper. Four of the nine studies reported evidence that patients were falsifying paper entries, whereas this was precluded in the electronic version. Finally, patients rated both methods easy to use, but there was a clear trend toward preference of the electronic method.
There are also some instances in which use of electronic data capture may be preferable to paper based on the respondent's view of confidentiality of the ratings. Turner and colleagues found that adolescents reported more sexual behavior in a computer assessment than in a paper version.
Of course, as the technology advances, so does the capacity of electronic diaries to conduct highly sophisticated assessment protocols involving branching of queries based upon an initial response (e.g., presence of headache), context-driven assessments (work versus home), and biological state (glucose level, blood pressure, or alcohol intoxication). The platform for electronic data capture is also expanding. Cell phones and smartphones (e.g., BlackBerry) are being used to signal and record patient reports as are interactive voice recorded computer/telephone systems,[32-34] including direct voice capture of responses. Finally, with rapidly increasing access to the internet across the socioeconomic continuum, many patients can complete diary ratings on websites provided to them.
The primary rationale underlying momentary assessment is to collect a patient report that is not distorted by recall bias or memory deficit. As mentioned above, some questions require precise collection of ratings in order to investigate the relationship between phenomena that change across hours. Such studies require a high resolution of the PROs being investigated. Furthermore, they require random sampling of experiences in order to prevent reporting of selected experiences due to their salience, meaningfulness, or distress to the respondent. Non-random, selected reporting could easily undermine the representativeness of the data being collected. With higher levels of sampling density of ratings, the possibility of examining prospective relationships between variables is created. This can be done using time series or cross-lagged panel analyses of experiences at Time 1 predicting experiences at Time 2. Examples of such studies include determining the effect of exercise on pain, the associated characteristics and course of migraine headaches, the relationship between activity level and onset of pain, emotional reactivity and depressed mood, the relationship between mood and pain, and the antecedents of cigarette smoking. These studies have yielded data that have confirmed existing conceptualizations (e.g. the course and associated symptoms of migraine headaches and also have challenged them generating new insights into the variables being studied (e.g., antecedents of smoking a cigarette.
There are other applications that may tolerate a less intensive sampling density of assessments, such as asthma attacks. When infrequent events are being assessed or when lagged relationships are not the focus, end-of-day reports may suffice for a variety of PROs.[42, 43] These may be collected via paper questionnaire and mailed, and they may yield adequate compliance when next-day postmarks are monitored. Although this method does not guarantee that the questions were answered on the correct day, it does substantially reduce hoarding and backfilling problems. Alternatively, end-of-day electronic assessments can be used with automatic time and date-stamping of the entry. Our laboratory has just completed a study examining the relationship between six daily momentary ratings of pain and of fatigue and end-of-day ratings (both collected electronically) across 28 days for patients with chronic rheumatological conditions, and we found that an end-of-day rating provides a good representation of these symptoms for the day. Finally, there may be instances when end-of-day reporting is preferable to momentary assessment, as in cases where episodes of infrequent events are targeted and could be missed by momentary ratings or the burden of frequent assessment would not be justified.
Apart from literacy problems and substantial cognitive or physical deficits, diaries can be completed by most individuals. Diary data have been conducted successfully in children and in older, chronically ill adults.[12, 40] Even patients with serious illness, such as cancer patients undergoing chemotherapy, are able to provide diary assessments. Concerns about older, computer-naive adults being unable to learn the diary task are largely unfounded. The evidence suggests that they are able to comfortably engage in diary methodologies using paper and electronic means.
The particular questions being addressed by a study and the natural course of the symptoms or behavior being monitored should be the determinants of the assessment protocol. Interested readers can find guidelines for the design of momentary assessment protocols in several review articles.[42, 46-48]
Another important consideration for studies using diaries that generate multiple within-person measurements is the issue of data analysis. Although in some cases it may be appropriate to aggregate across ratings to create a single, mean rating for the reporting period, in many cases the complexity of the data issues will dictate a more sophisticated approach using multi-level modeling. Several reviews of these methods are available.[42, 49, 50] Given the multiple ratings generated by diaries, the reliability of the construct being measured and the variability across time can be examined in ways not available in traditional single-point assessment methods.
The methodological literature exploring the characteristics of different methods of collecting PROs has increased in recent years, primarily due to the widely expanded options afforded by new technology. Much of the debate has been cast as a horse race across methods that leads to conclusions framed in terms of “winners” and “losers.” The question is not, which is the superior method? Rather the researcher needs to become acquainted with the factors that are at play in the methods that are available. Different methods bring different strengths and weaknesses including patient burden, researcher burden, cost, and compliance. Moreover, the phenomena and inter-relationships being examined will necessitate some method characteristics, while allowing leeway in others. Highly sensitive information may benefit from the impersonal and protected nature of electronic assessment compared with paper diaries or personal interviews. PROs of chronic, stable conditions may be well-served by paper or electronic end-of-day assessments. What has become clear is that simply asking patients to provide PROs in a particular manner does not guarantee that it will be accomplished according to protocol. The extent to which these deviations may impact the study question must be evaluated in each case. It is also important to bear in mind that employing electronic methods does not guarantee the validity of the measurement. A sampling density that is insufficient may miss important events. And, electronic methods are still self-report and may be poor indicators of some phenomena compared with physiological monitoring (e.g., hot flashes). In all cases, the selection of an assessment methodology should be thoughtful and informed, and the rationale for the chosen method should be outlined in study reports.
This writing of this paper was supported in part by an investigator-initiated grant from the National Institutes of Health (1 U01-AR052170-01; Arthur A. Stone, principal investigator). The NIH has neither been involved in the design or interpretation of our research, nor in the preparation or review of this manuscript.
Conflict of interest disclosure: the author has a financial interest in invivo data, inc, Pittsburgh, PA, a company providing electronic data services to the pharmaceutical community. There were no other contributors to this paper.