|Home | About | Journals | Submit | Contact Us | Français|
Survey self-reports of health status, symptoms (pain and fatigue), and life satisfaction often serve as outcomes in clinical trials. Prior studies have shown, however, that such reports can be subject to context effects, which could threaten their validity.
We examined the impact of 2 context effects: the effect of the reporting period associated with a question (No Period Specified; Last Month; Right Now) and the effect of whom the respondent compared themselves to in answering a question (None Specified; Compared with Others in the US; Compared with 20-Year-Olds).
One thousand four hundred seventy-one community adults aged 20 through 70 years, who were members of an internet panel, responded to 1 of 9 questionnaires formed by crossing the 2 context variables. A significant effect of Reporting Period was observed indicating that higher levels of Pain and Fatigue were associated with the 1-Month reporting period. When no reporting period was specified, symptom levels were equivalent to the Right Now levels of symptoms. Reporting Period had no effect on the other outcomes. Educational level did not interact with these main effects, with 1 exception. None of the predicted effects were found for Comparison Group, although Pain was significantly associated with this factor.
Reporting period in survey questions is a factor that influences responses and should be considered by survey researchers in their study designs.
Self-reports of overall health status, symptomatology, and life satisfaction are frequently assessed outcome variables in clinical trials testing new interventions. They are also included in numerous national health surveys around the world to monitor population health. As is the case for all self-reports, respondents’ answers to questions about their health can be affected by contextual variables, including the wording of the question, the format of the response alternatives, and the nature of preceding questions. Although the cognitive and communicative processes underlying self-reports in general are increasingly well understood (for overviews see Refs. 1–5), much remains to be learned about their impact on health reports. The present research contributes to this goal by exploring how different reporting periods and comparison frames influence self-reports of health and symptomatology.
Health inventories use reporting periods of different lengths, ranging from “right now” or “today” to “last week,” or “last month.” Other inventories specify no reporting period at all, leaving it up to respondents. Little is known about the time frame that respondents are likely to consider spontaneously in the latter case.
Several lines of research converge on the prediction that longer reporting periods will foster reports of more severe and intense symptoms.5 First, retrospective reports of the intensity of subjective experiences over extended periods of time are influenced by 2 salient and memorable moments, namely, the peak of the experience (eg, the moment of most intense pain), and its ending.6 Because longer reporting periods have a higher chance of including more intense moments,7 they tend to yield self-reports of more intense symptoms—even when the question asks about “usual” or “typical” level of the symptom and the extreme episode is not representative. Second, compounding this effect, memory for mild experiences decays faster than memory for intense experiences,8 which attenuates the contribution of milder episodes as time passes. Finally, respondents assume that the researcher is interested in relatively frequent experiences when the reporting period is short and recent (eg, yesterday), but wants to learn about less frequent experiences when the reporting period pertains to a longer timeframe (eg, last year9). All of these influences support the prediction that longer reporting periods result in reports of more intense symptoms.
Consistent with this prediction, we have previously observed that questions referencing “last month” result in a 15% increase in pain intensity ratings, and a 10% increase in fatigue intensity ratings, relative to questions referencing respondents’ immediate (“right now”) symptoms.10 The present research builds on this finding and explores its robustness and limitations; it also addresses which time frame respondents consider when the question does not specify a reporting period.
All evaluative judgments are made relative to some standard, including evaluations of one’s health. When a respondent reports that her health is “very good,” a crucial question is: relative to what or to whom? Ubel et al11 asked a group of respondents of the Health and Retirement Study (HRS), aged 50 and older, “How would you rate your current health on a scale from 0 to 100 in which 0 represents death and 100 represents perfect health?” A second group had a version of the question with the added phrase “for someone your age” to this standard wording, and a third group had the added phrase “for a 20-year-old.” Despite health equivalence among the 3 groups (established on the basis of other health data from the HRS interview), respondents reported significantly lower current health ratings when the question introduced a “20-year-old” as the comparison standard (mean = 66) than in the 2 other groups (both mean = 73). This pattern (1) indicates that the specification of a comparison group influences the answers obtained and (2) suggests that respondents spontaneously use others their age as the relevant frame of reference when none is specified. Given that respondents’ comparisons are based on their own age group, their health reports should capture health differences within their age group, but not differences between age groups, unless a specific fixed-age comparison is specified for all respondents. Confirming these predictions, (3) those designated as sick from the HRS Activities of Daily Living data reported poorer self-reported health in all age groups, independent of the wording of the question, whereas (4) expected age-related declines in self-reported health were only observed when the question specified 20-year-olds as the comparison, that is, when the question included a fixed-age comparison.
The present study builds on the above findings in several ways. First, we examined the influence of explicit/implicit comparison groups and reporting periods on commonly used and well-validated health, symptom, and satisfaction questions to evaluate the practical consequences of previously identified contextual influences. It replaces the comparison with 20-year-olds with a less extreme standard, asking respondents to compare themselves to the US population. This somewhat abstract, but presumably more relevant, standard may avoid some of the problems associated with an extreme standard, like healthy 20-year-olds. Second, we extended the content of the questions to include specific symptoms (Pain and Fatigue) and Life Satisfaction. Our health question is a version of one commonly used in clinical trials; unlike the question used by Ubel et al, it does not include the descriptor “perfect” health. Third, we broadened the population of respondents to include younger adults, covering the age range of 20–70 years old. These decisions resulted in a factorial study that crossed 3 levels of comparison group (none specified; compared with others in the United States; compared with others your age) with 3 reporting periods (none specified; right now; last month). The study design embedded these 9 groups into each of 3 age categories of respondents (20–30, 40–50, and 60–70 years of age).
With regard to reporting periods, our primary hypothesis was that longer reporting periods would result in reports of higher symptom intensity. Specifically, we predicted that respondents would report more intense pain and fatigue when the question pertains to “last month” rather than “right now.” Some of the standard forms of both questions do not specify a reporting period and it is unknown whether respondents draw on their current symptoms on a more extended time period when answering the standard questions. A comparison of the questions without a specified reporting period with the “right now” and “last month” questions would provide insight into this issue.
The content of the self-report question has implications for the reference period. Questions pertaining to general states of health or satisfaction by definition reference longer periods of time.12 Hence, we expected the reporting period to exert little influence on respondents’ answers to these questions. In contrast, questions about the severity of one’s symptoms could plausibly pertain to the current moment as well as to more extended periods.
With regard to comparison information, our primary hypothesis was that respondents would compare themselves to others their age when no comparison is specified, which would replicate Ubel et al,11 but they would take different age groups into account when asked to do so. Extreme standards may cloud differences of interest, because respondents view themselves at the opposite end of the distribution, which effectively compresses their ratings. We explored the usefulness of an “average” standard and ask respondents to compare themselves to the US population. The impact of this request should depend on respondents’ age: older respondents (aged 60–70) should assume that they are above the age of the comparison group, resulting (on average) in reports of lower health and more severe symptoms than in the other age groups; conversely, younger respondents (aged 20–30) should assume that they are below the age of the comparison group, should result in reports of better health and less severe symptoms; and, middle-aged respondents (aged 40–50) should fall in between these extremes. It is conceivable, however, that most respondents may think of similar others when they think of the population in general, which may render a population standard unsuitable for general population surveys; the present data will illuminate this issue.
The study follows a 3 (age of respondents: 20–30; 40–50; 60–70 years) × 3 (reporting periods: none specified; last month; right now) × 3 (comparison group: none specified; others your age; US population) factorial design. We used an established internet panel maintained by Polimetrix (Stanford, CA) for this study. The Polimetrix panel includes over 1 million individuals and is widely used in public opinion and market research. As is typical for internet surveys, the entire panel provides a broad cross-section of the US population, but its representativeness is difficult to establish because of potential self-selection problems. The survey firm sent out offers of participation according to a prespecified sampling plan that included the 3 age groups and randomly distributed the 9 versions of the questionnaire (formed by crossing 3 reporting periods and 3 comparison group instructions) within each age group. We contracted to receive 150 completed web-based questionnaires for each of the 9 groups, yielding a total of 1350 participants; in fact, the sampling procedure resulted in a final N of 1471. Information about the response rate is not available. Note, however, that these limitations are of minor importance for a randomized experiment, where the key interest is in differences between conditions. As shown in Table 1, randomization across conditions was successful.
Nine questionnaires were constructed using the following structure as the basic template. Five questions concerning age, gender, zip code, and the current day of week and time of day preceded the questions of interest. For the primary 4 outcome questions, we borrowed the stem of the item from well-established assessments and then altered the comparison and reporting period by adding short phrases to the stem. From the SF-3613 we took the question, “In general, would you say your health is:” followed by 5 response options: Excellent, Very Good, Good, Fair, or Poor. From the same questionnaire we also took, “How much bodily pain do you have?” which had 6 response options: None, Very Mild, Mild, Moderate, Severe, or Very Severe. From the Functional Assessment of Cancer Therapy Fatigue Scale14 we took, “How true is the following statement: I feel fatigued” which had 5 options: Not at All, A Little Bit, Somewhat, Quite a Bit, and Very Much. Finally, from the Diener Life Satisfaction Scale15 we took, “How satisfied are you with your life as a whole?” which was presented with a 10-point numeric rating scale anchored with Completely Dissatisfied (1) and Completely Satisfied (10).
For the comparison group manipulation, the first level was a “general” form of the questions, which does not specify a comparison group (General). The second level used “Compared with Others in the United States,” as a prefix (US Comparison). The third level included the phrase “Compared with Others your Age” (Same Age Comparison). For the Reporting Period manipulation there were also 3 levels. The first level was no specific timeframe (Not Specified), the second condition specified “Over the Last Month” (Last Month), and the third condition specified “Right Now” (Right Now).
These questions and a brief description of the study (approved by the Stony Brook University Institutional Review Board) were converted by Polimetrix into a web questionnaire using their proprietary systems.
The average age of respondents was 43.8 ± 16.6 years, 52% of the sample was male, and the sample was highly educated with 95% having graduated from high school and 36% having graduated with at least a 2-year college degree (Table 1); the issue of education is considered below. There was very little difference in age, sex, and education among the 9 cells of the design. Regarding the age groupings, 546 (37%) were in the 20–30 year old group, 462 (31%) in the 40–50 year old group, and 463 (32%) in the 60–70 year old group. All statistical analyses were performed with SAS.
The means and standard deviations for the outcome variables are: for Health, 2.45 ± 1.03 (1–5 point scale, lower is better health); Pain, 2.52 ± 1.19 (1–6 point scale, higher is more pain); Fatigue, 1.44 ± 1.08 (0–4 point scale, higher is more fatigue); and, Life Satisfaction, 6.80 ± 1.93 (1–10 point scale, higher is more satisfaction). Hypotheses were tested by examining results from 2-way analyses of variance (ANOVAs) with Comparison Group and Reporting Period as the 2 between-subject factors followed by the addition of the Age Grouping variable and its interactions with the other factors and by post hoc contrasts. Table 2 presents the results of these analyses. Because none of the interaction terms containing Age Grouping were significant, they have been omitted from the model and from Table 2.
We predicted that reported symptom intensity would increase with the length of the reporting period, resulting in higher reports of Pain and Fatigue for the Last Month than for the Right Now group. This hypothesis was tested with single degree contrasts between the 2 groups embedded in a Time-frame by Comparison Group model and was confirmed: the mean level of Pain was 2.67 in the Last Month condition and 2.43 in the Now condition, a significant difference [F (1, 1468) = 9.68; P < 0.002]. The reported mean level of Fatigue was 1.56 in the Last Month condition and 1.39 in the Now condition, also a significant difference in means [F (1, 1468) = 5.89; P < 0.02]. Of particular interest was the finding that participants responded to survey items in the condition that did not specify a reporting period in the same way as they did for the Right Now reporting period [see Table 2 with results of an overall 2-factor (Reporting Period and Comparison Group) ANOVA and simple effects].
We further predicted that reporting period would not affect respondents’ answers to questions that inherently specify a longer timeframe, namely, general Life Satisfaction, and overall health status. Consistent with this prediction, no significant effects of reporting period (Last Month vs. Now) were observed for these measures [Satisfaction: F (1, 1468) = 0.02, P = ns; Health: F (1, 1468) = 0.00, P = ns].
We predicted that older respondents would report more, and younger respondents less, symptom intensity when the question instructed them to compare their symptomatology with the US Comparison versus to others of their own age. In the context of an ANOVA, this hypothesis predicts a 2-way interaction between Age Group and Comparison Group. This analysis was computed by creating a contrast for the Age Group by Comparison Group interaction by excluding the No Reference. For the symptoms Pain and Fatigue, the predicted interactions were not observed, F (1, 1460) = 0.01, P = ns; and F (1, 1460) = 0.04, P = ns, respectively (see Table 3 for means).
Although we had an a priori hypotheses about the symptom questions, an exploratory hypothesis associated with the comparison instructions concerned the effect of Comparison Group on Health and Life Satisfaction. There was no evidence of interaction effects for these variables either; F (2, 979) = 0.15, P = ns; and F (2, 979) = 0.03, P = ns, respectively. A second exploratory question concerned the performance of the survey items that did not specify a comparison frame. Respondents’ age is not relevant to this question, and therefore 2-way ANOVAs were performed to gauge the effect of Comparison Group (with Reporting Group and Age Category as covariates in the model) and Bonferonni-protected t tests were used to assess group differences. No significant main effects were observed for Fatigue, Life Satisfaction, or Health. There was, however, a significant effect for Pain. Post hoc tests (Table 2) indicated that pain reports were higher when participants compared themselves with Same Age others rather than the US population or when no comparison was specified; the latter 2 conditions did not differ from one another.
Our respondent sample was highly educated; it is therefore important to examine whether or not the findings apply to both well-educated and less well-educated samples. To address this question we created a new variable based on educational level, which was coded 0 as high school education or less (n = 377) and 1 if greater than high school education (n = 1088) to achieve a reasonable N in each group. The main findings reported above were tested as interactions with this new variable; significant interactions would be interpreted as showing a difference in effects by education level.
Regarding the primary timeframe effects, there was no indication of an education interaction with timeframe (Past Month vs. Now) for Pain [F (1, 1447) = 0.04, P = 0.85], Fatigue [F (1, 1447) = 0.07, P = 0.79], or Health [F (1, 1447) = 0.97, P = 0.32], indicating no difference in the effect of timeframe by education level. However, for Satisfaction a significant interaction was found [F (1, 1447) = 4.55, P = 0.04]. Timeframe did not influence satisfaction ratings for the more highly educated group (Past Month 7.2 vs. Now 7.3), whereas more satisfaction was associated with the longer recall period (Past Month 7.1 vs. Now 6.6) for the less educated group, paralleling the pattern we observed for Pain and Fatigue for the whole sample.
Regarding the Comparison Group results, an Age Group by Comparison Group interaction comprised the original test, which was consistently negative, as reported above. A 3-way interaction was computed to determine if education level moderated the results. None of the interactions with education were significant [Pain, F (1, 1445) = 1.87, P = 0.17; Fatigue, F (1, 1445) = 1.44, P = 0.23; Satisfaction, F (1, 1445) = 2.55, P = 0.11; Health, F (1, 1445) = 2.49, P = 0.12].
We first consider the influence of Reporting Periods. Respondents reported higher levels of pain and fatigue when the symptom questions pertained to Last Month rather than Right Now, consistent with predictions. Last Month pain level was about 10% higher than pain Right Now, and Last Month fatigue was about 12% higher than Right Now fatigue. These differences in Pain and Fatigue reports are associated with effect sizes of Cohen’s d = 0.20 and 0.17, respectively (mean difference divided by the standard deviation). These effect sizes indicate “small” effects, which could nevertheless impact the interpretation and significance of a study. The degree of difference was similar to that observed previously by our research group.10 This observation presumably reflects the operation of multiple processes as reviewed above. Second, some versions of pain and fatigue questions do not specify a reporting period, leaving it up to respondents to determine which timeframe the question refers to. Our findings suggest that respondents resolve this issue by drawing on the most accessible information, namely, their current symptomatology. This is reflected in the lack of difference between the No Timeframe and the Right Now versions of the pain and fatigue questions. With only 1 exception (the Satisfaction question by Reporting Period, where the pattern for the less educated group paralleled the results for Pain and Fatigue), these results were consistent across education level, an important consideration given the high education level of this sample. Finally, survey items that pertain to states/conditions that are commonly assumed to extend in time were not affected by the reporting period specified in the question. This was the case for self-ratings of general health status as well as for reported satisfaction with one’s life as a whole.
These findings have important applied implications. They suggest that questions with an unspecified reporting period may capture current symptomatology. If this is the desired timeframe, it would be preferable to specify so. These results also indicate that studies using different reporting timeframes will not yield comparable data and, perhaps more importantly, that the same timeframe should be used for all questions within a study to ensure comparability.
Turning to the effect of comparison groups, we failed to observe any influence of asking respondents to compare themselves to others their age versus to the US population. We expected that respondents would spontaneously compare themselves to others their age when no specific comparison group was specified, as observed previously by Ubel et al.11 The largely identical answers in the unspecified and others-your-age conditions are similar to Ubel et al’s findings. However, we also expected that younger respondents would evaluate their health more favorably, and older respondents less favorably, when asked to compare themselves to the US population. This was not the case, suggesting that all respondents may see themselves as relatively similar to the population, essentially exhibiting a false consensus bias.16 We conclude from this observation that any attempt to standardize the comparison group will need to draw on an extreme standard, as was the case in the Ubel et al12 study, which asked for comparisons with a “healthy 20-year-old” and included “perfect health” as a scale anchor, further increasing the extremity of the positive endpoint.
As Ubel et al12 concluded, using a standard that is shared by all respondents is important to ensure that answers are comparable across groups. When a 40-year-old and an 80-year-old describe their health as “good,” they are probably describing very different health states, but arrive at the same rating based on comparisons with others of their own age. To render their reports comparable, we must ensure that they are using similar standards; in fact, Ubel et al12 only observed age-related declines in self-reported health when such a standard was specified. Future research should explore which standards can serve this purpose without imposing the disadvantages associated with extreme standards.
Finally, there are several limitations of this study that should be noted. The sample was based on a highly educated internet panel and we have no information about nonresponders to the questionnaire. These sampling issues raise a question about the generalizability of the findings. The results would be more robust if cognitive interviews had been conducted after completion of the questionnaire to determine how questions were interpreted by participants, and we recommend this technique for subsequent studies. Such information would have been especially useful for understanding responses to the reference groups comprising the Comparison Group factor and for interpreting the meaning of the questions to demographically distinct groups.