|Home | About | Journals | Submit | Contact Us | Français|
A positive outcome in self-reported behavior could be detected erroneously if an intervention caused over-reporting of the targeted behavior. Data collected from a multi-site randomized trial were examined to determine if adolescent girls who received a physical activity intervention over-reported their activity more than girls who received no intervention.
Activity was measured using accelerometers and self-reports (3-Day Physical Activity Recall, 3DPAR) in cross-sectional samples pre- (6th grade, n=1464) and post-intervention (8th grade, n=3114). Log-transformed accelerometer minutes were regressed on 3DPAR blocks, treatment group, and their interaction, while adjusting for race, BMI, and timing of data collection.
Pre-intervention, the association between measures did not differ between groups, but post-intervention 3DPAR blocks were associated with fewer log-accelerometer minutes of moderate-vigorous physical activity (MVPA) in intervention girls than control girls (p = 0.002). The group difference was primarily in the upper 15% of the 3DPAR distribution, where control girls had >1.7 more accelerometer minutes of MVPA than intervention girls who reported identical activity levels. Group differences in this sub-sample were 8.5–16.2% of the mean activity levels; the intervention was powered to detect a difference of 10%.
Self-report measures should be interpreted with caution when used to evaluate a physical activity intervention.
Self-report measures of physical activity are a popular means of evaluating physical activity interventions because they are more feasible and less expensive than objective measures of activity (e.g. accelerometers). Among the 76 physical activity intervention studies in a recent review by Salmon et al., 51 relied exclusively on self-report measures (1). Some experts have argued, however, that self-report measures are insufficient for intervention studies because of their potential for misclassification (2, 3). Jacobs et al. reviewed ten of the most commonly used activity self-report questionnaires and found that all were poorly correlated with accelerometer-measured activity (4). Sirard et al. reviewed common self-report measures of activity in children and adolescents and found a wide range of correlations with objective measures of activity (r = −0.10 to 0.88) (5).
Observational studies of self-reported physical activity often acknowledge such misclassification as a limitation, but argue that it is non-differential and assume that effect estimates are conservative. The same logic cannot necessarily be applied to intervention studies because differential misclassification could result from social desirability bias induced by the intervention. Social desirability is “the defensive tendency of individuals to respond in a manner that is consistent with social norms or beliefs” (6). Comparisons of self-reported to objective measures of physical activity have indicated that social desirability is associated with over-reporting among females (7, 8). Interventions, in their effort to encourage individuals to change behavior, may promote social desirability and inadvertently increase over-reporting of that behavior.
Such differential misreporting has been noted in behavioral intervention studies aimed at changing diet behaviors. Espeland et al. (9) reported that participants who were randomized to receive a sodium reduction and weight loss intervention under-reported their sodium intake more than participants randomized to other groups. Harnack et al. (10) similarly found that girls randomized to receive an obesity prevention intervention under-reported their fat and total energy intake compared to control girls.
To our knowledge, no study has examined whether differential over-reporting of physical activity occurs within trials that test interventions to promote physical activity. Identification of such systematic error would contribute to our knowledge of the cognitive and social processes that underlie self-reports of physical activity, and could lead to improvements in the design of instruments (11, 12). Furthermore, if the bias were sufficiently large it could lead to incorrectly concluding that an intervention was efficacious. The purpose of this study was to determine if adolescent girls randomized to receive a physical activity intervention over-reported their activity levels at the conclusion of the intervention compared to girls who received no intervention.
The data were obtained as part of the Trial of Activity for Adolescent Girls (TAAG). TAAG was a multi-center, group-randomized controlled trial, initiated by the National Heart, Lung, and Blood Institute (NHLBI), designed to test an intervention to reduce the decline of physical activity levels in middle school girls (13). TAAG was a collaborative trial among six field centers (the Universities of Arizona, Maryland, Minnesota, and South Carolina, San Diego State University, and Tulane University), the coordinating center at the University of North Carolina, Chapel Hill, and the NHLBI. A Data and Safety Monitoring Board provided oversight and performed an advisory role. Six middle schools were recruited from each field center, for a total of thirty-six schools.
After baseline measurements were collected, three schools at each of the six field centers were randomized to intervention and three to the control condition. The intervention was based on a social-ecological framework (14); intervention activities focused on promoting environmental and policy changes that supported physical activity, training physical education teachers to increase active participation of girls in class, teaching behavioral skills associated with activity in other classes, promoting collaborations with community agencies to increase activity outside of school, and providing cues, messages, and incentives within the school to encourage girls to be active. Further details of the intervention strategies are provided by Elder et al. (14) The intervention component that was directed by TAAG staff began in fall 2003 and lasted through spring 2005, at which time the primary outcome data were collected. The intervention was continued by school and community personnel through spring 2006, but data analyses for this paper only included data through 2005.
Data were obtained from independent cross-sectional samples of girls recruited for measurement at baseline (6th grade, spring 2003) and follow-up (8th grade, spring 2005). Prior to the intervention, a sample of 60 girls per school was randomly selected by the coordinating center and invited to participate in data collection. Of the 2,160 selected, 1,721 consented and participated in data collection (80%), and 1,464 provided complete physical activity data from days that were measured by both accelerometer and self-report (68%). Following the intervention, a sample of 90–120 girls per school, depending on the school size, was randomly selected. Of the 4,123 girls selected, 3,504 consented and participated in data collection (85%), and 3,114 provided complete activity data from days that were measured by both accelerometer and self-report (76%). Parental consent and student assent were obtained prior to each measurement period.
The MTI Actigraph® accelerometer (Ft. Walton Beach, FL) was used to measure physical activity objectively. The MTI Actigraph has been the most widely used accelerometer in youth (15–18). Trost et al. reported high correlations between MTI counts and energy expenditure as measured by treadmill ambulation in children age 10–14 (r = 0.87) (18). Janz reported that MTI counts showed moderate to strong correlation with heart rate telemetry in children age 7–15 in uncontrolled, free-living conditions (r = 0.50–0.74) (15).
Activity was measured by the accelerometer in 30-second epochs. The count thresholds for moderate physical activity (MPA) were set at 1,500–2,600 counts 30 sec−1. This lower threshold is equivalent to 4.6 METs and was found to effectively distinguish between slow and brisk walking (19). Activity measured at >2600 counts 30 sec−1, or 6.5 METS, was classified as vigorous physical activity (VPA). This threshold was found to effectively distinguish between brisk walking and running (19).
Participants reported their engagement in specific types of physical activity using the Three-Day Physical Activity Recall (3DPAR). Participants recalled the activities performed over the previous three days, choosing from a list of 70 activities (16, 20). Each day is divided into eighteen 30-minute segments (“blocks”) and the participant reports the “main activity” performed during each block. Participants could report the activity even if they were not engaged for the full 30 minutes. Participants also report the perceived intensity of the activities, choosing from “light,” “moderate,” “hard,” and “very hard.” An algorithm that combined perceived intensity with MET values for each activity (21) was used to identify blocks that were “moderate” or “vigorous.” The 3DPAR is adapted from the Previous Day Physical Activity Recall, which had the highest correlation with objective measures of activity among youth (r = 0.88) of any self-report measure reviewed by Sirard et al. (5). Pate et al. reported a significant, though lower, correlation between the 3DPAR and accelerometer in adolescent girls (r = 0.28–0.46) (20).
Each girl was instructed to wear an accelerometer during waking hours for seven consecutive days. They were instructed on its use and care, and told to remove it only for sleep, for activities in which it could get wet, or when competitive sports required its removal. Accelerometers were initialized to begin collecting data at 5:00 AM on the day after they were distributed, providing six complete days of data. Upon completion of the seven-day period, monitors were returned and participants completed the 3DPAR under the supervision of a trained research assistant who was not involved in delivery of the intervention. All assessments occurred at the student’s school, and all procedures were approved by institutional review boards.
Regression analyses based on the general linear mixed model were conducted to test whether the association between activity measures differed between treatment groups. To account for group randomization, school and site were included in all models as random effects. All other variables were treated as fixed effects. All analyses were conducted with SAS version 9.1 (SAS Institute, Cary, NC).
For each girl, only data from days that were measured by both the 3DPAR and accelerometer were included. We computed each girl’s mean daily moderate to vigorous physical activity (MVPA) and vigorous physical activity (VPA), as measured by the 3DPAR and accelerometer separately. Accelerometer minutes were regressed on 3DPAR blocks, treatment group, body mass index (BMI), race/ethnicity, the % of measured days that were on the weekend (Saturday or Sunday), and an interaction term between 3DPAR blocks and treatment group. All variables were modeled as continuous variables except race/ethnicity, which was categorized as non-Hispanic White, non-Hispanic Black, Hispanic, or non-Hispanic Other. MVPA and VPA were analyzed separately using identical models. The residuals from these models were found to be highly skewed, and thus a log transformation was performed for accelerometer minutes after adding 0.1 to each girl’s number of minutes. Analyses were repeated with log-accelerometer minutes as the dependent variable.
Characteristics of intervention and control girls at both time points are presented in Table 1. Ages ranged from 10–14 at baseline, with 94.3% of girls in the baseline sample age 11 or 12. Similarly, age ranged from 12–16 at follow-up, with 95.6% of girls age 13 or 14. Distributions of age were similar between treatment groups. The samples were diverse, with 45.0% non-Hispanic White, 21.5% non-Hispanic Black, 21.7% Hispanic, and 12.0% “non-Hispanic Other” at baseline. Racial distributions were similar at follow-up.
Mean activity levels, as measured by the 3DPAR and accelerometer, are presented in Table 2. The association between self-reported and objectively-measured activity was low, as the baseline correlation coefficient between measures was 0.17 for MVPA and 0.14 for VPA in the total sample. The association between measures improved slightly over time overall, but showed different temporal patterns across treatment groups. Correlation coefficients doubled from baseline to follow-up among control girls, but changed little among intervention girls.
Table 3 presents results of the multilevel regression analyses that tested for differential association between 3DPAR and accelerometer. In all analyses, the association between 3DPAR and accelerometer was highly significant (p < 0.0001). Before the intervention, one 3DPAR block of MVPA was associated with 0.041 log-accelerometer minutes of MVPA in the control group and 0.063 log-accelerometer minutes of MVPA in the intervention group. The difference between groups was not statistically significant (p = 0.10). Associations were slightly higher for VPA – one 3DPAR block of VPA was associated with 0.073 and 0.094 log-accelerometer minutes in the control and intervention groups, respectively, but again the group difference was not statistically significant (p = 0.41).
After the intervention, the trend across groups reversed as the association between measures became significantly lower in the intervention group compared to the control group for both MVPA and VPA. One 3DPAR block of MVPA was associated with 0.078 and 0.049 log-accelerometer minutes in the control and intervention groups, respectively. One 3DPAR block of VPA was associated with 0.129 and 0.084 log-accelerometer minutes in the control and intervention groups, respectively.
The log-transformation substantially improved the model fit to provide a more valid test of a group difference, but using log-minutes as the outcome complicates the interpretation. It is difficult to evaluate whether such a difference in log-minutes would be sufficient to obscure any treatment effect. To generate a more meaningful interpretation, we exponentiated the distribution of log-minutes predicted by the model to estimate the group difference in accelerometer minutes predicted by the 3DPAR. Figures 1 and and22 illustrate the results of this analysis. Throughout the 3DPAR distribution, in increments of 5 percentile points, we calculated the predicted log-accelerometer minutes for white, non-Hispanic girls at the median BMI and median % of weekend days in each treatment group. At each point in the distribution, we used the predicted value and the standard deviation of the residuals from the model to generate a normal distribution of log-accelerometer minutes. We then exponentiated every observation in this distribution to generate a distribution of accelerometer minutes, and used the mean of the latter distribution as the estimate of accelerometer minutes. Figures 1 and and22 illustrate the estimates for each treatment group to demonstrate the absolute group difference in the association between measures.
In Figure 1, the curves began to diverge at approximately the 60th percentile of the 3DPAR, indicating that intervention girls who reported 1.33 or more blocks of MVPA/day had fewer accelerometer minutes than control girls who reported the same amount. The gap between treatment groups steadily increased until the 85th percentile (3 self-reported blocks/day), at which time it rapidly increased. Among girls at the 85th percentile, intervention girls had 1.77 fewer accelerometer minutes/day than control girls (22.44 vs. 24.21 minutes), or 8.5% of the mean activity level in the total sample. At the 90th and 95th percentiles (4 and 5 blocks/day, respectively), the group difference increased to 2.44 and 3.39 minutes, respectively (11.7 and 16.2% of the mean activity level, respectively). The patterns were similar for VPA (Figure 2), as intervention girls who reported 0.67 or more VPA blocks per day (60th percentile) had fewer accelerometer minutes than control girls who reported the same amount. The group difference in accelerometer minutes of VPA at the 85th, 90th, and 95th percentiles (2, 2.67, and 3.67 self-reported blocks/day, respectively) increased from 0.38 to 0.54 to 0.92 minutes/day, respectively. These differences are equivalent to 7.7, 11.0, and 18.7% of the mean activity level, respectively, in the total sample.
Secondary analyses were conducted to determine if the two activity measures would lead to different conclusions about the effect of TAAG. Using follow-up data, the measures were individually regressed on treatment group, BMI, race/ethnicity, and % of weekend days. These models tested for a group difference in physical activity following the intervention for each measure. Table 4 presents the results of these analyses. As reported elsewhere (22), accelerometer results suggested that control girls were slightly more active, but the group difference was non-significant. 3DPAR results suggested that intervention girls had slightly more VPA while control girls had slightly more MVPA, but again the group differences were non-significant.
In this study of adolescent girls, a physical activity intervention appeared to cause differential measurement error between the treatment groups. Despite using a self-report instrument that had been previously validated, girls in the intervention group over-reported their activity levels significantly more than girls in the control group.
One of the purposes of the TAAG intervention was to create an environment where physical activity among girls is more socially acceptable (23). Policy, organizational, and environmental changes were designed to provide opportunities for girls to be active, while enhancing social support through encouragement from school staff, community organizations, and peers (14). Promoting activity as a social norm may have the unfortunate byproduct of promoting over-reporting, given that females who exhibit social desirability traits are prone to over-report their activity (7, 8). Other studies have found a similar intervention-induced bias in self-reported diet (9, 10), and this type of bias may occur in other types of behavioral interventions.
When model results were used to predict the number of accelerometer minutes throughout the 3DPAR distribution, it suggested that differential over-reporting was limited to girls above the 60th percentile in the 3DPAR distribution. The significant group difference thus appeared to be due to a minority of intervention girls over-reporting by a relatively large amount rather than the entire sample over-reporting by a small amount. Above the 85th percentile, the group difference was approximately 8–19% of the mean activity level in the total sample; TAAG was powered to detect a 10% difference in activity between treatment groups (24). Overall, however, the magnitude of over-reporting in the intervention group was not large enough to change the conclusion about the intervention’s effect, which was non-significant according to both self-report and accelerometer.
Strengths of this study included the use of both self-report and objective measures that are widely used and have been validated in other samples. This study also utilized female- and age-specific cut-points for different activity intensity levels, based on data from the study population rather than relying on cut-points in the literature that are largely based on mixed-sex samples or a wider age range (25). Finally, our study sample was from a multi-site intervention and was diverse in terms of both race/ethnicity and socioeconomic status.
A key limitation of the study was that the 3DPAR required participants to only identify the “main activity” of each 30-minute block. A self-reported block could represent any portion of 30 minutes, and thus we could not compare the absolute quantity of activity, as assessed by self-report and accelerometry, on a uniform scale. We can, however, conclude that when comparing girls who reported the same level of activity, intervention girls had less objectively measured activity than control girls after the intervention. The correlation between measures may have been limited by the narrow range of values on the 3DPAR, but this is unlikely to induce a group difference unless activity levels differed between groups. The baseline correlation may have also been limited because the accelerometer thresholds were based on a sample of 13–14 year-old girls, and thus may have been less valid in a younger sample. Finally, the accelerometer cannot capture certain types of activity (e.g. swimming, cycling) and relies on participant compliance. We examined self-reported swimming and cycling participation in 8th grade by treatment group to determine if group differences in these activities could account for our results, and found no such differences in participation.
The results of this study further question the use of self-report activity measures that have been doubted by experts (2–5). Future intervention studies should not dismiss measurement error from self-report instruments as non-differential and assume that any effect estimates are conservative. Evidence from self-report instruments indicating that an intervention produced behavior change should be interpreted with caution as it is possible that the intervention promoted over-reporting of the targeted behavior. At a minimum, investigators should acknowledge this potential bias as a weakness and be careful to differentiate between objectively measured and self-reported results (3).
This study was supported by funds from the NHLBI grant #s HL66845, HL66852, HL66853, HL66855, HL66856, HL66857, HL66858.
The authors thank the faculties and staffs of the 36 schools that participated in the trial, as well as the investigators and support staff at the various study sites. The Project Office at the National Heart, Lung, and Blood Institute was an equal partner in the design and implementation of the trial and in the analysis and interpretation of the data. They also approved the final version of the manuscript.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.