|Home | About | Journals | Submit | Contact Us | Français|
The primary study aim was to evaluate associations of estimated weekly minutes of moderate-to-vigorous intensity exercise from self-reports of the telephone-administered 7-Day Physical Activity Recall (PAR) with data captured by the RT3 triaxial accelerometer.
This investigation was undertaken as part of the FRESH START study, a randomized clinical trial that tested an iteratively-tailored diet and exercise mailed print intervention among newly diagnosed breast and prostate cancer survivors. A convenience sample of 139 medically-eligible subjects living within a 60-mile radius of the study center provided both 7-Day PAR and accelerometer data at enrollment. Ultimately n=115 substudy subjects were found eligible for the FRESH START study and randomized to one of two study treatment arms. Follow-up assessments at Year 1 (n=103) and Year 2 (n=99) provided both the 7-Day PAR and accelerometer data.
There was moderate agreement between the 7-Day PAR and the accelerometer with longitudinal serial correlation coefficients of .54 (baseline), .24 (Year 1) and .53 (Year 2), all P-values < .01, though the accelerometer estimates for weekly time in moderate-to-vigorous physical activity were much higher than those of the 7-Day PAR at all time points. The two methods were poorly correlated in assessing sensitivity to change from baseline to Year 1 (rho=.11, P=.30). Using mixed models repeated measures analysis, both methods exhibited similar non-significant treatment arm X time interaction P-values (7-Day PAR=.22, accelerometer=.23).
The correlations for three serial time points were in agreement with findings of other studies that compared self-reported time in exercise with physical activity captured by accelerometry. However, these methods capture somewhat different dimensions of physical activity and provide differing estimates of change over time.
The United States Centers for Disease Control and Prevention (CDC) proposes that every demographic and social group in America can benefit from participating in moderate intensity physical activity (6). Recent studies have suggested that individuals who engage in a consistent regimen of moderate intensity activities can realize substantial improvements in weight control and lipoprotein profiles (27). Further, the Health and Human Services 2005 Dietary Guidelines Advisory Committee recommends that Americans engage in at least 30 minutes of moderate-intensity physical activity, above usual activity, at work or home on most days of the week (10). Updated joint recommendations from the American College of Sports Medicine and the American Heart Association suggest that adults should exceed the previous guidelines in order to improve fitness and reduce risk for chronic disease and prevent unhealthy weight gain (18). In the fall of 2008, it is anticipated the first National Guidelines for Physical Activity for Americans will be adopted and that these will reflect these aforementioned recommendations for regular physical activity of at least moderate intensity for a minimum of 150 accumulated minutes per week above normal activities of daily living. Clearly, improving the adherence to these guidelines is a priority, as are methods to monitor adherence within populations of interest.
To capture the dimensions of frequency, duration and intensity of exercise, researchers often employ self-report measurements and accelerometry-based technologies. Self-report of exercise is utilized frequently because it is relatively inexpensive to capture and can be administered via multiple modalities, namely by face-to-face or telephone interview, respondent-completed mail or Internet-administered instruments. Among the self-report instruments are those that have been developed for specific populations, like older adults (for example, the Community Healthy Activities Model Program for Seniors or CHAMPS (28) and children (31). In addition, there are self-report instruments such as the 7-Day Physical Activity Recall (7-Day PAR) (25) that have been used in various populations and widely validated. The primary limitation of any self-report measure is the inaccuracy and bias inherent with subject recall (12). Subgroups and individuals may differ in their activity patterns and cognitive abilities, which adds varying amounts of measurement error to the self-reported estimate of exercise frequency, duration and intensity.
Accelerometry-based technology is a frequently-used methodology to objectively measure physical activity. The use of accelerometry-based physical activity monitors in research studies has been increasing in recent years, and while the technology has improved in terms of providing objective estimates of exercise frequency, duration and intensity, questions remain regarding the reliability, accuracy and ease of use of accelerometer technology in a research environment and in field studies (14, 23).
The primary aim of the present study was to evaluate the association between estimated weekly minutes of exercise from a self-reported instrument (7-Day PAR) and an accelerometer (RT3 triaxial, Stayhealthy, Inc, Monrovia, CA). This study was undertaken within a convenience sub-sample of subjects from the FRESH START study, a randomized clinical trial that tested an iteratively tailored diet and exercise print-based intervention delivered by mail (9). The goals of the FRESH START study were to increase fruit and vegetable intake, reduce total and saturated dietary fat intake and increase time spent in dedicated moderate or vigorous intensity exercise among cancer survivors. Secondary aims of the current study were to evaluate the sensitivity to change from baseline to the 1-year post-intervention period assessment for both the PAR and RT3 and to evaluate the effect of the FRESH START treatment arm on the 1 year measures of exercise in both the PAR and RT3.
The methods and main outcomes for this trial have been reported elsewhere (8, 9). In brief from July 2002 through August 2004, breast and prostate cancer survivors who were diagnosed with early stage (in situ, localized, or regional) cancer within the previous nine months were recruited to participate in FRESH START. Institutional Review Board approval was obtained from all sites involved in this study, and subjects were recruited from 39 states within the United States and two provinces in Canada. Survivors (n= 762) expressing interest to a letter of invitation or to study advertisements provided informed consent and completed a brief mailed screening instrument. Subjects indicating medical or physical conditions precluding unsupervised exercise (i.e., severe orthopedic conditions or imminent [within 6 months] hip or knee replacement, paralysis, end-stage renal disease, dementia, unstable angina, or recent heart attack, congestive heart failure or pulmonary conditions that require oxygen or hospitalization within 6 months) or a high fruit and vegetable diet (i.e., renal insufficiency or pharmacologic warfarin-use) were excluded. Subjects (n=154) who lived within a 60-mile radius of Duke University Medical Center (DUMC) and agreed to participate in the substudy reported to the General Clinical Research Center and were provided an accelerometer to wear for seven days. The accelerometer was programmed for seven days and the telephone survey was scheduled to administer the 7-Day PAR in order to capture data during the same time period. Subjects who reported practicing two or more study target behaviors were deemed ineligible: 1) eating 5 or more fruits and vegetables daily; 2) consuming a diet <30% of total energy from fat daily; and/or 3) exercising (moderate to vigorous intensity) for 150 minutes or more weekly. Due to RT3 unit malfunctions or misapplication of equipment by subjects, data from 15 out of the 154 subjects pre-randomization were not able to be included for analysis, leaving 139 eligible for analysis at the first time point (“baseline”). Although 24 of the 139 subjects were subsequently screened-out from the larger FRESH START trial due to ineligibility, data were retained for this cross-sectional substudy analysis. As a result of these ineligible subjects, data for the longitudinal substudy analysis was only potentially available on 115 subjects.
Eligible participants were randomly allocated to either the experimental (tailored materials) or the attention control (standardized materials) arm. Both study arms consisted of ten-month protocol featuring a set of seven periodic mailings aimed to improve diet and exercise behaviors. Participants in the intervention arm were mailed an initial workbook followed by seven tailored newsletters at six-week intervals. As detailed previously (9), the tailoring of these mailings was informed by demographic characteristics, cancer coping style (30), stage of readiness, barriers to lifestyle changes, and self-reported status in achieving the threshold level for the three study target behaviors, as described above. Each participant in the experimental group received two modules that pertained to lifestyle behaviors that were not practiced at goal level. Attention control participants received a FRESH START study workbook that included the “Facing Forward” booklet (National Cancer Institute) and were subsequently mailed non-tailored and generally available education materials on the benefits of eating a healthy diet and exercising (9). Between the 1 and 2 year points, there was no contact with the subjects.
At the baseline, Year 1, and Year 2 assessments, participants in both arms completed a two-part survey via telephone, with each part taking 45–55 minutes. The array of measures has been detailed previously (8, 9). Of interest to the current study is the self-reported measure of exercise.
The 7-Day PAR is a self-report recall instrument to assess physical activity (5, 25). The instrument was modified to be performed via a telephone interview where the interviewer recorded responses in a computer database. Respondents were asked to recall exercise sessions of moderate, hard, or very hard exercise (at least 5 metabolic equivalents [MET – kcal/kg/hr] levels) that were practiced for at least consecutive 10 minutes in duration during each of the previous 7 days. In addition, they are asked to recall how much they slept each night. The remaining time for the week was presumed to have been spent in light activities. For the FRESH START study, adherence to national exercise thresholds were expressed as the total number of moderate, hard and very hard minutes of exercise per week.
Subjects who agreed to participate in the substudy were asked to wear an RT3 Tri-axial Research Tracker accelerometer (Stayhealthy Inc., Monrovia, CA) for an entire seven-day period, with the exception of sleep time or engaging in activities that involved water (e.g., bathing, swimming). This seven-day period was scheduled to precisely overlap with the same seven days of activity that was captured via the 7-Day PAR. Participants in the substudy came to the Duke study site and were instructed in the use and care of the RT3 and then also provided with a postage-paid mailer in order to return the RT3 to the study office. At Year 1 and Year 2 follow-up, these procedures were repeated and subjects were provided with the same RT3 instrument (whenever possible). Upon mailed receipt, the RT3 data were downloaded to a study computer using a docking station provided with the accelerometer.
Data that was downloaded from the RT3 devices was stored as comma delimited files and converted to Microsoft Excel files. Triaxial (in three directions) activity was captured as “counts” by the RT3 devices, and recorded on a per minute basis. For each day there were 1440 records (60 minutes per hour for 24 hours), each with a count value that reflected the amount of activity captured by the RT3 for that minute. A SAS (SAS Inc, Cary NC) program was written specifically to convert RT3 counts to an array of measures that estimate energy expenditure on a daily summed basis. If it was determined by an analysis of the count data that the RT3 was either not worn by the subject or considered “not active” for more than 720 minutes in a day (i.e., half or more of the possible 1440 minutes) in 20 or more consecutive minute blocks then the RT3 data for that day were considered invalid and not included in the analysis. In general, monitors functioned properly, but on occasion the count data showed that either due to monitor malfunction or non-human movements from vibrations in the environment of the accelerometer (e.g., car, placed on top of clothes dryer) extremely high activity recordings were noted, and these data were considered invalid. As a result, at each time point (Baseline and Years 1 and 2) subjects could have a minimum of 0 and a maximum of seven valid days of RT3 data available for processing. Nearly 85% of the subjects provided four or more valid days of RT3 data. For these data, the final analytic outcome variable of total number of exercise minutes of moderate or higher intensity for one week was calculated as: [(Sum of moderate, hard and very hard minutes for all valid days)/(# of valid days)] multiplied by seven.
For the subset of subjects with valid data at baseline, Year 1 and Year 2, summary estimates of summed number of minutes of moderate, hard and very hard minutes from both the self-reported 7-Day PAR and the RT3 accelerometer were used. The three analytic objectives were addressed separately. The primary objective, to estimate the association between the RT3 activity monitor and self-reported 7-Day PAR estimates of total weekly minutes of moderate, high and very high exercise activities, was explored using correlation coefficients at each of the three time points. In the process of evaluating this objective, we explored both the parametric Pearson and the non-parametric (rank) Spearman correlations, and determined there were no substantive differences in inference between them. We present the Pearson correlations, with the resulting summarized minutes of exercise data using means and standard deviations, for each time point. To evaluate sensitivity to change, as defined as the difference from baseline (pre-intervention) to Year 1 follow-up (post-intervention) for both RT3 and PAR, difference scores were calculated as Year 1 –Baseline value for each subject. Correlation coefficients were then used to evaluate the extent of association between the difference scores. The final objective was whether the two methods of measuring minutes of exercise differed in the assessment of the effect of each study arm over the three time points. This objective was evaluated by comparing the arm by time interaction term from a repeated measures mixed model for each method. All analyses were conducted in SAS, version 9.1 (Cary, NC).
As stated previously, this analysis utilizes data from the 139 subjects with valid data at pre-randomization, then post-randomization baseline (n=115) and Year 1 (n=103) and Year 2 (n=99) time points. Table 1 displays selected characteristics of the pre-randomization subsample presented for comparison alongside those from the full set of 543 randomized participants in the FRESH START study. Since the subsample was a convenience sample of subjects who resided within 60 miles of Durham, NC, it cannot be assumed that it would be as representative as a random sample of the full set of 543 randomized subjects from the FRESH START study. It is important to note that 115 subjects of the 139 pre-randomization subsample were also in the n=543 set of randomized subjects, so that a formal statistical test for differences in characteristics between the groups cannot be conducted. Upon inspection, the general profile of the subsample is similar to that of the full set with respect to age, gender, marital status, education, income and number of co-morbidities. However, when compared to all randomized FRESH START participants, the subsample was comprised of a higher proportion of blacks (18.7% vs 13.3%). In addition, the subsample was noticeably different than the full set in allocation to treatment group, with just 39.1% compared to 49.9%. Finally, the subsample also reported more minutes per week of moderate or greater exercise than the full set, reporting an average of 59.2 minutes per week compared to 49.0 minutes. Of the 24 subsample subjects subsequently excluded from the FRESH START study, 16 reported exercising at least 150 minutes/week. Overall, more than 16% of the subsample reached national guideline threshold of 150 minutes per week, compared 10.7% of the full set.
The primary objective of the current study was to estimate the association between the RT3 activity monitor and self-reported 7-Day PAR estimates of total weekly minutes of moderate, high and very high exercise. This was conducted as a series of three cross-sectional correlations, without respect to the arm to which the subjects were allocated. Of the 139 subjects assessed at baseline, 103 provided useful RT3 data for the Year 1 measurement and 99 at Year 2. Table 2 displays the mean values for total exercise minutes from moderate and vigorous (high and very high) activities from both the self-reported 7-Day PAR and the RT3 accelerometers, and the Pearson correlation coefficients for each of the three time points. At all three time points, the two measures of minutes of exercise are positively and significantly correlated at a p-value of at least 0.01, with stronger associations at Baseline (rho = .54) and Year 2 (rho = .53) than Year 1 (rho = .24).
A secondary objective was to evaluate the association between the 7-Day PAR and the RT3 to the sensitivity to change, as defined as the difference from baseline to Year 1 follow-up. On average, for the group of subjects with useable data for both 7-Day PAR and RT3 at Baseline and Year 1, the 7-Day PAR estimated an increase of 44.9 minutes per week (SD=104.4) and a decrease of 1.1 minutes per week (SD=29.7) as measured by the RT3 (data not shown). The Pearson correlation coefficient for the association of these two change scores was .11, with a p-value = 0.30.
The final analytic objective of this study was to explore the effect of the intervention on minutes of exercise as measured by the self-reported 7-Day PAR and by RT3 accelerometers at the three time points in the subsample of participants (Baseline, Year 1, and Year 2). The results are summarized in Figure 1. The p-values for the separate arm by time repeated measures mixed models were p=0.22 for the 7-Day PAR estimates and p=0.23 for the RT3 estimates, indicating that despite the RT3 estimating much higher levels of moderate and vigorous physical activity than the self-report from the 7-Day PAR, both measures of minutes of exercise provided similar inferences about the effect of the FRESH START intervention over time in the subsample of participants.
In this FRESH START substudy at each of the three serial time points, there was a significant positive correlation found between self-reported minutes of moderate and vigorous exercise using the 7-Day PAR and the total moderate and vigorous minutes of physical activity captured via the RT3 accelerometer. The correlation coefficients at baseline and two years are very similar and substantive (rho =.54 and .53, respectively) and at the Year 1 assessment, the association is lower but still significantly different from random association (rho=.24). Taken as a whole, these correlations suggest a moderate association between the RT3 and PAR, but this association may be limited because each of these measures pertain to different types of physical activity. The PAR is designed to estimate the weekly total minutes spent in dedicated moderate-to-vigorous exercise of at least 10 consecutive minutes in duration, whereas the RT3 delivers estimates (in “counts” per minute) of any physical activity detected on a minute by minute basis.
There are a limited number of studies that have explicitly compared measures of self-reported exercise with similar estimates derived from activity monitors. However, the correlation coefficients between the two methods of capturing exercise and physical activity observed in this study in the three cross-sectional time points (.54, .24, and .53) were higher than those reported by Ainsworth et al.(1), but similar to those found by Hayden-Wade et al. (19), and lower than those reported by others (21, 24). Ainsworth et al. (1) classified self-reported physical activity (PA) minutes per day as non-occupational walking, moderate or hard/very hard, and compared those with “counts” from a CSA uniaxial accelerometer. They reported rho=.26 for non-occupational walk plus moderate exercise and a rho=.32 for hard/very hard PA. Leenders et al. (21) compared total energy expenditure (expressed as kcal/kg/day) derived from the self-reported PAR with the same measure estimated from two activity monitors: the uniaxial CSA (rho=.82) and the triaxial Tritrac (rho=.89). In the current study, we conducted three cross-sectional analyses comparing self-reported minutes per week of moderate or vigorous (hard or very hard) dedicated exercise and found correlations intermediate in magnitude as compared to the studies mentioned above.
As a secondary aim, we compared the difference in minutes, expressed as Year 1 minus Baseline, between the self-report PAR and the RT3, and found essentially no difference from a random association (rho=.11, p=0.30). This finding does little to inform as to which method is preferable for evaluating the efficacy of an exercise or PA intervention, as both methods have strengths and weaknesses which contribute to bias and error (discussed below).
Finally, we compared the two randomized group trajectories using the three time points (Baseline, Year 1 and Year 2) from the FRESH START study (Fig. 1). The p-values for the arm by time interaction are roughly equal for the PAR (p=0.22) and the RT3 (p=0.23) methods, so both methods provide similar evaluative information. It should be noted that in this 23% subsample of FRESH START, we were statistically underpowered to detect an arm by time interaction of moderate effect, thereby making this interaction analysis exploratory. However, when the arm by time interaction term is dropped from the analytic model, the PAR model detects no differences between the arms (p=0.85) and a significant linear increase over time (β=32.1, p<.0001), while the RT3 model finds a main effect difference between the arms (β=−56.1, p=0.03), but no significant average linear change over time (β=−15.9, p=0.08). The functional forms of the intervention trajectories for both the PAR and RT3 methods are similar in that they both peak at the Year 1 assessment (at the end of the intervention period) and then decline at Year 2, much less pronounced in the case of the PAR than for the RT3. However, the trajectories for the control arms are quite different between methods, in that the PAR control group monotonically increases at each time point, while the RT3 control trajectory remains fairly flat.
There are several limitations to this study. First, the 139 subjects for whom RT3 data was usable at baseline was a convenience sample of FRESH START participants who happened to reside within 60 miles of Duke University in Durham, NC, and as such are not representative of the entire FRESH START sample. Thus, the results from this analysis may not be generalizable outside of this subsample.
Second, there are limitations with both methods being compared in this study. Limitations with self-reported physical activity and exercise are well-known (2, 12, 16, 26) and include decreased reliability due to recall biases, social desirability response, cognitive and memory processing which can vary due to age, gender and other attributes. Furthermore, subjects can have difficulty determining whether a particular activity falls into low, moderate or vigorous intensity. Even in a highly structured interviewer administered instrument like the PAR, subjects are likely to overestimate their actual exercise level (15). Motion sensors such as accelerometers are generally considered to provide more objective and reliable estimates of the true level of PA than self-report instruments (1,15) but waist-mounted accelerometers have limitations in estimating moderate-intensity activity (1, 29), as well as for several lifestyle activities such as raking, shoveling and sweeping and for static activities (4, 20, 31). In addition, because RT3 units need to be kept dry, they were not usable during episodes of swimming or water exercise, which also contributes to non-concordance with the PAR. In FRESH START we observed that several RT3 units were returned for data processing with unusable count data at baseline (n=15) due to either user error or mechanical issues with the RT3.
Third, not all accelerometers are equal in accuracy and reliability (13, 14), so care must be used to choose the best model to limit measurement error. Despite the widespread use of accelerometers, there still are no standardized methods to process and summarize data from them, so that different algorithms impact important outcome measures differently (23). Some recent studies (for example, 7,17) have shown poor agreement of both accelerometers and 7-Day recalls with doubly labeled water and heart rate monitors, measures which are expensive to implement, but considered to be closer to being a “gold standard”.
This study has strengths that contribute to the understanding of the relative benefits of using the PAR and the RT3 in measuring dedicated exercise and PA in an intervention study. First, although this subsample was a convenience sample, it is similar in demographic profile to the sample of the FRESH START study, spanning both elderly and younger adults, men and women, and a variety of ethnicities. Next, we observed that the PAR and the RT3 showed moderate-to-good association when evaluated in three separate cross-sectional time points. This may be the first attempt to provide three longitudinal cross-sectional correlations between concurrent measures of PAR and accelerometer measurements. The PAR and RT3 methods delivered different results, however, when evaluating the change in exercise from baseline to Year 1. Finally, both methods yielded similar trajectories of intervention arm by time interactions but conflicting main effects. Additional research is needed, and ongoing, to determine which method, and which instrument, is best suited for use in intervention trials. In the end, it is the decision of the investigator to weigh the preponderance of benefits of each method (validity, reliability, ease of administration, etc.) against the costs (bias, expense, difficulty of data processing, etc.) when making the decision which to choose for any particular study.
Supported by National Institute of Health Grants CA81191, CA74000, CA63782, and M01-RR-30, and current salary support for RS under P30AG028716. We would like to thank James Topping, Robert Hollowell, MD and Manjushri Bhapkar, MS for their careful and skillful programming and coding efforts of the RT3 count data. The authors would like to dedicate this publication to the memory of Dr. Elizabeth C. Clipp, who inspired us all.