Accelerometry-based portable PA monitors are a feasible and objective means of detecting PA patterns. Many studies have developed and validated models with various accelerometers to predict activity EE; however, to our knowledge, the scopes of such studies were largely limited to short protocols consisting of structured intermittent bouts of PA. These validation protocols, while in man cases similar to protocols employed in the development of the regression equations we tested, may not mimic free living because they include a limited number of PA types, and assume that all are equally likely to be present in free living. This may lead to larger prediction errors when using these equations for longer and free-living studies than we experience in the laboratory. In this study using a whole-room indirect calorimeter, we validated the ability of the ActiGraph, Actical, and RT3 activity monitors to accurately report summary statistics relating to time spent in specific PA intensity categories in a heterogeneous group of healthy men and women. Previously published regression equations for each device were explored to discover their relative strengths and weaknesses. The long study duration (~22 continuous hours) presents a bridge between short laboratory PA protocols, where all exercise intervals are explicitly specified, and free-living studies by allowing subjects to engage in both prescribed and spontaneous bouts of PA while still providing minute-by-minute EE measurements from the room calorimeter. Analyses were designed to attempt to highlight features that would be of interest to researchers examining long-durations (weeks) of free-living data or a data collected from a large number of subjects, where minute-by-minute prediction accuracy is less important than reliable summary measures of each day.
PAL is a measure of the mean EE above REE. It is an attractive daily PA outcome because it rises proportionally to the number and intensity of active minutes in each day while being comparable between subjects, because data from each subject is normalized by REE. Mathematically, accurate predictions of PAL require that intervals in which activity counts are close to zero be assigned an EE close to or equivalent to the REE. Thus, the Hendelman (AG 2) equation is a poor choice because of its high y
). The Swartz ActiGraph regression (11
) also has a large y
-intercept and was not considered in this study because of its performance similarities to AG 2. The regressions that best estimated PAL contain the most physiological intercepts. In the cases of AG 3 and RT3 2, which predict EE in METs, the intercepts are slightly greater than one, whereas in RT3 2 activity EE is forced to zero when activity counts are zero. Using other regressions, PAL was, on average, underpredicted which highlights potential limitations in the regression forms and also reflects that there are some increases in EE that were measured by the calorimeter but do not have an associated acceleration response to be detected by the accelerometers (thermic effect of food, limb movements, and isometric muscle contractions). The higher predicted PAL that was observed for most of the RT3 regressions could be due to measurement sensitivity of each device, represented by the lower proportion of measured zeros by the RT3 (0.50) relative to the Actical (0.59) and ActiGraph (0.61), or could be due to characteristics of this regression, such as an overpredicted baseline value, higher slope, or a nonlinear model form. In the case of the proprietary RT3 regression, it is difficult to isolate the source of any potential benefits or artifacts, as the form of the regression is proprietary.
Time spent in MET categories is a summary metric which characterizes the intensity distribution of daily PA, and is a useful tool for assessing whether a daily PA goal has been met in the field. Although differences between predicted and measured intensity distributions were generally small in the moderate and vigorous intensity categories (<2%), discrimination between sedentary and light PA had a much higher error rate (generally around 10%). Because, on average, the total difference between the models and the calorimeter for the combined category of 1–3 METs is small (<2%), we may be able to infer that the form of the regressions we tested may not be appropriate for low intensity PA, as adjustments in the slope alone would change the amount of total time classified in this intensity region. Recently, an approach has been presented to attempt to mitigate these problems by using a different regression form for these activities (21
); however this equation was not tested here because it requires that data be acquired 1-s epochs. As in previous work (15
) our largest errors were observed using the Hendelman (AG 2) regression. However, because it is one of the only regression equations developed primarily using lifestyle activities, which are the most prevalent activities in our protocol, we felt inclusion of this equation was important to show the way errors observed in short protocols, those with experimental duration of 2–3 h, propagate when data is considered over the course of a day.
In addition to the regressions presented for all of our analyses, we also computed the time spent in sedentary (1–1.5 METs), light (1.5–3 METs), and moderate/vigorous PA (>3 METs) using the Matthew's ActiGraph moderate/vigorous cutoff point (760 counts/min) (13
). This cutoff point was developed using combined data from several subjects. We coupled this cutoff point with a sedentary/light cutoff point of 100 counts/min, which has been previously suggested for adolescents (23
). Using these cutoff points there was only a small difference between the activity monitor predicted time spent in sedentary PA and the calorimeter (2.9%), although this difference was significant (P
< 0.001). This difference was markedly smaller than differences observed using the other ActiGraph cutoff points and regressions tested. Although the time spent in light PA estimated using Matthews' cutpoint showed the best agreement with the calorimeter of any ActiGraph equation tested (mean time spent in light PA = 11.93%), there was still a significant underprediction in this category and a corresponding overprediction of the time spent in moderate/vigorous activity.
When considering these results, it is important to remember that even small percentage differences between predicted and measured time in each intensity category can cause problems in assessing subjects' adherence to public health recommendations, which typically require between 20 and 45 min of moderate-to-vigorous PA. If only the discrimination between light (1.5–3 MET) PA and all others is considered, we can determine the potential of each regression to correctly characterize whether such a goal has been achieved. Because 1% of our average study visit corresponds to ~13 min, even regressions that demonstrated this seemingly high mean agreement with the measured intensity distribution often had wide ranges of agreement. These errors would likely render reliable determination of the time an individual spent engaged in moderate-to-vigorous PA difficult, unless the subject exceeded the specified amount of PA by 20–30 min. This would suggest that current accelerometer regressions should be used only for assessing compliance within a population and not on an individual basis. This could be because the forms of the current regressions are too simple to account for the inter-individual variability in PA performance, suggesting that either more flexible modeling techniques or individual calibrations should be considered.
There are some limitations in this study. First, we did not evaluate all predictive equations available for all the monitors we tested. We restricted our search to commonly used regressions, developed using 1-min epoch data. Further, whenever possible we used equations that are built-in to activity monitor software, thereby attempting to isolate the equations that would be most accessible to researchers in the field. A new, nonlinear regression for the ActiGraph has been recently published (21
); however, the data collection epoch was 1 s, we were unable to validate its performance using this data set. Also, although we frequently referred to our predictions in terms of METs, they are more truly PA ratios because each subjects' EE was normalized by a measured REE (24
). This difference could explain some discrepancies with regression equations developed using a constant 3.5 ml O2
/kg/min as the normalization factor. However, there is some recent evidence that the constant normalization factor is not valid for all subjects (25
) and PA ratio may be a more meaningful summary metric. We explored the impact of the value used for normalization by analyzing all of our data using an REE computed with the Harris–Benedict equation. Resulting statistical trends for PAL and percent of time spent in each intensity category were unchanged.
It should also be noted that due to our relatively large sample size, both with respect to number of subjects and duration of data collection, many statistically significant differences were detected that may result from absolute differences that are too small to be clinically relevant. Thus, each researcher must examine the magnitude of the difference between predicted and measured values to determine whether observed differences are meaningful in the context of their work. Underprediction in PAL may be important, even if absolute differences are small, because a threshold value for an active day may not be predicted even if it is achieved using these approaches. For intensity categorizations, small percentage differences can correspond to enough minutes of erroneous prediction as to restrict their ability to detect whether an exercise goal has been met, or worse, can predict that an exercise goal has been met when it has not.
In this study, we compared three commercially available accelerometry-based activity monitors and seven EE prediction equations with measured values using a room indirect calorimeter. Mean PAL was underpredicted by four regressions (AC 1, AC 2, AG 1, and RT3 1), overpredicted by one (AG 2) and was not different from the criterion measure in three cases (AG 3, RT3 2, and RT3 3). Despite many performance similarities across monitor types and regressions, specific strengths and weaknesses were found for each, suggesting that no one equation or monitor is superior in all circumstances. For example, the RT3 regressions had the most comparable PAL value to those measured, whereas the Actical single-regression model (AG 1) was generally good at estimating the time spent in moderate and vigorous PA. Consequently, researchers should consider their outcome goal in determining not only the instrument they use to collect data but also their postcollection processing method. Because data can be safely analyzed using multiple regression approaches, researchers who are interested in more than one type of outcome may determine that more than one regression approach should be employed within a study in order to produce the highest accuracy results for each measurement variable of interest.