|Home | About | Journals | Submit | Contact Us | Français|
We evaluated the accuracy of three automated accelerometer wear-time estimation algorithms against self-report. Direct effects on sedentary time (<100 counts per minute; cpm) and indirect effects on moderate-to-vigorous physical activity time (MVPA, ≥1952 cpm) were examined.
A sub-sample from the 2004/05 Australian Diabetes, Obesity and Lifestyle Study (n=148) completed activity logs and wore accelerometers for a total of 987 days. A published algorithm that allows movement within non-wear periods (Algorithm 1) was compared to one that allows less movement (Algorithm 2), or no movement (Algorithm 3). Implications for population estimates were examined using 2003/04 US National Health and Nutrition Examination Survey data.
Mean difference per day between the criterion and estimated wear time was negligible for all three algorithms (≤11 minutes), but 95% limits of agreement (LOA) were wide (± ≥2 hours). Respectively, the algorithms (1, 2 and 3) misclassified sedentary time as non-wear on 31.9%, 19.4% and 18.0% of days and misclassified non-wear time as sedentary on 42.8%, 43.7%, and 51.3% of days. Use of Algorithm 2 (compared with 1) affected population estimates of sedentary time (higher by 20 minutes per day) but not MVPA time. Agreement between Algorithms 1 and 2 was good for MVPA time (mean difference −0.08, LOA: −2.08, 1.91 minutes), but not for wear time or sedentary time.
Accelerometer wear time can be estimated accurately on average; however, misclassification can be substantial for individuals. Algorithm choice affects estimates of sedentary time. Allowing very limited movement within non-wear periods can improve accuracy.
Accelerometers are increasingly used to provide valid, objective assessments of sedentary time  and physical activity [2, 3] in free-living populations, including in large-scale population monitoring.[4, 5] Good correlation or agreement between accelerometer output and activity intensity or energy expenditure has been established, however, other issues need to be considered, particularly how best to determine accelerometer wear time. Study protocols typically specify wearing the accelerometer during waking hours only, and removing for any water-based activities. Thus, wear time varies between days and between participants. Collecting self-report data can assist in determining wear time, but adds to participant and researcher burden. Automated wear-time estimation algorithms are thus especially desirable for large-scale studies, but are not standardised  and their accuracy remains to be established.
Automated estimations classify prolonged periods of non-movement (e.g. ≥60 minutes or ≥20 minutes at zero intensity) as non-wear time. However, non-moving periods could be either non-wear time or sedentary time. Estimations cannot detect non-wear periods that are shorter than the minimum length of the algorithm criteria (e.g. <60 or <20 minutes) or of a higher intensity than the algorithm allows (e.g. where failure of the accelerometer filtration process or external movement has occurred). To overcome this latter problem, spurious data can be defined and removed ; alternatively, some wear-time estimation algorithms  allow a limited amount of movement (non-zero counts) to occur within a block of non-wear time.
In addition to the direct impact on sedentary time estimates, achieving the criteria for valid data can be affected by wear time estimation. Thus, all accelerometer measures can be impacted, even if they do not directly include very low-intensity activities in their calculation (e.g. moderate to vigorous physical activity [MVPA] time). Studies often employ a minimum daily wear-time criterion (typically 10 hours) and often require a minimum number of valid days (commonly ≥ 4). Potentially, bias can be introduced if the automated estimation process erroneously excludes days on which participants are most sedentary (and possibly also least physically active) and/or excludes the participants who are most sedentary (and least physically active).
Wear-time estimation algorithms produce varied results.[7, 11] Defining non-wear time as all blocks of non-movement ≥20 minutes, relative to ≥60 minutes, leads to lower estimates of wear time, lower sedentary time, and higher average counts (a marker of overall activity intensity). One study concluded that ≥60 minutes was preferable to the ≥20 minutes criterion, but lacked a referent assessment. No studies that assessed automated estimation against a criterion method in free-living persons were found in database searches of PubMed and ProQuest (June 2010).
We examined the validity and compared the performance for adults of three automated accelerometer wear-time estimation algorithms. We also examined whether misclassification varied with socio-demographic characteristics or bodyweight. Finally, we examined the net impact of misclassification on population estimates of sedentary time, and also on MVPA time.
All accelerometer data were analysed in SAS 9.1 via a program developed by the National Cancer Institute. The program's wear-time component (Algorithm 1) has been used to generate US population estimates of sedentary time  and physical activity.  The program was adapted to produce Algorithms 2 and 3. In view of the apparent advantage of longer (≥60 minutes) over shorter (≥20 minutes) criteria for identifying non-wear periods, we focused on algorithms that use the ≥60 minutes criterion, and compared the effect of the extent of interruptions they allowed within the non wear-period. More/fewer or no interruptions means less/more spurious data but also more/less discarding of true sedentary time. The interruptions allowed by the algorithms were:
We used data from a sub-study of the 2004/05 Australian Diabetes Obesity and Lifestyle Study (AusDiab)  and the 2003/04 United States (US) National Health and Nutrition Examination Surveys (NHANES; www.cdc.gov/nchs/nhanes.htm). Both studies obtained ethical approval from relevant parties and written informed consent from participants. For both studies, accelerometers (Actigraph model 7164; Actigraph LLC, Fort Walton Beach, Florida) were set to record in one-minute epochs and participants were instructed to wear the accelerometer on the right hip for seven consecutive days during all waking hours, unless doing water-based activities.
Detailed methods for the AusDiab sub-study are reported elsewhere. Participants (n=202) recorded the times that they wore the accelerometer (on/off times) in an activity log, plus times that they removed it for a period of 15 minutes or more. Data were available for analyses from 148 participants with 987 matching days for accelerometer wear and the activity log. Data were excluded if the accelerometer failed (n=6) or the participant withdrew (n=7), and from all observed days where the accelerometer was not worn or the activity log was poorly completed (n=41 participants, n=336 days). Specifically, data were excluded that were suspicious (e.g. all on/off times occurred on the hour), ambiguous (e.g. cannot be certain that the times recorded referred to AM or PM), or missing time or date information.
The beginning and end-times of periods classed as non-wear by the algorithms were extracted and compared to the activity logs. Self-report was used as a criterion of whether participants were wearing the accelerometer or not at any given time; however, allowances were made for imprecision in self-reported times (of up to 30 minutes). The beginning and end times of our criterion non-wear periods were defined in one of three ways depending on the discrepancies between self report times from the log and those derived from the algorithms (Figure 1). If the discrepancy was less than 30 minutes (assumed to be imprecise time reporting), then the times identified by whichever algorithm most closely matched the activity log were used. If the discrepancy was 30 minutes or more, the algorithm was assumed to have failed and the times reported in the activity log were used. Times from the activity log were also used if the algorithms did not detect a reported non-wear period.
We calculated the overall agreement between the algorithms and the criterion measure in their assessment of each epoch as non-wear/wear. Although agreement with our imperfect criterion should not be interpreted in terms of diagnostic accuracy, we reported our results in terms of sensitivity and specificity instead of the traditionally-used Kappa, as these statistics can distinguish between types of misclassification and are unaffected by the amount of time that is non-wear according to the criterion. The best combination of sensitivity and specificity was judged by highest Youdin's J (sensitivity + specificity −1). In view of the non-independence of observations, we used a cluster bootstrap method to assess 95% confidence intervals (STATA v11). To explore the consistency of the performance of the algorithms, we also calculated sensitivity and specificity for each day, and report the range observed.
We descriptively compared Algorithms 2 and 3 with Algorithm 1 in terms of the percentage of days in which misclassification occurred and the amount of misclassification for days in which misclassification occurred. Due to skewness, the latter were reported as medians and ranges. Because of the direct substitution between non-wear and sedentary time, we report wear time that was misclassified as non-wear and sedentary time that was misclassified as non-wear. Similarly, a failure to detect non-wear time was reported as non-wear time that was misclassified as sedentary. Using Bland-Altman analysis  we also looked at mean differences and 95% limits of agreement (LOA) between estimated and criterion wear time for all three algorithms. To explore whether misclassification was systematic, we looked at bivariate associations of sociodemographic characteristics (age, gender, employment, education, income), and body mass index (BMI, kg.m−2) with the amount of sedentary time misclassified as non-wear (0, < 1 hr, 1–<2hrs, 2–<3 hrs or 3 hrs+) using Generalised Estimating Equations (in view of the repeated measures).
NHANES used a complex, multi-stage design, and its 2003–2004 cycle included an accelerometer component  for which all ambulatory participants at least six years of age who attended the Mobile Examination Centre (MEC) were eligible. Since we examined algorithm validity for adults, we focus only on data for adults (n= 4,741 MEC participants aged ≥ 20 years). To estimate the potential impact of algorithm choice on population estimates, we compared the algorithm that has to date been used on the NHANES data (Algorithm 1) [1, 5] with the algorithm that agreed most with the criterion. Daily values (minutes) and valid averages were calculated for: wear time, total sedentary time (worn time of intensity <100 cpm), and MVPA time (worn time of intensity ≥ 1952 cpm). As with other analyses of the NHANES accelerometer data,[1, 5] valid averages include only data from monitors that were returned in calibration and from days with ≥ 10 hours of wear-time. We further removed days where excessively high counts were encountered (≥ 20,000 cpm) as these may indicate unreliable data.
The algorithms were compared descriptively in terms of the population averages estimated, and the valid sample of participants and days from which these averages were derived. Population figures were calculated using linearized methods and appropriate sample weights. Population averages were based on all available valid data. Bland-Altman analysis was used to examine agreement between the algorithms in their estimates of: wear-time (minutes per day), MVPA (minutes per day), and sedentary time (minutes per day, percentage of time worn, and as minutes per day with correction for wear time by the residuals method ). Agreement was examined for the 3,078 participants with sufficient data for reliable estimates; that is, at least four valid days according to both estimation algorithms.
Participants from the AusDiab accelerometer sub-study were aged between 30 and 87 years (mean=54.2 years, SD 12.0). The sample included men and women diverse in age range, weight status and socio-demographic characteristics (Table 1).
Agreement between the algorithms and the criterion was excellent for all three algorithms. On average, all three algorithms had high sensitivity and specificity (all >95%); however, the values observed for the days with least sensitivity (0% to 43%) indicate that performance was not consistently good (Table 2). Algorithm1 had the most sensitivity and the least specificity, Algorithm 3 had the least specificity and the most sensitivity, while Algorithm 2 showed the best balance of both, by a very small amount. When misclassification occurred, sedentary time was misclassified as non-wear for 72 to 78 minutes on average whereas only 44 to 50 minutes on average of non-wear time was misclassified as sedentary time.
Misclassification of sedentary time as non-wear occurred on nearly one third (31.9%) of all observed days using Algorithm 1. Algorithm 2 reduced this to 19.4% without altering the percentage of days on which non-wear time was misclassified as sedentary time. Algorithm 3 also reduced the occurrence of misclassified sedentary time (to 18.0%) but at the same time increased misclassification of non-wear time as sedentary time (51.3% versus 42.8% for Algorithm 1).
For all algorithms, the mean differences between estimated and criterion wear time were negligible (≤11 minutes) but LOA were wide, spanning approximately two to three hours. The LOA were wider for Algorithm 1 than the others (Table 2). The Bland-Altman plots (Figure 2) further indicated that all three algorithms tended to have more underestimation of wear time than overestimation, and that the most extreme underestimation occurred at lower values of wear time.
Significantly more misclassification occurred among overweight or obese participants compared with those of normal or underweight BMI (Table 2). Based on the crude percentages (with differences of ≥5% considered noteworthy) and the GEE analysis, there was otherwise very little evidence that the amount of misclassification of sedentary time by the original method (Algorithm 1) varied across socio-demographic groups (Table 3). Noteworthy, but non-significant differences in crude percentages were seen only for participants who were working full time and those in the 40–49 year age bracket, when compared with their respective counterparts.
Table 4 compares estimates for the adult US population that result from using Algorithm 2 versus 1. Compared to Algorithm 1, Algorithm 2 generated higher estimates of wear time on average and consequently classified 581 more observed days as valid, and more participants as having valid data (using either a one-day or four-day criterion). Mean population estimates of sedentary time were higher when using Algorithm 2 than Algorithm 1, even when correcting for wear time or when examining sedentary time as a percentage of worn time. The magnitude of the difference in estimates was modest: approximately twenty minutes or one percent. By contrast, estimates of average time spent in MVPA were not affected by choice of wear-time algorithm, with the median [minimum, maximum] for Algorithms 1 and 2 being 18.0 [0, 215.5] and 17.6 [0, 215.5] minutes, respectively.
Figure 3 shows the Bland-Altman plots of agreement between Algorithms 1 and 2 for wear time, sedentary time, and MVPA. Agreement between the algorithms was poor for wear time and for all measures of sedentary time. For wear time, sedentary time and percentage sedentary time, some heteroscedasticity was evident (i.e. the amount of misclassification increased with the mean), so the Bland-Altman plots are displayed for the log transformed data. The back-transformed mean difference (1.02) and LOA (0.94, 1.11) indicate that Algorithm 2 produced estimates of wear time that were 2% higher on average than Algorithm 1, and anywhere between 6% lower and 11% higher for 95% of people. Relative to Algorithm 1, Algorithm 2 also generated higher estimates of sedentary time (+4%, LOA −8%, +17%) and percentage worn time spent sedentary (+2%, LOA −3%, +7%). For corrected sedentary time, the mean difference was 20.4 minutes (LOA: −12.7, 53.5). While mean differences were small, the wide LOA and large outliers show the two algorithms do not yield equivalent estimates of sedentary time or wear time. In contrast, there was good agreement for MVPA estimated by the two algorithms (mean difference −0.08, LOA: −2.08, 1.91 minutes).
All three wear-time estimation algorithms showed excellent agreement with the criterion. However, for some days, the algorithms had poor sensitivity and specificity and generated estimates that were incorrect by several hours. In addition to problems with short periods of spurious data, long bouts of time were often misclassified. Reducing the amount of movement permitted within non-wear time (Algorithm 2) reduced the misclassification of sedentary time as non-wear without affecting the detection of non-wear time. By contrast, allowing no movement within non-wear periods (Algorithm 3) had a similar effect on misclassification of sedentary time, but at a cost of failing to detect `true' non-wear time. One problem for all algorithms was true non-wear periods shorter than 60 minutes, which commonly occurred when the accelerometer was removed after 11pm. Overall, allowing very limited interruptions (i.e. <50 cpm, no more than two per non-wear period) appeared optimal, although the benefit over no interruptions (i.e. 0cpm) was only slight.
It is possible that misclassification resulting from wear-time estimation algorithms is differential; however, our analysis of this issue was limited by the small sample. The association of misclassification with BMI suggests that the algorithms may perform better for normal weight adults than for adults who are overweight or obese. One possible explanation is under-detection of movement for overweight or obese persons on whom accelerometers tend to sit at the wrong angle.
Algorithm 1 was developed for surveillance purposes to estimate population means, primarily MVPA, in large-scale population studies, such as NHANES. The use of this versus Algorithm 2, which has also been used in analysing the NHANES data, affected the estimates of total wear time, and thus the number of valid days and participants included in analysis. This did not translate into an impact on MVPA time, and the impact on population estimates of sedentary time was modest. However, agreement in sedentary time estimates across algorithms was poor, even when wear time was supposedly `controlled' either via the residuals method  or by conversion to percentages. Overall, the implications for research studies are that algorithm choice may be of little importance when obtaining descriptions of population levels, but studies aiming to examine factors associated with sedentary time or to detect within-person change may be affected by misclassification.
This study adds to the relevant research literature [7, 11] by using a referent assessment method in free-living participants. Our findings complement those of a recent laboratory study, which established that several automated estimations with long minimum durations (60 minutes, or 90 minutes) perform adequately, particularly when allowing interruptions. There is no gold-standard criterion for free living populations and although our criterion was not ideal, it was unlikely to favour any particular algorithm and was likely adequate for comparison purposes. The sample was not fully population representative, thus, generalizability is not certain. Other populations, including children, adolescents, young adults, and people from various racial and ethnic backgrounds should also be examined. Given the rapid advances in accelerometer technology (which includes the collection of large amounts of raw data), appropriate algorithms also need to be validated for different epoch lengths and also for accelerometers using dual-axis or triaxial modes. This study used three `cut-points' for allowable interruptions; further exploration may reveal better algorithms.
Automated accelerometer wear time estimation has acceptable validity for adults for many purposes, with the better results achieved by allowing non-wear periods to contain very limited movement (Algorithm 2) rather than extensive interruptions (Algorithm 1). However, further achievable improvements are needed, particularly when accurate sedentary time measures are necessary, such as identifying and removing spurious data  and reducing the failure to detect short non-wear periods (<60 minutes) by allowing non-wear bouts to continue past midnight. Estimation algorithms are a time-efficient and feasible option for large-scale population monitoring, but associated measurement error in sedentary time is substantial and needs consideration.
“The Corresponding Author has the right to grant on behalf of all authors and does grant on behalf of all authors, an exclusive licence (or non exclusive for government employees) on a worldwide basis to the BMJ Publishing Group Ltd and its Licensees to permit this article (if accepted) to be published in BJSM editions and any other BMJPGL products to exploit all subsidiary rights, as set out in our licence (http://group.bmj.com/products/journals/instructions-for-authors/licence-forms/).”
Clark, Gardiner, Healy, Winkler and Owen are supported by a Queensland Health Core Research Infrastructure grant and by NHMRC Program Grant funding (#569940). Gardiner is supported by a Heart Foundation of Australia (# PP 06B 2889). Clark is supported by an Australian Post-graduate Award. Healy is also supported by a NHMRC (#569861) / National Heart Foundation of Australia (PH 374 08B 3905) Postdoctoral Fellowship. Data from the AusDiab study were used (for full acknowledgments of the many funding sources, see ). Data used in this study (NHANES) were collected by the National Center for Health Statistics, Centers for Disease Control and Prevention.
Funding This study was supported by funding from a Queensland Health Core Research Infrastructure grant and the National Health and Medical Research Council of Australia (#569940)
Competing interests There are no competing interests to declare.