|Home | About | Journals | Submit | Contact Us | Français|
The use of movement monitors (accelerometers) for measuring physical activity (PA) in intervention and population-based studies is becoming a standard methodology for the objective measurement of sedentary and active behaviors and for validation of subjective PA self-reports. A vital step in PA measurements is classification of daily time into accelerometer wear and nonwear intervals using its recordings (counts) and an accelerometer-specific algorithm.
To validate and improve a commonly used algorithm for classifying accelerometer wear and nonwear time intervals using objective movement data obtained in the whole-room indirect calorimeter.
We conducted a validation study of a wear/nonwear automatic algorithm using data obtained from 49 adults and 76 youth wearing accelerometers during a strictly monitored 24-h stay in a room calorimeter. The accelerometer wear and nonwear time classified by the algorithm was compared with actual wearing time. Potential improvements to the algorithm were examined using the minimum classification error as an optimization target.
The recommended elements in the new algorithm are: 1) zero-count threshold during a nonwear time interval, 2) 90-min time window for consecutive zero/nonzero counts, and 3) allowance of 2-min interval of nonzero counts with the up/downstream 30-min consecutive zero counts window for detection of artifactual movements. Compared to the true wearing status, improvements to the algorithm decreased nonwear time misclassification during the waking and the 24-h periods (all P < 0.001).
The accelerometer wear/nonwear time algorithm improvements may lead to more accurate estimation of time spent in sedentary and active behaviors.
A sedentary lifestyle and lack of physical activity (PA) are major causes of obesity and related health risk factors in children, adolescents, and adults (16, 29, 34). Thus, precise measurements of PA are important in order to estimate the association between PA and health (3, 10–12). Several methods such as self-reports and accelerometry have been proposed to assess PA, but a single and comprehensive measure of PA that is applicable to surveillance, epidemiology, clinical and intervention research still does not exist (31). For example, PA records have been shown to be quite accurate for capturing total activity and can provide desired details regarding activity context and the type of activity (e.g., aerobic, strengthening, or flexibility exercises). However, they are burdensome to respondents since they are required to record each time there is a change in activity throughout the day. Other methods such as portable indirect calorimetry and doubly labeled water are costly. Recent technological advances allow objective measurement of PA frequency, duration and intensity by wearable monitors that can record movement and/or heart rate. A challenge with the use of monitors is capturing total activity since activity monitors selectively record movement of the part of the body to which they are attached (19, 32).
Nevertheless, accelerometers have been frequently used for the measurement of PA using criteria such as time spent in activities performed at various intensities and for the prediction of energy expenditure (EE) associated with PA (5, 26). Accelerometers have been also widely used to obtain measurements of body movements (intensity, duration, frequency, and type) and to relate them to various health parameters in free-living populations; their applications range from clinical interventional trials (2) to epidemiological studies (31).
Data collected by accelerometers such as Actigraph in a natural free-living environment can be divided into wear and nonwear time intervals. Nonwear time intervals include periods during which participants are asked not to wear their monitor, such as sleeping, showering, and aquatic activities. Wear time usually includes all waking periods (31) and requires a specific number of hours of wearing for a day to be considered valid (4, 8). Another approach frequently used in children is to select a time period during the day (e.g. 6 am to 9 pm) (21, 27). Since the length of wear time is a base for assessing time spent at various PA intensities and consequently duration, patterns, and amount of PA, it is critical to correctly classify wear and nonwear time intervals. Typically, an automated algorithm uses monitor-specific criteria to detect and eliminate the nonwear time intervals, during which no activity is detected (14, 21, 30). Distinguishing between the two can be difficult since continuous zero reading may occur for several reasons such as removal of the accelerometer during certain activities (e.g., water activities, sports) or for no reason, sleeping or sitting still with or without the accelerometer for long periods (18).
The most commonly used automated algorithm for Actigraph data is based on criteria proposed by Troiano et al. (31). This algorithm has been used in several population-based studies including the National Health and Nutrition Examination Survey (NHANES) (20). The algorithm was implemented using SAS® software and is available for downloading from the National Cancer Institute (22). This algorithm was designed for detecting nonwear during waking period, and NHANES participants were asked to remove the Actigraph for a night’s sleep and to replace it in the morning. The algorithm was not optimized to detect wear/nonwear during sleep; it would not be expected to perform well in that condition. Although to our knowledge the algorithm has never been independently validated in laboratory-based study, it has proven useful to automatically classify wear/nonwear time intervals (35).
Thus, the goal of this study was to assess performance of the current algorithm by comparing its accuracy when classifying wear and nonwear time intervals in data collected in a whole-room indirect calorimeter with known accelerometer-wearing status. Since it is uncommon for accelerometers to be worn during sleep, the classification of nonwear time intervals during the waking period (not sleep) was performed separately. The benefits of using the room calorimeter in this validation study included strictly monitored, accurately measured actual accelerometer-wearing time, and the possibility of assessing the outcomes’ clinical relevance by measurement of PA-related EE. We also explored potential improvements in algorithm accuracy by optimizing criteria for wear and nonwear classification.
Participants aged 10 to 67 years were recruited from the Nashville, Tennessee area using flyers, email distribution, and word of mouth. They participated in a prospective study focused on methodological aspects of PA measurement in youth and adults (6, 25). Personal characteristics of the study participants are in Table 1. Before the study, participating adults, youth and their parents or guardians signed an informed consent approved by the Vanderbilt University Medical Center Institutional Review Board.
Study participants spent approximately 24-h period in a whole-room indirect calorimeter (28), and followed a structured protocol for simultaneous measurements of PA and EE. The protocol included a broad range of pursuits ranging from moderate and vigorous to light and sedentary tasks, including eating meals and snacks and self-care activities. During times (30 to 120 minutes) when no activity was specifically scheduled, the participants were asked to engage in their normal daily routine as much as possible without specific suggestions. They also recorded their activities in a diary with a detailed schedule, reporting any episodes of accidental monitor nonwear intervals and other relevant comments. Sleep was defined as the period of time spent lying on a mattress at night between 9:00 pm and 6:00 am without any significant movement as determined by the floor (force platform) in the room calorimeter. The participants were instructed how to record their activities in a provided diary with a detailed schedule and a timeline. They checked off each scheduled activity and reported any episodes of accidental monitor nonwear intervals and other relevant information (e.g. treadmill speed) or comments. During the day, staff was available for assistance and the dairy was discussed with each participant after finishing the study.
Body weight was measured to the nearest 0.01 kg with a digital scale and height was measured using a wall-mounted stadiometer. The minute-to-minute EE was calculated from the rates of oxygen consumption and carbon dioxide production (33). Nonwear EE was calculated by summing EE measured by the room calorimeter during time intervals detected as nonwear by each algorithm.
The PA was measured by commercially available Actigraph GT1M accelerometer (ActiGraph, Pensacola, FL), calibrated by the manufacturer placed on the anterior axillary line of the hip on the dominant side of the body. Among commercially available accelerometers, the Actigraph used in the present study provides consistent and high quality data, supported by its feasibility, reliability and validity (9). The monitor reports counts from the summation of the measured accelerations over a specified epoch (1). Actigraph data were collected at a 1-second epoch and summed as counts per minute.
Consistent with our experimental design, participants wore the accelerometer for the duration of their stay in the room calorimeter. Any time interval classified by the algorithm as nonwear was considered an error or misclassification. The current algorithm and its major components with the default settings are summarized in Table 2. We systematically evaluated: (1) threshold for nonzero counts allowed during a nonwear time interval; (2) minimum length of time window for consecutive zero counts (and nonzero counts below the threshold) to be considered a nonwear time interval; and (3) maximum length of artifactual movement interval allowed for nonzero counts during a nonwear time interval.
The validity of the default threshold (100 counts/min) of the current algorithm was evaluated by examining accuracy of nonwear time interval classification for the thresholds ranging from 0 to 100 counts/min. The validity of the default 60-min time window was evaluated based on frequency distribution of the classified nonwear time intervals. We identified and tested two parameters to process the artifactual movement in the new algorithm: (1) artifactual movement interval defined as maximum time during a nonwear time interval for the detected movement to be classified as artifactual; (2) window-2 defined as the minimum time before and after an artifactual movement interval with no counts detected. We tested the artifactual movement interval ranging from 2 to 10 minutes and the window-2 from 15 to 60 minutes.
The current algorithm’s SAS program for classifying wear/nonwear time intervals, was downloaded from the National Cancer Institute (22), and the current algorithm was also implemented using the programming language R (23). A randomly chosen set of the adults and youth data was processed using the two programs and the outputs were compared for validation.
The distribution of nonwear time intervals classified by each algorithm was examined using frequency histograms. The minimum classification error was used as the optimization target for the threshold. Nonwear time intervals classified during the waking period and during the 24-h room calorimeter stay were used to compare the algorithm’s performance. As a secondary analysis, the EE measured by the room calorimeter during the misclassified nonwear time intervals was calculated. The nonwear time intervals classified by each algorithm and the corresponding EE during these intervals were summarized for each individual, and the performances of the two algorithms were compared using Wilcoxon signed-rank tests. The percent of participants having at least one misclassified time interval with lengths ≥ 60-min was also calculated and compared using two-sample test for binomial proportions.
The same analyses were performed separately for adults and youth. Data are presented as means and standard deviations (SD), ranges, and/or interquartile range (IQR). The programming language R version 2.7.0 (23) was used to develop and implement the algorithms, and to perform the statistical analyses.
Figure 1 shows the distributions of time intervals incorrectly classified as nonwear by the current algorithm with threshold ranging from 0 to 100 (current threshold) counts/min during the waking period. Each plot also presents the percent of participants having at least one misclassified time interval with lengths of ≥ 60-min. The x-axis represents categories for the length (minutes) of misclassified nonwear time intervals and the y-axis represents their frequency (the number of nonwear time intervals at each x-axis category). The frequency of nonwear detection and the percent of the misclassified participants decreased as the count threshold decreased; the least misclassifications were obtained with zero-count threshold. To handle artifactual movement detection, the current algorithm allows a time interval with sporadically occurring nonzero counts within a 100-counts threshold to be classified as nonwear. As such, there is a trade-off between the two features; if the threshold is lower, the misclassification rate will be lower, but it may not be able to handle artifactual movement detection. To detangle these two features, we chose zero-count threshold (no-threshold), and added a new component as a separate feature (the 3rd component) for further algorithm improvement and development.
Figure 2 shows the distribution of nonwear time intervals during the waking period, classified by the improved algorithm (using zero-count threshold) with the 20-, 60- and 90-min time windows. For example, in adults the improved algorithm with a 20-min time window detected 83 nonwear time intervals from 20 to 29 minutes. The improved algorithm with a 60-min window detected 3 nonwear intervals with lengths of ≥ 60-min. The number of misclassified nonwear time intervals sharply decreased with the 60-min window for both adults and youth. The optimal window during the waking period was the 90-min window with no nonwear intervals detected in adults and one in youth. We do not recommend a larger time window, especially for the current algorithm, because a longer time window could increase the chance of detecting true nonwear time intervals as wear due to the current implementation method described below.
A wide time window-2 (e.g. 60-min) did not determine the artifactual movements that occurred within relatively shorter periods, and a short window (e.g. 15-min) caused frequent misclassification of some low intensity activities as nonwear time intervals (data not shown), prompting the choice of the 30-min window as the default. In handling artifactual movements, artifactual movement intervals from 2 up to 5 minutes performed similarly. We set the 2-min interval as the default, which is similar to the 2 consecutive intervals in the current algorithm. We confirmed the new algorithm’s ability to handle artifactual movement in two separate 3-day experiments (data not shown).
Table 2 summarizes side-by-side comparison of the current algorithm and proposed improvements. The new algorithm sets zero-count threshold and 90-min default time window-1 compared to 100 counts threshold and 60-min time window in the current algorithm. The new components for artificial movement detection include the artifactual movement 2-min interval and the up/downstream 30-min window-2.
Figure 3 shows representative data sets for two adults (panel A) and two youth (panel B) during the 24-h room calorimeter stay and the nonwear time intervals misclassified by the current and new algorithms. Plots for the measured EE (gray lines) and the raw counts (black lines) were overlaid in a normalized scale from 0 to 1. The horizontal short solid lines and values above the lines show the length of the misclassified nonwear time intervals, and the dashed boxes represent the classified wear time intervals. Each individual’s waking periods are presented (thick solid lines); dotted lines corresponding to 1.5 MET are also added as a reference for sedentary PA periods. The plots show that the performance of the two algorithms differs mainly during sedentary PA periods (< 1.5 MET). The current algorithm falsely classified sedentary PA (< 1.5 MET) as a nonwear more often than the new algorithm, including during the waking period.
The total nonwear times misclassified by the two algorithms and the corresponding EE were calculated for each participant, and the summary statistics are presented in Table 3. The new algorithm significantly lowered the nonwear time misclassification for both adults and youth during the waking and 24-h periods (Wilcoxon signed-rank tests, all P <0.001). Similar improvements to the PA-related EE were also observed for both adults and youth during the waking period and the 24-h stay (Wilcoxon signed-rank tests, all P < 0.001). The current algorithm misclassified nonwear time of at least 60 minutes in more participants compared to the new algorithm (during the waking period: 47% versus 12% of adults, P = 0.0002; 22% versus 4% of youth, P = 0.0008). Although the new algorithm performed better than the current algorithm, overall performance of the current algorithm was relatively good during the waking period. This could be expected because the current algorithm was developed to classify nonwear/wear during non-sleep periods. The current algorithm appeared to perform better in youth compared to adults during the waking period but not during the 24-h stay; an explanation might be that youth are usually more active than adults during the waking period, but not during sleeping. Thus, our results suggest that the current algorithm would perform better for people (such as youth) with a more active lifestyle compared to people having a relatively sedentary lifestyle.
We confirmed that the nonwear time intervals classified by the SAS and our R programs were not different. However, we noted that the current algorithm implemented in SAS classifies nonwear time intervals on a 24-h basis (midnight-to-midnight). That is, the algorithm classifies nonwear time intervals until midnight and summarizes them for the day. It classifies again starting from just after midnight and ending at midnight of the next day, and so on. Thus, for individuals who go to bed late, say after 11 pm, the algorithm can falsely classify this true nonwear time as wear since the total minutes of the interval is less than 60 minutes, which does not meet the time window criteria for nonwear time (see Figure SDC-1, Supplemental Digital Content 1, which illustrates this problem). Thus, use of the current algorithm could lead to misclassification of nonwear intervals < 60-min before midnight as wear. To avoid this problem, the new algorithm is processing data continuously without the midnight break and adds a day stamp after the wear/nonwear classification.
In this study, we systemically evaluated the automated algorithm most commonly used in population-based studies to classify accelerometer wear and nonwear time intervals and proposed improvements to the algorithm. The recommended elements in the new algorithm are: 1) zero-count threshold during a nonwear time interval, 2) 90-min time window for consecutive zero/nonzero counts, and 3) allowance of a 2-min interval of nonzero counts with up/downstream 30-min consecutive zero counts windows for artifactual movement detection. These improvements would mostly affect the misclassification of time intervals spent in sedentary behaviors that do not pass the wear/nonwear classification criteria for the low activity counts. Thus, studies in populations with a low active and high sedentary behavior PA patterns could likely benefit from these improvements.
In the current algorithm, classification of time intervals to wear/nonwear depends on three major criteria: 1) nonzero counts threshold, 2) time window for zero/nonzero counts, and 3) artifactual movement detection. In this study, to test the validity of nonzero counts threshold criterion, we examined counts threshold ranging from 100 to 0 counts/min since the 100-counts threshold is the default in the original version of the algorithm and specific for the accelerometer used in the analysis (31). The number of misclassified nonwear time intervals decreased as the counts threshold decreased, resulting in the optimal threshold at zero count. Thus, we have chosen the zero-count threshold for further testing and potential algorithm improvement. We also evaluated time windows; the number of misclassified nonwear time intervals sharply decreased until the current default 60-min window and reached an optimal 90-min for both youth and adults. However, due to the implementation method in the current algorithm, the 90-min window could increase false detection of wear time intervals around midnight. Thus, the 60-min time window was used for comparison of the current and new algorithms.
The third criterion in the algorithm is a procedure for proper classification of nonzero counts potentially caused by artifactual monitor movements during nonwear periods, which may be caused by accidental movement of the monitor (e.g. nudged or touched while sitting on a table or nightstand). During validation, we found that the current algorithm misclassified nonwear/wear time intervals, especially in sedentary behaviors (<1.5 MET). A plausible explanation is that it is difficult to distinguish between artifactual movement and a sporadic movement during sedentary PA. To mitigate this misclassification, in addition to the 2-min interval (artifactual movement interval) in the current algorithm, we included a second criterion termed window-2 in the new algorithm.
We validated the current algorithm using the programming language R (23) and compared it with the SAS program available from the NHANES (20) website using the same data sets. We found that both algorithms generated the same result when the entire monitoring study period is classified without daily segments. During the validation, we also examined the effect of daily summation of wear/nonwear classification with the midnight time break used by the SAS program. This is a potential source of misclassification of wear/nonwear time periods in cases when the actual wear stops and nonwear starts after 11 pm. Thus, we suggest that the minute-to-minute output for the entire monitoring study period be classified into wear/nonwear intervals and then further categorized into daily segments. This approach is implemented in the new algorithm.
To make the new algorithm applicable for specific studies and other types of accelerometer, the parameters in Table 2 (i.e. windows 1 & 2 and artifactual movement interval) can be defined by users depending on their experimental needs. In addition, our R program is readily applicable for data collected with various (e.g. 1-second) epochs and can be provided to interested investigators upon request.
A potential clinical consequence of wear time underestimation by the algorithm could be miscalculation of time spent in sedentary, light, moderate and vigorous PA categories, expressed as a proportion of the total wear time, which is very often used to quantify PA in population-based studies. In the context of the current study, we estimated that the observed mean of 50-min misclassification of wear time as nonwear during waking period could lead to approximately 8% underestimation of time spent in sedentary behaviors. Subsequently, the calculated percentage of time spent in light and/or moderate intensity activities would be overestimated. In addition, misclassification of wear/nonwear time could cause bias in the prediction of PA-related EE. Although the estimated EE daily bias might be relatively small, the differences could be substantially higher when extrapolated over longer periods such as week, month, or year in epidemiological and cohort studies. Assuming that 30% of total energy is spent for PA, we estimated that the difference in misclassification of wearing time during sedentary activities as rest (~60 kcal/day) would create an approximately 20 kcal/day gap between the two assessments. For an average person this would equate to ~7,000 kcal/year overestimation of energy spent on PA, the amount often linked to the current obesity epidemic and related health consequences (13).
Thus, even modest increases in the accuracy of wear/nonwear time classification have the potential to improve our understanding of the relationships between PA, PA-related EE, and health outcomes. The proposed algorithm improvements might be especially important in cohort studies in which the baseline PA assessment is often linked to longitudinal health risks and disease outcomes (7, 15, 17, 24).
Our study has several strengths. First, the room calorimeter allowed us to validate wear/nonwear intervals and assess clinical importance of the improvements by measuring PA-related EE. Second, we used a relatively large group of males and females with a wide range of ages (10 to 67 years old) and BMI (16 −52 kg/m2). Finally, since our R program uses modifiable arguments, it could be easily modified for each study’s needs and adopted to other accelerometers (e.g. RT3, Actical, Actiwatch).
We also recognize that our methodology has some limitations. First, the standardization of activity bouts performed in the room calorimeter may not accurately represent individuals’ habitual daily PA patterns. This could include variations in sleep patterns and longer periods of sedentary PA that may be misclassified, causing overestimation of nonwear time intervals. Second, although we confirmed that adding criteria for artifactual movement detection improved the correct classification of artifactual movement, the proposed criteria should be validated in other studies with larger number of participants in various free-living settings. Third, data were collected for a single 24-h period. Longer observation period (e.g. total 36-h) would provide additional data from wear day-time activities. Finally, we did not find substantial differences in the criteria for the new algorithm between men and women or adults and youth. While it is possible that differences may emerge in larger studies, or in studies of older adults, our results do not suggest a compelling need for gender or age-specific algorithms. More work is needed to verify and/or optimize the algorithm in studies of older adults.
In conclusion, we found that the classification of Actigraph nonwear and wear time intervals could be improved by modifying the currently used algorithm. The improvements include eliminating nonzero counts threshold during a nonwear time interval along with 90-min window for consecutive zero/nonzero counts, and handling artifactual movement detection using an additional component. Application of the improved algorithm in population-based studies may lead to a better prediction of time spent in PA and especially sedentary behaviors.
Disclosure of funding: National Institutes of Health (NIH).
We acknowledge the contribution of Elizabeth Booth, Stephane Daphnis, Cindy Dorminy, Kristen Jevsevar, Natalie Meade, Elizabeth Provenzano, Jason Rapaport, and Lauren Whitaker from the Energy Balance Laboratory as well as Clinical Research Center at Vanderbilt University Medical Center for help with conducting the study. We also thank Peggy Schuyler for editorial help.
This study was supported in part by R01 HL082988, R01 DK69465, the Vanderbilt Institute for Clinical and Translational Research (VICTR) grant 1UL1 RR024975 from NCRR/NIH, and Vanderbilt Diabetes Research and Training Center grant DK20593.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflict of Interest: none.
The results of the present study do not constitute endorsement by American College of Sports Medicine.