This study had two main findings. First, the cut points derived from the different statistical analysis deviated considerably from each other. Second, most cut points were substantially lower than what is found in earlier studies. This was a result of using individual MET values, rather than the standardized value of 3.5

mL/kg/min to define walking intensity and the older age of our sample compared to earlier studies. Together, this indicates that age- and weight-specific cut points may be appropriate.
Most calibration studies have used an OLR analysis to establish cut points for the work rate. This approach is inappropriate because each subject is measured repeatedly, which violates the independence assumption of this statistical procedure. Therefore, Welk [
28] has suggested to apply a MIX analysis or ROC curves to analyze data from such calibration protocols. Consistent with the present study, however, Welk found that the cut points derived from regression analyses were fairly similar, while the results from ROC analyses deviated from the regression models. Further, in the present study, the cut points yielded quite different results for sensitivity and specificity. By definition, cut points derived from the ROC analyses had a balanced tradeoff between sensitivity and specificity. However, the cut points derived from the regression analysis for MPA yielded high sensitivity (>0.90) and low specificity (<0.51), while those for VPA yielded low sensitivity (<0.56) and high specificity (>0.91).
However, a ROC analysis based on maximized sensitivity and specificity may not have been the method of choice for the present study due to the large class-skewing effect, especially for the VPA cut point [
32]. Therefore, one could argue that applying an accuracy definition (ROC 2) that accounts for both the positive and negative cases would be a more balanced method and be the most appropriate approach. However, this was seemingly not the case, indicated by the suggested cut point for VPA of 7220

counts/min using the ROC 2 approach. As shown in , only one person exceeded accelerometer counts of 7000

counts/min, and this cut point clearly does not fit the observations very well. Thus, this result is not meaningful. Due to the lack of data at high intensities (at or above 6 METs), we also analyzed cut points for 5 METs, where data were less skewed (not shown). For this intensity threshold, the results from the ROC 2 analysis were consistent with those of the regression analyses. This finding demonstrates that the ROC 2 analysis worked well for intensities that had more observations. Thus, because the ROC models are sensitive to class skew and the number of observations near the level of the state variable, the models will have a poor performance if few subjects reach the 6 MET level, which could be a problem for calibration studies in populations with low fitness levels. However, neither a higher walking speed (>6

km/h) nor running was suitable in the present study performed in severely obese subjects. Regression models would probably be a better choice in such cases, as extrapolation in these models will work reasonably well if the data can be assumed to be more or less linear within a narrow range. In addition, regression models allow for examination of covariates, which makes these models more informative and useful. As such, we preferred to use the regression models to establish the cut points in our sample.
Regarding which regression model to use, two arguments can be made in favor of the MIX model. First, the repeated structure of the data did not meet the assumptions underlying the OLR procedure; thus the MIX model would clearly be the preferred and correct method of choice. Second, a significant quadratic relationship between counts/min and MET was found using the MIX model. This was not captured by the OLR model, when dependency among the observations was ignored. As such, we argue that a linear mixed model should be used in future calibration studies of this type. Hence, the results obtained from the mixed model are discussed in the following sections.
The cut points in our study were ~1400 (MPA) and ~1000

counts/min (VPA) lower than those found in earlier calibration studies using treadmill walking and running in young normal weight subjects [
3–
7]. This was mainly an effect of using individual MET values instead of the standardized value of 3.5

mL/kg/min and having an older aged sample. As the resting metabolic rate expressed per kg body weight declines with increasing BMI or fat mass [
18], the metabolic cost of PA will be systematically biased toward an underestimation in obese individuals. Thus, we believe our correction for individual resting metabolic rate in this sample is an important step forward. Such correction is also recommended in children to avoid an overestimation of PA level due to their higher resting metabolic rate (4–6

mL/kg/min) [
25,
26]. An alternative strategy is to assign different MET cut points (e.g., 4 and 7 METs for MPA and VPA, resp., in children [
33]) to compensate for differences in resting values. However, this procedure will not capture individual differences, so we believe the use of individual MET values is a more valid approach.
Our results were expected, because both age and weight increased the metabolic cost of walking without a corresponding increase in the accelerometer counts, as also shown by others [
16,
17]. This is partly consistent with an earlier study in moderately obese middle-aged to older subjects (BMI 31 ± 5.17; age 62.6 ± 6.5 years), which found cut points of 1240 and 2400

counts/min for MPA and VPA, respectively [
10]. However, Lopes et al. [
10] reported a very low VPA cut point. This could (at least partly) be explained by their substantially steeper slope for counts/min compared to earlier studies (0.0013 versus ~0.0006–0.0008 [
3–
7]), which could be due to their use of a single accelerometer unit. This reduces the external validity of their results.
We found a marked effect of age, with increased age giving lower cut-point thresholds. This is a result of age being positively related to the relative oxygen consumption (indicating lower work economy with increased age) and negatively related to counts/min (indicating lower trunk vertical accelerations with increased age) during walking. The lower work economy with increased age is in line with previous studies [
14,
15]. However, the negative effect of age on accelerometer counts contrasts an earlier study [
17], which did not find any difference between groups of 20–29-, 40–49- and 60–69-year-old subjects over a wide range of speeds. We have no explanation for this discrepancy. Clearly this indicates a need for future calibration studies, including large samples of men and women differing in body size and age.
A possible factor explaining some variation between studies may be the use of different generations of the Actigraph accelerometer, given that all studies in normal-weight subjects have used the earlier CSA 7164 model. Although the GT1M model has been shown to be slightly less responsive to low accelerations and walking speeds compared to earlier models, this difference is probably of minor importance for explaining the contrasting results [
34–
36].
The attachment of single-axis accelerometers becomes more difficult as body fat increases because the abdominal fat mass may increase the likelihood of accelerometer tilting. This effect results in lower count values [
37] and may have led to an underestimation or a larger count variation in the present study, than has been seen in studies of normal-weight subjects. Earlier studies using walking protocols have reported similar (0.59) [
8] or somewhat larger (0.74) [
7] explained variances in energy expenditure. These findings indicate the possible influence of instrument tilting, although gait patterns, as observed in the lab, are probably more important for explaining this variation in the data.
One important implication of our findings is that, in terms of energy expenditure, obese and middle-aged to older individuals may in fact be more active than is currently believed. Current evidence suggests that overweight, obese, and older individuals are less active than normal-weight and young individuals [
33,
38–
40]. For example, an analysis of the free living PA level in our nonrepresentative sample of obese subjects shows that 14% exceeded the recommended PA level (30 minutes of MPA/day in bouts of 10

min) using the MPA cut point of 2020

counts/min [
11] and that 69% exceeded this level using our established cut point of 612

counts/min. If our findings are valid, comparison of PA level between different populations using the same cut points is problematic and may systematically bias the results toward an underestimation in the middle-aged, older, overweight, and obese subjects.
4.1. Strengths and Weaknesses
The main strength of the present study was the comparison of three different statistical methods for determining the cut points from the accelerometer data. Further, we believe the inclusion of a sample varying in age and obesity status strengthens the validity of our results, although we did not include a young normal-weight comparison group. Importantly, our sample contrasts earlier studies performed in young normal-weight subjects and advances our understanding of PA measurements using accelerometry.
One weakness of the present study may have been the protocol used for establishing the individual resting oxygen consumption. Compared to the findings of Byrne et al. [
18], the 1 MET values in the present study seem to have been overestimated. By applying the suggested regression equation from that study and correcting for sitting (multiplying by a factor of 1.08) in our sample, MET values of 2.60 and 2.31

mL/kg/min would be predicted for men and women, respectively. These values are lower than our corresponding findings of 3.26 and 2.94

mL/kg/min, which is likely partly due to the fewer number of restrictions that we placed on subjects prior to testing and the relatively short duration of the resting protocol (10 minutes). This effect may have led to overestimation of our 1 MET values and our suggested cut points. A second weakness may be that the GT1M accelerometer has been replaced with new tri-axial accelerometers (GT3X/GT3X+), which may reduce the interest for our findings. However, cut points established in earlier accelerometer models are valid for use with the vertical axis of the new accelerometers, and only one study has to date published vector magnitude cut points [
41]. A third weakness is that the study was not powered to explore sex differences. Although we did not aim to compare men and women, the analysis indicated that differences between the sexes may exist (not shown), which could have been revealed by a balanced or larger sample size. Previous studies have reported similar cut points in men and women [
3–
5,
7]. However, those studies used a standardized MET value as the reference for metabolic cost. The difference in resting metabolic rate between men and women in the present study challenges the use of common cut points, as lower resting metabolic rate in the women causes somewhat lower cut points (481 and 4717

counts/min for MPA and VPA, resp.) than in the male group (1067 and 5314

counts/min for MPA and VPA, resp.) (calculated with the MIX approach). Because 74% of the subjects in the present study were women, it should be kept in mind that the cut points reported might pertain to women more than men. However, cut points for the mixed group of men and women are reported so that the results are comparable with previous studies.
Further research should determine intensity cut points in a larger sample of men and women who vary in their age and degree of obesity. Additional studies of this type should also be performed using the linear mixed model regression procedure. Finally, calibration studies should be continuously performed as new makes and models of the accelerometer are brought into use.