The absolute PA question had better test-retest reliability than the relative PA question. Paradoxically, evidence for convergent validity was stronger for relative PA compared to absolute PA. For both questions, results indicated evidence for discriminant validity. The relative PA question had the best combination of test-retest reliability, convergent validity and discriminant validity. Specifically, there was moderate agreement when this question was re-administered seven days later, fair to moderate or good associations when compared with indicators of physical function, and little to no associations when compared with measures hypothesized to be
theoretically not related to PA. Although we were unable to evaluate the five-level form of the relative PA question, a previous study examining the validity of a similar question from the National Health Interview Survey (NHIS) found that very little was gained with the 5-level question compared to the 3-level question [
12].
Indicators of physical function, often referred to as
indirect measures of PA, have not been commonly used to evaluate the convergent validity of single-item PA questions in older adults, despite recommendations for their use [
16,
17]. One study, evaluating two different PA questions in older adults, examined convergent validity against indicators of health (i.e., health conditions such as heart attack, stroke, and diabetes). This study did not report any type of validity coefficients, making comparisons with our findings difficult [
9].
Two other studies that evaluated an additional four PA questions in populations of older adults, examined validity by comparing questions with summary measures from PA recall questionnaires. In the first study, a PA question designed to be used as a screening question in primary care was evaluated in a population of older women [
11]. This question, "As a rule, do you do at least half an hour of moderate or vigorous exercise (such as walking or sport) on five or more days of the week?", was compared to two summary scores from the New Zealand Physical Activity Questionnaire - Long Form. Results indicated moderate agreement (κ = 0.46 to 0.56). In the second study, three PA questions from the NHIS (job-related activity, main daily activity, and activity compared to peers) were compared with summary measures from a detailed PA question set [
12]. The main daily activity question asked, "How much hard physical work is required in your main daily activity? Would you say a great deal, a moderate amount, a little, or none?" The activity compared to peers question, "Would you say that you are physically more active, less active, or about as active compared to other persons you age?", was also expanded to a 5-level question with the following response options:
a lot more, a little more, about the same, a little less, a lot less. For participants 65 years of age or older, correlation coefficients ranged from 0.17 to 0.21 for the main daily activity question and from 0.24 to 0.28 for the activity compared to peers question. The validity results from the present study, in particular for the relative PA question, have been similar or better than previous studies of single-item PA questions in older adults.
At least two studies have evaluated test-retest reliability of single-item PA questions in older populations. In the first study, researchers found intraclass correlation coefficients (ICCs) ranging from 0.75 to 0.80 for two PA questions that asked regular exercisers about their frequency and intensity of activity [
9]. Another study evaluated the test-retest reliability of three different PA questions (work PA, strenuous PA, and moderate PA) in a sample of participants from the Canadian Mulitcentre Osteoporosis Study [
10]. The kappa statistic was 0.57 (0.47 to 0.68) for the strenuous PA question and 0.30 (0.23 to 0.37) for the moderate PA question.
Reliability results achieved in the present study for the relative PA question were similar or better than those reported by Nadalin et al. [
10] but worse than those reported by Davis et al. [
9]. Comparing the results in the present study to those reported by Davis et al. [
9] is also problematic, since the PA questions evaluated in that study were only posed to participants who had already reported engaging in regular exercise.
Indicators of physical function have been used to evaluate the convergent validity of many PA recall questionnaires designed for older adults. For a number of the most well-known questionnaires, evidence for convergent validity is not substantially stronger than that obtained in this study; in fact, in some instances, the relative PA question evaluated in this study, performed better. For example, correlations between summary scores from the Community Healthy Activities Model Program for Seniors (CHAMPS) Physical Activity Questionnaire and various measures of physical functioning ranged between 0.10 and 0.54 [
16,
18-
20]. For the CHAMPS Physical Activity Questionnaire and the Yale Physical Activity Survey, test-retest reliability was evaluated over a similar interval to this study (one to two weeks), and ICCs ranged from 0.55 to 0.79 [
18,
19,
21].
The intent of both the absolute and the relative PA questions was to quickly and easily classify older adults by their activity level. Since specific details related to frequency, duration and intensity are not referenced within the relative PA question, this question will remain accurate for assessment even when PA recommendations for older adults are revised, such as was done in the United States in 2007 [
22] and in Canada in 2011[
23]. The relative PA question may also be less prone to recall errors, compared to the absolute PA question, since participants do not need to remember the duration or frequency of their typically performed activities.
It is known that in general, people tend to over-report PA levels [
24]. In the self-reported health literature, it was noted that with increasing age, people tended to overestimate their health when comparing themselves to others or alternatively, they underestimated the health of others [
14]. Thus, it is plausible that the participants in this study may have overestimated their PA, and perhaps to a greater extent when responding to the relative PA question. This should be kept in mind when interpreting the results of this study and when considering the merits of measuring PA using an absolute or a relative question.
Other limitations exist for the present study. Participants included in this study were Canadian veterans of World War II or the Korean War and their caregivers, a highly selected group of older adults. In addition, some of the validation measures were only available on participants in the study who had reported at least one modifiable fall risk factor. The Project to Prevent Falls in Veterans (PPFV) began as a randomly selected sample; however, only 13% of the original participants were included as part of the risk factor modification trial and a smaller percentage completed the second clinical assessment and the final telephone interview. As a result, it is likely that the participants included in this study are different than the general population of older Canadian veterans and their caregivers. Caution should be taken in generalizing results from our study to populations that may differ clinically and demographically.
The present analyses were done because we had data that allowed us to do these comparisons, but were not part of a validation study planned a priori. It is therefore possible that the modest validity correlations achieved may be partially due to the measures selected for validation. Since there is no widely accepted criterion of PA [
24], we chose to evaluate the convergent validity of two single-item PA questions, by comparing them with indicators of physical functioning. We recognize, however, that capacity to perform PA does not equal actual performance. As a result, correlation coefficients indicating more than a moderate association may not be possible when using indicators of physical functioning as validation measures. A related limitation is that while some of the indicators of physical function were objectively measured performance-based outcomes, others were measures of self-reported functional ability. Self-reported measures can be affected by factors such as cognitive impairment and guessing among older populations [
25]. Additionally, it would have been preferable if the indicators of physical functioning were measured at the same time as the PA questions. Even so, we hypothesize that any resulting bias is likely toward the null, indicating that correlations may have been stronger if these measures had been conducted closer in time.