The workgroup discussions summarized in this paper focused on: 1) recommendations for researchers, practitioners, and organizations to minimize errors with self-report questionnaires. 2) identifying information that questionnaire users should know and/or should ask to reduce measurement error; and 3) providing suggestions to enhance the use of questionnaire to assess physical activity behavior in research and practice settings‥
A conceptual framework for reducing errors with self-report assessment of physical activity is presented to summarize the workgroup discussions (). The framework identifies six steps of the physical activity assessment process among which efforts might be made to reduce measurement errors with self-report. The steps are: identifying the need to measure physical activity, selecting an instrument, collecting data, analyzing data, developing a summary score, and interpreting data. Underlying the first four steps are behavioral parameters that comprise the primary aspects to be assessed in physical activity questionnaires. The first four parameters pertain to physical activity: type, intensity, frequency, and duration, followed by the domain, and location of activity participation. A research team and/or investigator must have a thorough understanding of all steps in the framework to minimize errors related to self-reported physical activity assessment. As the workgroup endorsed the overarching focus of the framework, the findings are presented across the six steps of assessment process.
Step 1 - Identifying Need
The goal of this step is to reduce investigator error by having a clear understanding of the purpose of the questionnaire application, the setting where the behaviors occur, and knowing which behaviors should be assessed. Researchers should have a clear idea of the physical activity behavior focus of their study, whether they intend to relate the assessment with a particular outcome (s), the target population for assessment, and the characteristics of the physical activity environment in which the behavior can be measured
2.
The type of study performed and the study aims should dictate the questionnaire to be employed. Surveillance systems aim to identify what proportion of a population is meeting a set of parameters, most often national recommendations for health-enhancing physical activity. Intervention studies have a goal to determine the effectiveness of a particular strategy in increasing physical activity. Etiology studies seek to understand how physical activity behaviors modify physiological mechanisms involved in the diseases process, or in altering the risks for developing incident diseases or adverse health outcomes, such as diabetes or disabilities. Screening is used to triage persons in the clinical setting, for eligibility for research studies, or to identify persons who need specific interventions to modify their physical activity behaviors. Specific recommendations for identifying the need to assess physical activity behaviors were: (a) matching the questionnaire format with a viable and valid interpretation of the resulting summary score of that questionnaire, and (b) being clear about the types of behaviors measured and as they are expected to relate to the outcome measures of interest. Appropriate questions to ask are: Is the questionnaire needed for surveillance, interventions, or to uncover associations between physical activity and disease? Who is the population to be measured? Are there characteristics of the environment that need to be considered, such as weather or location?
Step 2 - Instrument Selection
The goal of this step is to reduce errors made by the investigator and interviewers by assuring the questionnaire is appropriate for the abilities and interests of the target population. Most importantly, is there evidence that the questionnaire is a valid assessment instrument for the population of interest? In selecting an instrument, researchers should consider factors that may influence the ability of the respondents to answer the questions and the relevance of the questionnaire to the study question.
Prior to use, every physical activity questionnaire should be evaluated for its psychometric properties. This includes selecting a questionnaire that supports the interpretation sought by the study (i.e., making proper inferences in light of the validity of the questionnaire). This evaluation includes an assessment of existing validity evidence to determine the constructs of physical activity that the questionnaire captures (construct validity evidence), how the questionnaire information converges with other sources of data shown to reflect different types of physical activity (convergence validity evidence), how well the types of activities included in a questionnaire reflect the types of physical activity of interest (content validity evidence)
3, and if a questionnaire is able to capture changes in physical activity behaviors over time, also referred to as sensitivity to change
4, 5. The reliability evidence, defined as the consistency of a measurement from one time to another, also must be known prior to selecting a physical activity questionnaire as it also influences accuracy of a score
6. The questionnaire or instrument reliability should not be confused with the stability of respondents’ physical activity behaviors.
A questionnaire should be appropriate for the survey population and specific for the behaviors assessed. Workgroup members proposed two ways to identify appropriate questionnaires to assess physical activity behaviors identified for a study purpose: (a) item banks and (b) questionnaire clearinghouses. An item bank is a particularly attractive idea as researchers could select from a collection of questions organized by the content of each item as well as the measurement characteristics of the item, such as validity, reliability, recall period, cognitive difficulty, and sharing the same scale. An item bank may be useful for computerized administration of self-report instruments inserted into a decision tree format based on a respondent’s reported physical activity behaviors. If the questions in an item bank are independently validated, a bank of items may be created for use in specific settings or to answer specific questions, such as a desire to identify the types of behaviors associated with increased obesity risks. An example of an item bank for chronic disease is the PROMIS network designed by the National Institutes of Health for research about patient–reported health status involving physical, mental, and social well–being
7. In contrast, questionnaire clearinghouses may provide validated and reliable self-report instruments tailored for specific populations and settings. Clearinghouses may serve to prevent the unneeded development of new questionnaires to assess physical activity and encourage researchers to evaluate the psychometrics of existing questionnaires for any study purpose. The Physical Activity Resource Center for Public Health provides a clearinghouse for physical activity instruments and intervention
8. A major limitation of a clearinghouse of existing questionnaires is that comparing scores across different questionnaires is difficult unless score equivalencies are established, an issue that has yet to be solved.
Questionnaires are used in a variety of study settings. Questionnaires used for screening should be short, focused on the types of behaviors to be assessed, and correctly and efficiently classify persons at varying levels of activity, or, perhaps the time they spent in specific types physical activities. The workgroup members recommended that surveillance questionnaires should reflect the types of behaviors to be tracked, include information that maximized understanding of the questionnaire, and if the behaviors can be tracked using short, easily recalled questionnaires, ask short questions and focus on activities that are easy to recall (i.e., vigorous intensity and structured activities), allow respondents to report individual activities performed as opposed to grouping types or intensities of activities together into one question, and to use words that best capture behaviors of interest. The questionnaires also should have introductions that maximize an understanding of a questionnaire’s objectives and also captivates the respondent’s interest and commitment to answer the entire questionnaire.
Questionnaires used in intervention studies need to be able to characterize behaviors that differentiate persons with and without a particular health outcome. They also should be sensitive to changes in physical activity behaviors, although few questionnaires are truly able to do so
9, 10. The workgroup members recommended more research is needed to identify questionnaires that are sensitive to change in behaviors. As it currently stands, many questionnaires are used for purposes that differ from what they were developed for and without validity evidence to support the inference made with the instrument. For example, the International Physical Activity Questionnaire (IPAQ) was developed for use in surveillance settings, but has been used to assess changes in physical activity behaviors without tests for its sensitivity to change
11.
Questionnaires should be culturally relevant and address the cognitive abilities of the respondent population (e.g., children). For persons with lower reading and numeracy levels, questions should avoid wording that combines multiple activities and/or behaviors into a single question that ask respondents to average time spent in multiple activities or across many days. Questionnaire items should relate to the types of physical activities that respondents’ value and perform. The lead investigator and interviewers should be familiar with the types of questionnaires created that will match their study needs with the respondent’s abilities to complete the questionnaire as designed. Because there is no questionnaire design that is ideal for all study populations, nor will there ever be an ideal questionnaire, use of a questionnaire/item bank of valid and reliable questionnaires/questions may be useful to reduce measurement error.
Specific recommendations for selecting a questionnaire were to: (1) match the questionnaire with the study purpose (e.g., ensure the instrument can validly and specifically correspond to the outcome(s) under investigation), (2) develop clearinghouses of valid and reliable items/questionnaires for use in various settings, and (3) assess and adapt questionnaires as needed for use in different cultures and languages using standard translation methods
12. Thus, appropriate questions to ask are: Is the questionnaire a good match for the participants relative to the types of activities they perform and find relevant? Is the questionnaire interviewer or self-administered? A questionnaire designed for interviewer administration may not have the same validity and reliability properties when it is self-administered. Are resources needed to use an interviewer-administered questionnaire? Has the questionnaire been subjected to a thorough evaluation of the psychometrics regarding the validity and reliability evidences in populations similar to the target population? Does the validity data support the interpretation that is sought by the study (i.e., making proper inferences based on the validity of the questionnaire)? What are the cognitive demands of the questionnaire relative to the recall of information? Does the literacy level of the participants match the literacy and cognitive demands of the questionnaire? Are the questions to be asked appropriate for the cultural experiences of the target population? Has the validity of the questionnaire changed following translation to a different language?
Step 3 - Data Collection
The goal of this step is to reduce reporting errors made by the participants and to assure that interviewers, if there are any, are effective and consistent in their questionnaire delivery methods. To assure fidelity at this step, researchers should identify elements of the data collection process that may increase measurement error. These include using the wrong time frame for the recall of physical activity, using recall prompts with jargon that respondents do not understand, providing inappropriate examples, and using incorrect modes of administration (e.g., using self-report methods when questionnaire instructions require interviewer-administered methods). Interviewers need to be trained and tested, and the consistency of their interviewing monitored.
These elements have the potential to increase inter- and intra-participant error that result in wide variability of summary scores within the group, and inconsistency of responses within an individual over repeated measures. The workshop participants noted that reducing error in the data collection step may be extremely difficult. In fact, some have argued that it may not be possible to reduce the error substantially and that self-reports are often regarded as being better suited for ranking individuals
13.
Although, workshop participants noted that measurement error associated with data collection is not an issue that is unique to the field of physical activity, they recommended that it was important to address some of the data collection issues to help reduce errors and improve the accuracy of self-report in estimating true behavior. One way questionnaire developers have tried to reduce the variability in data collection is to have trained interviewers administer the questionnaire using a standardized format, while clarifying for the respondent terminology or a questionnaire item’s intent when misunderstanding arises with a question. For example, an interviewer may be able to differentiate between light, moderate, and vigorous intensity physical activity if a respondent appears to be confused by the definitions provided in the questionnaire. Standardized wording and use of prompts may help respondents understand the intention of the questions. A prompt may be an example of an intensity of activity or an example of types of activities surveyed.
Many self-report questionnaires are used to identify if people meet national guidelines for physical activity
14 and Healthy People objectives
15 to increase moderate- and vigorous-intensity physical activity. Most of the validation data for questionnaires has been accumulated through correlating the questionnaire with a standard, such as an accelerometer or doubly labeled water. This only serves to indicate that the self-report provide a valid ranking; not that the questionnaire is able to assess true behaviors
16. The need to assess meeting a guideline increases the difficulty in recalling the frequency and duration of physical activity behaviors when the respondent is asked to consider simultaneously all activities that s/he performs. For example, when the goal is to assess if respondents are meeting physical activity recommendations/objectives that require a daily or weekly goal for a given activity intensity (i.e., 5×30 min/day or 150 min/week of moderate-intensity activities), they are challenged with the need to recall the frequency and duration of many behaviors at such selected intensities
17. As a potential solution, scoring algorithms can create an index to approximate the objective or guideline from a larger battery of questions, although the goal of using only one question for assessment will be lost.
Also, social desirability among respondents to be viewed as “physically active” may create over-reporting of physical activity behaviors, producing a potential “intensity bias” wherein respondents define intensities differently than what the investigators plan. Another form of intensity bias may occur when a questionnaire asks about intensities in absolute terms (i.e., using the same MET value for each activity) while respondents reply to intensities relative to their personal characteristics, such as sex, age, physiological capacities, and movement experiences. This bias provides an incomplete ascertainment of physical activity behaviors and makes it very difficult to compare results between accelerometery and self-report methods of physical activity
18.
Another concern in collecting data is how long is the questionnaire time frame in terms of past day, week, month, year, or a lifetime. Knowing that recall error increases with the duration of recall time and that prospective studies are time consuming and carry an extensive participant burden for multiple recalls, it is useful to know what is the shortest recall period to characterize usual physical activity behaviors analogous to studies that identify minimum wear times for pedometers
19 and accelerometers
20. Such an approach may be useful to answer the question of how far into the past a respondent can recall his or her physical activity and the number of assessments needed in a prospective study to obtain accurate and reliable responses to assess a stable physical activity exposure. The optimal methods to make these estimates are evolving
21. By knowing the minimal number of days needed to obtain a reliable estimate of physical activity, accuracy may be improved and the respondent recall burden lessened. This often requires one to monitor physical activity with records, logs, or previous day recalls. While some types of physical activities are routine and performed daily, such as self care activities, work, and selected household activities, others are intermittent and may depend on the day of the week, month, or season during a year
22. Also, the frequency, duration, intensity, and types of activities performed will likely vary by age, sex, culture, and geographic residence, making generalizations difficult to identify the minimum recall period to reflect usual physical activity behaviors of specific segments of a population.
Compounding the potential for measurement error is the challenge of recalling hours, minutes, and types of physical activity behaviors that are performed intermittently. Some workshop participants thought it may be better to just identify questionnaire items that provide an index of “moving about” as was done in the Yale Physical Activity Survey
23. Such an index may be more accurate in classifying physical activity levels as inactive, low active, active, highly active than in computing summary scores that rely on summing estimates of minutes at varying intensities
27. The Stanford Brief Activity Survey (SBAS)
24–26 uses a similar approach to classify patterns of activity at work and in leisure settings. However, since this method does not assess the minutes of physical activity performed, one cannot assess national activity goals and intervention aims based on a minutes/week goal.
A specific recommendation for data collection was to conduct more qualitative research and cognitive interviews
28 in diverse sex, age, race/ethnicity segments of the population to understand how respondents understand the physical activity concepts measured. Appropriate questions to ask are: What types of physical activities does the questionnaire contain? What are the recall time frames? Will the recall have prompts to encourage recall of general and specific activities? Is the questionnaire format interactive in providing feedback to the participant about their response or one-way without feedback?
Step 4 - Data Analysis
The goal of this step is to reduce error in data analysis by using correct scoring procedures and analytic methods. Elements of this step include applying statistical methods to identify and correct for measurement errors that occur in completing physical activity questionnaires, which may require the use of measurement error correction models and estimating attenuation factors, using information derived from the models
29.
The goal in the data analysis phase is to identify, reduce and control errors using some quantitative and qualitative methods as appropriate. A common method to reduce measurement error in dietary analyses is by using analytic error correction techniques. This method is originally based on the premise that nutrient intake is measured with error when using recall instruments, such as a 24-hour recall or a food frequency questionnaire. By conducting a validation study using criterion or reference measures (e.g., doubly-labeled water, dietary records) over the same period as the recall questionnaire, it is possible to use regression models to determine the sources of the recall error for selected nutrients and use this information to estimate attenuation factors that allow for the correction of these errors in assessing the relation between physical activity and the outcome of interest
30. In assessing diet or physical activity behaviors, when one uses a questionnaire to recall a nutrient or physical activity level, the impact of that exposure on a disease risk is generally believed to be attenuated by the recall error, although under certain circumstances the reverse could be true (i.e., errors could induce an association). Measurement error methods offer an approach to estimate in a quantitative way the magnitude of the various sources of error that may influence results of physical activity and health studies. This approach has been applied by Ferrari and colleagues
31 and described by Nusser
29 to quantify specific sources of measurement error and to estimate attenuation factors associated with measurement error in self-reports of physical activity. More work in this area is needed to further develop this methodology for physical activity studies.
If physical activity questionnaires are to be comparable between studies, they must use the correct scoring protocols. There are many physical activity questionnaires that can be accessed on the web, in books, and in publications. However, rarely are the scoring methods identified in sufficient detail to provide the user guidance in how to score the questionnaire as was done with the Collection of Physical Activity Questionnaires
32. Having a clearinghouse for physical activity questionnaires modeled after the National Human Genome Research Institute’s consensus measures for Phenotypes and Exposures (PhenX) (
www.phenX.org), the National Cancer Institute’s (NCI) Grid Enabled Measures Database (GEM) (
www.gem-beta.org), and the National Collaborative on Childhood Obesity Research (NCCOR) (
http://tools.nccor.org/measures/) would be useful. These websites provide detailed protocols, including scoring methods, for many measures used in research settings. The NCI also provides a list of 102 physical activity questionnaires and their related references, and details of 72 physical activity questionnaire validation studies on their website at
http://appliedreasearch.cancer.gov/tools/paq. An optimal website would include a list of questionnaires with downloadable materials to identify the questionnaire’s intended purpose, literacy requirements, detailed scoring protocols with available scoring syntax, recommended strategies to identify bias and correction methods for measurement error, how to best analyze the questionnaire data for different purposes, a list of references of the validity and reliability studies and also publications using the questionnaire. Also, having available existing SAS or SPSS codes needed to analyze physical activity questionnaire data, including application of the error correction models, would help to provide standardized questionnaire scores. A model for this resource is the SAS code provided by the National Cancer Institute used to score ActiGraph accelerometer data found in the National Health and Nutrition Examination Survey (
http://riskfactor.cancer.gov/tools/nhanes_pam/).
Specific recommendations to reduce data analysis error were to: (a) put more effort into developing standardized statistical methodologies to adjust for reporting errors in physical activity questionnaires, and (b) provide a resource website to provide access to and educate the correct ways to score physical activity questionnaires, identify and correct for measurement error, and provide guidance in preparing data for subsequent reporting of results. Appropriate questions to ask to reduce data analysis error are: How do I identify error in my data? And, how do I select and apply error correction models?
Step 5 - Summary Scores
A goal of this step is to select the correct summary score for the study aim and to score the questionnaire correctly which is critical to the success of a study. The summary score can be presented in various ways to include minutes or hours per day or week; MET-minutes or MET-hours per day or week; kcal; kcal per kg per day; points and so forth. A questionnaire might be used only for classification purposes as employed with the Behavioral Risk Factor Surveillance System
33 or the SBAS
24, especially to assess progress in meeting physical activity guidelines
34, 35. The summary score also can add to measurement error. If minutes are the raw units reported, transforming the units into MET-minutes (by multiplying the minutes reported by a MET value) may potentially increase the measurement error if the MET value applied to estimated intensity that differs from a MET value measured using indirect calorimetry.
To maintain the validity of a questionnaire, investigators must use the scoring methods developed for the questionnaire, ideally using a standardized physical activity intensity value. The Compendium of Physical Activities was developed to reduce the variability in assigning MET values between questionnaires
36, 37. Three versions of the Compendium have been published with the 2011 version using an evidence-based approach to report measured MET values from published sources
38 (
https://sites.google.com/site/compendiumofphysicalactivities). Also, error in estimation of time spent in different intensities can be reduced by using common MET cut-off points for absolute intensity ranges. For adults, these include sedentary (< 1.5 METs), light intensity (1.6–2.9 METs), moderate intensity (3.0–5.9 METs), and vigorous intensity (≥ 6.0 METs), recognizing that such a scale uses a healthy, younger adult as the reference standard
14, 39. Such practices allow for direct comparison of the questionnaire metrics between studies when absolute intensity is the desired frame of reference. On occasion the Compendium provides multiple MET values for the same activity. Investigators need to generate rationale for using the one they select.
Specific recommendations to reduce error in selecting summary scores were to: (a) use the scoring methods developed for the questionnaire, (b) use the same MET value for the same activities, and (c) use common age-appropriate cut-off points to assess relative intensity levels. Appropriate questions to ask is: What summary score do I need to address my study aims?
Step 6 – Data Interpretation
The goal of this step is to reduce error by interpreting the data correctly. The investigator selecting a questionnaire must have a clear understanding of how the summary scores will be applied to the outcome of interest and, in collaboration with the data analyst, must understand the implications for success of the study aims in using one analytic method over another. Since physical activity data are used in many settings, including linking with other behavioral data, such as dietary intake; assessing health status; identifying dose response relationships; determining the prevalence of meeting recommendations; and identifying behavior changes, investigators and data analysts also must understand how the analytic methods was selected and how uncontrolled measurement error, confounding and effect modification can influence the interpretation of study findings.
Specific recommendations to avoid error in interpreting the data were: (a) to have a clear idea how the questionnaire data will be used to answer the study aims, and (b) to know how the study framework can assist in making correct interpretations of study outcomes. Appropriate questions to ask are: Am I interpreting the physical activity scores correctly? And, is there unaccounted error that may influence the interpretation of the results?
The value and utility of self-report measures is currently misunderstood because many researchers have employed self-report incorrectly (e.g., employing a measure in an intervention context when the measure has not been demonstrated to be valid to detect change in behavior). The workgroup participant recommended that self-report measures have their place in the battery of assessments of PA but researchers need to carefully select the instrument that meet the purpose of their study and one that has validity evidence supporting the interpretation that they seek.