|Home | About | Journals | Submit | Contact Us | Français|
Most adults do not achieve adequate physical activity. Despite the potential benefits of worksite health promotion, no previous comprehensive meta-analysis has summarized health and physical activity behavior outcomes from these programs. This comprehensive meta-analysis integrated the extant wide range of worksite physical activity intervention research.
Extensive searching located published and unpublished intervention studies reported from 1969 through 2007. Results were coded from primary studies. Random-effects meta-analytic procedures, including moderator analyses, were completed in 2008.
Effects on most variables were substantially heterogeneous because diverse studies were included. Standardized mean difference (d) effect sizes were synthesized across approximately 38,231 subjects. Significantly positive effects were observed for physical activity behavior (0.21), fitness (0.57), lipids (0.13), anthropometric measures (0.08), work attendance (0.19), and job stress (0.33). The significant effect size for diabetes risk (0.98) is more tentative given small sample sizes. Significant heterogeneity documents intervention effects varied across studies. The mean effect size for fitness corresponds to a difference between treatment minus control subjects' means on V02max of 3.5 mL/kg/min; for lipids, −0.2 on total cholesterol:HDL; and for diabetes risk, −12.6 mg/dL on fasting glucose.
These findings document that some workplace physical activity interventions can improve both health and important worksite outcomes. Effects were variable for most outcomes, reflecting the diversity of primary studies. Future primary research should compare interventions to confirm causal relationships and further explore heterogeneity.
Although strong evidence shows that exercisers are healthier than nonexercisers, most adults do not perform enough physical activity to achieve health and well-being benefits.1 Workplaces may implement physical programs in hopes of keeping workers healthy and reducing healthcare costs.2 Since employed adults spend about half of their workday waking hours at workplaces, offering physical activity programs at work may be an efficient strategy to increase physical actvity.3–5 Convenience, group support, existing patterns of formal and informal communication among employees in a worksite, and possible corporate behavior norms are potential advantages of worksite programs over other approaches.6–8 Workplace programs may be especially important because the imbalance between physical activity and energy intake at work may contribute to the obesity epidemic.4 This meta-analysis addresses the need to quantitatively synthesize the rapidly growing literature reporting workplace physical activity programs.
Despite the potential health and economic benefits of worksite health promotion,2 no previous comprehensive meta-analysis has summarized health and physical activity behavior outcomes from these programs. Several previous narrative reviews were limited in scope and unable to address either the magnitude of outcomes or potential workplace moderators of outcomes,4,5,9,10 The broadest narrative review was conducted using studies published before 1995.11 Two previous meta-analyses addressed physical activity behavior outcomes across some studies included in this project. One 1998 meta-analysis of 26 studies reported an effect size consistent with a standardized mean difference of 0.22, which was not significantly different from 0. The authors noted their attempted moderator analyses suffered from inadequate statistical power.3 A 1996 meta-analysis synthesized data for diverse adults and reported a workplace effect size consistent with a standardized mean difference of 0.35.12
This meta-analysis moves beyond the previous reported quantitative syntheses by greatly expanding the search strategies to ensure a more comprehensive synthesis, addressing both physical activity behavior and health outcomes, examining work-related outcomes, and conducting exploratory moderator analyses. The research questions were:
Standard strategies for quantitative systematic reviews were used to locate and secure potential primary studies, determine eligibility, extract data from research reports, meta-analyze primary study results, and interpret findings.
A comprehensive search was completed using multiple strategies to move beyond previous reviews and limit bias.13 An experienced health sciences reference librarian used broad search terms in 11 computerized databases (e.g., MEDLINE, PsychINFO, EMBASE, Cochrane Controlled Trials Register, Dissertation Abstracts International). Multiple research registers were examined, including National Institutes of Health Computer Retrieval of Information on Scientific Projects; Australian/New Zealand Clinical Trials Registry; and mRCT, which has 14 active registers and 16 archived registers. Computerized database searches on principal investigators of funded studies and on the first three authors of eligible primary studies were completed. Hand searches were conducted in 114 journals. Ancestry searches were completed on previous reviews and eligible studies. These comprehensive search strategies yielded 7,251 papers, reports, and reviews that were examined to locate eligible primary studies.
Primary studies of interventions to increase physical activity that were reported in English between 1969 and late 2007 were included. Reports with adequate data to calculate an effect size for at least 3 subjects were included. Studies focused on chronically ill workers were excluded. Published and unpublished studies were eligible because syntheses using only published studies may overestimate the effect size.14 Small-sample studies, which often lack statistical power to detect treatment effects, were included because they may report on novel interventions or may include difficult-to-recruit subjects.14
Studies with varied designs were included. Randomized controlled trials may be especially difficult to implement at worksites because of employee resistance to randomization and potential contamination among workers with extensive contact.11 Some pre-experimental studies compare programs developed at workplaces. Some investigators find it unethical to withhold treatment when interventions are thought to be beneficial.15 Separate analyses were conducted for single-group and two-group comparisons. A richer variety of interventions and samples were included by using unpublished reports, small-sample studies, and pre-experimental research.
A coding frame to record primary study characteristics and results was developed, pilot tested, and refined. Workplace characteristics of company size, inclusion of multiple companies in the study, and profit versus nonprofit status were coded. The extent of worksite involvement in the intervention was coded in two ways: whether the interventionist was a workplace employee and if the worksite designed the intervention. Other data coded included whether interventions were delivered during employees' paid time, if data were collected at the workplace, if interventions included fitness facilities at worksites, and whether some form of organizational policy change occurred in association with interventions. Interventions could include motivational/educational sessions and/or supervised exercise sessions.
A priori lists of outcome measures were used to select among multiple possible measures reported in primary studies, as a way of avoiding coder or author bias. For example, if studies presented both objective ergometer (step-counter) measures and self-reports of physical activity, the ergometer values were coded. Physical activity behavior was recorded only if the study clearly measured physical activity behavior separate from any interventionist-supervised exercise. Fitness was coded as oxygen consumption (V02max). Lipid measures included total cholesterol, high-density lipoproteins, or the ratio of total cholesterol to high-density lipoproteins. Body mass index (BMI), weight, abdominal girth, and percent body fat were coded for anthropometric measures. Both quality of life and mood (e.g., depression, anxiety) were assessed with self-report measures. Diabetes risk was measured as fasting glucose or insulin levels. Work attendance and health services utilization measures were derived from company records. Job satisfaction and stress were coded from self-report instruments. The data reported most distal from completion of the intervention were recorded, because persistence of intervention effects is most important for long-term benefits to health. To ensure analysis of only independent samples, author lists were crosschecked to locate reports that might contain overlapping samples. When possible, multiple papers describing the same study were used to code comprehensive data. Coding was not masked because evidence indicates it does not decrease bias.16 To enhance coding reliability, two extensively trained coders independently extracted all data. A third PhD prepared coder provided validation of effect size data. The first author or another member of the research team resolved any coding discrepancies.
Data calculations were handled by standard meta-analytic approaches using standardized mean difference (d) effect size weighted by inverse of variance. Exploratory moderator analyses were conducted among two-group post-intervention comparisons. Many potential moderators could not be analyzed because too few studies reported the necessary information (e.g., company focus, such as manufacturing). Analysis details are available from the authors.
Approximately 38,231 subjects participated in the studies included in the meta-analysis (k=206 comparisons, s=138 reports).17–155 Independent two-group post-test effect sizes included data from 24,520 subjects (k=94, s=71); two-group pre–post effect sizes, from 14,630 subjects (k= 80, s=59); and pre–post treatment group comparisons, from 22,413 subjects (k=192, s=125). Sample sizes varied dramatically from 12 to 5,038 subjects.78,155 Multiple treatment groups were common: 34, 10, 3, and 1 paper(s) reported on two, three, four, and six treatment groups, respectively. Twelve unpublished dissertations and one unpublished presentation paper were included. Many studies reported funding (s=59). One report was disseminated before 1970, 5 in the 1970s, 35 in the 1980s, 49 in the 1990s, and 48 were disseminated after 2000. The earliest study was reported in 1969 and the most recent study in 2007. Analyses were completed in 2008.
Among the studies that reported details about worksites, 55 were for-profit and 50 were not-for-profit companies. Most papers did not report company size (s=80). Among the papers reporting this information, the vast majority were large companies (at least 750 employees), with only five described as small (fewer than 100 employees). Most studies were conducted in single companies at one location (s=87), 17 used multiple locations of one company, and 23 conducted studies at multiple companies. The most common types of companies were education/health services (s=37), government (s=32), and manufacturing (s=17). Few studies reported whether study data were collected at the worksite; among those providing this information, 51 collected data at the workplace and 14 did not. Interventions were more often delivered at the workplace (s=51) than in other locations (s=21). Nearly all of the studies recruited subjects at the worksite (s=121). Only 32 papers reported that interventions were delivered during employees' paid time. Most studies used interventionists employed by the research project (s=101) instead of workplace employees. Only six studies reported including an organizational-level policy change, such as providing free or reduced memberships to fitness centers not located at the worksite. Twenty-six studies involved workplace employees in designing interventions. Thirty-eight papers reported on interventions that included fitness facilities at the worksite. Supervised exercise was used in 27% of the studies while 80% employed motivational/educational sessions. Further details about interventions are found in Table 1.
Visual and statistical assessment of funnel-plot asymmetry, as indicators of possible publication bias, suggested substantial evidence of asymmetry for physical activity, fitness, lipids, and diabetes risk, especially for single-group comparisons. Evidence of asymmetry was weaker but still notable for anthropometric measures and mood. Due to the relatively few effect sizes on quality of life, health services utilization, work attendance, job stress, and job satisfaction, evidence for or against funnel-plot asymmetry was inconclusive for these variables.
Table 2 presents the overall effects of interventions on physical activity, health, and well-being outcomes. The findings should be interpreted with caution given the small number of studies or subjects for some outcomes. For physical activity behavior, the mean overall effect at post-test comparison in two-group studies was 0.21. The two-group pre–post effect and treatment group pre–post comparisons were of comparable magnitude. The Common Language Effect Size (CLES) of 0.56 for the two-group post-test effect size indicates that 56% of the time a random treatment subject would have a higher physical activity score than a random control subject (all CLES values reported are based on a random-effects mean effect size for two-group post-test comparisons). To enhance interpretability, mean physical activity effect sizes were transformed to steps/day using means and standard deviations from appropriate reference groups. For two-group post-test comparisons, the raw mean difference was 612, which corresponds to a final steps/day mean of 8,869 for treatment subjects versus 8,257 for control subjects. The homogeneity test and estimated between-studies standard deviation (Q and δ in Table 2) demonstrated significant heterogeneity for all physical activity behavior comparison types. The I2 value (Table 2), the percentage of total variation among studies' observed effect sizes that is due to heterogeneity rather than sampling of participants, also documents significant heterogeneity.
Fitness outcomes also were significantly better among treatment than control subjects, and better at post-test when treatment subjects' pre- and post-intervention scores were compared. Mean effect sizes ranged from 0.47 to 0.57 (CLES=0.66). As with steps/day for physical activity, the mean effect size on fitness was transformed to maximal oxygen consumption (V02max). For two-group comparisons, the raw mean difference was 3.5, which corresponds to, for example, a final V02max mean of 37.7 mL/kg/min for treatment subjects versus 34.2 mL/kg/min for control subjects. Fitness effect sizes were significantly heterogeneous which indicates some studies found significantly better fitness outcomes than other studies.
Diabetes risk was significantly reduced by interventions. Mean effect sizes for the two-group comparisons were 0.90 to 0.98 (CLES=0.76). For two-group studies, the calculated raw mean difference was −12.6, which corresponding to a post-intervention fasting glucose mean of 81.0 mg/dL for treatment subjects versus 93.6 mg/dL for control subjects. Both mean values are within the range considered normal fasting glucose levels. Diabetes risk effect sizes exhibited significant substantial heterogeneity. Diabetes risk findings should be considered tentative given the small number of studies that reported this variable (k=6).
Lipid and anthropometric effect sizes were more modest but positive, indicating better scores following interventions among treatment subjects. Lipids mean effect sizes ranged from 0.12 to 0.17 (CLES=0.54). In terms of the ratio of total cholesterol to HDL, the raw mean difference was −0.2, such as from a mean post-intervention ratio of 4.6 for treatment versus 4.8 for control. All of the lipids effect sizes were significantly heterogeneous. Anthropometric mean effect sizes for treatment subjects varied from 0.07 to 0.13 (CLES=0.52). For the two-group comparison in terms of BMI, the raw mean difference was −0.3, which would occur if the post-intervention BMI mean were 25.0 for treatment versus 25.3 for control. Anthropometric effect sizes were significantly heterogeneous, except the two-group pre–post comparisons.
Mean effect sizes for both quality of life (0.23) and mood (0.13) two-group comparisons were positive, indicating better outcomes among treatment subjects, but these did not reach statistical significance. Effect sizes for two-group pre–post and pre–post effects were significant with improved quality of life and mood scores following interventions. Most of the quality of life and mood effect sizes exhibited significant heterogeneity.
Estimates and tests for work-related outcomes are reported in Table 3. The two-group post-test comparison of work attendance documented that, on average, treatment subjects had lower mean absenteeism than control subjects (effect size=0.19, CLES=0.55). Although the direction of the effect was similar, mean effect sizes were smaller for both two-group pre–post effects and treatment group pre–post comparisons. Job stress was significantly lower at follow-up among treatment subjects than control subjects (effect size=0.33, CLES=0.59). Job stress effect sizes were positive for other comparisons but were not significant. Job satisfaction was significantly greater following interventions among treatment subjects than controls in the two-group pre–post effect analysis (effect size=0.20, CLES=0.54), but similar findings did not achieve statistical significance for the two-group post-test analysis. Effect sizes for most comparison types on most outcomes were significantly heterogeneous, as documented by Q, estimated between-studies standard deviations, and I2 values.
Healthcare utilization two-group post-test analyses revealed significantly higher healthcare utilization among treatment subjects than among control subjects (effect size= −0.17,CLES=0.45). The two-group pre–post effect estimate was of similar magnitude (−0.18) but not significant. The pre–post comparison for treatment subjects revealed no utilization differences. Healthcare effect sizes were more homogeneous than most other variables in the project. Findings regarding job stress, job satisfaction, and healthcare utilization should be viewed as tentative given the small numbers of studies which reported these variables (k in Table 3).
Analyses of potential workplace moderators were conducted for variables with sufficient cases: physical activity behavior, fitness, lipids, and anthropometric variables. Dichotomous moderator results are presented in Table 4. Profit versus nonprofit company status was not significantly linked with mean effect size for any variable (QB in Table 4). Neither company size nor whether multiple companies were included in the study were significant moderators of mean effect sizes on physical activity behavior, fitness, lipids, or anthropometric outcomes. Three-level moderator analyses were conducted for numbers of companies and locations (results available from first author): The only significant effect was for anthropometric effect size, with significantly higher mean effect size for interventions conducted in one multi-location company (0.22) than in other combinations of numbers of companies and locations (both 0.04).
Intervention delivery at the worksite or elsewhere was significant only for anthropometric effect sizes, such that interventions delivered at workplaces yielded a larger mean effect size (0.17) than those delivered elsewhere (0.05). Whether employees received interventions on company paid time was significant for two of the four outcomes: Studies with employees paid during intervention reported larger mean effect sizes than those with employees receiving interventions outside company paid time on both fitness (0.92 vs 0.49) and anthropometric measures (0.22 vs 0.02). Interventions with employee interventionists were more effective than those with others as interventionists for fitness (1.03 vs 0.50), lipids (0.59 vs 0.09), and anthropometric measures (0.32 vs 0.05). Workplace participation in designing the interventions, as compared to interventions designed by people not employed by the worksite, was significant for fitness (1.18 vs 0.49) and anthropometric outcomes (0.22 vs 0.06) but not for lipids or physical activity behavior. Neither recruitment nor data collection location (workplace versus elsewhere) were related to variables with adequate data for moderator analyses.
The presence of a fitness facility onsite in the workplace did not affect mean effect sizes on fitness or physical activity behavior. Studies with onsite fitness facilities reported larger mean effect sizes on lipids (0.32) than studies without such facilities (0.07). Anthropometric outcomes also yielded larger mean effect sizes among studies with onsite facilities (0.24) than those without facilities (0.05). Organizational policy change could be analyzed for lipids and anthropometric outcomes only. Lipid effect sizes were unrelated to policy changes while anthropometric outcomes yielded significantly larger mean effect sizes in studies with policy changes (0.24) than those without policy changes (0.03). Whereas for physical activity behavior, fitness, and lipids nearly all moderators left significant residual heterogeneity (QW in Table 4), all but two moderators left nonsignificant residual heterogeneity for anthropometric outcomes. Results of exploratory multiple moderator analyses are available from the corresponding author.
These findings document that some interventions improve physical activity in some subjects, and these changes may in turn improve selected health outcomes, work culture, and job stress. However, significant heterogeneity requires cautious interpretation of findings.
The physical activity mean effect size of 0.21 is similar to that reported in 26 worksite studies (r=0.11, d=0.22)3 and smaller than the effect size reported of 33 workplace studies (r=0.17, d=0.35).12 This might reflect more comprehensive searching that could have located more studies with small effect sizes. Previous workplace quantitative syntheses have not addressed health, well-being, or work-related outcomes of improved physical activity; so the present study therefore constitutes the first published report of the impact of physical activity interventions on these variables. This meta-analysis moved beyond previously reported syntheses by comprehensively searching to obtain far more studies, separating effect sizes for one- and two-group designs, and conducting moderator analyses on two-group studies.3 The results of single-moderator analyses should be interpreted cautiously given the potential for confounding of moderators.
Improvement in fitness was documented with an effect size of 0.57.[CV2]The magnitude of physical activity, fitness, and health benefits appear modest, and it is unclear if the physical activity dose was sufficient to improve health to meet public health goals.5
This meta-analysis was limited by the number of studies located with sufficient data to calculate effect sizes and substantial heterogeneity among studies. Physical activity interventions varied widely, as did methods for assessing some variables. For example, physical activity was rarely objectively measured, leading to difficulties in comparisons across interventions.
Although findings on improved work attendance, job satisfaction, and job stress were mixed, this study suggests that some physical activity programs are effective beyond direct health benefits. Even modest reductions in absenteeism may result in substantial fiscal savings when multiplied by many employees. The findings regarding health utilization should be interpreted cautiously given the very small sample size and the inadequate time between interventions and utilization measurement among these studies. Some programs may have conducted health screening prior to encouraging subjects to begin exercising, which might have prompted needed health care.156 Longer follow-up studies could determine the enduring economic impact of programs.
Well-designed studies evaluating worksite physical activity promotion programs are needed. Direct comparisons between programs that allow employees to participate on paid work time versus those that do not should be investigated. Also necessary are direct comparisons of programs with and without worksite fitness facilities to determine whether the cost of providing onsite facilities is justified by improvements in employee health and productivity. Investigations targeting at-risk subjects would determine whether interventions need to be tailored to specific subgroups of employees. Investigations should also examine the impact of interventions on important worksite-related outcomes that influence worker productivity including absenteeism, stress levels, and job satisfaction.
Financial support provided by a grant from the National Institutes of Health (R01NR009656) to Vicki Conn, principal investigator. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
No financial disclosures were reported by the authors of this paper.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.