The literature search identified 46 trials that met the inclusion criteria, in which 61 different obesity prevention programs were evaluated (12 trials evaluated more than one prevention program and 3 prevention programs were evaluated in 2 trials), resulting in a total of 64 effect sizes for this review. Of these 64 prevention programs, 30 were universal and 34 were selected. The majority focused on both males and females (n = 48), but 14 focused solely on females and 2 focused solely on males. The majority of these interventions were school-based programs (84%). A total of 51 of the 64 prevention programs used random assignment to condition, of which 13% were randomized at the participant level, 2% were randomized at the group level, and 85% were randomized at the school level. Brief descriptions of the samples, program content, and intervention effects are provided in and for universal and selected prevention programs respectively. provides a flow chart showing the number of studies that were omitted because of the various exclusionary criteria.
Descriptions of the Sample, Intervention Content, and Findings from Universal Obesity Prevention Trials
Descriptions of the Sample, Intervention Content, and Findings from Selected Obesity Prevention Trials
Flow chart illustrating the number of articles omitted for the various exclusion criteria.
To assess inter-rater agreement between the two coders responsible for abstracting effect sizes and moderators, we calculated the inter-class correlation coefficient (ICC) for continuous variables and kappa (κ) coefficients for nominal variables (see ). The ICC coefficients ranged from a low of .95 (for the effect size estimates) to 1.0 (for 80% of the continuous variables examined in this report). The κ coefficients ranged from .87 (for whether nested data was modeled incorrectly) to 1.00 (for 75% of the nominal variables examined in this report). These analyses indicate that there was high inter-rater agreement.
Inter-Rater Agreement for all Variables Abstracted for the Present Meta-Analytic Review
and report the magnitude of effect sizes and provide the participant, intervention, delivery, and design features that were investigated as potential moderators of intervention effects. The effect sizes reflect analyses performed on the entire samples used in these studies, versus effect sizes for various subgroups such as the different genders, because such subgroup analyses were not consistently reported across trials.
Universal Programs – Moderator Values and Effect Sizes
Selected Programs – Moderator Values and Effect Sizes
Average Effect Size and Effect Size Heterogeneity
Analyses were conducted on the effect size for change in BMI in the intervention condition versus the control condition. Pearson’s r
’s were first converted to z
scores to avoid problematic standard error estimates (Hedges & Olkin, 1985
). The SPSS macro developed by Lipsey and Wilson (2001)
was then used to estimate the overall inverse variance weighted average effect size for random effects models. All mean values were computed using this method.
The average effect size across all studies was very small (r
= .04), but was significantly larger than zero (z
= 2.94, p
< .01). The r
’s for the effect sizes ranged from −.24 to .50. Only 13 of these interventions (1 of which was evaluated in 2 trials), or 21% of the 61 programs evaluated, found significant positive intervention effects based on an alpha level of .05 (Dwyer et al., 1983
; Eliakim et al., 2000
; Fitzgibbon et al., 2004
; Gutin & Owens, 1999
; Killen et al., 1988
; Lionis et al., 1991
; Manios, Moschandreas, Hatzis, & Kafatos, 2002
; Robinson, 1999
; Stice, Orjada, & Tristan, in press
; Stice & Ragan, 2002
; Stice, Shaw, Burton, & Wade, in press
; Tamir et al., 1990
). One intervention (Alexandrov et al., 1992
) reported a significant negative effect, which either represented a chance finding or an iatrogenic effect.
There was significant heterogeneity in effect sizes (Q = 204.41, p < .001), indicating that there was statistically meaningful variability across the effect sizes produced by the interventions (i.e., that effects were not equivalent across trials). The heterogeneity in the effects suggests that there may be participant, intervention, delivery, and design features that account for the variability in effect sizes.
Two moderators could not be examined because of severe restrictions in range; because only two studies used credible active control conditions and because we only located two unpublished reports, we did not consider type of control condition or publication status3
further. Two potential confounding variables were not examined because they did not show significant relations to effect sizes: preliminary univariate analyses indicated that length of follow-up (z
= 1.58, p
= .11, β = .18) and the age range of participants in the trials (z
= .80, p
= .42, β = .10) were not significantly related to effect size magnitude. Within this context, it should be noted that preliminary analyses also indicated that publication year, a variable commonly included in meta-analytic reviews, was not a significant predictor of effect size (z
= 1.44, p
= .15, β = .17).
We also conducted preliminary analyses to determine which operationalization of participant ethnicity and psychoeducational content to examine in the models. First, with regard to ethnicity, analyses indicated that neither the percent of the sample that was Black or Hispanic (z = .48, p = .63, β = .06) nor the predominant ethnic group in each sample (which was represented with a series of dummy-coded vectors) was significantly related to effect sizes. Ethnicity dummy variables representing Black (z = .43, p = .67, β = .05), Hispanic (z = .31, p = .76, β = .04), Asian and Pacific Islander (z = 1.51, p = .13, β = .18), and Native American (z = 1.53, p = .12, β = .18) were not statistically significant. We focused exclusively on the former operationalization for this report because the latter operationalization had some very small cell sizes (e.g., predominantly Native American participants) and necessitated the use of multiple dummy-coded vectors to represent this operationalization. Second, because the code representing whether interventions had only psychoeducational content was not systematically related to the effect sizes in a univariate model (z = −.38, p = .71, β = −.04); we limited our analyses to whether the intervention contained any psychoeducational content (which allowed us to use a parallel approach for all of our intervention content variables). Thus, although we initially coded 23 effect size moderators (see ), the moderator analyses focused on the 18 effect size moderators listed in .
Univariate Regression Models for Individual Moderators
Parental involvement was initially analyzed as a four level moderator with the following levels represented: no parental involvement, psychoeducational material provided to parents, parental attendance of sessions, and parental behavioral change. However, in preliminary analyses dummy-coded variables representing psychoeducational material (z = −1.14, p = .26, β = −.14), parental attendance of sessions (z = −.19, p = .85, β = −.02), and parental involvement (z = .38, p = .71, β = .05) were not statistically significant predictors of effect size. We therefore simplified this variable into a dichotomous variable (no parental involvement or psychoeducational material = 0; parental attendance or parental behavior change = 1) so that we could use a single dummy coded vector to represent this variable.
Random effects regression models, with inverse weighted variances, tested whether the putative moderators were related to observed effect sizes. Random effects models separate variance between effect sizes and variance attributable to individual studies. Inferentially, random effects models can be generalized to a broader set of studies or potential studies in contrast to fixed effects models that do not account for variance attributable to a particular study. Regression models were implemented using SPSS macros (Lipsey & Wilson, 2001
) for inverse variance weighted regression with maximum likelihood estimation. The correlations between the moderators are presented in .
Correlations Among the Putative Moderators of Weight Gain Intervention Effects
Moderators were first examined in separate univariate regression models to investigate the bi-variate relations between moderators and effect sizes that were not complicated by colinearity. The moderators that showed significant effects in the univariate models were then entered in a multivariate model to estimate the unique effect of each moderator controlling for the effects of the other moderators with significant effects. The five continuous moderators, average age, percent Black and Hispanic, intervention duration in hours, intervention duration in weeks, and number of behavioral targets, were standardized in a z
score format. We tested for linear and quadratic effects for the five continuous moderators, as statisticians recommend testing for such higher order effects to decrease the risk of model misspecification (Hosmer & Lemeshow, 2000
). For instance, it is possible that interventions targeting two health behaviors produce larger weight gain prevention effects on average than do interventions targeting fewer or more behavioral targets. Effect sizes were regressed on the linear and quadratic terms. If the quadratic effect was not significant, the quadratic term was removed from subsequent models to ensure that linear effects were not obscured by colinearity between the linear and quadratic terms. When the quadratic term was significant, both the linear and quadratic terms were retained for all subsequent models.
Among the five continuous moderators, the model for participant age was the only model in which the quadratic term was significant. Both the linear and quadratic age terms were significant (see ). As indicated in , larger effect sizes tended to emerge in trials involving children and adolescents, but smaller effect sizes occurred in trials involving preadolescents. To probe the form of this curvilinear pattern, we examined mean effect sizes for the three tertiles of age: interventions with an average less than or equal to 9.23 years exhibited effect sizes that were only marginally different from zero (M r = .03, p = .07, n = 21); interventions with an average age greater than 9.23 and less than or equal to11 did not exhibit effect significantly different from zero (M r = .01, p = .42, n = 23); and interventions with an average age greater than 11 were significantly different from zero (M r = .07, p < .05, n = 20).
Relation of the average age of participants to the weight gain intervention effect sizes.
Significantly larger effects were observed in female-only trials than were observed in mixed sex and male-only trials (see ). Follow-up analyses revealed that the average effect for programs focusing solely on females was significantly different from zero (M r = .13, p < .01, n = 14), whereas the average effect for programs targeting mixed sex samples and male-only samples was trivial and not significantly different from zero (M r = .02, p = .06, n = 50).
Intervention duration was examined as a function of hours and weeks. While there was not a significant effect for duration in hours, there was a significant negative effect for duration in weeks (see ). Interventions below the median of 16 weeks exhibited a mean effect size significantly greater than zero (M r = .06, p < .01, n = 31) in contrast to the interventions at or above the median of 16 weeks that were not significantly greater than zero (M r = .02, p = .15, n = 33).
The model for number of behavioral targets containing only a linear term had a significant negative coefficient (see ), indicating that effect size decreased as the number of non-weight related targets increased. Interventions that targeted only weight change exhibited effect sizes greater than zero (M r = .09, p < .001, n = 27) and interventions that targeted other behavioral changes in addition to weight change were not significantly different than zero (M r = .01, p = .47, n = 37).
Pilot trials of interventions exhibited significantly larger effect sizes than fully powered demonstration trials (see ). Follow-up analyses revealed that the average effect for pilot studies was significantly different from zero (M r = .14, p < .001, n = 18), whereas the average effect for interventions evaluated in demonstration trials were not significantly different from zero (M r = .02, p = .07, n = 46).
Trials that used a self-selected recruitment method resulted in significantly larger effect sizes than trials that used population-based recruitment methods (see ). Follow-up analyses showed that trials using self-selected recruitment were significantly greater than zero (M r = .14, p < .001, n = 16), whereas the average effect for trials using population-based recruitment were not significantly different from zero (M r = .02, p = .10, n = 48).
A multivariate model was estimated containing moderators that were significant predictors of effect size in previous univariate models: the linear and quadratic terms for age, participant gender, number of behavioral targets, duration in weeks, if the trial was a pilot study, and recruitment method. The linear term (z = −4.14, p < .001, β = −2.06) and the quadratic terms for age both showed significant univariate effects in this model (z = 4.56, p < .001, β = 2.36). The only other moderator that remained statistically significant in the multivariate model was self-selected recruitment (z = 2.07, p < .05, β = .30). Participant gender (z = −1.96, p = .05, β = −.33), duration in weeks (z = −.66, p = .51, β = −.08), number of behavioral targets (z = −.43, p = .67, β = −.06), and whether the trial was a pilot study (z = 1.59, p = .11, β = .19) did not show significant unique effects in the multivariate model. The R2 for the full model was .42.