|Home | About | Journals | Submit | Contact Us | Français|
This meta-analytic review summarizes obesity prevention programs and their effects and investigates participant, intervention, delivery, and design features associated with larger effects. A literature search identified 64 prevention programs seeking to produce weight gain prevention effects, of which 21% produced significant prevention effects that were typically pre to post effects. Larger effects emerged for programs targeting children and adolescents (versus preadolescents) and females, programs that were relatively brief, programs solely targeting weight control versus other health behaviors (e.g., smoking), programs evaluated in pilot trials, and programs wherein participants must self-select into the intervention. Other factors, including mandated improvements in diet and exercise, sedentary behavior reduction, delivery by trained interventionists, and parental involvement, were not associated with significantly larger effects.
Obesity in adulthood results in an increased risk for future death from all causes, coronary heart disease, atherosclerotic cerebrovascular disease, and colorectal cancer, as well as serious medical problems, including hyperlipidemia, hypertension, gallbladder disease, and diabetes mellitus (Calle, Thun, Petrelli, Rodriguez, & Heath, 1999). Obesity in childhood and adolescence is also associated with serious medical problems, including high blood pressure, adverse lipoprotein profiles, diabetes mellitus, atherosclerotic cerebrovascular disease, coronary heart disease, colorectal cancer, and death from all causes, as well as lower educational attainment and poverty (Dietz, 1998). The prevalence of obesity has increased sharply over the last three decades; currently 65% of adults are classified as overweight or obese (Hedley et al., 2004). The prevalence of obesity has risen even more sharply among adolescents and young adults (Hedley et al., 2004), which is alarming because obesity persists into adulthood for 70% of obese adolescents (Magarey, Daniels, Boulton, & Cockington, 2003). Obesity also carries a high fiscal cost; roughly $100 billion per year is spent on obesity-related health care (Wolf, 1998).
Unfortunately, successful treatments for obesity have been elusive. For adults, the current treatment of choice only results in about a 10% reduction in body weight and virtually all patients regain this weight within a few years of treatment (Jeffery et al., 2000). Obesity treatments for children and adolescents have yielded similar effects, though behavioral family-based interventions have produced more persistent weight loss effects (Epstein, Valoski, Wing, & McCurley, 1990; Flodmark, Ohlsson, Ryden, & Sveger, 1993). Compounding matters, only about 10% of obese children and adolescents seek weight loss treatment (e.g., French, Perry, Leon, & Fulkerson, 1994). Accordingly, much effort has been devoted to developing and evaluating obesity prevention programs, in the hope that this strategy will more effectively curb this pernicious public health problem.
Studies have evaluated four major types of interventions that were expected to produce weight gain prevention effects. These include: (a) multi-focus cardiovascular disease prevention programs that targeted obesity along with other risk factors for cardiovascular disease (e.g., hypertension and smoking), (b) prevention programs that focused solely on the prevention of obesity or weight gain, (c) interventions designed to solely increase physical activity, and (d) eating disorder prevention programs that promote use of healthy weight management skills.
Although numerous evaluations of weight gain prevention programs have been conducted, their results have not been comprehensively reviewed and analyzed with meta-analytic procedures. Several excellent narrative reviews exist (e.g., Dietz & Gortmaker, 2001; Schmitz & Jeffery, 2000; Story, 1999), but meta-analytic techniques were not used to empirically describe effect sizes or investigate potential moderators of intervention effects. One meta-analytic review has been published (Campbell, Waters, O’Meara, Kelly & Summerbell, 2003), but it used extensive exclusionary criteria that rendered it impossible to examine moderators of intervention effects (only 10 trials were included). Thus, the overarching goal of this paper is to address this important gap in the literature. The first aim of this review is to provide a summary of these prevention programs and their effects. The second aim is to examine participant, intervention, delivery, and design features that are associated with larger intervention effects. Given the heterogeneity in the effects from these interventions, it is important to systematically consider the moderators associated with interventions that produced the largest effects. The third aim is to discuss promising directions for future research in light of the findings from completed trials.
A unique feature of meta-analysis is that it permits an empirical examination of factors associated with variation in effect sizes. Elucidating factors that moderate prevention program effects is informative because it highlights aspects of the participants, intervention, program delivery, and research design that are associated with stronger intervention effects. This information should increase the yield of future prevention efforts by identifying the conditions under which optimal prevention effects occur. As well, this information might identify particular subgroups of individuals for whom alternative obesity prevention programs need to be developed. Analyses of moderators of intervention effects should also advance general theories regarding effective routes to alter maladaptive health behaviors and attitudes. Accordingly, we investigated several potential moderators of intervention effects that were selected based on theory, prior findings, and previous literature reviews.
Researchers have hypothesized that obesity prevention programs are more effective when they are delivered to middle school or high school students versus grade school students (Baranowski Cullen, Nicklas, Thompson, & Baranowski, 2002). Younger children may find it difficult to grasp the concepts and skills taught in the interventions. They may also be less likely to impact the food purchases made by adults (when eating at home or restaurants). Thus, we hypothesized that effects will be significantly larger for interventions offered to adolescents versus children.
Results from prior trials suggest that obesity prevention programs that promote a healthier lower-calorie diet (Perry et al., 1998) and those that also attempted to increase physical activity and/or decrease sedentary behavior (Gortmaker et al., 1999; Vandongen et al., 1995) produced larger effects for females than for males. However, another obesity prevention program that promoted healthy lower-calorie diets and increased physical activity found significantly stronger effects for males than for females (Kain, Uauy, Vio, Cerda, & Leyton, 2004) and one obesity treatment trial found that an intervention solely aimed at increasing activity and decreasing sedentary behaviors was more effective for boys than girls, though an intervention solely focusing on increasing activity level was equally effective for boys and girls (Epstein, Paluch, & Raynor, 2001). Although these findings may represent chance findings because most trials did not report that intervention effects were moderated by gender, there was more evidence that obesity prevention programs produced larger effects for females than for males. This finding may have emerged because sociocultural pressures for thinness are greater for females (Thompson, Heinberg, Altabe, & Tantleff-Dunn, 1999), which may amplify the effects of obesity prevention programs for this population. In support, more females than males are dissatisfied with their bodies and the vast majority of adolescent females with body image concerns are dissatisfied because they feel overweight (Thompson et al., 1999). In contrast, the reasons males give for body image concerns are more heterogeneous, and nearly half who indicate that they are dissatisfied with their weight actually wish to gain weight. In addition, females are at higher risk for onset of obesity than males (Solomon & Manson, 1997). Because there was more evidence that obesity prevention programs produce larger effects for females than males, we hypothesized that intervention effects for prevention programs might be larger for females.
There is also reason to believe that ethnicity might moderate obesity prevention effects. One the one hand, there is evidence that Black and Hispanic individuals show elevated rates of overweight and obesity, as well as greater increases in weight over development, relative to other ethnic groups (e.g., Burke & Bild, 1996; Kimm et al., 2001), suggesting that programs targeting these high risk groups might be more effective because there is a greater opportunity to show a prevention effect. On the other hand, overweight and obesity is less stigmatized and is associated with less body dissatisfaction for certain ethnic minority groups (e.g., Duncan, Anton, Newton, & Perri, 2003), particularly Black women, which might attenuate the effects of obesity prevention programs for these populations. Thus, we hypothesized that intervention effects would be different for programs primarily targeting high-risk ethnic minority participants versus those primarily targeting low-risk ethnic groups.
More generally, we have hypothesized (Stice & Shaw, 2004) that interventions are more effective when offered to high-risk participants (selected prevention programs) versus all individuals in a population (universal prevention programs). In the obesity prevention field, selected interventions have been directed at a variety of groups at elevated risk for future weight-gain, including Black and Hispanic individuals, students with other cardiovascular disease risk factors (e.g., hypertension), overweight or obese individuals, first year college students, and females with body dissatisfaction. Theoretically, these high-risk individuals are more motivated to engage in the prevention program content, and thus more likely to benefit. It is also likely that low risk individuals have less room for change on the outcomes (a floor effect). One narrative review comparing selected and universal school-wide obesity prevention programs concluded that selected interventions may be more effective in reducing pediatric obesity than universal interventions (Resnicow, 1993). In addition, prevention programs for eating pathology (Killen, et al 1993), depression (Clarke et al., 1995), anxiety (Lowry-Webster, Barrett, & Dadds, 2001), behavior problems (Stoolmiller, Eddy, & Reid, 2000), and substance abuse (Murphy et al 2001) have often produced stronger effects for high-risk sub-samples than for the full sample of individuals enrolled in these universal prevention programs. Thus, we hypothesized that intervention effects would be larger for selected programs versus universal programs. Because the key distinction between selected and universal programs is that the former are offered to high-risk individuals, we use the term risk status of participants to refer to this moderator.
Previous meta-analyses of prevention programs for other problem behaviors have suggested that longer duration multi-session interventions produced superior effects than very brief interventions (Rooney & Murray, 1996; Stice & Shaw, 2004). Theoretically, interventions with a longer duration afford a greater opportunity for presentation of information and behavioral change skills. We hypothesize that intervention effects will be stronger for prevention programs with a longer versus shorter duration.
It has also been suggested that parental involvement leads to more favorable results in obesity prevention, as the family is thought to be key to developing a psychosocial environment that is conducive to healthy eating and physical activity (Story, 1999). Parents are usually responsible for determining food offerings in and away from the home, at least through a certain age, as well as influencing exercise and recreation. Obesity treatment trials suggest that both child and adolescent weight loss programs are more effective when at least one parent is involved (Epstein, Wing, Keoske, & Valoski, 1987; Golan, Weizman, Apter, & Fainaru, 1998). Therefore, we hypothesized that obesity prevention programs with parental involvement would have larger effects than those without parental involvement.
Because research suggests that psychoeducational content is ineffective in producing behavioral change (Helweg-Larsen & Collins, 1997; Larimer & Cronce, 2002), we hypothesized that psychoeducational programs would be associated with weaker intervention effects. Indirect support for this hypothesis was provided by a recent meta-analysis which found that eating disorder prevention programs with psychoeducational content are less effective than those without this content (Stice & Shaw, 2004).
One implication from the energy balance model of obesity is that a reduction in fat and sugar intake and an increase in fruit and vegetable intake will decrease the risk for future weight gain (Epstein, Gordy et al., 2001). Although virtually all obesity prevention programs recommend consumption of low-fat diets, we differentiated between programs that directly manipulated dietary change as part of the intervention versus those that did not. We reasoned that distinguishing between interventions that actually manipulated diet and those that did not would provide the most sensitive test of this moderator. Another benefit is that this coding scheme captures environmental manipulations of the food environment, which is useful because theorists have suggested that the food environment plays a key role in obesity promotion (Wadden, Brownell, & Foster, 2002). The most common example was interventions that directly changed the nutritional content of school lunches (e.g., Donnelly, et al., 1996; Luepker, et al., 1996). We hypothesized that interventions that involved a direct improvement to dietary intake should produce stronger intervention effects than those that did not.
Another implication from the energy balance model of obesity is that increased physical activity will decrease risk for future weight gain (Wadden, Vogt, Foster, & Anderson, 1998). Although most obesity prevention programs recommend regular physical activity, we distinguish between prevention programs that directly manipulated physical activity from those that simply recommended it because we felt this would provide a more sensitive test of this potential moderator. The most common example of programs that manipulated physical activity was school-based interventions that administered a physical education class for students in the intervention condition but not the control condition (e.g., Dwyer, et al., 1983; McMurray et al., 2002). We hypothesized that programs that directly increased physical activity would have larger intervention effects than those that did not increase activity.
A third implication of the energy balance model of obesity is that interventions that reduce sedentary behavior, such as television viewing and video game use, should also decrease risk for future weight gain. Indeed, it has been theorized that more effective obesity prevention programs focused on reducing sedentary behavior (Baranowski et al., 2002), and television viewing is considered one of the most modifiable causes of obesity in children (Robinson, 1999). We hypothesized that larger effects would emerge for programs that focused on reducing sedentary behaviors than for programs that did not target this risk factor.
Our review of the literature suggested that the number of health behaviors targeted in an intervention was inversely related to the magnitude of intervention effects for obesity. Specifically, it appeared that interventions that attempted to change a broad array of health behaviors, such as body weight, blood pressure, cholesterol, and smoking, were less effective than programs that focused solely on body weight. Our clinical experience from designing and evaluating prevention programs also suggests that interventions focusing on a few concepts are more effective than those focusing on a broader array of concepts. It may be that the greater the complexity of the message relayed by the intervention, the more difficult it is for participants to process, store, and retrieve information presented in the programs. Consistent with this general impression, a review of school-based cardiovascular disease prevention trials concluded that broad-based programs targeting multiple health behaviors aimed at reducing risks for cardiovascular disease have not been effective for reducing obesity in children (Resnicow & Robinson, 1997). We hypothesized that programs targeting multiple health behaviors would have smaller effects than those solely targeting weight change.
Researchers have suggested that obesity prevention programs are more effective when delivered by dedicated interventionists versus classroom teachers (Baranowski et al., 2002). Theoretically, teachers are not able to devote as much time and energy to providing interventions as dedicated interventionists because teachers have classroom responsibilities that take precedence. Moreover, dedicated interventionists are typically able to provide the intervention several times per school year, allowing them to develop and refine their presentation strategies, whereas teachers typically will only provide the intervention once per year. In addition, teachers rarely receive the amount of specialized training and detailed supervision provided to dedicated interventionists. Thus, we hypothesized that intervention effects will be significantly larger for programs delivered by dedicated interventionists versus classroom teachers.
Meta-analytic reviews of substance abuse (Tobler et al., 2000) and eating disorder (Stice & Shaw, 2004) prevention programs have found that interactive programs produce larger intervention effects than didactic programs. Theoretically, participants in interactive programs show greater intervention effects because this format helps participants engage in the program content, which facilitates skill acquisition and attitudinal change. Interactive programs are also more likely to involve exercises that allow participants to apply the skills taught in the intervention, which should enhance skill acquisition (e.g., particular sports). We predicted that interactive programs would be more effective than didactic programs.
Our review of the prevention and treatment literature for obesity and eating disorders suggested that larger intervention effects were often observed for pilot trials of a new intervention relative to large demonstration trials. Such a pattern of effects might occur because interventionists are more passionate about new prevention program or because demonstration trials are more methodologically rigorous and are therefore more immune to such experimenter effects (e.g., because they more often use blinded assessors and minimal intervention control conditions). Thus, we hypothesized that intervention effects would be significantly larger for pilot evaluations of new interventions.
Our experience suggests that intervention effects are often larger when prevention programs are delivered solely to participants who have actively self-selected into trials in response to recruitment efforts, such as media advertisements, relative to when prevention programs are offered to all individuals in a defined population (e.g., a particular school). Presumably this is because the former strategy recruits individuals who are more motivated to achieve weight gain prevention effects and therefore engage more effectively in the prevention program. Thus, we hypothesized that intervention effects would be significantly larger for self-presenting volunteers than for participants recruited through population-based recruitment efforts.
We theorized that trials that randomly assigned participants to condition might produce larger intervention effects than trials that used alternative approaches to allocating participants to treatment condition, such as matching. We reasoned that because random assignment is the best approach to generating groups that are equivalent on any potential confounding variables at baseline (with sufficiently large sample sizes) that it should therefore minimize the chances that any of these confounding variables are correlated with treatment condition, which should thus maximize the ability to detect intervention effects if they really occur (i.e., randomization maximizes the signal-to-noise ratio reflected in inferential tests of the intervention effects). Accordingly, we hypothesized that intervention effects may be greater for trials that used random assignment relative to other approaches to assigning participants to condition. However, because the proper analysis of intervention effects involves tests of differential change across conditions, which adjusts for any initial differences at baseline on the outcome, we suspected that this effect might not reach statistical significance. Consistent with this expectation, random assignment did not emerge as a significant moderator of effects sizes in our meta-analysis of eating disorder prevention programs (Stice & Shaw, 2004).
Virtually all parametric inferential tests, such as repeated measures ANOVA, growth curve, and survival models, used to test for intervention effects within randomized trials assume independence of errors. However, when participants are nested within schools, classes, or group-based interventions, the assumption of independence may not hold (Baldwin, Murray, & Shadish, 2005). Participants within these nested groups may be more similar than participants from across these groups, which can artificially reduce the error terms used to test for intervention effects, which increases risk for a false positive finding. Thus, we hypothesized that studies that did not model the nested nature of the data in the trial will produce artificially larger effect sizes for the interventions relative to studies that modeled the nested nature of the data.
We also investigated three variables that might produce artifacts for the effect sizes and bias our estimates of effect size moderators, with the goal of including these variables as covariates in the models if necessary. First, our review of the eating disorder prevention field suggested that interventions tend to produce larger effect sizes when they are compared against assessment-only or waitlist control conditions relative to when they are compared to active interventions that are credible and structurally matched to the intervention in terms of contact hours (Stice & Shaw, 2004). Theoretically, this pattern of findings occurs because the active comparison groups more effectively control for demand characteristics, participant expectances, and other non-specific factors that contribute to intervention effects. Thus, we tested whether type of control condition was systematically related to the intervention effect sizes. Second, because effect sizes for prevention programs tend to be smaller when longer follow-up periods are examined relative to shorter follow-up periods or pretest to posttest designs for prevention programs (Stice & Shaw, 2004), we tested whether follow-up length was related to effect size magnitude. Third, because prior meta-analyses have found that unpublished studies often have smaller effects than published studies (Lipsey & Wilson, 2001), we investigated whether publication status was related to intervention effect sizes.
Following the recommendations of Lipsey and Wilson (2001), five procedures were used to retrieve published and unpublished trials of obesity prevention programs. First, a computer search was performed on PsychInfo, MedLine, Dissertation Abstracts, and Cumulative Index to Nursing and Allied Health Literature for the years 1980 – 2005 (through October) using the following keywords: obesity, weight, cardiovascular disease, prevention, preventive, and intervention. Two research assistants and a professional librarian performed independent searches to increase the odds that all relevant articles would be retrieved. The first two authors reviewed the products of all three searches to identify pertinent articles. Second, the tables of content for journals that commonly publish articles in this area were reviewed for this same period (e.g., Preventive Medicine, Journal of Pediatrics, Health Education Quarterly). Third, we consulted narrative reviews of the obesity prevention field to search for additional citations of relevance. Fourth, the reference sections of all identified articles were examined. Finally, established obesity prevention researchers were contacted and asked for copies of unpublished articles (under review or in press) describing prevention trials.
The defining feature of a successful obesity prevention program is that it results in significantly less weight gain or risk for obesity onset than observed in the control group. Thus, we only included trials that used some type of proxy measure of body fat as an outcome. Most trials used the body mass index (BMI = Kg/M2) as the primary proxy measure of body fat, but a few studies, particularly older ones, used skinfold thickness. It is important to note that BMI is not a direct measure of body fat. Although this proxy measure tends to show high correlations with the most precise measures of body fat (r = .80 – .90), such as dual energy x-ray absorptiometry (DEXA; Dietz & Robinson, 1998), it has been found to show lower agreement with DEXA measures in large epidemiology samples (r = .71; Ellis, Abrams, & Wong, 1999). Nonetheless, because the BMI is easy to measure, shows high test-retest reliability, is inexpensive, and correlates with health risk markers and diseases, such as elevated blood pressure, adverse lipoprotein profiles, atherosclerotic lesions, serum insulin levels, and diabetes mellitus, it is considered the measurement of choice for large scale studies (Dietz & Robinson, 1998; Freedman & Perry, 2000).
As noted previously, we included trials that were primarily conceptualized as evaluations of obesity prevention programs, as well as trials that evaluated other interventions that were expected to result in less weight gain or risk for obesity onset but that were not primarily conceptualized as obesity prevention programs (e.g., certain physical activity interventions, eating disorder prevention programs, and psychoeducational interventions). A prior meta-analysis indicated that certain eating disorder prevention programs and psychoeducational interventions produced significant weight gain prevention effects (Stice & Shaw, 2004). We included a wide variety of interventions that were expected to produce weight gain prevention effects in the hope that it would maximize our chances of identifying participant, intervention, delivery, and design features that are associated with the most efficacious obesity prevention programs. If multiple reports of the same trial were published, we selected the one with the longest follow-up period.
This meta-analysis focused solely on effect sizes for weight gain prevention effects, as assessed by differential change in body fat measures. We did not include effect sizes for changes in self-reported dietary intake or physical activity because numerous trials have found significant intervention effects for self-reported dietary intake and physical activity, but no significant effects for weight change (e.g., Baranowski et al., 2003; Luepker et al., 1996; Puska et al., 1982). According to the energy balance model of adiposity, any true reduction in caloric intake and/or increase in physical expenditure should be accompanied by concomitant changes in body mass. Therefore, we interpreted this pattern of findings as suggesting that self-report measures of dietary intake and physical activity are of questionable validity, at least within the context of the demand characteristics of obesity prevention trials. This interpretation dovetails with studies that have found that people under-report caloric intake and over-report activity level (Bandini, Schoeller, Dyr, & Dietz, 1990; Lichtman et al., 1992).
We focused exclusively on prevention programs that were evaluated in controlled trials. We included trials in which participants were randomly assigned to an intervention or to usual-programming (e.g., standard physical education classes), active interventions that were not focused on weight gain prevention (e.g., a general parent training intervention), waitlist, or assessment-only control conditions, as well as trials in which some relevant comparison group was used (e.g., matched controls) in a quasi-experimental design. Random assignment to condition is optimal because it is the best approach to generating comparison groups that are equated on any potential confounding variables at baseline (Shadish, Cook & Campbell, 2002). Because many confounds are unknown, random assignment is preferable to the use of control groups that are matched to the intervention group on pre-selected dimensions. Nonetheless, carefully selected comparison groups can permit useful inferences regarding intervention effects if analyses test for significant differences in change over time across conditions (i.e., controlled for initial between group differences on the outcome; Shadish et al., 2002). We excluded trials that only compared active interventions because it seemed inappropriate to compare them to trials that used a control condition and because it is difficult to determine whether a lack of differential change across active interventions signifies that both prevention programs were effective or that neither was effective.
We also focused exclusively on studies that tested whether the change in the outcomes over time was significantly greater in the intervention group versus the control group. This could take the form of a time-by-condition interaction in a repeated-measures analysis of variance (ANOVA) model, an analysis of covariance (ANCOVA) model that controls for initial levels of the outcome variable, or growth curve model that controlled for initial levels of the outcome (e.g., the effects were conditional upon the intercept value of the dependent variable coded to reflect the level of the outcome at baseline; Stice, & Shaw, 2004). It is necessary to control for initial levels of the outcome variable because otherwise the analyses are not providing a test of differential change over time across conditions. Verifying that the groups do not differ at baseline on the outcome variable does not solve this problem because the objective is to model change from baseline to intervention termination or follow-up, rather than just to conduct between-subjects tests of the groups at termination or follow-up. If the intervention group had higher initial BMI scores than the control group, the analyses may not detect a true intervention effect (a Type II error), whereas if the control group had higher initial BMI scores than the intervention group, the analyses might erroneously suggest that an intervention effect was present when it was not (a Type I error). We also included trails that used logistic regression or survival models to test whether the rates of onset of obesity or overweight was significantly less in the intervention condition versus control condition if initially obese or overweight participants, respectively, were excluded from the analyses (Willett & Singer, 1993). Studies that only tested for significant changes within condition were not included because this type of analysis does not test whether the changes in the intervention condition are significantly greater than the changes in the control condition. With this latter approach there is no way to separate the effects of the intervention versus those from alternative sources, such as regression to the mean or measurement artifacts.
We excluded trials that were described as obesity treatment programs by the authors because the purpose of the present report was to provide a meta-analytic review of programs that sought to prevent future weight gain or obesity onset. Nonetheless, we included evaluations of programs that sought to prevent future weight gain in overweight or obese samples if they were not referred to as treatment programs by the authors. More generally, we did not exclude studies solely because the average BMI of participants fell above conventional cutoffs for overweight or obese (e.g., over 25 or 30 for young adult samples).
We also restricted our focus to trials that targeted children and adolescents because of our interest in determining whether effective interventions have been designed for developing individuals. We beleive that obesity prevention programs should be implemented before most individuals will show onset of obesity. However, we used a broad view of adolescence and included trials with a mean age of participants up to age 22 because this captured college-based obesity prevention programs. College-aged individuals are still developing self-regulation skills, particularly with regard to dietary and exercise behaviors. In addition, many developmental psychologists consider adolescence to span from approximately age 12 through age 24 because most individuals in the US have not settled into adult roles by their early 20s (Arnett, 2000).
We calculated effect sizes for tests of differential change in BMI and risk for obesity onset across the intervention and control conditions because virtually all of the prevention trials included BMI as a primary outcome. Although other proxy measures of adiposity were used in several trials, such as skinfold thickness and waist-to-hip ratios, these latter outcomes were operationalized inconsistently and were collected in only a subset of the trials. We considered averaging the effect sizes from these various adiposity proxy measures, but noted that the intervention effects for these various outcomes were often contradictory and were concerned that averaging across diverse measures would introduce unnecessary error variance into the analyses. Furthermore, the measurement error is considerably lower for the BMI relative to alternative proxy body fat measures, including waist circumference, triceps skinfold and subscapular skinfold measures (Freedman & Perry, 2000). In the four studies that did not collect BMI data, effect sizes were calculated for alternative proxy measures of body fat; Dwyer et al., (1983) used skin-fold measures, Eliakim et al., (2000) used magnetic resonance imaging estimations of percent body fat, and Gutin et al. (1995) and Gutin and Owens (1999) used DEXA estimations of percent body fat.
The correlation coefficient (r) was selected as the index of effect size because of its similar interpretation across different combinations of interval, ordinal, and nominal variables (Pearson’s r, Spearman’s rho, and point biserial; Rosenthal, 1991). Furthermore, this effect size preserved the valence of the effects (unlike measures such as eta-squared). Cohen’s (1988) criteria for small (r = .10), medium (r = .30) and large (r = .50) effects were used.1
If effect sizes were reported in Cohen’s (1988) d, we converted them to r with the formula provided on page 20 of Rosenthal (1991). If effects were reported as odds ratios (OR), they were converted to r with the formula provided on page 194 of Lipsey and Wilson (2001). If no effect sizes were reported, we generated them directly by calculating Cohen’s d with the means and standard deviations (from the control group at baseline) reported in the article, which were then converted to r using the Rosenthal formula or we reconstituted the data using weighted probability values to estimate a χ2 test that provided an odds ratio, which was then converted to r using the Lipsey and Wilson formula. If none of these options for generating effect sizes were possible, we estimated effect sizes from the exact p-values reported by the authors using the formula provided on page 19 of Rosenthal (1991). If exact p-values were not reported, they were generated from the test statistics (e.g., F) and degrees of freedom using Microsoft Excel© 2004.
We were able to use the methods described previously to generate effect sizes or estimates of effect sizes for all trials that reported significant intervention effects and for most trials that reported non-significant effects. However, for the two trials that reported non-significant effects and did not provide any other data with which to estimate the effect size (Fardy, 1996; Willet, 1995), we used full information maximum likelihood (ML) estimation to impute the missing effect sizes because this approach produces more accurate and efficient parameter estimates than list-wise deletion or alternative imputation approaches such as mean substitution (Schafer & Graham, 2002). We selected this approach over the more common strategy of assuming an effect size of zero (Lipsey & Wilson, 2001) because more precise estimates of these missing values can be generated using the conditional probabilities between effect sizes and effect size moderators from the trials that provided complete data on these variables.
Table 1 lists the numeric values used to code each moderator, the operationalization of each moderator and relevant descriptive statistics describing the distribution of the moderators.2 We coded certain moderators two ways, in an effort to ensure that we were not missing the effects of a moderator because we did not operationalize it optimally. First, in addition to coding the average age of participants in the study, we also coded the age range of participants, to determine whether studies focusing on a narrow age range may be better able to deliver an intervention that is developmentally appropriate. Second, with regard to participant ethnicity, we coded both the percentage of participants who were Black or Hispanic (a continuous variable) because these two groups are at particularly high risk for obesity and the dominant ethnic group represented in the sample (a nominal variable). Third, with regard to intervention duration, we coded both the total amount of intervention hours and the total length of the intervention in weeks because these two aspects of intervention duration varied somewhat independently (the r between these two dimensions was only .50). Fourth, with regard to psychoeducational content, we coded both whether each intervention contained psychoeducational content (to stay parallel with the coding used for the other intervention content codes) and whether the intervention only included psychoeducational content, to explore the possibility that these latter types of interventions were uniquely associated with small intervention effects.
One aspect of our coding system was constrained by the distribution of a certain moderator across studies. Specifically, although we were interested in testing whether the intervention effects were significantly larger for females than males, only 33% of the trials that we located reported effect sizes separately for the sexes (and only 21% provided a direct test of whether gender moderated the intervention effects). Accordingly, we tested whether interventions offered solely to females were more effective than those offered solely to males or those offered to both sexes. We took this approach because (a) this variable emerged as a significant predictor of eating disorder prevention programs effects (Stice & Shaw, 2004), (b) our initial review of the findings suggested that effects were larger for female only trials, and (c) this allowed us to include all trials in the analyses. Because only 2 interventions were offered solely to males, we did not feel comfortable estimating average effects for these 2 trials.
There were also a number of other potential moderators that we were unable to code because insufficient information was provided in the articles and reports. We were unable to code average attendance because only 44% of the studies reported this variable. We were unable to code the socioeconomic status of the sample because parallel information (e.g., average parental income) was reported in only 35% of studies. We were unable to code the method of handling missing data (e.g., listwise deletion [completer analysis], last observation carried forward, full information maximum likelihood estimation imputation) because less than 40% of the studies reported this information.
We used a consensus approach to coding the effect size moderators. The first and second authors were each responsible for coding certain moderators, but consulted with each other when questions regarding the coding of particular studies arose. Although this approach allowed for a refinement of the coding system and served to increase inter-rater agreement, we did not use the consensus approach on all data points or double code all studies. Thus, we examined inter-coder agreement by having the first two authors code a randomly selected 30% of the trials examined in this meta-analytic review.
The literature search identified 46 trials that met the inclusion criteria, in which 61 different obesity prevention programs were evaluated (12 trials evaluated more than one prevention program and 3 prevention programs were evaluated in 2 trials), resulting in a total of 64 effect sizes for this review. Of these 64 prevention programs, 30 were universal and 34 were selected. The majority focused on both males and females (n = 48), but 14 focused solely on females and 2 focused solely on males. The majority of these interventions were school-based programs (84%). A total of 51 of the 64 prevention programs used random assignment to condition, of which 13% were randomized at the participant level, 2% were randomized at the group level, and 85% were randomized at the school level. Brief descriptions of the samples, program content, and intervention effects are provided in Tables 2 and and33 for universal and selected prevention programs respectively. Figure 1 provides a flow chart showing the number of studies that were omitted because of the various exclusionary criteria.
To assess inter-rater agreement between the two coders responsible for abstracting effect sizes and moderators, we calculated the inter-class correlation coefficient (ICC) for continuous variables and kappa (κ) coefficients for nominal variables (see Table 4). The ICC coefficients ranged from a low of .95 (for the effect size estimates) to 1.0 (for 80% of the continuous variables examined in this report). The κ coefficients ranged from .87 (for whether nested data was modeled incorrectly) to 1.00 (for 75% of the nominal variables examined in this report). These analyses indicate that there was high inter-rater agreement.
Tables 5 and and66 report the magnitude of effect sizes and provide the participant, intervention, delivery, and design features that were investigated as potential moderators of intervention effects. The effect sizes reflect analyses performed on the entire samples used in these studies, versus effect sizes for various subgroups such as the different genders, because such subgroup analyses were not consistently reported across trials.
Analyses were conducted on the effect size for change in BMI in the intervention condition versus the control condition. Pearson’s r’s were first converted to z scores to avoid problematic standard error estimates (Hedges & Olkin, 1985). The SPSS macro developed by Lipsey and Wilson (2001) was then used to estimate the overall inverse variance weighted average effect size for random effects models. All mean values were computed using this method.
The average effect size across all studies was very small (r = .04), but was significantly larger than zero (z = 2.94, p < .01). The r’s for the effect sizes ranged from −.24 to .50. Only 13 of these interventions (1 of which was evaluated in 2 trials), or 21% of the 61 programs evaluated, found significant positive intervention effects based on an alpha level of .05 (Dwyer et al., 1983; Eliakim et al., 2000; Fitzgibbon et al., 2004; Gutin & Owens, 1999; Killen et al., 1988; Lionis et al., 1991; Manios, Moschandreas, Hatzis, & Kafatos, 2002; Robinson, 1999; Stice, Orjada, & Tristan, in press; Stice & Ragan, 2002; Stice, Shaw, Burton, & Wade, in press; Tamir et al., 1990). One intervention (Alexandrov et al., 1992) reported a significant negative effect, which either represented a chance finding or an iatrogenic effect.
There was significant heterogeneity in effect sizes (Q = 204.41, p < .001), indicating that there was statistically meaningful variability across the effect sizes produced by the interventions (i.e., that effects were not equivalent across trials). The heterogeneity in the effects suggests that there may be participant, intervention, delivery, and design features that account for the variability in effect sizes.
Two moderators could not be examined because of severe restrictions in range; because only two studies used credible active control conditions and because we only located two unpublished reports, we did not consider type of control condition or publication status3 further. Two potential confounding variables were not examined because they did not show significant relations to effect sizes: preliminary univariate analyses indicated that length of follow-up (z = 1.58, p = .11, β = .18) and the age range of participants in the trials (z = .80, p = .42, β = .10) were not significantly related to effect size magnitude. Within this context, it should be noted that preliminary analyses also indicated that publication year, a variable commonly included in meta-analytic reviews, was not a significant predictor of effect size (z = 1.44, p = .15, β = .17).
We also conducted preliminary analyses to determine which operationalization of participant ethnicity and psychoeducational content to examine in the models. First, with regard to ethnicity, analyses indicated that neither the percent of the sample that was Black or Hispanic (z = .48, p = .63, β = .06) nor the predominant ethnic group in each sample (which was represented with a series of dummy-coded vectors) was significantly related to effect sizes. Ethnicity dummy variables representing Black (z = .43, p = .67, β = .05), Hispanic (z = .31, p = .76, β = .04), Asian and Pacific Islander (z = 1.51, p = .13, β = .18), and Native American (z = 1.53, p = .12, β = .18) were not statistically significant. We focused exclusively on the former operationalization for this report because the latter operationalization had some very small cell sizes (e.g., predominantly Native American participants) and necessitated the use of multiple dummy-coded vectors to represent this operationalization. Second, because the code representing whether interventions had only psychoeducational content was not systematically related to the effect sizes in a univariate model (z = −.38, p = .71, β = −.04); we limited our analyses to whether the intervention contained any psychoeducational content (which allowed us to use a parallel approach for all of our intervention content variables). Thus, although we initially coded 23 effect size moderators (see Table 4), the moderator analyses focused on the 18 effect size moderators listed in Table 8.
Parental involvement was initially analyzed as a four level moderator with the following levels represented: no parental involvement, psychoeducational material provided to parents, parental attendance of sessions, and parental behavioral change. However, in preliminary analyses dummy-coded variables representing psychoeducational material (z = −1.14, p = .26, β = −.14), parental attendance of sessions (z = −.19, p = .85, β = −.02), and parental involvement (z = .38, p = .71, β = .05) were not statistically significant predictors of effect size. We therefore simplified this variable into a dichotomous variable (no parental involvement or psychoeducational material = 0; parental attendance or parental behavior change = 1) so that we could use a single dummy coded vector to represent this variable.
Random effects regression models, with inverse weighted variances, tested whether the putative moderators were related to observed effect sizes. Random effects models separate variance between effect sizes and variance attributable to individual studies. Inferentially, random effects models can be generalized to a broader set of studies or potential studies in contrast to fixed effects models that do not account for variance attributable to a particular study. Regression models were implemented using SPSS macros (Lipsey & Wilson, 2001) for inverse variance weighted regression with maximum likelihood estimation. The correlations between the moderators are presented in Table 7.
Moderators were first examined in separate univariate regression models to investigate the bi-variate relations between moderators and effect sizes that were not complicated by colinearity. The moderators that showed significant effects in the univariate models were then entered in a multivariate model to estimate the unique effect of each moderator controlling for the effects of the other moderators with significant effects. The five continuous moderators, average age, percent Black and Hispanic, intervention duration in hours, intervention duration in weeks, and number of behavioral targets, were standardized in a z score format. We tested for linear and quadratic effects for the five continuous moderators, as statisticians recommend testing for such higher order effects to decrease the risk of model misspecification (Hosmer & Lemeshow, 2000). For instance, it is possible that interventions targeting two health behaviors produce larger weight gain prevention effects on average than do interventions targeting fewer or more behavioral targets. Effect sizes were regressed on the linear and quadratic terms. If the quadratic effect was not significant, the quadratic term was removed from subsequent models to ensure that linear effects were not obscured by colinearity between the linear and quadratic terms. When the quadratic term was significant, both the linear and quadratic terms were retained for all subsequent models.
Among the five continuous moderators, the model for participant age was the only model in which the quadratic term was significant. Both the linear and quadratic age terms were significant (see Table 8). As indicated in Figure 2, larger effect sizes tended to emerge in trials involving children and adolescents, but smaller effect sizes occurred in trials involving preadolescents. To probe the form of this curvilinear pattern, we examined mean effect sizes for the three tertiles of age: interventions with an average less than or equal to 9.23 years exhibited effect sizes that were only marginally different from zero (M r = .03, p = .07, n = 21); interventions with an average age greater than 9.23 and less than or equal to11 did not exhibit effect significantly different from zero (M r = .01, p = .42, n = 23); and interventions with an average age greater than 11 were significantly different from zero (M r = .07, p < .05, n = 20).
Significantly larger effects were observed in female-only trials than were observed in mixed sex and male-only trials (see Table 8). Follow-up analyses revealed that the average effect for programs focusing solely on females was significantly different from zero (M r = .13, p < .01, n = 14), whereas the average effect for programs targeting mixed sex samples and male-only samples was trivial and not significantly different from zero (M r = .02, p = .06, n = 50).
Intervention duration was examined as a function of hours and weeks. While there was not a significant effect for duration in hours, there was a significant negative effect for duration in weeks (see Table 8). Interventions below the median of 16 weeks exhibited a mean effect size significantly greater than zero (M r = .06, p < .01, n = 31) in contrast to the interventions at or above the median of 16 weeks that were not significantly greater than zero (M r = .02, p = .15, n = 33).
The model for number of behavioral targets containing only a linear term had a significant negative coefficient (see Table 8), indicating that effect size decreased as the number of non-weight related targets increased. Interventions that targeted only weight change exhibited effect sizes greater than zero (M r = .09, p < .001, n = 27) and interventions that targeted other behavioral changes in addition to weight change were not significantly different than zero (M r = .01, p = .47, n = 37).
Pilot trials of interventions exhibited significantly larger effect sizes than fully powered demonstration trials (see Table 8). Follow-up analyses revealed that the average effect for pilot studies was significantly different from zero (M r = .14, p < .001, n = 18), whereas the average effect for interventions evaluated in demonstration trials were not significantly different from zero (M r = .02, p = .07, n = 46).
Trials that used a self-selected recruitment method resulted in significantly larger effect sizes than trials that used population-based recruitment methods (see Table 8). Follow-up analyses showed that trials using self-selected recruitment were significantly greater than zero (M r = .14, p < .001, n = 16), whereas the average effect for trials using population-based recruitment were not significantly different from zero (M r = .02, p = .10, n = 48).
A multivariate model was estimated containing moderators that were significant predictors of effect size in previous univariate models: the linear and quadratic terms for age, participant gender, number of behavioral targets, duration in weeks, if the trial was a pilot study, and recruitment method. The linear term (z = −4.14, p < .001, β = −2.06) and the quadratic terms for age both showed significant univariate effects in this model (z = 4.56, p < .001, β = 2.36). The only other moderator that remained statistically significant in the multivariate model was self-selected recruitment (z = 2.07, p < .05, β = .30). Participant gender (z = −1.96, p = .05, β = −.33), duration in weeks (z = −.66, p = .51, β = −.08), number of behavioral targets (z = −.43, p = .67, β = −.06), and whether the trial was a pilot study (z = 1.59, p = .11, β = .19) did not show significant unique effects in the multivariate model. The R2 for the full model was .42.
The first aim of this review was to summarize the effects of prevention programs that sought to produce weight gain prevention effects. One of the more noteworthy findings was that although numerous prevention programs have been evaluated, most (79%) did not produce statistically reliable weight gain prevention effects. Indeed, the average intervention effect size was an r of .04, which would be considered trivial by most researchers and clinicians. This pattern of findings attests to the difficulty of altering the health behaviors that increase risk for weight gain and obesity onset, and echoes the modest success of treatment programs for obesity in producing lasting changes in body weight (Jeffery et al., 2000).
Although it is tempting to conclude that it is particularly challenging to prevent future weight gain, the percentage of programs that produced significant intervention effects for obesity prevention programs (21%) is similar to that of prevention programs for other public health problems such as HIV (22%; Logan, Cole, & Leukefeld, 2002) and eating disorders (25%; Stice & Shaw, 2004), although smoking prevention programs have a higher rate of significant intervention effects (60%; Skara & Sussman, 2003). The average effect size for obesity prevention programs (r = .04) is also similar to the average effect size observed for prevention programs for other public health problems, such as smoking (r = .07; Hwang, Yeagley, & Petosa, 2004), substance abuse (r = .05; Tobler et al., 2000), HIV (r = .05; Logan et al., 2002), and eating disorders (r = .12; Stice & Shaw, 2004). This broader pattern of modest average returns for prevention programs aimed at a variety of health behaviors implies that most prevention programs are only minimally effective in reducing maladaptive health behaviors.
The above conclusion makes it imperative to focus on the 21% of prevention programs that produced significant weight gain prevention effects. The average effect size for these 13 interventions was r = .22 (p < .001), which corresponds to a medium effect size and is of clinical significance. The effect sizes ranged from a low of .06 (Killen et al., 1988) to a high of .50 (Eliakim et al., 2000), which is a remarkably large effect size for a prevention program.
There are several noteworthy features of the interventions that produced weight gain prevention effects. First, these programs were relatively intensive: on average they involved 40 hours of intervention time (range 3 – 120 hours). However, it was not only the successful interventions that were intensive; the average number of intervention hours was 46 for the programs that did not produce weight gain prevention effects (range 5 – 280 hours). Intervention duration is important to consider because it is difficult to disseminate intensive programs in schools given the competing demands for classroom time. Moreover, the long intervention duration also translates into higher dissemination costs, because both training and delivery costs will be greater. Given the range of intervention durations, it is important to consider the effect per hour of the intervention when comparing the different programs. As indicated in Tables 5 and and6,6, the average r per hour of interventions for those that produced significant weight gain prevention effects ranged from .001 (Robinson et al., 2003) to .063 (Stice, Shaw et al., in press), suggesting that certain interventions produce more apparent return per hour of intervention, which should facilitate dissemination and lower dissemination cost.
It is also noteworthy that only 2 of the 13 prevention programs that produced significant weight gain prevention effects were primarily conceptualized as obesity prevention programs (Fitzgibbon et al., 2004; Robinson, 1999). The other interventions were described as general health education interventions (Lionis et al., 1991; Manios et al., 2002), cardiovascular disease prevention programs (Killen et al., 1988; Tamir et al., 1990), physical activity interventions (Dwyer et al., 1983; Eliakim et al., 2000; Gutin & Owens, 1999), and eating disorder prevention programs (Stice, Orjada et al., in press; Stice & Ragan, 2002; Stice, Shaw et al., in press). This suggests that there may be many avenues to preventing obesity beyond programs that are directly billed as weight gain prevention programs and that it would be fruitful to follow-up these alternative interventions. Another benefit of these alternative interventions is that they produce effects for additional public health problems beyond obesity (e.g., smoking and eating disorders).
A third noteworthy feature of the 13 effective programs is that only 3 (5% of the total programs evaluated) of these interventions produced weight gain prevention effects that persisted over a significant follow-up period (Fitzgibbon et al., 2004; Stice, Shaw et al., in press; Stice, Orjada et al., in press). The remainder of the programs produced weight gain prevention effects from pretest to posttest. Because virtually all of the weight loss effects observed in obesity treatment studies disappear by 3-year follow-up (Jeffery et al., 2000), it is possible that the weight gain prevention effects did not persist. It will be important to include longer-term follow-ups in future obesity prevention trials.
A fourth feature of the programs that produced weight gain prevention effects is that the positive effects for weight gain have only been replicated in multiple trials for one intervention (Stice & Ragan, 2002; Stice, Orjada et al, in press). Given that independent replication is a necessary step in establishing that a program is efficacious – that it produces statistically reliable effects in highly controlled trials (American Psychological Association [APA], 1995), this is represents another important gap in the literature.
The second aim of the present review was to examine participant, intervention, delivery, and design features that are associated with larger intervention effects. Results indicated that intervention effects were stronger for children and adolescents relative to preadolescents, with the strongest effects emerging for adolescents. The evidence that obesity prevention programs were most effective for adolescents generally conformed to the initial hypothesis (Baranowski et al., 2002). Theoretically, older participants are better able to grasp intervention material and wield control over their food and physical activity choices than younger participants. In addition, adolescence is a developmental period during which individuals often must develop self-regulatory skills because they are becoming more autonomous, and it may be particularly useful to deliver obesity prevention programs at this time. There was some evidence that obesity prevention programs are more effective for children versus preadolescents, which seems inconsistent with the suggestions that obesity prevention programs are less effective for children (Baranowski et al., 2002). An examination of the prevention programs aimed at children suggests that the interventions that produced the largest effect in this age range included a parental involvement component (e.g., Harvey-Berino et al., 2003), which may be a particularly effective way to alter the food environment of children in this age range. However, it is difficult to interpret age effects in a meta-analytic review of intervention programs because the same interventions were not tested across a range or ages (i.e., the age of participants and the content of the intervention varies). Thus, it may be something about the types of programs that are delivered to adolescents, rather than the adolescent developmental period, that explains why larger effects tended to emerge with adolescents. Nonetheless, the fact that the effect of participant age remained significant in the multivariate model suggests that this effect did not emerge because of some confound with the other moderators that showed significant effects (e.g., participant gender or intervention duration).
As expected on the basis of a prior meta-analytic review of eating disorder prevention programs (Stice & Shaw, 2004), the univariate model found that obesity prevention programs were more effective when delivered solely to females versus male or mixed gender samples. We theorize that females may be more receptive in general to interventions promoting weight control because of the significantly stronger societal pressures for them to conform to a thin-beauty ideal espoused by Western cultures. It was also noteworthy that, for the trials for which separate effect sizes for the two genders were available, the average effect size for females (r= .06) was larger than the average effect size for males (r = .02) and that only the former differed significantly from zero. This finding suggests that extant programs may be more effective for females and that there is a need to develop alternative interventions for males. However, participant gender did not have a significant unique effect in the multivariate model because participant gender was correlated (see Table 7) with the number of behavioral intervention targets (r = −.47) and whether the trial relied on self-selected recruitment (r = .66). This pattern of findings either implies that the effect of participant gender is actually driven by another moderator that was confounded with gender or that certain participant, intervention, and design features simply tend to co-occur naturally, which has the effect of attenuating the unique effect of each moderator in multivariate models. Although this interpretational ambiguity is not unique to meta-analytic reviews, as it arises with any correlational data (e.g., for prospective risk factor studies), it does signal that caution should be used when interpreting the moderator effects.
Unexpectedly, interventions with a relatively shorter duration (in weeks) produced significantly larger effect than those that were longer in duration. This might suggest that interventions that are long in duration are unappealing to participants, which causes them to drop out of the intervention or to disengage from the program. Intervention duration in weeks did not show a significant unique effect in the multivariate model, which occurred because intervention duration in weeks was correlated (see Table 7) with the number of health behavior targets (r = .51). Because it is logical to expect that interventions targeting multiple health behaviors would be longer in duration, this appears to be natural colinearity that simply functions to attenuate the unique effects of each of these factors.
As hypothesized, effects were significantly larger for interventions that solely focused on obesity prevention than for interventions that focused on additional health behaviors. This finding is consistent with our suggestion that message complexity may curtail the effectiveness of health promotion interventions. It may be necessary to keep health promotion interventions relatively short and simple for maximal intervention effects. The effect for number of behavioral targets became non-significant in the multivariate model because (see Table 7) interventions targeting multiple health behaviors tended to be longer in duration (r = .51 with duration in weeks) and tended not to use self-selected recruitment (r = −.51), or be pilot trials (r = −.49). Although it is possible that the effect for number of behavioral targets was actually driven by one of these moderators, as noted above, it is also possible that this colinearity is natural and simply attenuates the unique effects for the moderators.
Also as expected, there was evidence that pilot trials tend to produce larger effects than large demonstration trials. Presumably, this finding emerged because interventionists and researchers are more passionate about new interventions, which contributes to larger effect sizes because of demand characteristics or because demonstration trials are more methodologically rigorous (e.g., are more likely to use blinded assessors), which makes them more immune to experimenter bias. The effect of this moderator also became non-significant in the multivariate model because pilot trials tended to focus on fewer behavioral targets (r = −.51) and more often used self-selected recruitment methods (r = .60), which attenuated the unique effects of the moderators.
Self-selected recruitment also showed a significant relation to the intervention effect sizes in both the univariate and multivariate models. Theoretically this effect emerged because self-presenting participants are more motivated to engage in the program and more likely to make the recommended lifestyle changes, which contributes to the larger effect sizes observed for trials that used this recruitment method.
It was also noteworthy that a number of factors that have been hypothesized to moderate obesity prevention program effects, such as mandated improvements in diet and exercise, sedentary behavior reduction, parental involvement, and delivery by trained professional interventionists (versus teachers), were not significantly related to larger effect sizes. This did not appear to be simply function of limited statistical power because, based on the procedures described by Hedges and Pigott (2004) for mixed-effects regression models, we had a power of .54 to detect a small effect (r = .10), a power of .89 to detect a medium effect (r = .30), and a power of greater than .99 to detect a large effect (r = .50). These calculations were based on 2-tailed inferential tests and an assumed variance of .1, which was a conservative value that exceeded all observed error variances. The effect sizes in Table 8 confirm that we had sufficient effects to detect medium effects. Moreover, the effect sizes for certain moderators, such as participant ethnicity, participant risk status, parental involvement, physical activity increase, use of random assignment, and modeling nested data incorrectly were so small that it is unlikely that limited power accounts for these null effects. Nonetheless, the fact that the effects for other moderators, such as psychoeducational content and reduced sedentary behavior, were somewhat larger suggests that future studies should continue to investigate these potential moderators.
Another possible explanation for the null effects for dietary and physical activity changes is titration, wherein participants in interventions that directly change dietary intake or activity level in schools may compensate for these changes by altering their dietary intake and activity level at other times. In support of this speculation, Donnelly et al., (1996) found that an intervention that mandated increased physical activity during school resulted in increased activity during school, but a significant decrease in activity outside of school. If participants compensate for such mandated health behavior changes by making alterations outside of the program to keep them at a particular energy intake or energy expenditure level, these mandated changes may not produce differential effects relative to interventions without mandated behavioral change. Another implication of this possibility is that it is vital to measure behavioral change that may offset any positive behavioral change that occurs during the intervention.
The final aim of this review is to explore directions for future research. First, the finding that most obesity prevention programs that have been evaluated did not produce significant weight gain prevention effects suggests that it will be vital to conduct follow-up trials of enhanced versions of the programs that produced significant weight gain prevention effects and to design new programs that build upon those that worked. This will also provide an opportunity to conduct independent replications of the most successful obesity prevention programs, which is a necessary component to establishing that these interventions are efficacious (APA, 1995).
Second, it will also be important to determine how to better design obesity prevention programs for populations that generally did not derive weight gain prevention effects from extant programs, such as preadolescents and males. Unless efficacious prevention programs are developed for a broad array of participants, it will be difficult for obesity prevention efforts to achieve a meaningful reduction in obesity at the population level.
Third, it will be important for future trials to address methodological limitations of prior trials. Future trials should use random assignment to condition, blinded assessment procedures, direct measures of body fat, and procedures to minimize attrition. Additionally, it would be desirable if they employed active control groups, rather than the assessment-only or waitlist control conditions that are commonly used because these latter control conditions do not rule out the possibility that demand characteristics, expectancy effects, or attention contribute to any apparent intervention effects. Showing that a prevention program outperforms an active placebo control condition or alternative active intervention is also necessary for establishing that a program is efficacious (APA, 1995).
Future trials should also include multi-year follow-ups to ensure that any intervention effects persist beyond the termination of the intervention, as most programs that produced weight gain prevention effects used only pre-post designs. Given that the vast majority of individuals who show successful weight loss in obesity treatment programs regain the lost weight a few years after treatment termination (Jeffery et al., 2000), it is possible that obesity prevention effects likewise erode over time. The fact that three interventions produced weight gain prevention effects that persisted over follow-up suggests it is possible to arm individuals with the skills necessary to avoid unhealthy weight gain in the future, though it may be necessary to offer obesity prevention programs at multiple developmental periods to maximize weight gain prevention effects. It will also be important to test whether there were actually lower rates of onset of clinically significant weight gain (e.g., obesity onset), which is an outcome that is both clinically meaningful and more consistent with the concept of prevention than reductions in average weight gain. Very few past trials have examined this outcome.
It will also be vital to evaluate the mediators that putatively account for any weight gain prevention effects. If the intervention produces change in putative mediators, but no weight gain prevention effects, or produces weight gain prevention effects, but the mediators do not change, this signals that the intervention model may be incorrect or that certain measures are unreliable or invalid. The fact that many obesity prevention programs reported significant effects for reductions in self-reported caloric intake and increases in self-reported exercise, but no significant effects for change in body mass, raises questions about the veracity of these self-report outcomes, as a true reduction in caloric intake or increase in physical expenditure should be accompanied by concomitant changes in BMI. However, it will be difficult to address these questions because extant measures of intake and activity level have limited validity, may not be sufficiently sensitive to detect the small changes in eating and activity promoted in most interventions, and may be too expensive for routine use in large-scale randomized trials (e.g., the double labeled water method of assessing energy intake).
Finally, the consistency regarding the limited returns of health behavior change prevention programs suggests that there is a need to develop and evaluate general theories regarding resistance to health behavior change. Such theories have the potential of increasing the return of obesity prevention programs, as well as prevention programs aimed at other health behaviors. The behavioral economics model of obesity (Epstein & Saelens, 1999) seems particularly well suited to understand resistance to change as it expressly recognizes that behavior is a result of the balance between benefits and costs of the behavior. An improved understanding of the benefits of overeating and a sedentary lifestyle may imply ways to overcome barriers to health behavior change. The behavioral economics model also posits that there are individual difference factors that may cause some people to obtain more reinforcement from eating and less reinforcement from exercise, thereby increasing their risk for weight gain. This perspective suggests that it would be advantageous to measure such individual differences and adapt the interventions to better address the specific needs of various subpopulations. In addition, the assertion that health behavior arises from the balance between the benefits versus costs of the behavior identifies a key challenge of preventing obesity – the costs of this behavior will be experienced in the future, whereas the benefits of overeating and sedentary behavior occur in the present. The fact that most youth are more oriented to the current benefits of a lifestyle involving a positive energy balance rather than potential future costs may make obesity prevention a particularly challenging target for prevention programs delivered to children and adolescents.
Understanding more fully the barriers to making health behavior changes may also help improve obesity prevention programs. Research suggests that internal barriers to change, such as a lack of willpower and the perception that one is too busy to make healthy changes, predict failed attempts to change diet and exercise behaviors (Ziebland, Thorogood, Yudkin, Jones, & Coulter, 1998). It has also been suggested that difficulty in impulse control and a denial of the consequences of unhealthy behaviors undermines health change efforts (Sjoberg, 2003).
Another explanation for the relatively modest impact of extant obesity prevention programs is that environmental factors, such as the availability of high fat foods and a scarcity of pleasant places to exercise in many communities, play a key role in obesity promotion (Wadden et al., 2002). If this model is correct, it will be important to attempt to directly manipulate these environmental factors in future obesity prevention trials.
In sum, this meta-analytic review suggests that most interventions do not produce the hypothesized weight gain prevention effects and that the overall average intervention effect was small. Findings also indicated that for most programs that produced significant weight gain prevention effects, the effect sizes are clinically meaningful, but usually confined to pre to post effects. Additionally, results indicated that several prevention programs targeting a variety of health behaviors, such as eating pathology and smoking, produced weight gain prevention effects. These findings are encouraging because they suggests that there may be many efficacious approaches to reducing risk for weight gain and because some of these interventions produce intervention effects for multiple health behaviors. Results did not provide support for several factors that have been hypothesized to differentiate effective from ineffective prevention programs, but did suggest that larger weight gain prevention effects were observed for programs targeting children and adolescents (versus preadolescents), females, and self-presenting samples, programs that were relatively brief, programs solely targeting weight control versus other health behaviors (e.g., hypertension), and programs evaluated in pilot trials. Future trials should follow-up promising findings and address methodological limitations of this literature (including the scarcity of long-term follow-up). Although significant progress has been made with regard to preventing this burgeoning public health problem, considerable work lies ahead.
We are very grateful to Amy Greenwold, Krista Heim, and David Huh for their assistance with the literature search and manuscript preparation.
1We did not focus on effect sizes, such as Cohen’s (1988) d, which focus on posttest mean differences across conditions without correcting for pretest mean differences. Such effect sizes estimates are not able to rule out the possibility that differences at baseline between the conditions, even if non-significant, artificially amplified or attenuated effect size estimates. This theoretically has the effect of introducing greater error variance in effect sizes estimates and therefore decreases power in analyses testing heterogeneity of treatment effects and moderators of treatment effects.
2It might be noted that only 55% of the trials that did not use random assignment to condition used matching to create the groups, suggesting that the variable reflecting random assignment was not simply a surrogate for matching, which would have complicated the interpretation of the former moderator.
3Even though there were only two unpublished trials included in the present meta-analysis, we confirmed that there was no evidence that the unpublished studies had significant different effect sizes relative to published studies (z = .03, p = .82, β = .03).
Preparation of this manuscript was supported by a research grants (MH/DK61957 and MH70699) from the National Institutes of Health.