|Home | About | Journals | Submit | Contact Us | Français|
This paper reports the effects of a comprehensive elementary school-based social-emotional and character education program on school-level achievement, absenteeism, and disciplinary outcomes utilizing a matched-pair, cluster randomized, controlled design. The Positive Action Hawai‘i trial included 20 racially/ethnically diverse schools (mean enrollment = 544) and was conducted from the 2002-03 through the 2005-06 academic years. Using school-level archival data, analyses comparing change from baseline (2002) to one-year post trial (2007) revealed that intervention schools scored 9.8% better on the TerraNova (2nd ed.) test for reading and 8.8% on math; 20.7% better in Hawai‘i Content and Performance Standards scores for reading and 51.4% better in math; and that intervention schools reported 15.2% lower absenteeism and fewer suspensions (72.6%) and retentions (72.7%). Overall, effect sizes were moderate to large (range 0.5-1.1) for all of the examined outcomes. Sensitivity analyses using permutation models and random-intercept growth curve models substantiated results. The results provide evidence that a comprehensive school-based program, specifically developed to target student behavior and character, can positively influence school-level achievement, attendance, and disciplinary outcomes concurrently.
Education has an urgent need to learn more about the role of behavior, social skills, and character in improving academic achievement (Eccles, 2004; Meece, Anderman, & Anderman, 2006). Since the No Child Left Behind Act passed, education has been focused on teaching to core content standards to improve academic achievement scores, particularly in reading and mathematics, for which schools are being held accountable (Hamilton et al., 2007). Teaching to, and support for, the behavioral, social, and character domains have been relegated to no or limited dedicated instructional time (Greenberg et al., 2003). Nevertheless, schools are expected to prevent violence, substance use, and other disruptive behaviors that are clearly linked to academic achievement (Fleming et al., 2005; Malecki & Elliott, 2002; Wentzel, 1993). The prevalence of discipline problems, for example, correlates positively with the prevalence of violent crimes within a school (Heaviside, Rowland, Williams, & Farris, 1999) which, in turn, affects attendance and academic achievement (Eaton, Brener, & Kann, 2008; Walberg, Yeh, & Mooney-Paton, 1974). Further, mental health concerns become more prevalent as students move into adolescence and can contribute to behavioral problems that detract from academic achievement (Costello, Mustillo, Erkanli, Keeler, & Angold, 2003). Disciplinary problems (Dinks, Cataldi, & Lin-Kelly, 2007; Eaton, Kann et al., 2008; Eisenbraun, 2007) and underachievement abound (Coalition for Evidence-Based Policy, 2002; Perie, Moran, & Lurkus, 2005; Snyder, Dillow, & Hoffman, 2008).
To address these needs, numerous school-based programs have been developed to target problems of academic achievement (Slavin & Fashola, 1998; What Works Clearinghouse, n. d.). In addition, many other types of programs have offered the promise of improving academic performance indirectly through a focus on specific problem behaviors, such as substance use and violence (Battistich, Schaps, Watson, Solomon, & Lewis, 2000; Biglan et al., 2004; DuPaul & Stoner, 2004; Elias, Gara, Schuyler, Branden-Muller, & Sayette, 1991; Flay, 1985, 2009a, 2009b; Horowitz & Garber, 2006; Peters & McMahon, 1996; Sussman, Dent, Burton, Stacy, & Flay, 1995; Tolan & Guerra, 1994). Although some of these programs are promising, most are problem-specific and tend to address only the micro-level or proximal predictors (e.g., attitudes toward a behavior) of a single problem (e.g., violent behavior) (Catalano, Hawkins, Berglund, Pollard, & Arthur, 2002), not the multifaceted ultimate (e.g., safety of neighborhood) and distal (e.g., bonding to parents) factors that influence many other important outcomes (Flay, 2002; Flay, Snyder, & Petraitis, in press; Petraitis, Flay, & Miller, 1995; Romer, 2003) Consequently, programs have had limited success (Catalano et al., 2002; Flay, 2002).
As practitioners, policymakers, and researchers have implemented programs and sought to raise academic achievement and address negative behaviors among youth, an increasing amount of evidence indicates a relationship among multiple behaviors (Botvin, Griffin, & Nichols, 2006; Botvin, Schinke, & Orlandi, 1995; Catalano, Berglund, Ryan, Lonczak, & Hawkins, 2004; Flay, 2002). Several mechanisms involving multiple behaviors have been identified in improving student behavior and performance (Greenberg et al., 2003; Zins, Weissberg, Wang, & Walberg, 2004). This suggests that key behaviors do not exist in isolation from each other. Moreover, prevention research offers ample empirical support showing that many youth outcomes, negative and positive, are influenced by similar risk and protective factors (Catalano et al., 2004; Catalano et al., 2002; Flay, 2002). That is, most, if not all, behaviors are linked (Flay, 2002). For example, the early initiation of alcohol and cigarette use and/or abuse is associated with lower academic test scores (Fleming et al., 2005). Further, early initiation of substance use and sexual activity can place youth at a greater risk of mental health disorders and aggressive behaviors (Gustavson et al., 2007; Hallfors, Waller, Bauer, Ford, & Halpern, 2005) and continuation of substance use through adolescence and into adulthood (Merline, O’Malley, Schulenber, Bachman, & Johnston, 2004).
Subsequently, there has been a movement toward more integrative and comprehensive programs that address multiple co-occurring behaviors and that involve families and communities. Such programs generally appear to be more effective (Battistich et al., 2000; Catalano et al., 2004; Derzon, Wilson, & Cunningham, 1999; Elias et al., 1991; Flay, 2000; Flay, Graumlich, Segawa, Burns, & Holliday, 2004; Hawkins, Catalano, Kosterman, Abbott, & Hill, 1999; Hawkins, Catalano, & Miller, 1992; Kellam & Anthony, 1998; Lerner, 1995). One of these programs currently being used nationally is the Positive Action (PA) program. PA is a comprehensive school-wide social-emotional and character development (SACD) program (Flay & Allred, 2003; Flay, Allred, & Ordway, 2001) developed to specifically target the positive development of student behavior and character.
Based on quasi-experimental studies, PA has been recognized in the character-education report by the U.S. Department of Education’s What Works Clearinghouse as the only “character education” program in the nation to meet the evidentiary requirements for improving both academics and behavior (What Works Clearinghouse, June, 2007). Preliminary findings indicate that PA can positively influence school attendance, behavior and achievement. Two previous quasi-experimental studies utilizing archival school-level data (Flay & Allred, 2003; Flay et al., 2001) reported beneficial effects on student achievement (e.g., math, reading, and science) and serious problem behaviors (e.g., suspensions and violence rates).
The first study (Flay et al., 2001) used School Report Card (SRC) data from two school districts that had used PA within a number of elementary schools for several years in the 1990’s. Schools were rank ordered on poverty and mobility and each PA school was matched with the best matched non-PA school(s) having similar ethnic distribution. Results indicated that PA schools scored significantly better than the non-PA schools in their percentile ranking of 4th grade achievement scores and reported significantly fewer incidences of violence and lower rates of absenteeism. The second study (Flay & Allred, 2003) used a similar methodological approach but expanded the variables on which PA and non-PA schools were matched to include dependent variables (e.g., reading and math achievement) assessed before the introduction of PA. Results confirmed previous findings and also demonstrated that involvement in PA during elementary school improved academic and disciplinary outcomes at both the elementary and secondary levels.
In sum, the prior quasi-experimental studies provide preliminary evidence regarding the effects of PA on academic achievement and disciplinary outcomes. However, these findings are in need of confirmation utilizing a randomized design (Flay, 1986; Flay et al., 2005), a standard considered vital before an intervention is ready for broad dissemination (Flay et al., 2005). Designs that use matching without random assignment leave open the possibility that variables other than those measured were responsible for observed posttest differences, rather than the intervention itself. Additionally, the previous quasi-experimental studies lacked data on program implementation, a measurement that is desirable to ensure that implementation occurred and, if so, how well it occurred (Domitrovich & Greenberg, 2000; Durlak & DuPre, 2008; Flay et al., 2005).
Utilizing student self-report data from the current randomized trial, Beets and colleagues (2009) examined the preventive benefits of PA on rates of student self-report and teacher reports of student substance use, violence, and voluntary sexual activity. Results indicated lower rates of substance use, violence and sexual activity among students attending PA schools. Overall, this randomized trial 1) replicated findings from quasi-experimental studies regarding violence and substance use and 2) found that PA can also alter other behaviors, such as sexual activity, that the program does not address directly. Hence, even though PA did not teach sexual responsibility, for example, the SACD content produced effects on sexual activity. Previous results suggest a mechanism that leads PA to positively affect multiple outcomes, such as sexual responsibility and academic achievement, even though the program does not include explicit discussion of these outcomes.
The purpose of the present study was to apply a matched-pair, cluster randomized, controlled design to evaluate the effects of PA on school-level indicators of academic achievement, absenteeism, and disciplinary outcomes. School-level data are useful for estimating causal effects but are underutilized (Stuart, 2007). The present study builds on extant research and is the first to report the effects of PA on school-level outcomes from a randomized, controlled design; thus, it provides the most rigorous test yet conducted for whether PA can improve school-level performance, and greatly reduces the possibility that factors other than the PA intervention are responsible for observed posttest group differences. PA was hypothesized to result in decreased absenteeism, disciplinary referrals and grade retentions and improved academic achievement.
The PA Hawai‘itrial was a matched-pair, cluster randomized, controlled trial, conducted during the 2002-03 through 2005-06 school years, with a one-year follow-up in 2007, in Hawai‘i elementary schools. The state is one large school district with diverse ethnic groups and a recognized need for improvement (i.e., low standardized test scores and a high percentage of students receiving free or reduced-price lunch). The trial took place in 20 public elementary (K-5th or K-6th) schools (10 matched-pairs) on three Hawai‘ian islands. Eligible schools for the study were those elementary schools that 1) were located on Oàhu, Maui or Moloka‘i, 2) were K-5 or K-6 community schools (were not academy, charter, or special education), 3) had at least 25% of students receiving free or reduced-price lunch, 4) were in the state’s lower three quartiles of standardized test scores, and 5) had annual student mobility rates under 20%, thereby ensuring that at least 40% of a selected cohort was still in the same school by the end of the trial. To ensure comparability of the intervention and control schools with respect to baseline measures, 2000 SRC data on 111 eligible schools were used to stratify schools into strata ranked on an index based on 1) demographic variables of percent free or reduced-price lunch, school size, percent stability, and ethnic distribution; 2) characteristics of the student populations such as percent special education, and limited English proficiency; and 3) indicators of student behavior and performance outcomes such as standardized test scores, absenteeism, and suspensions (Dent, Sussman, & Flay, 1993; Flay et al., 2004; Graham, Flay, Johnson, Hansen, & Collins, 1984). Schools were matched based on their index score, resulting in 19 utilizable strata. Matched pairs were randomly selected from within strata, with one school of each pair randomly assigned to either the intervention or control condition before recruitment.
Starting with schools only on Oàhu (to limit travel costs), intervention schools were asked to implement PA whereas the control schools were asked to continue “business as usual” without making any substantial SACD reforms. Once it was evident that no additional schools could be recruited on Oàhu, recruitment began using strata from Maui and Moloka‘i. The final sample of schools was representative of Hawai‘ian schools, though with higher stability (as intended) and at higher risk (as intended) as indicated by percent free or reduced-price lunch and standardized test scores, respectively.
Intervention schools were offered the complete PA program free of charge and control schools were offered a monetary incentive during the randomized trial and the PA program upon completion of the trial. Three of the 10 control schools chose to receive the PA program after the formal trial; they were treated as controls at the follow-up to the present study, as anecdotal evidence suggests that they did not fully implement the program, and it is likely that schools need several years to fully implement a comprehensive program to see substantial benefits (Beets et al., 2009; Li et al., 2009).
The Positive Action program (www.positiveaction.net) is a comprehensive, school-wide SACD program designed to improve academics, student behaviors and character. The program, developed in 1977 by Carol Gerber Allred, Ph.D. and revised since then as a result of process and outcome evaluations, is grounded in a broad theory of self-concept (Purkey, 1970; Purkey & Novak, 1970), is consistent with integrative, ecological, theories of health behavior such as the Theory of Triadic Influence (Flay & Petraitis, 1994; Flay et al., in press), and is described in detail elsewhere (Flay & Allred, 2003; Flay et al., 2001). The full PA program consists of K-12 classroom curricula, of which only the elementary curriculum was used in the present randomized trial; a school-wide climate development component, including teacher/staff training by the developer, a PA coordinator’s (principal’s) manual, school counselor’s program, and PA coordinator/committee guide; and family- and community-involvement programs.
The sequenced elementary curriculum consists of 140 lessons per grade, per academic year, offered in 15-20 minutes by classroom teachers. When fully implemented, the total time students are exposed to the program during a 35 week academic year is approximately 35 hours. Lessons cover six major units on topics related to self-concept (i.e., the relationship of thoughts, feelings, and actions) physical and intellectual actions (e.g., hygiene, nutrition, physical activity, avoiding harmful substances, decision-making skills, creative thinking), social/emotional actions for managing oneself responsibly (e.g., self-control, time management), getting along with others (e.g., empathy, altruism, respect, conflict resolution), being honest with yourself and others (e.g., self-honesty, integrity, self-appraisal) and continuous self-improvement (e.g., goal setting, problem solving, courage to try new things, persistence). The classroom curricula utilize an interactive approach, whereby interaction between teacher and student is encouraged through the use of structured discussions and activities, and interaction between students is encouraged through structured or semi-structured small group activities, including games, role plays and practice of skills. For example, students are asked how they like to be treated. Regardless of age, socioeconomic status, gender or culture, students and adults suggest the same top values of respect, fairness, kindness, honesty, understanding/empathy and love, consistent with others’ findings (Nucci, 2001). These values are then adopted as the code of conduct for the classroom and school (Flay & Allred, in press).
The school-climate kit consists of materials to encourage and reinforce the six units of PA, coordinating school-wide implementation. Included in the kit, the PA coordinator’s (principal’s) manual directs the use of materials such as posters, music, tokens, and certificates. It also includes information on planning and conducting assemblies, creating a PA newsletter, and establishing a PA committee to create a school-wide PA culture. Additionally, a counselor’s program, implemented by school counselors, specializes in developing positive actions with students at higher risk and their classrooms, families, and the school as a whole. The family-involvement program is available in various levels of involvement and promotes the core elements of the classroom curriculum and reinforces school-wide positive actions. The parent manual is designed for parents to use at home and includes materials that parallel the classroom curriculum. The present study did not include the more intensive family kit. The community-development component of PA was not used in this trial.
Prior to the beginning of each academic year, teachers, administrators, and support staff (e.g., counselors) attended PA training sessions conducted by the program developer. The training sessions lasted approximately 3-4 hours for the initial year, and 1-2 hours for each successive year. Booster sessions, conducted by the Hawai‘i-based project coordinator and lasting approximately 30-50 minutes, were provided an average of once per academic year for each school. Additionally, mini-conferences were held in February of each year to bring together 5-6 leaders and staff (e.g., principals, counselors, teachers) from each of the 10 participating schools in order to share ideas and experiences as well as to get answers to any concerns regarding implementing the program.
Archival school-level data were obtained from the Hawai‘i Department of Education (HDE) as part of the state’s SRC data accountability system (Hawai’i Department of Education, n. d.-b), with different indicators available at different time points as shown in Table 1. The SRC data were included in schools’ School Status and Improvement Report, designed to provide information on schools’ performance and progress. Absenteeism, suspensions, retention in grade, and four academic achievement indicators, served as the dependent variables for the present study; these were chosen because they were the publically-available indicators of school performance; corresponding classroom- and student-level data were not available due to privacy considerations. School-level performance is an appropriate measure of program effectiveness because the PA Hawai’i trial tested a school-wide implementation of the program and whole schools were randomized to condition (Stuart, 2007).
The four school-level academic achievement variables included the grade 5 math and reading standardized test (percent scoring average or above; the HDE switched from the Stanford Achievement Test [SAT] to the TerraNova [2nd ed.] test at one-year follow-up during the current study), and the grade 4 math and reading Hawai‘i Content and Performance Standards (HCPS II) (percent proficient). The math and reading SAT and TerraNova (2nd ed.) are national normreferenced tests that are utilized by school districts in the U.S. to assess achievement of students from kindergarten through high school. The math and reading HCPS II were developed by the HDE through a collaborative process involving teachers and HDE curriculum specialists and represent the HDE performance standards to meet No Child Left Behind mandates (Hawai’i Department of Education, n. d.-a). The archival school-level academic achievement data were available continuously, from 2002 to one-year post trial, as intervention schools continued to implement the PA program. Achievement scores were not reported for one of the 10 pairs of schools because they had too few students at each grade level, so these schools were not included in the primary analysis. There were no missing data for the other dependent variables.
The other three school-level indicators used in this study included: 1) absenteeism (average number of days absent per year, 2) suspensions (percent suspended), and 3) retentions (percent retained in grade, i.e., kept back a grade). Student suspensions may have occurred due to, for example, disorderly conduct, burglary, truancy, and contraband (e.g., possession of tobacco). Suspension data represent all grade levels at each school, and the retention variable included students who were retained in all grades except kindergarten. The archival school-level absenteeism data were available annually from 1997 to 2007; the suspension data from 1999 to 2007; and the retention data from 2002 to 2007.
Thus, the archival data utilized in the present analysis were collected from schools with a different student body each academic year, and intervention schools, over time, had increasing exposure to PA. For example, archival school-level data collected for PA schools during the 2005-2006 academic year represented schools with students who were exposed to the intervention for up to four years compared to the 2002-2003 academic year.
As part of the PA Hawai’i trial, sufficient data from year-end process evaluation surveys were collected from teachers at the end of the second (2004), third (2005), and final year (2006) of program implementation and are described in detail elsewhere (Beets et al., 2008). We used three school-level implementation indicators related to program exposure and adherence: 1) exposure, measured by seven items (i.e., six items referred to the six units in the PA curriculum and asked about how often the teachers taught the concept throughout the school day, and an additional item assessed the amount of PA workbooks and activity sheets used during a typical day), 2) classroom material usage, measured by three items (i.e., how often teachers used PA materials/activities) and 3) school-wide material usage, measured by tree items (i.e., how often PA materials/activities were used throughout the school). All item responses ranged from 1 “never” to 5 “always.” Alpha reliabilities were adequate (Beets et al., 2008).
The three school-level implementation indicators and an overall school-level implementation indicator were calculated at the second (2004), third (2005), and final year (2006) of program implementation using several steps. First, based on teachers’ responses to the items that comprised each of the different implementation indicators, we calculated mean teacher-level indicator scores. Second, using the teacher-level indicator scores, a mean school-level implementation indicator was calculated for every school each year. Lastly, an overall school-level implementation indicator was calculated by computing the mean across all schools for each year of program implementation.
During the spring of the final year of the four-year randomized trial, data were collect from one school leader (i.e., principal, vice principal, counselor) from each treatment and control school regarding the SACD programs and/or activities that were conducted in their school during the prior three academic years. Respondents were asked to list up to 16 SACD programs. For each program, respondents indicated the number of weeks the program was offered, the amount of time (minutes) devoted to the program per week, and whether or not teachers attended/received training to deliver the program (yes/no).
For our primary analysis, we used matched paired t-tests, Hedges’ adjusted g as a measure of effect size (Grissom & Kim, 2005; Hedges & Olkin, 1985), and percent relative improvement (RI). To assess the robustness of results, permutation tests and random-intercept growth curve models were used for sensitivity analyses. The random-effects growth curve models provide some statistical control beyond randomization for potentially confounding unmeasured variables in case randomization was not totally successful with 10 schools per condition. This battery of statistical approaches was used separately for each of the outcomes and was applied to end-of-study (2006) and one-year post trial (2007) outcomes.
First, matched paired t-tests of difference scores were used to examine change in school-level outcomes by condition. For each outcome, two difference scores [posttest (2006) – baseline (2002) and one-year post trial (2007) – baseline (2002)] were calculated for each pair of intervention and control schools and a paired t-test was performed. In a randomized design, the difference in means provides an unbiased estimate of the true average intervention effect (Stuart, 2007).
Second, effect sizes for absenteeism, suspensions, retentions and each of the four achievement outcomes were calculated by subtracting the mean difference of control schools from the mean difference of PA schools and dividing by the pooled posttest standard deviation. Hedges’ g (as well as other measures of effect size such as Cohen’s d and Glass’ d) has some positive bias; therefore, Hedges’ approximately unbiased adjusted g was calculated. Moreover, the adjusted g is an appropriate effect size calculation when the sample size is small (Grissom & Kim, 2005). Effect sizes were examined at posttest and at one-year post trial and were interpreted as small (0.2), moderate (0.5) or large (0.8) (Cohen, 1977).
Additionally, we calculated RI as an indicator of effect size that may be more understandable to practitioners. RI is the posttest difference between groups minus the baseline difference between groups, divided by the control group posttest level; that is, (PA mean – C mean) posttest – (PA mean – C mean) baseline / C mean posttest, expressed as a percentage.
Subsequently, to avoid reliance on t-test assumptions alone and as a sensitivity analysis, permutation tests were conducted with Stata v10 permute, which estimates p-values based on Monte Carlo simulations (Stata Corp., College Station, TX). Both paired t-tests of differences and permutation models have demonstrated good performance in randomized trials when the number of pairs is small (Brookmeyer & Chen, 1998).
Lastly, random-intercept growth curve models (see Appendix A) were conducted with Stata v10 xtmixed (Rabe-Hesketh & Skrondal, 2008) to account for all observations and to model school differences. That is, this allows a more complete analysis of the multiple waves of available data (5 waves of data at posttest; 6 waves of data at one-year post trial) and takes into account the pattern of change over time. The random-intercept model allows the intercept to vary between schools, which indicates that some schools tend to have, on average, better outcomes and other schools have worse outcomes. The random coefficient is fixed, which reflects that intervention effects are similar for all schools. To estimate effects with missing values present, full information maximum likelihood estimation was used which utilizes all available data to provide maximum likelihood estimation (Acock, 2005). For the present analyses, each growth curve involved approximately 100 observations (5 waves × 20 schools at posttest; 6 waves × 20 schools at one-year post trial). Although this sample size is at the lower end of some suggested guidelines for this estimator, it is adequate as a supplementary sensitivity analysis, as different views exist regarding appropriate sample size (Singer & Willett, 2003).
For each outcome, from baseline through both posttest and one-year post trial, we tested whether a quadratic term for time was significant using the likelihood-ratio (LR) test (Rabe-Hesketh & Skrondal, 2008). Through posttest, results indicated that a quadratic model provided a significantly better fit for the data on reading HCPS II (LR χ2 = 14.92, p < .001) and absenteeism (LR χ2 = 6.25, p < .05). Through one-year post trial, results showed that a quadratic model fit significantly better for math TerraNova (LR χ2 = 4.04 , p < .05), reading TerraNova(LR χ2 = 4.56 , p < .05), math HCPS II (LR χ2 = 17.04, p < .001), and absenteeism (LR χ2 = 19.39, p < .001).
For the remaining outcomes (school suspensions and retentions), from baseline through both posttest and one-year post trial, we conducted random-intercept Poisson models with Stata v10 xtpoisson (Rabe-Hesketh & Skrondal, 2008). As is common with elementary school-level data, frequency distributions for school suspensions and retentions were skewed at both posttest and one-year post trial. Hence, a random-intercept Poisson model was used to account for this skewed distribution. The mean and variance of the suspension and retention variables were similar through posttest (suspensions [M = 0.95; variance =1.09]; retentions [M = 0.99; variance = 0.92]) and one-year post trial (suspensions [M = 1.07; variance = 1.72]; and retentions [M = 0.94; variance = 0.88]), an assumption of the Poisson model (Snijders & Bosker, 1999); therefore, we did not adjust for overdispersion. Similarly, as discussed above, a LR test was used to compare random-intercept Poisson models with the inclusion of a quadratic term. Only the result for suspensions (LR χ2 = 4.85, p < .05) at one-year post trial demonstrated a quadratic model provided a better fit for the data.
Additionally, to test whether the pattern of curvilinear change was different in PA and control schools, a year squared by condition interaction term was included in the quadratic models, and a LR test was performed. Results indicated that the inclusion of an interaction term did not significantly improve any of the quadratic models and, hence, was not included in the final models.
At the 2002 baseline no significant differences (p ≥ .05) existed between intervention and control schools on any of the SRC variables (Table 2; Table 4 displays outcome variables). Thus, the methods of developing strata and random selection and assignment were effective for these variables. Schools were racially/ethnically diverse with a mean enrollment of 544 (SD = 276.41).
There was some variability in school-level implementation between schools, with small improvements across years (Table 3). Regarding the three school-level indicators examined, school-wide material usage demonstrated the highest school-level implementation. Implementation was adequate for each indicator; however, results indicated that schools could have implemented PA with greater fidelity.
We found that control schools reported implementing an average of 10.2 SACD programs compared with 4.2 -- in addition to PA -- in the intervention schools. Teachers in control schools spent an average of 108 minutes per week on SACD-related activities. PA-school teachers spent the expected amount of time on PA (55.1 min/week), yet overall they still spent only 35 min/week more on SACD-related activities than teachers in control schools. Control schools reported that teachers were involved in SACD-related activities for an average of 24 weeks per school year. In contrast, teachers in intervention schools reported delivering PA almost every week of the school year as well as being involved in other SACD-related activities for 25 weeks/year. Both PA and control school teachers reported receiving training to implement approximately half of the SACD-related programs (52.3% and 53.3%, respectively) that they reported implementing other than PA (100% trained).
Raw means for school-level academic achievement, absenteeism, suspensions, and retentions are presented in Figures Figures11 and and2,2, respectively. Overall, for the academic achievement outcomes, raw means for PA and control schools were statistically similar at baseline and demonstrated a clearly discernable divergence over time. State averages for academic achievement are shown for comparison. Although the PA schools were well below state averages at baseline (as planned), they nearly met or exceeded the state averages for academic achievement at posttest and one-year post trial.
Likewise, for the other school-level outcomes, PA and control schools diverged between baseline and posttest. For absenteeism and suspensions, pre-baseline years of archival school-level data were available and provide an interrupted time series presentation. As expected, these outcomes were stable for several pre-program years with divergence occurring after the intervention.
The results of the matched paired t-tests of difference scores and effect size calculations at posttest and one-year post trial are presented in Table 4. At posttest, results indicated that PA schools had significantly higher math (p < 0.05) and reading (p < 0.05) HCPS II scores; and significantly lower absenteeism (p < 0.001), with marginally fewer suspensions (p = 0.056). After completion of the randomized trial, at one-year post trial as PA schools continued to implement the PA program, reading TerraNova(p < 0.05) and math (p < 0.01) and reading (p < 0.05) HCPS II were significantly higher among PA schools; and absenteeism (p < 0.001) and suspensions (p < 0.05) were significantly lower for PA schools. Overall, results indicated higher achievement and lower absenteeism and suspension outcomes for the PA schools. The permutation models provided similar statistically significant results as the matched paired t-tests at both posttests. That is, permutation tests at posttest indicated statistically significant results for math (marginal p = 0.054) and reading (p < 0.01) HCPS II and absenteeism (p < 0.01); and at one-year post trial reading (p < 0.05) TerraNova, math (p < 0.001) and reading (p < 0.05) HCPS II, absenteeism (p < 0.001), and suspensions (p < 0.05) were significantly different for PA schools as compared to control schools.
In order to provide a basis for comparing the magnitude of the intervention effects we found with effects found in other trials, effect sizes were calculated. As shown in Table 4, all of the effect sizes were moderate to large, regardless of the level of significance. Corresponding effect size calculations demonstrated moderate to large treatment effects for the academic achievement, absenteeism, and disciplinary outcomes at posttest, with larger effects at one-year post trial. Similarly, RIs were larger at one-year post trial.
The estimates for the intervention effect on academic achievement scores (random-intercept models) from baseline through posttest and one-year post trial are presented in Table 5. At posttest, the intraclass correlation coefficient (ICC; expressed as the proportion of the total outcome variation that is attributable to differences among schools) for the unconditional means models (Singer & Willett, 2003) were .72, .67, .87, and .72 for math SAT and HCPS II and reading SAT and HCPS II, respectively. At one-year post trial, the ICC for the unconditional means models were .68, .46, .87, and .66 for math TerraNova and HCPS II and reading TerraNova and HCPS II, respectively, indicating that most of the variation in academic achievement lies between schools, rather than within schools over time. Overall, through both posttest and one-year post trial, the random-intercept models’ year by condition interactions substantiated results of the matched paired t-tests and permutation models, indicating higher achievement increases in PA schools. For change from baseline through one-year post trial, the time by condition interactions for math TerraNova (B = 1.34, p < .05) and HCSPII (B = 2.69, p < .001) and reading TerraNova (B = 1.35, p < .01) and HCPS II (B = 2.10, p < .05) were all statistically significant. These effects indicate about a 2 percentage point advantage per year for the PA group compared to the control group due to the intervention, or about a 12 percentage point advantage across the six-year period.
The estimates for the intervention effect on the absenteeism, suspension, and retention outcomes (random-intercept and random-intercept Poisson models) from baseline through both posttest and one-year post trial are presented in Table 6. Parameter estimates and incidence rate ratios (IRR) are each presented for the random-intercept Poisson models, as an intercept parameter is not calculated for IRR estimates and, additionally, a residual variance estimate is not part of such models (Rabe-Hesketh & Skrondal, 2008). At posttest, the ICCs for the unconditional means models were .88, .52, and .47 for absenteeism, suspensions, and retentions, respectively. The ICC values for the Poisson models are approximations and were calculated utilizing a similar approach as used for the random-intercept models (Goldstein, Browne, & Rasbash, 2002). At one-year post trial, the ICCs for the unconditional means models were .88, .52, and .41 for absenteeism, suspensions, and retentions, respectively. Thus, much of the variation in absenteeism, nearly half of the variation in suspensions, and less than half the total variation in retentions can be attributable to differences between schools.
Regarding absenteeism, from baseline through both posttest (Year × Condition B = −0.45, p < .001) and one-year post trial (Year × Condition B = −0.36, p < .001), the random-intercept growth models substantiated results of the matched paired t-tests, demonstrating a significant reduction in absenteeism among PA schools relative to control schools. However, as compared to the matched paired t-tests, inconsistent results emerged for the suspension and retention outcomes. The random-intercept growth curves indicated a marginally significant (B = −0.20, p = .06; IRR [95%CI] = 0.82 [0.67, 1.01]) year by condition interaction for the suspension outcome from baseline to one-year post trial, where the t-tests did not. Further, inconsistent with the non-significant matched paired t-test, the retention year by condition interactions through posttest (B = -0.30, p < .05; IRR = 0.74 [0.54-1.00]) and one-year post trial (B = −0.30, p < .05; IRR= 0.74 [0.58-0.95]) were statistically significant. Therefore, overall, the random-intercept and random-intercept Poisson models demonstrate decreased absenteeism, disciplinary and retention outcomes among PA schools relative to control schools.
The present study extends previous research on the capabilities of school-based interventions targeting social-emotional and character development to improve academic performance and attendance and reduce disciplinary problems and grade retention in schools. This study also confirms earlier preliminary findings of beneficial results of the PA program from quasi-experimental studies (Flay & Allred, 2003; Flay et al., 2001) using a matched-pair, cluster randomized, controlled trial. Specifically, as indicated by matched paired t-tests and permutation models, PA schools scored significantly better than control schools in reading TerraNova and math and reading HCPS II; and significantly lower absenteeism and suspensions at one-year post trial. Moreover, random-intercept growth models demonstrated that PA schools showed significantly greater growth in math and reading TerraNova, math and reading HCPS II; and significantly lower absenteeism and retentions through one-year post trial, with suspensions showing marginal significance. Indeed, school-level means for math and reading achievement demonstrated that PA schools, which were below state averages at baseline, nearly met or exceeded state averages by posttest and one-year post trial. These findings were especially noteworthy since many of the schools were in low income areas and had a high level of racial/ethnic diversity.
The present results demonstrated moderate to large effect sizes on all of the observed outcomes and were likely the result of several notable attributes of the PA program. First, PA addresses distal influences on behavior in a multifaceted way; PA is a comprehensive approach that involves providing the curriculum to all grades in the school at once, involving all teachers and staff in the school, and involving parents and the community. The PA program assists students and adults to gain not only the knowledge, attitudes, norms and skills that they might gain from other programs, but also improved values, self-concept, family bonding, peer selection, communication, and appreciation of school, with the expected result of improvement in academic performance and a broad range of behaviors. These improved outcomes may occur because positive behaviors tend to correlate negatively with negative behaviors (Flay, 2002). More specifically, with regards to academic achievement, for example, PA increases positive behaviors and decreases disruptive behaviors which, in turn, lead to more time on task for teaching and, in turn, more opportunity for student learning (Flay & Allred, in press). Also, improvements in students’ positive behaviors, such as attention and inhibitory control, can lead to increased academic achievement throughout formal schooling (McClelland, Acock, & Morrison, 2006).
Second, PA is “interactive” in delivery, using methods that integrate teacher/student contact and communication opportunities for the exchange of ideas, and utilize feedback and constructive criticism in a non-threatening atmosphere (Tobler et al., 2000). Third, the results observed may also have been a consequence of the intensive nature of the program, with students receiving approximately 1 hour of exposure during a typical week over multiple school years. Lastly, in the present study, we believe that the beneficial effects of the PA program could have been even greater if the fidelity of implementation was excellent.
This analysis has some limitations. First, data regarding academic achievement, absenteeism, suspensions, and retention outcomes were not available at the student or classroom level. Because of this, variation in scores within students across years, or variation between students within schools could not be examined. As a result, individual student or classroom characteristics could not be included as predictors in the models to reduce unexplained variation. However, with random assignment, student and classroom characteristics should be about the same in the intervention and control groups. In addition, random-intercept models provide some statistical control for unmeasured differences between schools. Since every student’s score contributes to a school’s mean score, the design and analysis in this study provides a good test for intervention effects (Stuart, 2007). Future work that utilizes multilevel analysis of student-level indicators of academic achievement, absenteeism, and disciplinary outcomes would be beneficial.
Second, although school-level data are useful for estimating causal effects (Stuart, 2007), there may be inconsistencies among schools regarding how data, such as disciplinary-related referrals, are reported. Furthermore, it is possible that an intervention could influence how these data are reported. For example, a negative behavior that results in a disciplinary referral after an intervention is implemented may not have been grounds for a disciplinary referral before the intervention.
A third limitation of our analyses is that only 20 schools participated in the study, with five waves of data resulting in 100 observations per random-effects growth curve model. Under conditions of small effect size and high ICC, this could result in relatively low statistical power to detect differences between treatment and control schools. This study found moderate to large effect sizes, but also large ICCs, so power was a concern. However, a successful matched-pair design can improve statistical power (Raudenbush, Martinez, & Spybrook, 2007), and our findings demonstrate a successful matched-pair design as well as its ability to detect statistical significance.
Fourth, there were a limited number of observations available for the random-effects growth curve models. With full information maximum likelihood estimation used in those models, a large sample is desirable (Hayes, 2006) to guarantee the accuracy of the estimates, although there are various viewpoints on what constitutes a large sample size (Singer & Willett, 2003). Our sample was large enough to use these models to compare the sensitivity of the matched paired t-tests and permutation tests to an alternative statistical model, with different assumptions. The random-intercept models substantiated our findings from the more basic tests.
Fifth, although we demonstrated adequate implementation of PA and realize the importance of implementation fidelity (Flay et al., 2005), we had insufficient data (i.e., insufficient variation given a sample of only 10 PA schools) to examine implementation as a covariate. Also, we did not have data to observe the change in SACD-related activities in control schools. As indicated by the data procured during the last year of the four-year trail, the widespread self-initiation of SACD-related activities, especially in control schools, can reduce the possible effect size that can be detected when evaluating school-based interventions (Hulleman & Cordray, 2009). Additionally, because implementation data were not collected after completion of the randomized trial, we could not examine implementation at one-year post trial. Future studies with larger samples of schools would be valuable to examine the effects of implementation fidelity on school-level outcomes.
Lastly, as with all other similar studies, results can only be generalized to schools that are willing to conduct such a program. Though our sample was adequate for this study, a larger representative sample of schools, or randomized trials at different locations, would allow generalization of results to a broader population.
These limitations notwithstanding, this study is the first to examine the effects of PA on school-level achievement, absenteeism, and disciplinary outcomes using a matched-pair, cluster randomized, controlled design. The study extends research on the ways that changing a child’s developmental status in non-academic areas can significantly enhance academic achievement (Catalano et al., 2004; Catalano et al., 2002; Flay, 2002) and actually, may be essential for it. Future research should examine the specific mechanisms, moderators and mediators of social and character development intervention effects. Such knowledge would allow adjustments to PA that might increase the beneficial effect.
Unfortunately, elementary schools, with many demands for accountability, may concentrate solely on math, reading, and science achievement; and, due to resource and time constraints, instruction regarding social and character development may be abandoned. The findings of this study provide evidence that the Positive Action program, which has demonstrated effects on improving student behavior and character (Beets et al., 2009; Li et al., 2009) can also reduce school-level absenteeism and disciplinary outcomes and, concurrently, positively influence school-level achievement. Indeed, this study makes clear that a comprehensive school-based program that addresses multiple co-occurring behaviors can positively affect both behavior and academics.
The estimated outcome, Yij is assumed to have a Poisson distribution with expectation μij.