This study describes the process of working in partnership with community stakeholders to develop a new method for monitoring program integrity. Preliminary psychometric information provides insight into how a system to evaluate the quality of program implementation performed in the context of a school-based intervention. In addition, comparison between two integrity monitoring systems speaks to the relative informativeness of a more detailed system with a more nuanced response scale. Finally, examination of data across five intervention classrooms provides initial information on the relation between treatment integrity and program outcomes.
Item-level and factor integrity scores across classrooms support the hypothesis that an integrity monitoring system with finer distinctions (System 2) has the potential to provide a more nuanced assessment of intervention integrity. Thus, whereas almost all of the variables in System 1 had a median rating at the maximum level (fully implemented), System 2 demonstrated a larger range of integrity scores across classrooms and variables. Findings also revealed that intervention integrity (per System 2) varied across classrooms.
Surprisingly, however, findings did not support the hypothesis that students in classrooms where the program was implemented with a higher level of process integrity (quality of delivery as measured by System 2
) would improve the most. Indeed, for overt and relational aggression, it appeared that the opposite might have occurred, with classrooms where the program was implemented with the lowest
rated process integrity demonstrating the most
program impact. This finding is in contrast to the expectation that greater quality of program delivery promotes stronger and more consistent effects. These unexpected findings are consistent, however, with Dane and Schneider’s (1998)
review of treatment integrity studies, which found that only 4 of the 13 studies that actually tested the relation between intervention integrity and outcomes demonstrated positive effects of program exposure or adherence (the rest indicating mixed or no effects), and none found a relation between quality
of delivery and outcomes. Unfortunately, methodological limitations and the small subset of studies formally testing the effect of program integrity precluded a clear understanding of Dane and Schneider’s findings.
Although explanations for this observation in this study remain largely speculative, there are several reasons this might have occurred. First, because all ratings of student overt and relational aggression within each of the five classrooms were completed by the same teacher, it is possible that change scores on these outcomes related to a systematic bias in teacher ratings. Second, it might be that an additional process variable—or an alternate definition of one or more of the variables that were assessed—played a more significant role in impacting outcomes for this type of classroom-based prevention program than the process variables included in System 2. For example, although System 2 originally included an item evaluating teacher participation, classroom means on this variable suggested that it did not capture the role of the teacher in a way that predicted greater program effects. Future research, however, might define teacher-related process variables in a way that is significantly related to student outcomes.
Another process variable that might relate to program impact is therapist competence. Although integrity scores and program outcomes in this study did not differ based on which therapy team administered the intervention, interventions with less intensively, or consistently trained therapists might produce differential effects based on therapist competence. Indeed, Dane and Schneider (1998)
noted in their review that significant integrity effects appeared to occur more frequently in studies that included objective integrity raters (versus the therapists themselves), perhaps due to therapist’s inflated self-ratings. Although this study included objective integrity monitors, it is possible that higher intervention quality was not associated with stronger outcomes in this program because even in the classrooms with the lowest process integrity scores, content was consistently and reliably covered. Thus, it might be that System 2
was sensitive enough to detect qualitative differences in intervention delivery across classrooms, but these qualitative differences were not large enough to interfere with delivery of program content. In other words, intervention quality or process factors might only be critical to the extent that they impact how much program content is delivered. Future research is necessary to test the predictive utility of content on outcomes and the incremental utility of additional integrity factors, such as teacher engagement.
Contributions to the Literature
This study makes several contributions to the literature. First, it provides a real-world example of how a partnership-based process assisted in the development of a culturally sensitive integrity monitoring system that accounts for theoretically and empirically based critical program components and responds to the unique needs of the target community. PAR methodology has the potential to serve as a bridge between empirically based manualized interventions and real-world practice (Power et al., 2005
). Thus, the use of partnership-based procedures to develop a multi-faceted intervention integrity system can ensure accurate implementation and monitoring of key program components and to identify areas within an intervention that clinicians can modify or adapt to better meet the needs of the specific community.
Being ultimately critical to program success and sustainability, these procedures present unique challenges as well. For example, tension can arise if community partners feel that cultural norms are not recognized or appreciated or when researchers feel that critical program components are challenged. In this study, for instance, there were particular challenges around the rating of the qualitative factor of “child behavior.” Whereas the research team and community partners agreed on the operational definition of different child behaviors (e.g., out of seat, talking out of turn), they differed in relation to the value placed on such behaviors. Thus, when researchers rated loud talking and out of seat behavior low on the integrity scale, feedback from a community partner suggested that the atmosphere was consistent with inner-city classrooms and did not reflect a lack of control that interfered with learning, as was the interpretation of the research team. Resolution of these disagreements involved extensive discussion and articulation of important issues that ultimately resulted in a more precise and nuanced measure of child behavior in the context of an urban school.
Results of this study also provide preliminary information on specific items and response scales that might be maximally informative to understanding key components of intervention integrity. Thus, data analyses suggested that System 2 be reduced to a 7-item scale. In addition, the 10-point response scale was able to more effectively capture the range of intervention integrity across classrooms. The question remains, however, the extent to which statistically significant differences in quality of delivery according to this integrity monitoring system (e.g., a factor score of 1.09 in Classroom A vs. −1.07 in Classroom C) are clinically significant.
Limitations and Future Directions
Although findings from this study are informative to future research on intervention integrity across diverse settings, there are several limitations. First, students were nested within classrooms. For example, the questionnaire assessing relational and physical aggressions was completed for each individual child by his/her teacher. As such, systematic differences across the five teachers might have occurred. Although outcome data were gathered directly from the child for other variables (e.g., knowledge, hostile attribution bias), change scores on these variables did not differ significantly across classrooms. Intervention integrity was also evaluated at the classroom level, resulting in all students within each of the five classrooms having the same integrity score and precluding a formal test of integrity as a mediator of individual child outcomes.
Another potential limitation of this study is related to the fact that System 2 was developed in response to a need identified during the early stages of the intervention. As such, System 2 was not introduced until approximately 4 weeks after the start of the intervention. In addition, schedules did not allow integrity monitors to be present at all intervention sessions, therefore, random assignment of monitors to sessions and classrooms resulted in intervention integrity being evaluated across different sessions for each classroom and for a variable number of times (5–8 observed sessions per classrooms). Finally, the decision to have integrity monitors rate items at the end of each observed session produced potential bias due to the recency effect. Monitors were, however, encouraged to make preliminary ratings and notes throughout the sessions, potentially mitigating the concern that therapist, student, and teacher behavior at the end of the session was disproportionately emphasized. Future use of this system, however, might explore potential differences in ratings near the beginning, middle, and end of the session.
Future research should focus on identifying and testing other potentially important variables related to treatment adherence, quality, and participant response. Studies should also include evaluation of outcomes and integrity factors at the individual child level (e.g., observation of individual child engagement). In addition, potential covariates such as teacher skill should be assessed so as to allow statistical control for classroom effects. Further investigation is also needed to examine the utility of process integrity in predicting change in outcomes over and above delivery of program content. In other words, if content is implemented fully, to what extent does the quality of intervention delivery matter?
Finally, research should identify the optimal response scale for integrity evaluation. Although this study moved from a 3-point response (not implemented, implemented partially, and implemented fully) to a more nuanced 1–10 range, results suggest that evaluating the quality of delivery at this level of detail might be unnecessary. Attempts should be made to examine whether fewer items (i.e., the top three factor loadings) and/or a shortened response scale would be similarly informative. These data would be particularly relevant to establishing a valid and reliable integrity monitoring system that can be used by school-based personnel implementing PRAISE in the future. Indeed, the goal of community-based research is to develop measurement and intervention tools that are feasible, acceptable, sustainable, and informative in real-world settings.
Overall, findings from this study suggest that manual-based interventions can be implemented with varying levels of quality, even by the same therapists. As such, it seems that evaluating intervention integrity as it relates to program content, quality, and participant response is critical to understanding what occurs during the intervention (Perepletchikova & Kazdin, 2005
). At the same time, findings related to differential outcomes based on quality of intervention delivery suggest that this aspect of intervention integrity must be explored further. In addition, consideration must be given to the context in which the intervention takes place. Not only should interventionists make clear the distinction between critical program components and areas for flexibility to fit participant needs, but context should also be considered in evaluating therapist competence (Waltz et al., 1993
). Interventions that include strong integrity may result in a better understanding of how and why programs work, as well as allowing facilitators to maximize impact through adherence to critical program components in a culturally appropriate manner.