These data were collected as part of the Building Recovery by Improving Goals, Habits and Thoughts (BRIGHT) study, a quasi-experimental community-based effectiveness trial that developed and evaluated a group CBT for depression in addiction treatment clients (K.E. Watkins et al., In press
We adapted an existing group CBT for depression that has demonstrated effectiveness in a variety of settings and populations (Miranda et al., 2003
; R. Muñoz, Ippen, Rao, Le, & Dwyer, 2000
; R. F. Muñoz & Mendelson, 2005
; K. B. Wells et al., 2000
), but that had not been implemented in publicly funded addiction treatment settings. Development of the adapted treatment manual was guided by two primary goals: 1) to increase the likelihood that addiction counselors could implement the therapy with fidelity and 2) to improve the existing therapy’s acceptability and appropriateness for clients in residential addiction treatment.
We conducted the adaptation of the treatment manual through a nine-month formative assessment process in which iterative feedback from multiple stakeholders was incorporated. These stakeholders included CBT experts, substance use addiction and COD experts, addiction treatment center administrators, counselors, and clients. The formative assessment process included manual revisions, followed by implementing the group in two outpatient addiction treatment settings, at which point additional revisions were made to the manual. Adaptations to increase the likelihood that addiction counselors could successfully deliver the therapy included adding “Leader Tips” boxes to the group leader treatment manuals, which included specific guidance on session timing, sample language to introduce exercises, and ways to increase client engagement and participation. Adaptations to the therapy to increase acceptability to addiction treatment clients included adding specific examples of thoughts and behaviors by individuals when using or in recovery, and adding a fourth module to the therapy that focused on the connections between substance abuse and mood.
The resulting BRIGHT therapy consisted of 16 two-hour sessions divided into four modules each focusing on a different target: Thoughts, Activities, People, and Substance Abuse. The group is designed to be semi-open, meaning that new group members can enter at the beginning of each of the four modules. Intervention materials include group leader manuals, client workbooks, and fidelity monitoring tools (described below), and are available from the first author.
All training and group implementation was conducted at Behavioral Health Services, one of the largest publicly funded addiction treatment providers in Los Angeles County. In consultation with the agency’s administrators and supervisors, we interviewed and selected five addiction counselors to be trained in delivering the therapy. We provided information about the training opportunity at an organization-wide staff meeting. Counselors could self-identify or were identified by their supervisors. We specified three minimal requirements for participation. First, counselors needed to express interest in learning CBT. Second, they had to have been employed as a counselor at the agency for at least one year, indicating some familiarity in working with addiction treatment clients and providing some indication that they had a commitment to remain employed at the treatment site for the duration of the training and group implementation. Third, counselors needed to be willing to co-lead the group with another addiction counselor and be open to leading a structured, manualized therapy. To increase the likelihood that counselors were representative of usual care counselors, advanced degrees or specialized training in mental health were not required.
We trained five addiction counselors. Four counselors were female and one was male, and reported a variety of racial/ethnic backgrounds (two were Black, two were Hispanic, one was White). At the time they were selected for training, counselors had an average of 4.2 years of experience as addiction counselors and did not have advanced degrees. Three counselors were in recovery from drugs or alcohol and four had completed an alcohol and drug addiction certification program (i.e., CADAC) from the state of California. All of the counselors reported knowing about CBT, but only one had received any prior training and supervision in CBT. This counselor had participated in an earlier pilot of a related intervention with our study team.
Counselors received two days of didactic training on 1) understanding depression and its relationship to substance abuse, 2) assessing depression symptoms, 3) group management skills, 4) understanding the CBT model for depression, and 5) introducing specific CBT exercises from the manuals. The didactic training was delivered on two consecutive full days and consisted of alternating between lecturing, demonstrations of techniques, and experiential exercises (e.g., role plays) in smaller groups. In planning the didactic training, we had a specific goal to balance didactic material with interactivity to maintain counselor interest, allow them to actively practice the skills being learned, and enable the trainers to observe their initial grasp of the concepts presented. Next, counselors then co-led one ‘practice round’ of the 16-session therapy in the outpatient setting where they were employed. While we delivered the therapy twice per week (i.e., as an 8-week treatment) in the residential setting, during the practice round the counselors led the group once per week (i.e., as a 16-session treatment). This allowed counselors additional time to prepare for each session, as they were only delivering one session per week. Following this ‘practice round,’ counselors received a one-day booster training that addressed more challenging issues that came up during the initial implementation (e.g., talking with clients about suicidal ideation). Counselors then delivered the therapy in a residential addiction treatment setting as part of the clinical trial for two and a half years. Two of the five counselors co-led the group at one time; counselors rotated every four months. Counselor pairs were planned in advanced to ensure all counselors led the group with each other, although we allowed flexibility in this plan to account for the competing scheduling commitments that the counselors experienced.
Group Clinical Supervision
Counselors received weekly group supervision throughout the training and implementation period. Supervision was provided by one of two licensed clinical psychologists with experience in CBT and COD treatment. All counselors attended the group supervision sessions, even though only two of the counselors were leading the group at any one time. The weekly supervision sessions were initially two hours during the training and implementation period, but were shortened to 90 minutes as counselors became more experienced and less supervision was required. Supervision sessions were conducted primarily in person at the agency’s administrative headquarters with only a few of the sessions conducted by phone because of the challenge involved in reviewing audio tapes of sessions during phone supervision.
The weekly supervision sessions focused on three tasks: reviewing individual client progress, reviewing the previous session(s) led, including review of portions of the audiotapes, and preparation for the upcoming session(s). Reviewing client progress was facilitated by regular administration (i.e., every other session) of the Patient Health Questionnaire (PHQ) (Spitzer, Kroenke, & Williams, 1999
). Supervision time was used to provide feedback to counselors on ways to improve their adherence and competence ratings. Initial supervision sessions also focused heavily on preparing group leaders to lead the upcoming one or two sessions. As session preparation became less necessary, this allowed for additional time for session review and deeper understanding of CBT concepts.
We enrolled clients admitted to one of Behavioral Health Service’s four residential addiction treatment centers located in Los Angeles County between August 2006 and January 2009. Study inclusion criteria were: (a) able to read and speak English, (b) a Patient Health Questionnaire (Spitzer, et al., 1999
) score of five or greater at two weeks post-residential treatment entry, and (c) a score of 17 or greater on the Beck Depression Inventory-II (Beck, Steer, & Brown, 1996
) 1 to 14 days after the first depression screening, indicative of persistent moderate to severe depressive symptoms. Study exclusion criteria included a positive screen for: (a) self-reported bipolar disorder (Sloan, Kivlahan, & Saxon, 2000
), (b) schizophrenia (using one item from the Healthcare for Communities (HCC) Psychoticism screener; (K. Wells, Sturm, & Burnam, 2001
)), and (c) cognitive impairment (as assessed by the Short Blessed Scale Exam; (Dennis, White, Titus, & Unsicker, 2006
)), or being a federal prisoner (as we did not have permission from Federal Parole Board to include these clients).
Over the course of the study enrollment period, 1,262 clients were assessed for depression at two weeks post-treatment entry. Approximately twenty-five percent of those clients met study criteria and 299 were enrolled into the study. We assigned 140 clients to the intervention condition. Fifty percent of the intervention clients were male and the sample was ethnically diverse (24% African American, 37% Caucasian, 28% Hispanic, 11% other/mixed). The average age was 35.3 (SD = 10.1) years old.
Our study sample showed symptoms related to both mental and substance use problems as assessed at the baseline interview 3-4 weeks after treatment entry. Mean BDI-II scores were in the clinically severe range (M = 32.7; SD = 8.9; range = 18-59) and almost half the sample (48.54%) met criteria for a past 12-month depressive disorder using the Composite International Diagnostic Interview, version 2.1 (Walters, Kessler, Nelson, & Mroczek, 1998
). Mental health functioning (SF-12; (Ware Jr., Kosinski, & Keller, 1996
)) scores were almost two standard deviations below the population norm (31.7; SD = 10.9; range = 8.2-55.1). The most commonly reported problem substance was amphetamines (36.8%), followed by cocaine (20.4%), alcohol (15.4%), and heroin (10.4%). Sixty-seven percent of the sample reported 12-month alcohol use that met criteria for a probable disorder (AUDIT-C; (Dawson, Grant, Stinson, & Zhou, 2005
We assessed counselor fidelity to the treatment manuals over the course of the two-and-a-half year implementation period. For patients who attended the BRIGHT group, we assessed patient perceptions including helpfulness of the group, therapeutic alliance with group leaders, and perceived improvement. We also collected BRIGHT group attendance data.
We evaluated counselor fidelity to the treatment manuals using adherence and competence measures developed for the BRIGHT therapy (Hepner, Stern, Paddock, Osilla, & Watkins, in preparation
). The adherence measure was adapted from (Jaycox et al., 2009
)) for use in this study. The number of adherence items varied by session and ranged from 10 to 18, depending on the number of exercises and new topics introduced. The adherence measure was specific to each session and required ratings on a 4-point scale (ranging from 0 to 3) of how adequately group leaders covered each session element. A score of 2 or higher indicated adequate adherence to each session element. An adherence score was computed for each session by dividing the number of items (reflecting individual session elements) that were adequately covered (scored 2 or 3) by the total number of items coded for that session. The number of items scored for each session ranged from 10-15 and followed the session outline (e.g., reviewing between session homework activities, introduction of the connection between thoughts and mood). Seven of the adherence items were scored for all sessions, while the remaining items assessed unique elements of individual sessions. Adherence was considered high if 85% of session elements were adequately covered.
The competence measure was adapted from the Cognitive Therapy Adherence and Competence Scale (Barber, Liese, & Abrams, 2003
). Adaptations were guided by the unique characteristics of the BRIGHT therapy that differed from typical individual CBT. Specifically, the BRIGHT therapy is a highly structured, group therapy in which the session agendas are largely predetermined (rather than setting the agenda with an individual at the beginning of the session) and counselors must focus on the needs of many group members (rather than a single individual). Further, the BRIGHT therapy is a modular therapy in which portions of the CBT model are focused on for a series of sessions (i.e., Thoughts module focuses on cognitive restructuring, while deemphasizing behavioral interventions). We also added items to assess group dynamics. The 14 items were rated on a 7-point scale (0-6), with an average score of 4.0 indicating competent CBT delivery, similar to the original Cognitive Therapy Adherence and Competence Scale (Barber, et al., 2003
). The same competence items were applied across all coded sessions.
All sessions were digitally recorded and 33% of sessions (N=80) were randomly selected for fidelity coding by at least one trained rater. Three raters (two MA-level, one PhD-level) received 16 hours of training, including reading the treatment manuals, coding 4 sessions independently of the other raters, and then meeting to discuss their scores after each session. Interrater reliability estimates were generated using the 13% of sessions (N=33) that we selected for double coding.
Although the fidelity measures were adapted from existing measures, they were adapted for the BRIGHT treatment; therefore, we provide a more extensive description of the inter-rater agreement for the measures. The inter-rater agreement was examined by calculating the proportion of observed agreement between raters, p0
. Since agreement can be expected by chance alone, the kappa statistic as a measure of inter-rater reliability was also examined, with values closer to 1, indicating greater reliability of rater assessments. However, kappa has some limitations. First, kappa can be relatively low if the proportion of positive responses is extreme (very high or very low) across raters. The intracluster correlation (ICC), which is often used to assess reliability for scaled or continuous items, can similarly be misleading when data have a restricted range. An example of restricted range would be when all therapists deliver a therapy with high adherence and competence. Second, kappa can be relatively high if the raters disagree on the overall proportion of positive assessments. Therefore, in addition to kappa, the prevalence adjusted bias adjusted kappa (PABAK) was evaluated, along with the prevalence index (PI) and the bias index (BI), to help understand differences between the kappa and PABAK reliability measures (Byrt, Bishop, & Carlin, 1993
). The PI is the difference between the probabilities of raters making positive versus negative assessments and addresses the first type of bias due to having an extreme proportion of responses across raters. The BI is the difference between raters with respect to the proportion of positive assessments and addresses the second source of bias mentioned above. To calculate PABAK, the adherence and competence measures must be dichotomized, reflecting whether each item was delivered with adherence (i.e., the item score was a 2 or 3) or competence (i.e., the item score >= 4). The level of inter-rater agreement is reported for each dichotomized item, with the following agreement levels based on guidelines published for kappa: poor (<0.41), moderate (0.41-0.60), substantial (0.61-0.80), and almost perfect agreement (0.81-1.00) (Landis & Koch, 1977
). We also examined ICCs for continuous items, but do not present that information here since the information conveyed by the ICCs was similar to that conveyed by kappa.
Reliability is reported in for 15 adherence items, along with the number of double-coded sessions since not all items repeat across all sessions. Note that there are up to 18 adherence items for some sessions. These additional 1 to 3 items relate to the New Topics unique to each of the 16 sessions and we did not have adequate double coded sessions on all the New Topics to report results on these remaining items. The overall adherence score, indicating whether 85% of session elements were adherent, showed substantial agreement (PABAK=0.682), despite a large observed agreement of p0=0.84, though the large prevalence index (PI=0.788) explains the difference between kappa and PABAK. Based on PABAK, seven adherence items had a substantial agreement; six items had almost perfect agreement; and two items (‘how have you been feeling’ and ‘key messages’) had poor agreement. Given that PABAK indicated substantial rater agreement for the overall adherence score; all items were included in the overall adherence measure for analyses despite two items having poor reliability. The results for three items with small numbers of double codings (fewer than 10) should be interpreted with caution. An overall competence score was estimated as the mean of the 7-point competence items. For estimating PABAK, this score was dichotomized to reflect whether the average continuous competence score was 4.0 or greater, indicating competent delivery of the therapy. Its inter-rater agreement was p0=0.777 (). The kappa statistic was very low in comparison (κ=0.079), due to a difference of PI=0.537 between the proportion of ‘yes’ and ‘no’ responses between both raters. There was very small bias (BI=0.019) between raters in the proportion of items each coded as ‘yes.’ After accounting for these biases, the PABAK score of 0.577 indicated moderate agreement, so no items were removed from the overall competence score prior to conducting analyses. The component dichotomous competence items are also presented in , with poor agreement on four items, moderate agreement for four items, substantial agreement on 1 item, and almost perfect agreement for five of the 14 items.
Inter-rater agreement and reliability of dichotomized competence and adherence items
Patient perception measures
Data were collected from clients who were assigned to the 16-session CBT group and who participated in a post-treatment follow-up interview (approximately 3 months after study enrollment). Clients were asked to report on their perceptions of helpfulness of the group using items developed for this therapy. The 13 items were rated on a 5-point scale ranging from ‘strongly agree’ to ‘strongly disagree.’ This measure demonstrated an internal consistency reliability of 0.80. Item content is included in . In addition, we assessed therapeutic alliance using the 12-item Working Alliance Inventory (WAI; (Busseri & Tyler, 2003
; Adam O. Horvath & Greenberg, 1989
)), which is a general measure of rapport and trust in the therapeutic relationship that has been found to be consistently positively related to client outcomes irrespective of type of therapy approach (A. O. Horvath & Luborsky, 1993
). Because the WAI measure has typically been applied to individual therapy, items were modified to refer to ‘group leaders.’ The WAI demonstrated an internal consistency reliability of 0.95. Clients also reported on how their life had improved since starting the BRIGHT group (“Looking back on what your life was like just before you started the group CBT/Project BRIGHT group and how it is now. How much would you say your life has improved?”). Response options were on a 5-point scale ranging from ‘Not at all improved’ to ‘Extremely improved.’ Data were also collected on the number of group CBT sessions attended.
Client perceptions of helpfulness of group CBT for depression
We used descriptive statistics to describe adherence and competence over the course of the implementation period. To evaluate whether adherence and competence varied by treatment module (i.e., Thoughts, Activities, People, and Substance Abuse modules), we fit random-effects analysis of variance (ANOVA) models to the overall adherence and competence scores for each session. To account for the non-independence of scores across sessions due to counselors co-leading multiple sessions, a multiple membership modeling approach was taken, which involved adding random counselor effects to the basic ANOVA model and estimating the session-specific counselor effect for a particular session as an average of the random counselor effects for those counselors who led that session (Browne, Goldstein, & Rasbash, 2001
; Carey, 2000
). We obtain p-values adjusted for multiple hypothesis testing when identifying statistically significant pairwise differences between particular modules using the Tukey-Kramer p-value (Kramer, 1956
) to control Type 1 error. We used the same approach to evaluate whether adherence and competence varied over time, but conducted only one pairwise test of whether the first 25% of sessions significantly differed from the last 25% of sessions since that is the most relevant comparison over time. We use descriptive statistics to describe client perceptions of the CBT group, followed by correlational analyses to evaluate the relationships between client perception variables and group attendance.