|Home | About | Journals | Submit | Contact Us | Français|
This study introduces an observational measure of fidelity in evidence-based practices for adolescent substance abuse treatment. The Therapist Behavior Rating Scale—Competence (TBRS-C) measures adherence and competence in individual cognitive-behavioral therapy and multidimensional family therapy for adolescent substance abuse. The TBRS-C assesses fidelity to the core therapeutic goals of each approach and also contains global ratings of therapist competence. Study participants were 136 clinically referred adolescents and their families observed in 437 treatment sessions. The TBRS-C demonstrated strong interrater reliability for goal-specific ratings of treatment adherence, modest reliability for goal-specific and global ratings of therapist competence, evidence of construct validity, and discriminant validity with an observational measure of therapeutic alliance. The utility of the TBRS-C for evaluating treatment fidelity in field settings is discussed.
Evaluating the feasibility and effectiveness of research-developed treatments for substance use disorders in usual care settings has become a national healthcare priority (Institute of Medicine, 2006). Substance abuse treatment programs are facing increased stakeholder demands for adherence to empirically based practice guidelines (Hayes, Barlow, & Nelson-Gray, 1999), and many programs describe their efforts to implement evidence-based practices as a cornerstone of quality care (e.g., Henderson et al., 2007; Mark et al., 2006). Also, emerging research indicates that strong fidelity is critical for successful transportation of research-based protocols (Henggeler, Melton, Brondino, Scherer, & Hanley, 1997; Henggeler, Pickrel, & Brondino, 1999). Reliable, brief, and cost-efficient evaluation tools are therefore needed to assess fidelity to multicomponent treatments that stand ready for dissemination to clinical settings (Carroll et al., 2000; Garland, Hurlburt, & Hawley, 2006).
There has been a recent surge in the number of empirically supported treatments for adolescent substance abuse (for reviews see Muck et al., 2001; Williams, Chang, & ACARG, 2000). However, only a handful of fidelity instruments have been developed to measure implementation of evidence-based practices with adolescent drug users (e.g., Henggeler et al., 1999; Hogue et al., 1998), and these assess treatment adherence but not therapist competence. Treatment adherence generally refers to the quantity or extent of specific treatment techniques used in session, whereas therapist competence refers to the quality or skill with which interventions are delivered (Waltz, Addis, Koerner, & Jacobson, 1993). Elements of therapist competence include knowledge of client issues, appropriateness and timing of interventions, and degree of responsiveness to client in-session behaviors (Stiles, Honos-Webb, & Surko, 1998). This study introduces a measure of therapist competence for adolescent drug treatments that incorporates these important elements.
Competence ratings are typically based on observer reports, inasmuch as therapist reports of their own clinical proficiency do not match observer accounts (Levin, Owen, Stinchfield, Rabinowitz, & Pace, 1999; Miller, Yahne, Moyers, Martinez, & Pirritano, 2004) and clients do not have sufficient expertise to judge treatment quality per se. In research studies with adult clinical populations, two methods have been commonly used to measure competence. In the global rating method, a single item (“How competent was the therapist in this session?”) or a few interrelated items (e.g., therapist skill, empathy, non-verbal behavior; Carroll et al., 1998) are used to rate the observed portion of treatment. Advantages of this method include high face validity and relative ease in training judges; drawbacks include a lack of specificity in describing components of the particular model being assessed. In the discrete technique method, multiple intervention techniques considered to be signature therapeutic ingredients of a particular model are rated separately. In discrete technique fidelity scales, two separate ratings are given for each technique: a “quantity” score to capture adherence and a “quality” score to capture competence. Fidelity studies of adults with depression (Barber, Crits-Christoph, & Luborsky, 1996; Shaw et al., 1999) and substance use disorders (Barber, Foltz, Crits-Christoph, & Chittams, 2004; Carroll et al., 2000) have used this method. Advantages include the ability to discriminate adherence from competence and the high degree of specificity in assessing model components.
There are two limitations to using either the global rating or discrete technique method for measuring competence. First, these approaches do not closely approximate the theory-driven processes of case formulation and treatment planning that guide clinical practice, training, and supervision. Case formulation and treatment planning tend to revolve around molar therapeutic goals that embody the model’s underlying principles of change and provide structure in clinical decision-making for each client (Kazdin, 1999; Stiles et al., 1998). Therapeutic goals are themselves comprised of multiple, integrated intervention techniques that extend across several sessions (Diamond & Diamond, 2002). Therapeutic goals are the clinical blueprint of a treatment model, whereas discrete techniques are the clinical tools. Second, global and discrete measures tend to focus on the behavior of the therapist. In clinical practice, however, competence is largely determined by the therapist’s ability to adapt continually to developments in clients’ lives as they occur in and out of session, while still adhering to the specified clinical framework (Stiles et al., 1998). Therapist responsiveness to client behaviors over time—referral problems, interpersonal manner, amenability to intervention efforts, and so forth—is central to competent model delivery (Waltz et al., 1993).
To address these two limitations in competence assessment, we developed the Therapist Behavior Rating Scale—Competence (TBRS-C), an observational measure of adherence and competence in individual and family-based therapy for adolescent substance abuse. The TBRS-C focuses on molar therapeutic goals rather than specific techniques and includes both goal-specific and global ratings of therapist competence. The TBRS-C also contains separate global ratings of therapist skill, responsiveness, and overall competence.
The TBRS was initially developed as a discrete technique adherence scale to assess fidelity in the same randomized trial of individual cognitive-behavioral therapy (CBT) versus multidimensional family therapy (MDFT) from which the current study sample was drawn. Hogue et al. (1998) used the original TBRS to confirm basic treatment adherence and differentiation in the trial: CBT focused on antecedents and consequences of drug use and behavioral skills building, whereas MDFT focused on family interactions and systemic interventions. The current study expands upon Hogue et al. in two important ways. First, it examines fidelity to the core therapeutic goals of each model—5 in CBT, 4 in MDFT—in lieu of 26 discrete techniques contained in the original TBRS. The molar therapeutic goals contained in the TBRS-C are each comprised of multiple discrete treatment techniques, including all those in the original TBRS and others. Second, the current study includes a multidimensional assessment of therapist competence in addition to treatment adherence.
The primary aims of this study were to examine interrater reliability, construct validity, and discriminant validity of the TBRS-C in measuring adherence and competence in CBT and MDFT for adolescent substance abuse. This study is among the first to assess competence as well as adherence in evidence-based treatments for adolescent drug problems. Because specific clinical expertise is needed to make valid assessments of competence in model delivery (Waltz et al., 1993), separate coding groups were recruited for each treatment condition (as in Barber et al., 2004): CBT-knowledgeable judges rated CBT sessions, and family therapy-knowledgeable judges rated MDFT sessions. Interrater reliability and variance components (therapist, client, and treatment phase effects) were calculated for individual items and averaged scales. Correlations among TBRS-C adherence and competence scores were used to examine construct validity; correlations between the TBRS-C and an observational measure of therapeutic alliance were used to examine discriminant validity. A second aim of the study was to compare reliability and validity findings for goal-specific competence ratings versus global ratings of therapist skill, responsiveness, and overall competence.
The study was conducted with approval by the governing Institutional Review Board. Active consent from caregivers and active assent from adolescents were collected in writing from all participants, and active consent to judge fidelity was collected from all study therapists.
The client sample was 136 urban, substance-abusing adolescents drawn from a larger randomized trial (N = 224) comparing individual CBT and MDFT (Liddle, 2002a). The cases selected for the current study (62 CBT, 74 MDFT) included all those that met the following criteria: completed a baseline assessment, had at least one videotaped therapy session, and completed at least one posttreatment assessment (for future fidelity-outcome studies). The sample was 81% male with an average age of 15.5 years (SD = 1.3) and a range of 13–17 years. The ethnic composition was approximately 70% African American, 20% European American, and 10% Hispanic American. Half of the adolescents were living in one-parent households, 14% with both biological parents, and 36% with various other compositions. Yearly household income was less than $10,000 for 29% of the sample. Most adolescents were enrolled in school (76%) and on juvenile probation (63%) at intake, and 32% were court-ordered to receive treatment. Eighty percent met DSM-III-R criteria for a substance use disorder (21% for alcohol abuse/dependence and 79% for marijuana abuse/dependence), 79% for an externalizing disorder, and 49% for an internalizing disorder.
The nine therapists who delivered the treatments, four in CBT and five in MDFT, ranged in age from 29 to 54 years (M = 40). The CBT therapists (two female) included two African Americans and two European Americans; one had a Master’s degree and three had doctorates. MDFT therapists (three female) included three African Americans and two European Americans; four had a Master’s and one had a doctorate.
The individual CBT model for multiproblem adolescent drug users (Turner, 1992; Waldron & Kaminer, 2004) is based on a broadly defined cognitive-behavioral framework that emphasizes a harm-reduction approach to substance use. Initial sessions focus on identifying and prioritizing adolescent problems and constructing the treatment contract. The intensive cognitive-behavioral treatment program focuses on increasing coping competence and reducing problematic behaviors using intervention modules tailored to the individual adolescent: health education, self-monitoring, problem solving and communication skills, identifying cognitive distortions, and increasing prosocial activities. Role rehearsal and homework assignments are utilized to reinforce new skills. Final sessions focus on relapse prevention and maintenance of gains.
MDFT (Liddle, 2002b) is a family-based ecological treatment for adolescent drug abuse and related behavior problems. MDFT therapists work simultaneously in four interdependent treatment domains according to the particular risk and protection profile of the adolescent and family. The adolescent domain helps teens engage in treatment, communicate effectively with parents and other adults, and develop social competence and alternative behaviors to drug use. The parent domain engages parents in therapy, increases their behavioral and emotional involvement with the adolescents, and improves parental monitoring and limit setting. The interactional domain focuses on decreasing conflict and improving emotional attachments and patterns of communication and problem solving using multiparticipant family sessions. The extrafamilial domain fosters family competency and collaborative involvement within all social systems in which the teen participates (e.g., school, juvenile justice, recreational). At various points throughout treatment therapists meet alone with the adolescent, alone with the parent(s), or conjointly with the adolescent and parent(s), depending on the treatment domain and specific problem being addressed.
Therapists were given study cases after four months of training and upon achieving satisfactory levels of fidelity in pilot cases as judged by model developers. Therapists were supervised weekly by model experts via live individual supervision, videotape feedback, and group supervision. Both treatments prescribed office-based, weekly sessions conducted over 16–24 weeks. For the study sample, cases completed an average of 12.3 sessions (SD = 8.7), with 59% of cases attending 8 or more.
The TBRS-C is an observational measure of treatment adherence and therapist competence for CBT and MDFT whose format can be adapted for other manualized treatments of adolescent substance use. Scale items represent the core therapeutic goals of the given treatment model. Items are scored using a 7-point Likert-type scale with the following anchors: “1” = Not at all, “3” = Somewhat, “5” = Considerably, and “7” = Highly. Each item receives a separate score for adherence and competence. Adherence ratings estimate the thoroughness with which goals are executed and the frequency with which they are addressed. Competence ratings estimate the technical quality of interventions (skillfulness) and their timing and appropriateness for the given client and situation (responsiveness). Each TBRS-C item assists judges in making competence assessments for the given therapeutic goal by describing important treatment context considerations (e.g., client interpersonal style, treatment phase), general keys to competent goal implementation, and guidelines for coding both adherence and competence. The measure is structured so that items are independent of one another but may co-occur in any given session segment. The TBRS-C also contains three global items for rating competence across the entire session: therapist skill, therapist responsiveness, and overall competence.
The TBRS-C describes five molar therapeutic goals for individual CBT (see Table 1): Establishing a Working Relationship, Drug Use Monitoring and Harm Reduction (exemplary techniques: Analysis of drug use behavior, Refusal skills and moderated use), Behavioral Skills Training (Communication skills, Decision making and problem solving, Anger management, Role playing, Relaxation training), Cognitive Therapy Techniques (Cognitive monitoring and change strategies, Coping with drug use thoughts), and Increasing Prosocial Behavior. It also contains four molar therapeutic goals for MDFT (see Table 2): Adolescent Interventions (exemplary techniques: Building and maintaining adolescent alliance, Mapping ecological influences on prosocial and antisocial behavior, Exploring drug use behaviors and consequences), Parent Interventions (Building and maintaining parent alliance, Reinforcing attachment and resuscitating hope, Enhancing parental monitoring and discipline), Family Interaction Interventions (Meeting individually with family members to prepare for family sessions, Resolving parent-adolescent impasses, Promoting positive family dialogue), and Extrafamilial Interventions (School and vocational interventions, Juvenile justice interventions).
The VTAS-R is a revised, 22-item version of the original VTAS (Hartley & Strupp, 1983) that defines the therapeutic alliance as a collaborative and task-oriented relationship determined by client behaviors and the therapist-client relationship. It has demonstrated strong interrater agreement (ICC range.80 –.93) and internal consistency (Cronbach’s α =.93 – .96) in previous studies of alliance in family-based treatments for adolescent drug use (Diamond et al., 1999; Robbins et al., 2003; Shelef et al., 2005). In a previous study using the same clinical sample as the current study, Hogue, Dauber, Faw, Cecero, and Liddle (2006) found that VTAS-R ratings yielded strong reliability and internal consistency in both conditions for mean scores averaged across the 22 items: ICC(1,2) =.90 and Cronbach’s α =.98 for therapist-adolescent alliance in CBT, ICC(1,2) =.83 and α =.97 for therapist-adolescent alliance in MDFT, and ICC(1,2) =.62 and α =.98 for therapist-parent alliance in MDFT. The mean scores for parent and adolescent alliance in MDFT were not significantly correlated (Pearson’s r = −.08).
Videotaped sessions were selected from Phase 1 of treatment (every study case) and from Phase 2 (when available). Phase 1 contained the first two available sessions between 1–5, so that judges could evaluate client presenting problems and early treatment developments as a context for coding later sessions. Phase 2 contained a randomly selected set of three consecutive sessions (when available) starting at session 6. Identical sampling procedures were used for both conditions. However, fewer sessions from the CBT condition were included in this study due to its somewhat higher treatment dropout rate in the original clinical trial: 36% of cases randomized to CBT dropped from treatment prior to session 6, compared to 31% in MDFT.
In CBT, 192 sessions were selected from 62 cases. Due to early treatment dropout, 36% of cases had Phase 1 tapes only. Across the 192 sessions, 54% were Phase 1 tapes, 29% were Phase 2 tapes that fell between sessions 6–12, and 17% were Phase 2 tapes between 13–25. For Phase 1 sets, 62% contained the first two sessions of treatment, 20% the first session only because no other videotape was available, and 18% some other configuration. For Phase 2 sets, 54% contained three consecutive sessions, 21% two consecutive sessions, 21% one session only, and 4% some other configuration. In MDFT, 245 sessions were selected from 74 cases. Due to dropout, 34% of cases had Phase 1 tapes only. Across the 245 sessions, 51% were Phase 1 tapes, 29% were Phase 2 tapes between sessions 6–12, and 20% were Phase 2 tapes between 13–25. For Phase 1 sets, 57% contained the first two sessions of treatment, 15% the first session only, and 28% some other configuration. For Phase 2 sets, 67% contained three consecutive sessions, 19% two consecutive sessions, 4% one session only, and 10% some other configuration. A total of 14% of sessions were with the adolescent alone, 12% with parent(s) alone, and 74% conjointly with the adolescent and parent(s).
The current study utilized VTAS-R mean ratings from Hogue et al. (2006). For CBT, only therapist-adolescent alliance was coded. For MDFT, judges completed separate alliance protocols while viewing the tape, one for the adolescent if present and one for the parent (or two, then averaged) if present. Due to resource limitations, only one session apiece from Phase 1 (session 2 for 69% of cases) and Phase 2 (randomly selected) were coded. In CBT a total of 71 sessions (42 Phase 1 and 29 Phase 2) across 47 cases were coded for adolescent alliance. In MDFT 73 sessions were rated for adolescent alliance (47 Phase 1 and 26 Phase 2) and 72 sessions for parent alliance (48 Phase 1 and 24 Phase 2) across 67 cases; The total number of MDFT sessions coded was 93 (58 Phase 1 and 35 Phase 2); most MDFT tapes (n = 52: 56%) received both adolescent and parent ratings because both members participated in session for at least 20 minutes.
Two different coding groups were recruited. CBT judges (N = 7) were recruited from a private clinic specializing in cognitive-behavioral treatments for mental health disorders: three European American women, three European American men, and one Caribbean man. CBT judges averaged 4.8 years (SD = 3.7) of postgraduate therapy experience and 4.0 years (SD = 3.4) of postgraduate CBT experience. MDFT judges (N = 8) were recruited from a community mental health clinic specializing in family-based treatment: two Hispanic American women, two Spanish women, one Hispanic-Asian American woman, one European American woman, and two Hispanic American men. They averaged 6.1 years (SD = 8.3) of postgraduate therapy experience and 4.9 years (SD = 7.9) of postgraduate experience in family therapy.
Procedures for training the CBT and MDFT coding groups were identical. Judges were trained during weekly 90-minute meetings over four months using review of the respective coding manuals, in-group coding and review of practice tapes, and exercises to increase understanding of scale items. Study coding commenced once both groups reached a threshold reliability of ICC =.65 for the preponderance of items; thereafter, groups met bi-weekly for supportive training and monitoring of rater drift until coding was completed. Judges rated all assigned sessions from a given case in chronological sequence over two weeks. Sessions were rated in their entirety, which ranged from 30 to 75 minutes and averaged 60 minutes. Two judges were assigned to code all tapes selected for each case, and judges were randomly paired with one another across the sample using a randomized block design (Fleiss, 1981).
Two judges rated each session, and pairs of ratings were averaged to create a final score for each scale item. Goal Average scores were then created by averaging the final scores for the therapeutic goals (5 items for CBT, 4 for MDFT) for each session. Goal-specific competence ratings were not made for scale items that received a score of 1 (“Not at all”) for adherence, which indicates non-occurrence of that therapeutic goal in the observed session. Thus, judges rated the quality of a given item only when the therapist was observed to be working on that goal (Carroll et al., 2000). In contrast, the three global competence items were scored for every session.
Model-specific clinical expertise and instrument training are needed to code therapist competence in a valid manner (Waltz et al., 1993). It seemed counterintuitive to require judges clinically trained in CBT to make competence judgments about a model, MDFT, in which they were not experienced—and vice versa. Moreover, it would have been contrived to ask CBT judges to rate competence in achieving CBT treatment goals while viewing MDFT tapes, for which such goals would rarely be applicable. Thus, CBT sessions were rated using CBT items only; likewise for MDFT. Also, TBRS-C items representing the main therapeutic goals of each model are theoretically independent of one another and not intended to represent a correlated set of interventions—that is, greater use of one goal is not correlated with greater use of another goal in any given session. For these reasons, internal consistency (Cronbach’s α) was not calculated for either set of items.
Variance components analysis of the TBRS-C scores was conducted using mixed-models procedures in SAS. Variance components analysis involves partitioning the total variability among scores into reliable sources of variance (e.g., therapist, client, etc.). Along with regression coefficients, variance components are routinely estimated in random regression analytic techniques applied to nested research designs (e.g., mixed model analysis of variance, hierarchical linear modeling). Variance components analysis was conducted on the Goal Average score for each item. Variance components were estimated using a restricted maximum-likelihood estimation method for the following sources of variance: therapist, client (nested within therapist), phase, and error. Each term was entered as a random effect in the analysis, and the estimates of variance were transformed into proportions of variance based on the estimates of the total variance. Because two individual judges were randomly assigned to each tape (rather than assigning consistent pairs of judges, as in Barber et al., 2004), it was not possible to estimate a judge effect, that is, the variance component of the averaged scores that was associated with judge.
Both versions of the TBRS-C demonstrated good to excellent interrater reliability for adherence but only fair to poor reliability for competence as measured by the intraclass correlation coefficient (ICC(1,2): Shrout & Fleiss, 1979). According to Cicchetti’s (1994) criteria for classifying the utility of ICC magnitudes, below.40 is poor,.40 to.59 is fair,.60 to.74 is good, and.75 to 1.00 is excellent. Table 1 contains ICC data for the CBT condition. Reliability of adherence ratings for the five specific goals of CBT ranged from.56 to.83. Reliability of competence ratings was generally much lower, ranging from.01 to.63. For the Goal Average scores, ICC =.74 for adherence and.56 for competence. Table 1 also contains the proportion of variance in ICC scores attributable to Therapist, Client, Phase, and Residual (Error). In four of the five scale items, the largest proportion of non-error variance in adherence was attributed to Phase, indicating that these interventions were utilized to a greater extent in either one treatment phase or the other. In contrast, competence scores varied little between early versus later treatment sessions. Also, Client accounted for more variance than Therapist in all fidelity scores; the difference was especially pronounced for competence.
Reliability data for MDFT are contained in Table 2. Adherence ratings were good to excellent, ranging from ICC =.64 to.79 for the four main goals. However, reliability of competence ratings was again only fair to poor, ranging from.15 to.48. For the Goal Average scores, ICC =.52 for adherence and.55 for competence. As with CBT, Client accounted for greater proportions of variance in fidelity scores than Therapist.
Interrater reliability for the global ratings of therapist competence was in the fair to good range in each condition. For CBT (Table 1), ICC =.56 for Overall Competence,.49 for Skill, and.49 for Responsiveness. For MDFT (Table 2), ICC =.63 for Overall Competence,.53 for Skill, and.56 for Responsiveness. For both conditions, the reliability of the global ratings for each session compared favorably to the reliability of averaged ratings of individual therapeutic goals for each session.
Interitem correlations among the CBT items (see Table 3) were mainly in the expected direction and support the construct validity of the scale. Adherence ratings for Establishing a Working Relationship were negatively correlated with the other four goals, as this goal is likely to be practiced during early treatment sessions in lieu of skills-oriented goals emphasized later on. Adherence scores for two skills-oriented items, Behavioral Skills Training and Cognitive Therapy Techniques, were positively correlated (r  =.44, p <.01). Also, competence ratings for all five goals were positively correlated with one another (with one exception), indicating that therapists judged to be working competently on any one goal were seen as being generally competent across the board. In MDFT, adherence ratings for the four goals were weakly or negatively correlated with one another (Table 4), in keeping with their status as largely independent domains of therapeutic focus in this model. Competence ratings among the four MDFT goals were all positively correlated.
The pattern of correlations among Goal Average Adherence and Competence scores and the global rating of Overall Competence (see Table 5) also supports the construct validity of the TBRS-C for both conditions. For CBT, Goal Average Adherence and Competence were moderately associated (r  =.42, p <.01), suggesting that adherence and competence were scored as related but distinct constructs. As expected, Goal Average Competence was highly correlated with Overall Competence (r  =.68, p <.01). The two dimensions of Overall Competence, Skill and Responsiveness, were highly correlated also (r  =.82, p <.001, not depicted in table). Similarly in MDFT, there was a weak correlation between Goal Average Adherence and Competence (r  =.17, p <.01) and a strong correlation between Goal Average Competence and Overall Competence (r  =.79, p <.001) and between Skill and Responsiveness (r  =.85, p <.001, not depicted).
Discriminant validity was examined by comparing TBRS-C fidelity ratings to independent ratings of therapeutic alliance from the VTAS-R (see Table 5). In CBT, adolescent alliance was correlated with Goal Average Adherence (r  =.28, p <.05) and Overall Competence (r  =.31, p <.01) but not related to Goal Average Competence. In MDFT, adolescent alliance was associated with both Goal Average Competence (r  =.40, p <.01) and Overall Competence (r  =.36, p <.01). Parent alliance in MDFT was not significantly related any fidelity score. The small (non-significant) to medium effect sizes observed in these fidelity-alliance correlations indicate that TBRS-C ratings were suitably distinct from concurrent judgments about the therapeutic relationship. Also, findings generally converge with clinical expectations that greater competence begets stronger alliances, although this held true for adolescents only and not parents. Other studies that examine correlations between fidelity and alliance have reported similar findings (e.g., Barber & Crits-Christoph, 1996; Carroll et al., 2000).
This study presents initial reliability and validity findings for the Therapist Behavior Rating Scale—Competence, an observational measure of treatment fidelity applied to two empirically supported treatments for adolescent drug abuse, individual cognitive-behavioral therapy and multidimensional family therapy. For both treatment models, the TBRS-C exhibited strong interrater reliability for adherence items, fair-to-poor reliability for individual competence items but sufficient reliability for global competence ratings, and patterns of correlations among items that support construct validity. Ratings of adherence and competence were distinct from one another and from independent observations of a related treatment process, therapeutic alliance. Variance in adherence and especially competence scores was attributable more to client effects than to therapist effects. These results demonstrate that fidelity measures focusing on molar therapeutic goals and incorporating therapist responsiveness into the assessment of competence can demonstrate acceptable psychometric properties to complement a high level of clinical validity.
Interrater reliability for the adherence items of the TBRS-C was equivalent or superior to that reported for individual items on discrete scales (e.g., Barber, Liese, & Abrams, 2003; Morgenstern et al., 2001) and also for averaged scores comprised of multiple items (e.g., Carroll et al., 1998). The strong showing for the adherence ICCs is noteworthy given that TBRS-C reliabilities were generated using a within-condition analytic strategy, whereby judges used CBT items to rate CBT tapes only (likewise for MDFT). This is a conservative approach to estimating reliability for adherence measures that attenuates the magnitude of intraclass correlations (Startup & Shapiro, 1993).
The interrater reliability of the competence ratings for individual therapeutic goals was generally weak and well below the magnitude found for competence items on most discrete techniques scales (e.g., Barber et al., 2003). Reliabilities of the global competence ratings (.56 for CBT,.63 for MDFT) and the average competence rating across therapeutic goals (.56 for CBT,.55 for MDFT) were modest but in keeping with the magnitude of competence ratings in some studies (e.g., Barber & Crits-Christoph, 1996; James, Blackburn, Milne, & Reichfelt, 2001), though decidedly lower than in others (e.g., Moyers, Martin, Manuel, Hendrickson, & Miller, 2005; Carroll et al., 2000). All would agree that it is easier to achieve reliable judgments about the quantity of a well-specified intervention than about its quality. The reliability gap between adherence and competence ratings is expectably even wider for a fidelity measure that demands judgments about molar, multicomponent intervention goals. Finally, competence ratings were especially vulnerable to rater drift, wherein pre-study levels of acceptable reliability for individual items declined over the course of the study despite the occurrence of coder meetings throughout. This did not occur for adherence. It appears that continual exposure to new clients and repeated exposure to a small group of therapists engendered a more cohesive mindset among judges regarding how extensively a therapeutic goal was implemented, but more diverse opinions about how well.
The moderate correlations between averaged adherence and competence scores for both CBT (r =.42) and MDFT (.17) suggest that these constructs are related but not redundant, which can be considered a strength of the scale. These results compare favorably to adherence-competence correlations reported for discrete technique measures of drug counseling for cocaine abuse (.58; Barber et al., 2006), CBT for cocaine abuse (.96; Barber et al., 2003), and three treatments for co-occurring cocaine and alcohol problems (range.21 to.62; Carroll et al., 2000). For fidelity measures such as the TBRS-C to be maximally useful in field settings, they need to provide and reliable and distinguishable data on both the kind of interventions delivered and the quality of treatment (Garland et al., 2006).
Greater variability in treatment adherence and competence was associated with clients than with therapists. The absence of therapist effects indicates that therapists were not consistently different from one another in fidelity across clients. One caveat to this finding is that all study therapists were intensively trained and deemed competent in model implementation prior to treating study cases, which reduced the potential spread of fidelity scores among therapists. This is highly desirable in controlled efficacy research but not likely to be found in real world clinical settings. In contrast, the relatively strong client effects indicate that the TBRS-C was sensitive to client-level variations in fidelity that are reasonably expected to emerge across a large group of adolescents and families with differences in problem severity, clinical complexity, therapist-client fit, and the like. Strong client effects also suggest that fidelity was not a therapist-centered characteristic—that is, therapists were not consistently adherent or competent across the clients on their caseloads. Thus, in this small sample of research-trained providers, fidelity was a trait of the therapist-client pairing, not the therapist.
This study has an important methodological limitation with regard to competence evaluation. Judges viewed only a small number of sessions (between 1 and 3) selected from the later phase of treatment. Judges who do not observe (most) every session are not able to track the clinical progress of the case across treatment, which hampers their ability to provide fully informed, case-specific assessments of competence (Elkin, 1999). For this reason Waltz et al. (1993) argue that competence evaluations are better served by including fewer cases for which all sessions can be rated in sequence. Certainly the TBRS-C can be used in this manner, resources permitting. Also, the size of bivariate correlations involving competence scores may have been limited by their modest reliability; as a counterbalance, correlations among adherence and competence scores may have been inflated by common source and method variance.
The TBRS-C appears suitable for meeting fidelity assessment needs in standard clinical practice. In controlled efficacy and effectiveness research, intensive and sustained involvement by model experts in training and supervising line staff is the norm (Schoenwald & Hoagwood, 2001). In the absence of this, quality assurance tools that document fidelity in real-world clinical conditions with precision and clinical sophistication are urgently needed. Whereas the current study employed resource-intensive methods in a research setting to establish initial psychometric properties, in theory the TBRS-C could be used as a supervision supplement and/or quality assurance measure in field settings. However, further development work is required to verify that observational fidelity instruments are reliable and valid when used as self-report tools by front-line clinical supervisors and perhaps clinicians themselves (Carroll, Nich, & Rounsaaville, 1998; Schoenwald, Sheidow, & Letourneau, 2004). Also, the utility of the TBRS-C is limited in one important respect: It does not currently possess a “red line” score that serves as a benchmark for determining if a given session was faithful to the treatment model. Such benchmarks can be valuable for training purposes in both research and practice contexts (Dobson & Shaw, 1993), but developing them requires a demanding process of consensus building among model experts and subsequent empirical verification that was beyond the scope of the current study.
Study results support the utility of single global ratings of therapist competence in lieu of individual ratings of several therapeutic goals. The correlation between the global competence rating and the average competence rating across goals was medium-to-large in both CBT (r =.68) and MDFT (.79), indicating sizable overlap in the information captured by each method. Within each condition the two methods produced an almost identical pattern of correlations with adherence and alliance variables. Other considerations favor the global rating method as well: it had acceptable interrater reliability in each condition, whereas many competence ratings for individual goals had poor reliability; it allows judges to evaluate both the occurrence (what did happen) and non-occurrence (what did not but should have happened) of therapeutic interventions; and it presents a lesser burden to code. All told, our recommendation for fidelity evaluation in field settings, and even in research settings when therapist competence is not the main focus, is the following: Assess adherence to main therapeutic goals individually, but assess competence as one global dimension.
The next step in developing the TBRS-C is refining the item descriptions and scoring procedures for therapist competence in order to enhance their reliability. It is possible that interrater reliability would be stronger if the instrument were used with community therapists in routine clinical practice, who might be expected to show significantly greater variability in general therapy skills and in fidelity to manualized treatments. Another important step in examining the psychometric properties of the TBRS-C is predictive validity analyses: Do adherence and competence in treatment goals predict outcome? Previous studies on the same research sample showed that adherence to discrete intervention techniques was linked to adolescent outcomes in both treatments (Hogue, Dauber, Samuolis, & Liddle, 2006; Hogue, Liddle, Dauber, & Samuolis, 2004). The TBRS-C can also be used to track changes in model fidelity over the course of training a specific cohort of therapists (e.g., Crits-Christoph et al., 1998), replicating and adapting a given model in various clinical settings (Mowbray et al., 2003), or operationalizing therapy change processes in research on mechanisms of behavioral change (Doss, 2004). But clearly, the broadest potential value of the TBRS-C lies in adapting the instrument to measure adherence/competence for other evidence-based practices in everyday care for a wide variety of mental health disorders.
Preparation of this article was supported by grants R01 DA14571 (PI: A. Hogue) and P50 DA07697 (PI: H. Liddle) from the National Institute on Drug Abuse. The authors are extremely grateful to both teams of TBRS-C coders: for MDFT, Laura Alvarez-Cienfuegos, Mary Ekwall, Maria Dolores Fatas, Karly Gilbert, Oscar Ocasio, Lisette Sanabria-Velasquez, and Miguel Vilaro-Colon from the Roberto Clemente Family Guidance Center, along with Priscilla Chinchilla; for CBT, Jayme Albin, Peter Berzins, Roland Carstedt, Sue Manin, Leslie Sadoff, and Robert Udewitz from Behavioral Associates, along with Adam Fried. The authors thank Leyla Stambaugh, Leslie Alkalay, and Crystall Matthews for research contributions to this study and Gayle Dakof and Cindy Rowe for providing feedback on the manuscript.
Aaron Hogue, Sarah Dauber, Priscilla Chinchilla, and Adam Fried, The National Center on Addiction and Substance Abuse (CASA) at Columbia University; Craig Henderson, Department of Psychology, Sam Houston State University; Jaime Inclan, Roberto Clemente Family Guidance Center, New York University School of Medicine; Robert H. Reiner, Behavioral Associates, New York City; Howard A. Liddle, Departments of Epidemiology and Public Health, and Psychology, Center for Treatment Research on Adolescent Drug Abuse, University of Miami Miller School of Medicine.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Aaron Hogue, National Center on Addiction and Substance Abuse at Columbia University.
Sarah Dauber, National Center on Addiction and Substance Abuse at Columbia University.
Priscilla Chinchilla, National Center on Addiction and Substance Abuse at Columbia University.
Adam Fried, National Center on Addiction and Substance Abuse at Columbia University.
Craig Henderson, Sam Houston State University.
Jaime Inclan, New York University School of Medicine.
Robert H. Reiner, University of Miami Miller School of Medicine.
Howard A. Liddle, University of Miami Miller School of Medicine.