|Home | About | Journals | Submit | Contact Us | Français|
The study utilized a generalizability theory analysis of adherence and competence ratings to evaluate the number of sessions and patients needed to yield dependable scores at the patient and therapist levels. Independent judges’ ratings of supportive expressive therapy (n = 94), cognitive therapy (n = 103), and individual drug counseling (n = 98) were obtained on tapes of sessions from the NIDA Collaborative Cocaine Treatment Study. Generalizability coefficients revealed that, for all three treatments, ratings made on approximately five to 10 sessions per patient are needed to achieve sufficient dependability at the patient level. At the therapist level, four to 14 patients need to be evaluated (depending on the modality), to yield dependable scores. Many studies today use fewer numbers.
Therapist adherence and competence evaluations have been an important subject for psychotherapy outcome studies since treatment manuals were first developed. The term therapist competence has been variably used, denoting constructs as disparate as therapists’ self-reported increase in self-confidence or assessor rating of structured interviews or clinicians’ ability to demonstrate techniques in role play (e.g., Ekers, Lovell, & Playle, 2006; Mannix et al., 2006; Sholomskas et al., 2005). General therapist competence would also include relationship skills and other non-specific skills. Intervention competence (Kaslow, 2004) is a term more limited in its scope and more easily assessed within the framework of a specific treatment approach. This term refers to competence utilized by the therapist, in a specific type of intervention, in a specific treatment (Sharpless & Barber, 2009). In the current paper, we use the term “competence” to refer to intervention competence, not overall therapist competence.
It is also important to clarify the difference between adherence and competence, because these terms are commonly used together. Adherence refers to the extent to which a therapist delivers the prescribed techniques or interventions, and also involves not delivering components that are proscribed by the therapy. Competence refers not only to the technical accuracy with which interventions are delivered but also to the appropriateness or adequacy with which these are delivered, taking the therapeutic context into account (e.g., Waltz, Addis, Koerner, & Jacobson, 1993). Relevant aspects of the context include, but are not limited to, client variables such as degree of impairment, the particular problems manifested by a given client, the client’s life situation and stress, and factors such as stage in therapy.
From a research and a training perspective, it is important to know whether a specific therapist is competent in delivering a specific treatment. Using ratings of treatment sessions, researchers may want to certify therapists as competent in the specific treatment modality prior to initiating a clinical trial. Once a clinical trial has begun, researchers also need to evaluate whether treatments are delivered as intended and with adequate skill, and can be differentiated from comparison treatments. These applications of adherence/competence scales in research studies are important even though such ratings have often been found to not predict treatment outcome (Webb, DeRubeis, & Barber, 2010). However, the literature is inconsistent on this relationship. In cognitive therapy, adherence (DeRubeis & Feeley, 1990; Feeley, DeRubeis, & Gelfand,1999) and competence (controlling for adherence) (Shaw et al., 1999) have been found to be related to outcome. In addition, one recent study found that cognitive therapy competence predicted next-session outcomes (Strunk, Brotman, DeRubeis, & Hollon, 2010), and another study found that competence in the delivery of supportive-expressive therapy was related to outcome (Barber, Crits-Christoph, & Luborsky, 1996). Furthermore, such adherence and competence scales continue to be used for clinical training purposes even though associations with outcome have been inconsistent (e.g., Keen & Freeston, 2008).
Despite these important potential uses of adherence and competence evaluations, it is surprising that there has been little attention to how many treatment sessions are needed to provide an adequate assessment of adherence/competence for a given therapy dyad, or how many patients are needed to provide an adequate assessment of adherence/competence for a therapist’s typical level of performance in delivering a specific psychotherapy. We are aware of only one study that has evaluated the number of sessions needed to provide an adequate assessment of competence (in this case, for cognitive behavior therapy) (Keen & Freeston, 2008). However, the evaluations in this study were not anchored to a specific treatment manual or clinical work with a specific patient population. Although the number of sessions needed for adequate assessment of adherence/competence was not provided, a recent study by Imel, Baer, Martino, Ball and Carroll (2011) indicated that the variability in adherence/competence scores within a therapist’s caseload can be large, suggesting that many patients might be needed to adequately determine a given therapist’s typical level of adherence/competence.
To answer the question of how many treatment sessions or patients are needed to create a stable (dependable) adherence and competence score, generalizability theory (Cronbach, Rajaratnam, & Gleser, 1963; Shavelson & Webb, 1991; Wasserman, Levy, & Loken, 2009) can be used. Generalizability theory tells us the adequacy with which one can generalize from the study sample to the sample’s population. Using generalizability theory, one can calculate generalizability coefficients that index the precision, or dependability, of a measure in the context of variability due to various conditions of measurement. In brief, generalizability theory extends the concept of reliability by looking at multiple sources of variability rather than only one. Generalizability coefficients can provide information, for example, on whether scores obtained from only session are adequately stable (i.e., reliably estimate the typical level of the variable across multiple sessions) or whether more sessions are needed. Similarly, generalizability coefficients can be calculated to estimate the dependability of therapist level scores, taking into account variability from patient to patient within a therapist. In applications where decisions are made about therapists, such as whether or not a particular therapist is competent enough in a specific treatment to participate in a randomized clinical trial evaluating the efficacy of the treatment, it is the dependability of competence scores at the therapist level of analysis that would be relevant.
Like intraclass correlations coefficients (ICCs), generalizability coefficients provide an index of adequacy of measurement ranging from 0 to 1.00. However, the difference between GCs and ICCs is that the generalizability theory makes it possible to assess multiple sources of measurement error simultaneously in order to characterize the variance components due to each source of measurement error (Cronbach et al., 1963; Shavelson & Webb, 1991; Webb, Shavelson, & Haertel, 2006). By identifying sources of error, steps can be taken to minimize error through modification of a study design (Wasserman et al., 2009) (e.g., inclusion of more sessions or patients). Different sources of error or facets (as called in generalizability theory) might include sessions, patients, therapists, and raters.
Generalizability theory is used to conduct two types of studies: (1) generalizability (G) studies that examine the characteristics of measurement within the context of the facets sampled, and (2) decision (D) studies that use the results obtained in a G study to explore and estimate how precision of a measure would change if the measurement procedures are changed (e.g., more raters, more sessions, or more therapists sampled). One important specification in any G or D study is whether the facets are fixed (only the levels of the facet included in the study are of interest) or random (levels of the facet in the study are sampled from a population or universe of interest). Further information on generalizability theory and its applications to psychotherapy research can be found in Wasserman et al. (2009) and Crits-Christoph et al. (2011) (the latter providing code for conducting such analyses).
Generalizability coefficients ≥ .80 have been described as acceptable (Cardinet, Johnson, & Pini, 2010; Wass, van der Vleuten, Shatzer, & Jones, 2001). Descriptive interpretations of various levels of ICCs have been also proposed, with ICCs in the range of .40 to .59 described as “fair,” and in the range of .60 to .74 described as “good” (Cicchetti & Sparrow, 1981). In reality, no one cut-off value for ICCs or generalizability coefficients is appropriate for all purposes. These differing interpretations relate to the purpose of a research study or clinical application of an assessment instrument. For “high stakes” decisions about individuals in which false positives would be a concern, such as professional licensing, a coefficient of at least .90 has been proposed as the standard (Downing, 2004).
To investigate how many sessions per patient and patients per therapist are used in typical psychotherapy studies that used measures of adherence/competence, a literature search was conducted. Studies in Webb, DeRubeis and Barber’s (2010) meta-analysis were inspected and supplemented with newer studies (2010 to May of 2011) using combinations of the search terms “therapy,” “outcome,” “effectiveness,” “adherence,” “competence” in PsycINFO. All included studies used a quantifiable measure of adherence and/or competence rated by independent judges on tape-recorded treatment sessions, and used treatment manuals. A total of 24 studies were included in the review (see Table I). In these 24 studies, a mean of two and a half sessions per patient (median = 2.0, SD = 2.2) and a mean of eight patients per therapist (median = 5.8, SD = 5.8) were evaluated for adherence/competence.
The aim of the present study was to conduct a generalizability theory analysis of intervention adherence and competence ratings in three different treatments for cocaine dependence: supportive expressive therapy, cognitive therapy and individual drug counseling. Generalizability coefficients were calculated on both patient level and therapist level. To calculate data on the patient level is to aggregate data across sessions so that each patient only has one average score. To calculate data on the therapist level is to aggregate data across patients so that each therapist only has one average score. After conducting a G study assessing variability in adherence/competence scores due to raters, patients, sessions, and therapists, we conducted a D study to explore the impact of increasing the number of sessions and patients on the precision of measurement at both the patient and therapist level.
Therapist adherence and competence data of supportive expressive therapy, cognitive therapy and individual drug counseling, drawn from the National Institute of Drug Abuse Cocaine Colloborative Treatment Study (NIDA CCTS; Crits-Christoph et al., 1999), were used to do the generalizability coefficients analysis. The NIDA CCTS study was a randomized multicenter clinical trial that compared different cocaine treatments. The study was approved by an Institutional Review Board and has been reported in several studies (e.g. Crits-Christoph et al., 1997, 1999).
In the NIDA CCTS 364 patients were randomized to therapy but only data from patients whose sessions had been rated by independent judge (n = 295) were used in the current analyses. The patients were recruited through newspaper advertisement, from substance abuse treatment centers, by a friend or an acquaintance, mental health centers, and from private mental health providers. The inclusion criteria included a current or in early remission cocaine dependence diagnosis of DSM-IV (First, Spitzer, Gibbon, & Williams, 1994) and age between18 and 60 years old (for more details of inclusion and exclusion criteria see Crits-Christoph et al., 1999). Patients in supportive expressive therapy (n = 94) were 33.7 years old (SD = 6.2), male (n = 75 [80%]), primarily Caucasian (n = 56 [60%]), employed (n = 59 [63%]), living alone (n = 33 [35%]), and had used cocaine in the past 30 days (M = 9.7, SD = 6.9). Patients in cognitive therapy (n = 103) were 35.0 years old (SD = 6.0), male (n = 86 [83%]), primarily Caucasian (n = 59 [57%]), employed (n = 63 [61%]), living alone (n = 44 [43%]), and had used cocaine in the past 30 days (M = 9.8, SD = 8.2). Patients in individual drug counseling (n = 98) were 33.3 years old (SD = 6.7), male (n = 71 [72%]), primarily Caucasian (n = 58 [59%]), employed (n = 61 [62%]), living alone (n = 43 [44%]), and had used cocaine in the past 30 days (M = 10.4, SD = 7.6).
The 39 therapists were volunteers recruited from newspaper announcements or from the staff of the study sites. All therapists were trained in their respective treatment modality during an intensive training phase. The intensive training consisted of reading the treatment manual, participating in a 2-day workshop, and treating four training cases under supervision. Those therapists who demonstrated their ability in delivering treatment as intended and in a competent manner during training were invited to participate in the main trial. Therapists in supportive expressive therapy (n = 12) were 38.9 years old (SD = 5.1), male (n = 8 [67%]), primarily Caucasian (n = 11 [92%]), had an MD, PhD, /PsyD or /EdD (n = 9 [75%]) or an MSW, MA, BA, AA or RN (n = 3 [25%]), had on average 11.7 years of clinical experience (SD = 6.1), and had 8.7 years of substance use clinical experience (SD = 7.4). Therapists in cognitive therapy (n = 15) were 40.0 years old (SD = 6.3), male (n = 12 [80%]), primarily Caucasian (n = 14 [93%]), had an MD, PhD, PsyD or EdD (n = 12 [80%]) or an MSW, MA, BA, AA or RN (n = 3 [20%]), had on average 13.4 years of clinical experience (SD = 8.4), and had 6.6 years of substance use clinical experience (SD = 6.2). Therapists in individual drug counseling (n = 12) were 40.1 years old (SD = 6.4), male (n = 4 [33%]), primarily Caucasian (n = 9 [75%]), had an MSW, MA, BA, AA or RN (n = 12 [100%]), had on average 13.8 years of clinical experience (SD = 8.5), and had 10.4 years of substance use clinical experience (SD = 6.5).
The three individual therapies were supportive-expressive therapy (SE) based on Luborsky’s psychodynamic treatment manual (Luborsky, 1984) and a psychodynamic manual for cocaine abuse (Mark & Luborsky, 1992); cognitive therapy (CT) based on Beck’s cognitive model for substance abuse (Beck et al., 1993); and individual drug counseling (IDC) (Mercer & Woody, 1992) based on the 12-step model of addiction. The active phase of the treatments lasted 6 months and consisted of two sessions every week for the first 3 months, and of one session every week for the last 3 months. A more thorough description of the treatments is provided in Crits-Christoph et al. (1997, 1999).
Independent judges were used to evaluate whether each specific treatment was delivered in a competent manner. The judges only rated those who were in their own expert field e.g. SE experts rated SE therapists’ skills. The independent judges (SE: n = 3; CT: n = 2; IDC: n = 2) were not otherwise related to the research study, but were experts in their respective treatments. All judges in each treatment modality rated the same sessions from each patient. Two audiotapes were rated for each patient. They were randomly selected, one tape from sessions 2 to 11 and one tape from session 12 to the end of the therapy.
Overall, the independent judges rated 148 tapes of SE therapy, 192 tapes of CT, and 181 tapes of IDC. The average number of sessions rated per patient by the independent judges was 1.6 (SD = .6) for SE, 1.9 (SD = 0.4) for CT, and 1.8 (SD = 0.5) for IDC. The average number of patients rated per therapist was 7.8 (SD = 3.5) for SE therapy, 6.9 (SD = 2.3) for CT, and 8.2 (SD = 3.9) for IDC.
This 44-item scale was used to measure adherence and competence in the delivery of SE for cocaine dependence. The ratings for each item ranged from 1 (not at all) to 7 (very much). In addition to ratings of frequency of use of techniques (adherence) and quality of use of techniques (competence), ratings of the appropriateness of using difference interventions were also made. An intervention was judged as very much appropriate if the “ideal” frequency of that intervention was used in the context of a specific session. If adherence was rated as not at all for a given item, the competence rating for that item was assigned the same number as the appropriateness rating (thus competence could be high even if frequency was low). Because of high correlations between appropriateness and quality (r = .94), the scales were summarized into one overall competence scale. The scale contains items that focus on the therapist’s ability to establish and maintain a supporting relationship (13 items) and items that reflect the therapist’s focus on interventions to explore the maladaptive relationship patterns (31 items). In the present study, the internal consistency was excellent for both adherence and competence when the patient was the unit of analysis (α = .93 and α = .98), and when the therapist was the unit for the analysis (α = .95 and α = .99). The interjudge reliability for the current study sample was shown to be good for the adherence scale (ICC [2,2] = .84), but weaker for the competence scale (ICC [2,2] = .68) (Barber et al., 2008). The adherence ACS-SEC scale has also been found to discriminate SE from CT and IDC, but that was not obtained for the competence scale (Barber, Foltz, Crits-Christoph, & Chittams, 2004).
This 21-item scale was developed for the NIDA CCTS and measures adherence and competence in the delivery of CT for cocaine dependence. Like the SE scale, the CTACS includes ratings of frequency, quality, and appropriateness of interventions. The scales’ items are rated separately on a 0 (low) to 6 (high) Likert-type scale. When an intervention does not occur but should have, adherence was rated as zero, as were appropriateness and quality. However, if the absence of an intervention was appropriate, the adherence scale was rated as zero but the appropriateness scale was rated higher. The appropriateness and quality scales were highly correlated (r > .90; Barber et al., 2003) and summarized into one overall competence scale. The content of the scale included cognitive therapy structure, development of a collaborative relationship, case conceptualization, and cognitive and behavioral techniques. In the present study, the internal consistency was good to excellent for both adherence and competence when the patient was the unit of analysis (judges: α = .89 vs. α = .96) and when the therapist was the unit for the analysis (judges: α = .96 vs. α = .98). The interrater reliability for the independent judges in the present study’s CT sample (n = 92) was shown to be acceptable (ICC = .67 for adherence, ICC = .73 for competence) (Barber et al., 2003). The CTACS scale as rated by independent judges also discriminated CT from SE and IDC in the NIDA CCTS study (Barber et al., 2004).
This scale measured adherence and competence of IDC in the current study. Each item was rated separately on a scale from 1 (low) to 7 (high). Originally, the scale contained 38 items, but for the current study 34 items of main techniques were used to create an overall score (see Barber et al., 2006). For the current study, the internal consistency was good to excellent for both adherence and competence when the patient was the unit of analysis (judges: α = .84 and α = .92), and when the therapist was the unit for the analysis (judges: α = .97 and α = .97). The interjudge ICC was acceptable for both adherence and competence (Barber, Mercer et al., 1996). The scale discriminated IDC from CT and SE therapy in the NIDA CCTS study (Barber et al., 2004; Barber, Mercer et al., 1996).
Dependability of measurement can be separately examined for each of the different facets in a study. Here, we calculated generalizability coefficients for both patient and therapist levels. Estimations of generalizability coefficients were based on a random effects model including terms for sessions, patients, therapists and raters. The mixed models were calculated in the software PASW Statistics 18 (SPSS). These mixed models included additional terms to accommodate the nesting of the repeated measures across sessions within patients, patients nested within therapist, and interactions among these terms. We calculated the generalizability coefficient (GC) according to formulas given on pages 28 and 29 in Webb et al. (2006), but with the addition of the variance components due to rater. On the patient-level we used the formula:
where GCp is the generalizability coefficient on the patient level, σ2t is the variance attributable to therapists, σ2p:t is the variance attributable to patients nested in therapists, ns is the number of 2 sessions, nr is the number of raters, σ2ts is the variance attributable to the therapist by session interaction, σ2p:ts is the variance attributable to patient nested in therapist by session, σ2tr is the variance attributable to the therapist by rater interaction, σ2pr is the variance attributable to the patient by rater interaction, σ2(p:t)sr,e is the variance attributable to error variance (i.e., not attributed to other factors in the design, the residual).
The therapist-level generalizability coefficients (GC) were calculated as shown in the following formula:
where GCt is the generalizability coefficient on the therapist level, σ2t is the variance attributable to therapists, σ2ts is the variance attributable to the therapist by session interaction, σ2p:t is the variance attributable to patients nested in therapists, ns is the number of sessions, np is the number of patients, nr is the number of raters, σ2p:ts is the variance attributable to patient nested in therapist by session, σ2tr is the variance attributable to the therapist by rater interaction, σ2pr is the variance attributable to the patient by rater interaction, σ2(p:t)sr,e is the variance attributable to error variance (i.e., not attributed to other factors in the design, the residual).
Generalizability coefficients indexing patient-level dependability of scores were calculated using the actual numbers of sessions rated in the present study. With this number of sessions (ranging from 1.6 to 1.9, on average, for the three treatments), all conditions had inadequate GCs (SE: adherence GC = .44, competence GC = .60; CT: adherence GC = .64, competence GC = .66; IDC: adherence GC = .48, competence GC = .59).
Generalizability coefficients were also calculated to index therapist-level dependability using the actual number of patients included in the current study. All coefficients were marginally acceptable for CT and IDC condition (CT judges: adherence GC = .77 and competence GC = .75; IDC judges: adherence GC = .71 and competence GC = .77). For SE therapy, the coefficients were lower (SE judges: adherence GC = .58, competence GC = .65).
Variance components (expressed as percent of total variance) for each of the terms in the statistical models are presented in Table II. As can be seen, beyond the residual variation, the largest source of variation that impacts both patient and therapist GCs was consistently the Patient by Session interaction term, reflecting the fact that adherence and competence ratings varied more over sessions for some patients than for others.
Having completed a “G study” that examined sources of variability and precision of measurement using the conditions inherent in the data collection (i.e., actual numbers of sessions and therapists), we then proceeded to conduct a “D study” that estimated generalizability coefficients using hypothetical numbers of sessions and therapists. Generalizability coefficients were calculated for various numbers of sessions (ranging from one to 14) per patient on the patient level. The numbers of the patients per therapist in these calculations were the actual numbers used in the study (the number of patients per therapist did not influence the GC on the patient level). Results indicated that generalizability coefficients increased steadily with increasing numbers of sessions per patient until reaching an asymptote. For adherence, the largest increment happened in the first several sessions, and after five to 10 sessions the curve flattens out (Figure 1). The same tendency but even stronger was observed for competence ratings, and after five sessions the curve flattens out (Figure 2).
The number of sessions per patients needed to achieve a GC ≥ .80 for adherence ratings at the patient level varied depending on the treatment modality. For IDC, 10 sessions were estimated to be needed, while only five sessions were estimated to be needed for CT; for SE, nine sessions were needed. Similar numbers of sessions were needed across the three treatments to achieve a patient-level GC ≥ .80 on competence ratings (SE: six sessions; CT: five sessions; IDC: seven sessions).
Generalizability coefficients were calculated for various numbers of sessions (1–6 sessions) together with various numbers of patients (1–14 patients) on the therapist level. Results showed that most of the increase of the GC occurs in the first numbers of sessions or patients (Tables III and andIV).IV). To achieve a ≥ .80 GC at the therapist level, the numbers of patients per therapist needed for dependable adherence and competence ratings, respectively, were SE, 13 and > 14; CT, 4 and 5; IDC, 8 and 7. If fewer than six sessions per patient are rated, even more patients per therapist are needed to achieve a therapist-level GC ≥ .80.
In order to answer the question how many sessions per patient, and patients per therapist, are needed to achieve dependable adherence and competence ratings in the treatment of cocaine dependence, a generalizability theory analysis was performed in a large sample of cocaine-dependent patients randomized to receive supportive expressive therapy, cognitive therapy, or individual drug counseling. The most important conclusion was that more sessions (for adherence: five in CT; nine in SE, 10 in IDC; for competence: five to seven across the treatments) than is typically sampled (two and a half) in most studies of other psychotherapies are needed to achieve dependable generalizability coefficients on the patient level when measuring therapists’ adherence and competence. On the therapist level, the number of patients per therapist sufficient to achieve good (≥.80) dependability of measurement varied by treatment modality, with CT needing the fewest (four or five) and SE the most (13 or 14).
Using a more liberal .70 or greater cut-off for adequate dependability of measurement at the patient level, five sessions for SE and IDC, and three sessions for CT, would be needed for adherence ratings, and three or four sessions for competence ratings. Similarly, at the therapist level, fewer patients per therapist (three for CT; nine or 10 for SE; five for IDC) are needed to achieve a .70 level of dependability for adherence/competence ratings (assuming six sessions are rated).
The literature review presented in the introduction of this article showed that, across 24 studies, there were typically two and a half sessions per patient and eight patients per therapist rated for adherence/competence. However, only three of these studies evaluated five or more sessions per patient (Huppert, Barlow, Gorman, Shear, & Woods, 2006; Ogrodniczuk & Piper, 1999; Shaw et al., 1999), and only 11 studies used seven or more patients per therapist (Barber, Crits-Christoph, & Luborsky, 1996; Barber et al., 2006, 2008; Carroll, Nich, & Rounsaville, 1997; Feeley et al., 1999; Gaston, Thompson, Gallagher, Cournoyer, & Gagnon,1998; Hogue et al., 2008; Loeb et al., 2005; Luborsky, McLellan, Woody, O’Brien, & Auerbach, 1985; Newman et al., 2011; Ogrodniczuk et al., 1999). Thus, assuming the results found here are generalizable to other scales, treatments, and patient populations, many studies in the literature probably evaluated an inadequate number of sessions and/or patients per therapist to create highly dependable adherence/competence scores at the patient and therapist levels, respectively. Inadequate dependability of adherence/competence scores may in part explain why many studies have failed to find associations between adherence/competence ratings and treatment outcome (Webb et al., 2010).
In the only previous study that reports the number of sessions needed to evaluate therapist competence, it was estimated that 17 videotaped sessions are needed to achieve a .80 GC in regard to cognitive therapy competence (Keen & Freeston, 2008). However, in addition to not focusing on a specific treatment manual or patient population, this study allowed therapists to choose two sessions to be evaluated, which may have created a positive bias in competence ratings compared to our study, in which therapists did not choose the sessions to be rated. Moreover, the design of the Keen and Freeston (2008) study did not allow for examination of patient-level and therapist-level GCs. These differences aside, it is relevant to note that Keen and Freeston (2008) also concluded that there needs to be a marked increase in the number of samples of clinical work assessed to be able to make reliable judgments of therapist competence in delivering cognitive behavior therapy.
For clinical dissemination of an evidence-based treatment approach, financial constraints make it unlikely that therapists would be evaluated on, for example, as many as five sessions for each of seven patients (35 sessions rated in total) to arrive at a decision as to whether or not a therapist is adequately skilled in a treatment approach. The two dissemination studies we found with variables of interests investigated two and three sessions per patient, respectively (Glisson et al., 2010; Weisz et al., 2009). Furthermore, our results were in line with the Imel et al. (2011) study, which suggested that variability in adherence/competence scores within the therapists’ caseload can be large. The larger the variation of adherence and competence within the therapist, the more measurements are needed to obtain reliable measurements. The current results therefore suggest caution in interpreting previous efforts at measuring treatment adherence/competence where the number of sessions or patients is low, and provide some suggestions for future studies, or clinical applications, in which therapist-level adherence/competence evaluations are planned. Our results also suggest that for both clinical trials and dissemination studies, investigators should evaluate the dependability of their adherence/competence ratings to ensure that these assessments are adequate to draw conclusions about adherence/competence at the level of the dyad or therapist. In the certification of therapists during dissemination of an evidence-based treatment, if higher levels of dependability (e.g., >80) cannot be achieved due to financial constraints in rating sessions/patients, then it should be acknowledged that false positives (certifying a therapist as competent when they are not) and false negatives (not certifying a therapist as competent when they are) may be high.
Although it is important to have dependable measurements of adherence/competence, other factors may come into play when choosing which sessions are selected in a given study. For testing certain hypotheses, it may be more important to sample early-in-treatment sessions rather than later-in-treatment sessions. Adherence/competence in late sessions, for example, might be influenced by the degree of progress made by patients thus far in therapy, and therefore if the goal of a study is to predict outcome from adherence/competence ratings, late sessions might need to be avoided (so as not to introduce a confound due to patient progress influencing adherence/competence). However, regardless of which phase of therapy is targeted in research, dependability of measurement at the patient level, or therapist level, needs to be adequately addressed (e.g., by sampling multiple early sessions).
The issue of dependability of measurement is of course a broader issue than the focus on adherence/competence in the current manuscript. Other psychotherapy process variables probably suffer from the same problem of lack of adequate dependability at the patient level because of patient by session interaction effects, and lack of adequate dependability at the therapist level because of inadequate numbers of patients per therapist included in a study. Outcome measures also may be less than ideally dependable. For example, investigators typically measure symptoms on a given day. Such symptoms, however, are likely to fluctuate over the course of different days within a week. Assessment of outcome over multiple days and calculation of an aggregate outcome across days may yield greater precision of outcome measurement. Improvement in the precision of measurement will allow for greater accuracy in the estimation of effect sizes (e.g., correlations of process with outcome) and greater ability to detect small to moderate effects that would be rendered non-significant because of attenuation due to measurement error in the independent and/or dependent measures.
Several limitations of the current study should be mentioned. Other variables besides the number of sessions and patients might have impacted the generalizability coefficients. For example, the type of treatment, or the specific scale/construct for each treatment, may have influenced the level of generalizability coefficients found here. In the current study, scales for supportive expressive therapy had lower dependability than the other modalities. Although our study does not directly address whether the problem is the therapy or the particular scale, it should be noted that extensive development work was conducted on several of the scales (Barber & Crits-Christoph, 1996; Barber et al., 1997, 2003). Moreover, interjudge reliability and internal consistency reliability for the SE adherence and competence scales was generally very good, with the exception that interjudge reliability on the SE competence scale was marginal (.68). Nevertheless, it is possible that different scales for measuring adherence/competence for SE, CT, and IDC therapies for cocaine dependence might yield more dependable scores using fewer treatment sessions and patients than we estimate are needed based on the current data. Assuming, however, that the problem with low dependability for SE therapy is not a function of inferior adherence/competence scales, it may be that more sessions and patients are needed to create dependable scores for this treatment because of the more inferential nature of the key clinical concepts in this form of therapy. An adequate assessment of a therapist’s performance in a psychodynamic therapy may require evaluation of the therapist’s handling of a range of patient clinical material (e.g., defenses, resistances, repetitive interpersonal themes, transference phenomena) that only becomes evident over many sessions and different types of patients. Thus, researchers and clinical trainers should be aware of the possibility that adequate judgments of therapist adherence/competence in psychodynamic therapy may require sampling many sessions and patients, or may require further delineation of the construct and further scale development.
Another shortcoming with the current study was that the judges were not randomized to their different treatment modalities, and therefore not free from selection bias. The raters’ ability to be objective can be influenced by the raters’ characteristics, perceptions, and attitudes (e.g., Fiske, 1977). The present study limited this impact through the use of treatment manuals, training of the raters, the use of a protocol for rater procedures, and ongoing calibration between the raters. Another limitation of the current study is that adherence and intervention competence may be particularly difficult to evaluate in a substance-dependent population because of a high frequency of disruptive life events that interfere with implementation of specific interventions. The generalizability of the findings to other types of treatments, other patient populations, and other measures of adherence and intervention competence is unknown.
In summary, the present results suggest that researchers and clinical trainers need to give greater attention to the dependability of judgments about therapist adherence/competence. Dependability may vary depending on the specific treatment, and there may be a need to evaluate a larger number of sessions and patients than typically is done.
This study used data from the National Institute on Drug Abuse Collaborative Cocaine Treatment Study (NIDA CCTS), Philadelphia, USA. The study was supported by the Psychotherapy Research Center, Philadelphia, USA, the Department of Psychology, Umeå, Sweden, and the Swedish Council for Working life and Social Research, Stockholm, Sweden.