In order to answer the question how many sessions per patient, and patients per therapist, are needed to achieve dependable adherence and competence ratings in the treatment of cocaine dependence, a generalizability theory analysis was performed in a large sample of cocaine-dependent patients randomized to receive supportive expressive therapy, cognitive therapy, or individual drug counseling. The most important conclusion was that more sessions (for adherence: five in CT; nine in SE, 10 in IDC; for competence: five to seven across the treatments) than is typically sampled (two and a half) in most studies of other psychotherapies are needed to achieve dependable generalizability coefficients on the patient level when measuring therapists’ adherence and competence. On the therapist level, the number of patients per therapist sufficient to achieve good (≥.80) dependability of measurement varied by treatment modality, with CT needing the fewest (four or five) and SE the most (13 or 14).
Using a more liberal .70 or greater cut-off for adequate dependability of measurement at the patient level, five sessions for SE and IDC, and three sessions for CT, would be needed for adherence ratings, and three or four sessions for competence ratings. Similarly, at the therapist level, fewer patients per therapist (three for CT; nine or 10 for SE; five for IDC) are needed to achieve a .70 level of dependability for adherence/competence ratings (assuming six sessions are rated).
The literature review presented in the introduction of this article showed that, across 24 studies, there were typically two and a half sessions per patient and eight patients per therapist rated for adherence/competence. However, only three of these studies evaluated five or more sessions per patient (Huppert, Barlow, Gorman, Shear, & Woods, 2006
; Ogrodniczuk & Piper, 1999
; Shaw et al., 1999
), and only 11 studies used seven or more patients per therapist (Barber, Crits-Christoph, & Luborsky, 1996
; Barber et al., 2006
; Carroll, Nich, & Rounsaville, 1997
; Feeley et al., 1999
; Gaston, Thompson, Gallagher, Cournoyer, & Gagnon,1998
; Hogue et al., 2008
; Loeb et al., 2005
; Luborsky, McLellan, Woody, O’Brien, & Auerbach, 1985
; Newman et al., 2011
; Ogrodniczuk et al., 1999
). Thus, assuming the results found here are generalizable to other scales, treatments, and patient populations, many studies in the literature probably evaluated an inadequate number of sessions and/or patients per therapist to create highly dependable adherence/competence scores at the patient and therapist levels, respectively. Inadequate dependability of adherence/competence scores may in part explain why many studies have failed to find associations between adherence/competence ratings and treatment outcome (Webb et al., 2010
In the only previous study that reports the number of sessions needed to evaluate therapist competence, it was estimated that 17 videotaped sessions are needed to achieve a .80 GC in regard to cognitive therapy competence (Keen & Freeston, 2008
). However, in addition to not focusing on a specific treatment manual or patient population, this study allowed therapists to choose two sessions to be evaluated, which may have created a positive bias in competence ratings compared to our study, in which therapists did not choose the sessions to be rated. Moreover, the design of the Keen and Freeston (2008)
study did not allow for examination of patient-level and therapist-level GCs. These differences aside, it is relevant to note that Keen and Freeston (2008)
also concluded that there needs to be a marked increase in the number of samples of clinical work assessed to be able to make reliable judgments of therapist competence in delivering cognitive behavior therapy.
For clinical dissemination of an evidence-based treatment approach, financial constraints make it unlikely that therapists would be evaluated on, for example, as many as five sessions for each of seven patients (35 sessions rated in total) to arrive at a decision as to whether or not a therapist is adequately skilled in a treatment approach. The two dissemination studies we found with variables of interests investigated two and three sessions per patient, respectively (Glisson et al., 2010
; Weisz et al., 2009
). Furthermore, our results were in line with the Imel et al. (2011)
study, which suggested that variability in adherence/competence scores within the therapists’ caseload can be large. The larger the variation of adherence and competence within the therapist, the more measurements are needed to obtain reliable measurements. The current results therefore suggest caution in interpreting previous efforts at measuring treatment adherence/competence where the number of sessions or patients is low, and provide some suggestions for future studies, or clinical applications, in which therapist-level adherence/competence evaluations are planned. Our results also suggest that for both clinical trials and dissemination studies, investigators should evaluate the dependability of their adherence/competence ratings to ensure that these assessments are adequate to draw conclusions about adherence/competence at the level of the dyad or therapist. In the certification of therapists during dissemination of an evidence-based treatment, if higher levels of dependability (e.g., >80) cannot be achieved due to financial constraints in rating sessions/patients, then it should be acknowledged that false positives (certifying a therapist as competent when they are not) and false negatives (not certifying a therapist as competent when they are) may be high.
Although it is important to have dependable measurements of adherence/competence, other factors may come into play when choosing which sessions are selected in a given study. For testing certain hypotheses, it may be more important to sample early-in-treatment sessions rather than later-in-treatment sessions. Adherence/competence in late sessions, for example, might be influenced by the degree of progress made by patients thus far in therapy, and therefore if the goal of a study is to predict outcome from adherence/competence ratings, late sessions might need to be avoided (so as not to introduce a confound due to patient progress influencing adherence/competence). However, regardless of which phase of therapy is targeted in research, dependability of measurement at the patient level, or therapist level, needs to be adequately addressed (e.g., by sampling multiple early sessions).
The issue of dependability of measurement is of course a broader issue than the focus on adherence/competence in the current manuscript. Other psychotherapy process variables probably suffer from the same problem of lack of adequate dependability at the patient level because of patient by session interaction effects, and lack of adequate dependability at the therapist level because of inadequate numbers of patients per therapist included in a study. Outcome measures also may be less than ideally dependable. For example, investigators typically measure symptoms on a given day. Such symptoms, however, are likely to fluctuate over the course of different days within a week. Assessment of outcome over multiple days and calculation of an aggregate outcome across days may yield greater precision of outcome measurement. Improvement in the precision of measurement will allow for greater accuracy in the estimation of effect sizes (e.g., correlations of process with outcome) and greater ability to detect small to moderate effects that would be rendered non-significant because of attenuation due to measurement error in the independent and/or dependent measures.
Several limitations of the current study should be mentioned. Other variables besides the number of sessions and patients might have impacted the generalizability coefficients. For example, the type of treatment, or the specific scale/construct for each treatment, may have influenced the level of generalizability coefficients found here. In the current study, scales for supportive expressive therapy had lower dependability than the other modalities. Although our study does not directly address whether the problem is the therapy or the particular scale, it should be noted that extensive development work was conducted on several of the scales (Barber & Crits-Christoph, 1996
; Barber et al., 1997
). Moreover, interjudge reliability and internal consistency reliability for the SE adherence and competence scales was generally very good, with the exception that interjudge reliability on the SE competence scale was marginal (.68). Nevertheless, it is possible that different scales for measuring adherence/competence for SE, CT, and IDC therapies for cocaine dependence might yield more dependable scores using fewer treatment sessions and patients than we estimate are needed based on the current data. Assuming, however, that the problem with low dependability for SE therapy is not a function of inferior adherence/competence scales, it may be that more sessions and patients are needed to create dependable scores for this treatment because of the more inferential nature of the key clinical concepts in this form of therapy. An adequate assessment of a therapist’s performance in a psychodynamic therapy may require evaluation of the therapist’s handling of a range of patient clinical material (e.g., defenses, resistances, repetitive interpersonal themes, transference phenomena) that only becomes evident over many sessions and different types of patients. Thus, researchers and clinical trainers should be aware of the possibility that adequate judgments of therapist adherence/competence in psychodynamic therapy may require sampling many sessions and patients, or may require further delineation of the construct and further scale development.
Another shortcoming with the current study was that the judges were not randomized to their different treatment modalities, and therefore not free from selection bias. The raters’ ability to be objective can be influenced by the raters’ characteristics, perceptions, and attitudes (e.g., Fiske, 1977
). The present study limited this impact through the use of treatment manuals, training of the raters, the use of a protocol for rater procedures, and ongoing calibration between the raters. Another limitation of the current study is that adherence and intervention competence may be particularly difficult to evaluate in a substance-dependent population because of a high frequency of disruptive life events that interfere with implementation of specific interventions. The generalizability of the findings to other types of treatments, other patient populations, and other measures of adherence and intervention competence is unknown.
In summary, the present results suggest that researchers and clinical trainers need to give greater attention to the dependability of judgments about therapist adherence/competence. Dependability may vary depending on the specific treatment, and there may be a need to evaluate a larger number of sessions and patients than typically is done.