|Home | About | Journals | Submit | Contact Us | Français|
The efficacy of cognitive therapy (CT) for depression has been well-established. Measures of the adequacy of therapists’ delivery of treatment are critical to facilitating therapist training and treatment dissemination. While some studies have found an association between CT competence and outcome, research has yet to address whether competence ratings predict subsequent outcomes.
In a sample of 60 moderate to severely depressed outpatients from a clinical trial, we examined competence ratings (using the Cognitive Therapy Scale) as a predictor of subsequent symptom change.
Competence ratings predicted session-to-session symptom change early in treatment. In analyses focused on predicting symptom change following four early sessions through the end of 16 weeks of treatment, competence was a significant predictor of evaluator-rated end of treatment depressive symptom severity, and was predictive of self-reported symptom severity at the level of a non-significant trend. To investigate whether competence is more important to clients with specific complicating features, we examined four patient characteristics as potential moderators of the competence-outcome relation. Compared to patients without these characteristics, competence was more highly related to subsequent outcome for patients with higher anxiety, an earlier age of onset, and (at a trend level) patients with a chronic form of depression (chronic depression or dysthymia). Competence ratings were not more predictive of subsequent outcomes among patients who met (vs. did not meet) criteria for a personality disorder (i.e., among personality disorders represented in the clinical trial).
These findings provide support for the potential utility of CT competence ratings in applied settings.
Assessments of therapist competence are essential to both psychotherapy research and practice (Barber, Sharpless, Klostermann, & McCarthy, 2007). In treatment outcome research, such assessments are potentially critical in interpreting the results of clinical trials (Jacobson & Hollon, 1996; Kingdon, Tyrer, Seivewright, Ferguson, & Murphy, 1996). In efforts to disseminate treatments to clinical practice, the construct of therapist competence is seen as vital (Roth & Pilling, 2007). Among agencies that accredit therapists in a particular treatment approach, assessments of therapist competence are often used as a major component of the accreditation process (for an example, see Academy of Cognitive Therapy, n.d.). Thus, therapist competence is a construct of central importance to both treatment research and clinical practice.
In this paper, we test whether ratings of therapist competence in conducting Cognitive Therapy (CT) can be used to predict subsequent clinical outcomes among depressed patients treated with CT in a recent clinical trial (DeRubeis et al., 2005). Although several groups have examined the association between competence assessed during CT and outcomes across full courses of treatment, to our knowledge in no study have competence ratings been used to predict patients’ subsequent outcomes specifically. We also explore whether competence is a more robust predictor of outcome among patients with specific clinical features which are believed to require greater skill on the part of therapists.
Three studies have examined the association between competence and outcome in CT for depression. In an analysis of data from the Treatment for Depression Collaborative Research Program, Shaw and colleagues (1999) failed to find a simple relationship between competence and outcome. Ratings of competence were made by Ph.D. level clinicians with expertise in CT (including some who were involved in training therapists for the study). The authors examined average competence ratings using the Cognitive Therapy Scale from nine sessions sampled throughout the course of therapy (i.e., from sessions 1, 2, 4, 6, 7, 10, 15, 18, and 19) for 36 patients who completed treatment. Treatment outcome was assessed by entering post-treatment symptoms as a dependent variable in a regression analysis and covarying pre-treatment symptoms. Competence was unrelated to symptom change on the three symptom measures examined. However, when therapist adherence to CT strategies and facilitative conditions (i.e., non-specific aspects of treatment such as warmth and rapport) were entered as additional covariates, competence predicted change on the Hamilton Rating Scale for Depression (HRSD), accounting for an additional 15% of the variance.
In two more recent studies, significant positive relations were obtained between ratings of therapist competence and outcome. Trepka and colleagues (2004) examined competence as assessed at one session randomly chosen between session 3 and the penultimate session. Among 21 patients who completed treatment, the single-item global rating of competence from the CTS was associated with Beck Depression Inventory (BDI) scores post-treatment after the BDI score from the beginning of treatment was entered as a covariate (r = −.47). Analyses of the CTS total scores were not reported, but the single item competence rating was highly correlated with CTS scores. Most recently, Kuyken and Tsivrikos (2009) examined competence among therapists in a center with a mission of providing CT. The center director’s ratings of competence (based on his general impression of therapists rather than his ratings of specific sessions) significantly predicted post-treatment BDI scores after controlling for intake BDI scores (β = .27).
Taken together, these studies provide partial support for the relationship between therapist competence and outcome. Two studies reported simple competence-outcome relationships while a third found a relationship only after treating adherence and facilitative conditions as suppressor variables. Although these studies have helped to advance our understanding of the association between competence and outcome, they also remain open to a critical alternative explanation. Specifically, it is possible (and perhaps likely) that ratings of competence, at least in part, could reflect prior symptom gains. Patients who have experienced more symptom improvement may be more likely to provide a context in which a therapist appears competent. If such an effect is operative, it may have been largely responsible for the competence-outcome relationships found to date. Thus, existing studies have yet to test whether ratings of therapist competence predict subsequent symptom change.
There is at least one other limitation of the work conducted to date that merits attention. None of the three studies discussed provided direct estimates of reliability for their judgments of competence (although Shaw et al. (1999) used the same judges who had yielded moderate reliability in a previous study). Reliability of competence ratings can be surprisingly low, with one study yielding intraclass correlation coefficients less than 0.1 (Jacobson & Gortner, 2000). Thus, it is likely important to evaluate the reliability of ratings of therapist competence in any study examining the relationship between competence and outcome.
High levels of competence may be more critical to successful treatment when patients have characteristics or comorbidities that require greater flexibility and skill on the part of the therapist. In the first study to address this issue, Kuyken and Tsivrikos (2009) failed to find that competence was more strongly related to outcome among patients with a larger number of comorbid diagnoses. However, several other patient characteristics may be important to consider as potential moderators of the relation between competence and subsequent symptom change. Although such analyses are by their nature exploratory, in light of the attendant risk of Type I errors, we limited our focus to patient characteristics selected a priori on the basis of expert opinion that these characteristics may require greater therapist skill and flexibility (Whisman, 2008). We further limited our focus to variables for which our dataset was likely to provide sufficient variability to provide a reasonable test. For example, we did not examine depressive symptom severity as the study from which data were drawn was restricted to patients with moderate to severe depression.
With these considerations in mind, we identified four potential moderators of interest: comorbid personality disorder diagnosis; age of onset of depression; whether one has a chronic form of depression (i.e., dysthymia or depression with a chronic course); and severity of comorbid symptoms of anxiety. Several authors have suggested that depressed patients with chronic difficulties, as reflected by persistent depressive symptoms or comorbid personality disorders, are more likely to require a highly skilled and adaptive cognitive therapist (Riso & Newman, 2003; Garland & Scott, 2008; Freeman & Rock, 2008). In addition, there is evidence that depressed adults with earlier ages of first-episode onset have a greater familial risk for depression (Levinson, 2006), as well as higher rates of comorbidity and poorer psychosocial outcomes (Hammen, Brennan, Keenan-Miller, & Herr, 2008). Each of these features could complicate their treatment. Finally, as noted by Singer, Dobson and Dozois (2008) there is reason to believe comorbid symptoms of anxiety may complicate the course of CT for depression. We thus chose these four variables to examine as possible complicating factors, in that they may set a context in which therapist competence is of greatest consequence.
In this study, we examine ratings of therapist competence as a predictor of subsequent symptom change among patients who participated in the CT condition of a trial of treatments for moderate to severe depression (DeRubeis et al., 2005). We focus primarily on the beginning of treatment, when symptom change tends to occur most rapidly, by examining whether competence ratings predict session-to-session symptom change across the first four sessions of CT. This analytic strategy is well-suited to capturing the immediate consequences of competently (or less competently) delivered CT. As a secondary analysis, we tested whether the average competence rating over the first four sessions predicts the subsequent symptom change that occurs between the end of those sessions and the end of treatment. We also examine whether any association between competence and outcome is accounted for by ratings of patient difficulty made on the basis of observation of the first session. Finally, we examine whether any of the four patient characteristics we selected moderates the relationship between competence ratings and subsequent symptom change.
Patients were 60 adults with a primary Axis I diagnosis of Major Depressive Disorder (according to DSM-IV criteria) who were assigned to the CT condition of a two-site (University of Pennsylvania and Vanderbilt University), randomized controlled trial of CT, pharmacotherapy, and placebo for moderate to severe depression (see DeRubeis et al., 2005).1 Patients met criteria for a current episode of depression according to the Structured Clinical Interview for DSM-IV Diagnosis (SCID-I; First, Spitzer, Gibbon, & Williams, 2001) and scored 20 or higher for two consecutive weeks on the modified 17-item version of the Hamilton Rating Scale for Depression (Hamilton, 1960). Those with psychotic features, a history of bipolar disorder, current substance abuse, borderline personality disorder, antisocial personality disorder, schizotypal personality disorder, or any other primary nonpsychotic Axis I disorder were excluded from participation (see DeRubeis et al.). Participants gave written informed consent prior to entering the study. Patients were randomized to condition on the basis of computer-generated allocation sequences for each site stratifying on gender and number of prior episodes. Allocation sequences were generated by the project biostatistician, Robert Gallop, Ph.D. As successive participants were enrolled, project coordinators opened sealed envelopes to learn treatment assignments. Acute treatment was provided from February, 1997 through April, 2000 with continuation treatment and follow-up occurring from June, 1997 through April, 2002 (DeRubeis et al.; Hollon et al., 2005).
In the sample of CT patients, 58% were women, and ages ranged from 19 to 68 years (M = 40, SD = 12). Most patients were Caucasian (78%), with 12% being African American, and 10% were of other ethnicities. One third of patients were married or co-habiting with their partners.
Four male and two female clinicians served as cognitive therapists. Five of the therapists were licensed Ph.D. psychologists, and one was a psychiatric nurse practitioner (MSN). All therapists were Caucasian, with ages ranging from 40 to 51 (M = 45; SD = 4) at the outset of the trial. Therapists were assigned approximately equal numbers of patients, with four therapists having 10 patients, one therapist having 11 patients, and the one therapist having 9 patients. In addition, four of the therapists had extensive CT experience (7 to 21 years) prior to the study initiation; two of the therapists, both at Vanderbilt, began the trial with only about two years of prior CT experience and received additional training from the Beck Institute for Cognitive Therapy during the trial. All therapists followed the procedures outlined in standard texts of cognitive therapy for depression (Beck, Rush, Shaw, & Emery, 1979; Beck, 1995).
Two measures of depression severity were used. The Beck Depression Inventory-II (BDI-II; Beck, Steer, & Brown, 1996), a self-report measure, was used as the primary session-to-session indicator of depression severity. Patients completed the BDI-II at the intake interview and prior to each therapy session.
The 17-item Hamilton Rating Scale for Depression (HRSD) modified to allow scoring of atypical symptoms (Hamilton, 1960; Reimherr et al., 1998) is a clinician-administered outcome measure. It was the primary measure of depression symptom severity in the DeRubeis et al. (2005) trial, and it served as our primary indicator of depression severity for analyses involving symptom change through the end of treatment. It was administered weekly throughout the first four weeks of treatment and every other week thereafter. Pre-treatment and post-treatment evaluations were also conducted.
The Cognitive Therapy Scale (CTS) is an 11-item scale that measures cognitive therapist competence (Young and Beck, 1988). Items are rated on a 0 to 6 scale, with higher scores indicating higher levels of competence. As noted, reliability among experts using the CTS is not uniformly high (Jacobson & Gortner, 2000). However, among those trained to rate together, intraclass correlation coefficients have been substantially greater. For example, Vallis, Shaw, and Dobson (1986) reported an ICC of .77 for the total CTS score when using two raters. The items of the CTS include: Agenda, Feedback, Understanding, Interpersonal Effectiveness, Collaboration, Pacing and Efficient Use of Time, Guided Discovery, Focusing on Key Cognitions or Behaviors, Strategy for Change, Application of Cognitive-Behavioral Techniques, and Homework (a copy of the scale along with the manual is currently available online; Academy of Cognitive Therapy, n.d.).
A single item from the CTS which is not used in calculating the total score assesses patient difficulty (i.e., “How difficult did you feel this client was to work with?”). Responses range from 0 (not difficult, very receptive) to 6 (extremely difficult). The rating of this item from the first therapy session was used an indicator of patient difficulty.
Personality disorder diagnoses were made at intake using the Structured Clinical Interview for DSM–III–R Personality Disorders (Spitzer, Williams, Gibbons & First, 1990). Among patients who were study-eligible (and therefore did not meet criteria for borderline, antisocial, or schizotypal personality disorders), 27 patients (45%) met criteria for a personality disorder (for additional details, see Fournier et al., 2008). Age of onset of each patient’s first depressive episode was assessed using the SCID-I. The average age of onset was 24 (SD = 13). Patients were considered to meet criteria for a chronic form of depression if they either met criteria for dysthymia or Major Depressive Disorder with a chronic specifier as assessed by the SCID-I. A total of 34 patients (57%) met criteria for either dysthymia or chronic depression. Severity of anxiety symptoms were assessed at intake using the Hamilton Rating Scale for Anxiety (M = 16.8; SD = 6.7; Hamilton, 1959).
Raters (DRS and MAB) had completed all graduate coursework and a one year practicum focused on training in CT at the University of Pennsylvania prior to providing ratings for this study. Raters had also participated in rater training on the CTS with experts in CT, and they practiced rating CT sessions together to ensure adequate rater agreement. Each tape was rated independently by both raters, who were blind to outcome. Therapy sessions were both video- and audiotaped. Raters watched and listened to videotape whenever possible; audiotapes were used only if the videotape was missing or presented some technical difficulty, such as poor sound quality. The raters also completed consensus ratings based on discussions that took place after the independent ratings were made. The consensus ratings were used for all analyses because they are believed to be the most valid judgments. Sessions were twice weekly for the first 4 to 12 weeks and weekly thereafter. The first four sessions of CT for each patient were rated sequentially.
Intraclass Correlation Coefficients (ICCs) using random effects were calculated to assess inter-rater reliability, based on the independently-produced ratings. The ICC for total CTS scores (adjusted for the use of two raters) was .77. Because some planned analyses would use the mean CTS total score averaged across the first four sessions for each patient, the ICC for this aggregated score was also computed. This ICC was .86 for two raters. These estimates are likely to be lower-bound estimates of the reliabilities of the respective consensus ratings.
The ICC for raters’ judgment of patient difficulty at session 1 was .86 (when corrected for the use of two raters). Because the 11 primary CTS items each involve complex judgments of several attributes, consensus ratings were expected to be superior and were therefore used. As the judgment of patient difficulty is a global rating for which no additional instructions are provided, the average of the two raters’ judgments of patient difficulty was used for this item.
Our primary analytic strategy was to use repeated measures regression, implemented in SAS Proc Mixed, to examine competence ratings as predictors of session-to-session symptom change across the first four sessions of CT. In these models, BDI-II scores from sessions 2 through 5 served as the dependent variable. BDI-II scores from the prior session (1 through 4) were entered as a covariate, with each BDI-II score serving as a covariate in predicting the BDI-II of the subsequent session. In these models, a significant predictor indicates that the variable predicted BDI-II scores at the following session, after covarying the BDI-II scores from the current session. The relation of a competence score in a given session to symptom change in the following session was thus examined in all 60 CT patients.
Our secondary strategy involved using the average competence rating for each therapist-patient dyad (with ratings being averaged across the first four sessions). These average competence ratings were examined as a predictor of symptom change following the first four sessions through the end of treatment. We utilized longitudinal random coefficients models (with a random intercept and slope of symptom change) so that repeated measurements of symptom severity during the period between session 4 and the end of treatment could be used to more precisely estimate symptom severity at the conclusion of treatment. We considered examining whether competence as assessed early in treatment was associated with individual differences in the slope of subsequent symptom change (as indicated by a competence by time interaction) using longitudinal random coefficients models. However, because we expected the effect of competence in early sessions to be relatively immediate, we instead focused on whether any relation between competence and subsequent symptom change following these early sessions was maintained through the end of treatment.2 Therefore, our analytic strategy was to predict post-treatment symptom severity after controlling for symptom severity immediately following the early sessions. To increase power, the intervening symptom assessments were used to estimate post-treatment symptom severity more precisely. Therefore, we used longitudinal random coefficients models with time transformed and recoded. Time (measured as weeks in treatment) was transformed using the square root function so that the assumption of linear change made in the models was not violated. Time was then recoded so that post-treatment scores were coded as time zero and prior scores were at negative values reflecting the square root of time. This approach allowed us to use repeated symptom assessments to estimate end of treatment symptom severity with greater precision than simply using the post-treatment scores. We controlled for the symptom assessment immediately following the competence ratings in the model. Because random slopes were modeled, we also entered the interaction of this symptom assessment with time as a covariate. To provide an even more conservative test, both the symptom assessment prior to treatment and the interaction of this assessment with time were also entered as covariates. In these models, a significant effect of competence ratings would indicate a relationship between these ratings and estimated end of treatment symptom severity (estimated via random coefficients models). Models were implemented using SAS Proc Mixed (Littell, Milliken, Stroup, & Wolfinger, 1996). In these models, the sample was reduced to the 51 patients who remained in the study long enough to provide some symptom assessments after session 4.
We therefore utilized three primary models: a model of session-to-session symptom change (using the BDI-II), a model of subsequent symptom change through the end of treatment (using the HRSD) and a model of subsequent symptom change through the end of treatment (using the BDI-II). For each of these models, we considered four covariance structures (i.e., autoregressive, unstructured, compound symmetry, toeplitz) and in each case selected unstructured as the best fitting model on the basis of Akaike’s Information Criterion (AIC), Schwarz’s Bayesian Criterion, and −2 Res Log Likelihood.
Because a site by treatment interaction was identified in the primary analyses of the efficacy of treatments and this interaction was partly driven by small, but nonsignificant site differences in the effect of CT (DeRubeis et al., 2005), site was entered as a covariate in both the session-to-session and longer-term models. Consistent with the recommendations of Feeley, DeRubeis, and Gelfand (1999), both analytic approaches involve competence predicting subsequent change (thereby guarding against the possibility that any observed association between competence and outcome would be due the effect of outcome on competence).
Prior to examining our primary hypotheses, we examined whether there were significant differences in competence ratings across the six study therapists. We first calculated the average CTS scores for each patient. Therapists differed significantly in these mean ratings (F (5, 54) = 7.74, R2 = .42, p < .0001). Mean scores for the six therapists were: 49.9 (range of 41.3 to 56.6), 42.1 (range of 30.6 to 49.3), 40.3 (range of 27.1 to 49.1), 39.9 (range of 32.7 to 47.5), 34.0 (range 17.8 to 45.9), and 31.1 (range of 11.3 to 40.1). Therapists did not differ on ratings of patient difficulty (F (5,54) = 1.40, R2 = .11, p = .24). Because two therapists had less experience in CT and obtained additional training during the trial, we also examined whether these two therapists were rated as less competent than the therapists more experienced in CT. Although differences were in the expected direction, they were not significant (F(1, 58) = 2.75, R2 = .05, p = .10; high experience M = 41.1, SD = 9.5; low experience M = 37.0, SD = 8.1). The overall average of competence scores for each therapist-patient dyad was 39.7 (SD = 9.2).
We examined competence as a predictor of session-to-session symptom change across the first four sessions of CT. As shown in Table 1, competence significantly predicted subsequent symptom change in this model. For ease of interpretation, signs have been adjusted so that a positive relationship indicates that higher competence ratings predict positive outcomes in these and all subsequently reported analyses. We then conducted exploratory analyses, using the same statistical approach, in which each CTS item served as a predictor of session-to-session symptom change (see Table 1). ICC estimates of inter-rater reliability for individual items are also reported in the table. The largest effects were for the following items: Agenda, Focusing on Key Cognitions or Behaviors, Pacing, Homework, and Application of Cognitive-Behavioral Techniques.
We then conducted analyses of therapist differences and differences in patient difficulty as they might explain variation in early session-to-session symptom change. For the model examining therapist, site was not entered as a covariate (as therapists were nested within site). Therapist was not a significant predictor of session-to-session symptom change across the first four sessions (p = .3). Higher patient difficulty ratings were predictive of less session-to-session symptom change (r = −.33, t = −2.41, p = .02). Interestingly, these difficulty ratings (completed at session 1) were not strongly related to concurrent ratings of competence at session 1 (r = −.15, p = .25). We also examined whether competence ratings predicted session-to-session symptom change after controlling for patient difficulty. In this model, the effect of competence was reduced to a non-significant trend (r = .28, t = 1.97, p = .06).
In the model predicting subsequent change in HRSD severity through the end of treatment, higher competence ratings were predictive of lower HRSD scores at post treatment (r = .33, t = 2.45, p = .02). In the parallel model using BDI as the index of symptom severity, this relationship was reflected by a non-significant trend (r = .24, t = 1.72, p = .09).3 We then examined therapist (in place of competence ratings) as a predictor in the models described above. Therapist was a significant predictor in the HRSD model (F(5, 53) = 5.07, p = .0007), but not in the model for BDI (F(5, 52) = 1.61, p = .17). When we examined therapist as a covariate (rather than site), competence remained a significant predictor in the models for HRSD (r = −.37, t = −2.93, p = .005) and was now significant in the model for the BDI (r = −.34, t = −2.55, p = .01). Although there were significant differences among therapists in mean competence ratings given for each therapist-patient dyad (as noted previously), these differences did not appear to correspond to the observed differences in HRSD or BDI severity at the end of treatment. Therapist remained a significant predictor of end of treatment HRSD symptom severity after controlling for competence ratings (F(5, 53) = 5.03, p = .0001). Thus, in the long-term models, competence ratings predicted subsequent symptom change on the HRSD (with and without therapist covaried), but competence ratings only predicted subsequent symptom change on the BDI when therapist was covaried. In addition, where therapist differences on outcome were evident (i.e., on the HRSD), these differences were largely not accounted for by competence ratings.
Neither of two analyses of the relation between ratings of patient difficulty and symptom improvement (as indexed by the HRSD) yielded a significant effect (for the analysis of post-treatment scores: r = .18, t = 1.27, p = .2; for the analysis of the slope of change: r = .14, t = .96, p = .3). However, significant effects were obtained in both kinds of analyses when the BDI was the indicator of depressive symptoms (post-treatment symptom severity: r = .31, t = 2.41, p = .02; slope of change: r = .28, t = 2.04, p = .046). We examined competence as a predictor in the models that used the HRSD and BDI scores, respectively, with both patient difficulty and the patient difficulty by time interaction entered as additional covariates. In the model for HRSD, competence ratings remained a significant predictor subsequent symptom change through the post-treatment assessment (r = −.31, t = −2.34, p = .02). In the model for BDI, competence ratings remained a non-significant predictor of subsequent symptom change through the post-treatment assessment (r = −.23, t = −1.62, p = .11).
Only nine patients discontinued treatment prematurely, which limited power to detect differences between completers and dropouts. However, we did compare mean CTS scores of these two patient groups. The means did not differ significantly (t(58) = .94, p = .4, d = .3, completers: M = 40.2, SD = 8.8, CI = 37.7, 42.6; drop-outs: M = 37.0, SD = 11.3, CI =28.3, 45.8). We then used logistic regression to examine whether the average competence rating for each patient predicted risk of dropout, after controlling for site and HRSD scores at intake. CTS scores were unrelated to risk of drop-out in this model (β = −.32, SE = .37, Wald = .74, p = .4, OR = .73, CI = .35, 1.50).
We examined four potential moderators using our primary analytic strategy focused on session-to-session models of the early portion of CT. However, we also explored whether these variables served as moderators of the relationship between competence and outcome in the longer-term models using HRSD and BDI.
As shown on the left side of Table 2, in the session-to-session analyses, significant interactions between competence and the potential moderators emerged for two of the four variables (i.e., age of onset and anxiety), as did a non-significant trend for the interaction of chronicity (i.e., dysthymia or chronic depression) and competence. Competence did not predict outcome differentially between dyads in which the patient was versus was not given a personality disorder diagnosis.4 The significant and trend level interactions were each obtained in the context of significant main effects of the potential moderators on BDI-II scores in the next session (anxiety: r = .30, t = 2.37, p = .02; age of onset: r = −.27, t = −2.12, p = .04; and chronic/dysthymic: r = −.28, t = −2.20, p = .03). Although there was no evidence of moderation by personality disorder status, there was a trend for personality disorder status to predict a reduced magnitude of session-to-session symptom change (r = −.26, t = −2.00, p = .0503).
To better understand the significant interaction effects, models of competence as a predictor of symptom change were examined separately for those with and without chronicity; for the continuous moderators, median splits were used to divide the sample into high and low groups. As depicted in Figure 1, the two significant interactions, as well as the trend-level interaction, were driven by competence being more predictive of outcome among patients who exhibited factors that were expected to require more competently delivered CT. That is, competence predicted session-to-session symptom change more strongly for patients with higher levels of comorbid anxiety, a younger age of onset and, at the level of a non-significant trend, a more chronic course of depressive symptoms.
The evidence was less robust that these patient characteristics interact with competence to predict more distal outcomes (i.e., post-treatment symptom severity). As shown on the right side of Table 2, the only significant interaction to emerge from these models was that of anxiety and competence in predicting HRSD post-treatment symptom severity. That same interaction yielded a non-significant trend in predicting end of treatment BDI symptom severity.5 Using a median split on anxiety to probe this interaction in the model for HSRD, competence was more strongly related to outcome among patients with higher levels of anxiety (r = .40, t = 2.22, p = .04) as compared to those with lower levels of anxiety (r = .29, t = 1.38, p = .18). The non-significant trend in predicting end of treatment BDI symptom severity was driven by a less striking effect of the same pattern (high anxiety: r = .25, t = 1.32, p = .2; low anxiety: r = .19, t = .91, p = .4). Thus, while these analyses were exploratory and should be interpreted with caution, it is noteworthy that the most consistent evidence of moderation of the relationship between competence and subsequent outcome was observed for pre-treatment severity of anxiety. Other evidence of moderators of the relationship between competence and outcome was limited to the session-to-session analyses focused on patients’ early responses to treatment.
To our knowledge, our findings provide the first evidence that variability in rated CT competence is associated with subsequent variability in symptom change in a context in which this association could not be driven by improvements in outcome facilitating greater competence. Although such a demonstration is not sufficient to prove causality, it is a necessary precondition for any such claim. Our results show that competence ratings predict session-to-session symptom change early in treatment, when patients improve most rapidly. Early competence ratings also predicted end-of-treatment symptom severity, although this relationship was fully significant on only one of two measures of depression severity and represented by a non-significant trend on the other. The observed effects were similar in magnitude, though reduced to a non-significant trend, when variations in the observed difficulty of the patients were taken into account. Taken together, these findings argue for the utility of ratings of therapist competence.
Although there were differences in competence ratings across therapists, the evidence for therapist differences on outcome was less robust. Failures to find such therapist differences should be interpreted cautiously as therapists were selected for inclusion in the study on the basis of their likely competence in conducting CT (perhaps leading to a restriction of range) and the total number of therapists included was quite small. Nonetheless, we did find significant therapist effects on outcome in the model that tested the prediction of longer-term symptom change as assessed by the HRSD. Although one might expect competence ratings to account for any effect of therapist on outcome, we failed to find this. Instead, therapist differences on longer-term symptom improvement on the HRSD appeared to be independent of competence ratings. That is, therapist effects on outcome were not accounted for by competence ratings. Likewise, the effect of competence ratings on outcome was not accounted for by therapist differences. Thus, in our data, the predictive value of competence ratings appeared to stem more from capturing variability in competence across sessions conducted by the same therapist than in capturing variability in competence across different therapists. It is important to note that patients were not randomly assigned to therapists. Instead patients were assigned to therapists in part on the basis of scheduling considerations, with a small number of the most problematic cases at each site assigned to the therapist at each respective site who was most willing to take on these cases. Given the lack of randomization of patients to therapists, therapist may have emerged as a predictor of longer-term symptom change as assessed by the HRSD either because of differences in patient characteristics assigned to different therapists or because of differences in therapist competence not captured by our competence ratings. This latter possibility leads us to wonder whether there may be a need to revise our understanding of what constitutes therapist competence. Competence assessments are based on current theoretical understandings of CT—specific aspects of competence have not been subjected directly to empirical scrutiny. Therefore, current models of therapist competence may fail to capture important aspects of competence. Future research could productively evaluate potential refinements of measures of therapist competence and the utility of such refinements for predicting subsequent outcomes.
In our view, several efforts would likely prove useful in improving measurements of competence and improving our knowledge of the practical utility of such judgments. First, as judgments of competence appear less reliable among judges who have not trained together (Jacobson & Gortner, 1999), greater specificity in how to make these ratings would be beneficial. In an effort to provide greater specificity, we are currently working to identify which specific therapist behaviors are the most important determinants of experts’ evaluations of competence. Second, refinements in the CTS should be evaluated empirically to determine which of the factors experts judge as important to competence are in fact predictive of superior patient outcomes. Finally, because competence ratings are often used to make decisions about therapists, competence measures should be evaluated for their ability to distinguish therapists (rather than merely distinguishing therapist-patient dyads of a particular therapist). This will likely require larger samples and inclusion of therapists with greater diversity of CT experience and expertise than was present in this study.
Although our analyses of patient characteristics that might moderate the relationship between competence and outcome were exploratory, they point to some interesting possibilities. In the session-to-session analyses, we found evidence to suggest that competence is most strongly related to subsequent symptom change for patients with an early age of onset and higher levels of comorbid symptoms of anxiety. There was also a trend for competence to predict session-to-session symptom change more strongly among patients with dysthymia or chronic depression. For each of these interactions, the respective patient characteristic predicted smaller session-to-session symptom improvements early in treatment. The combination of these characteristics and lower competence scores therefore predicted especially smaller-than-average improvements. Interestingly, there was no evidence that competence predicted outcome more strongly among patients with personality disorder diagnoses. This is not to say that personality disorder patients are not at greater risk of poor treatment response. As has been reported using this same sample, the presence of personality disorders was associated with a poorer treatment response in patients in CT (Fournier et al., 2008). However, we failed to find evidence that therapist competence is particularly important to outcome among patients with personality disorders. As noted previously, diagnoses of some personality disorders served as exclusion criteria for the trial (viz., borderline, anti-social, and schizotypal personality disorders). Our results only constitute a test of whether therapist competence may be of greater consequence among depressed patients with those personality disorders which were represented in our sample. Whether or not these findings would extend to samples that include patients with personality disorders not represented in this study is an important topic for future research.
The moderation analyses we conducted should be interpreted with caution as most effects were limited to the session-to-session analyses focused on more immediate symptom change. Nonetheless, the results of analyses involving anxiety were particularly robust. Compared to patients with low levels of anxiety, competence was a particularly strong predictor of subsequent symptom change among patients with high levels of anxiety. This was true both in the session-to-session analyses and in the longer-term model using the BDI. The longer-term model using the HRSD showed a non-significant trend in the same direction. Thus, of the possible moderators we examined, the evidence was strongest in suggesting the relationship between competence and subsequent symptom change was largest among patients with higher levels of anxiety. Interestingly, in a subset of the current sample, our group has found that therapists’ use of strategies specifically tailored to anxiety symptoms was associated with less substantial subsequent change in depressive symptoms (Gibbons & DeRubeis, 2008). Thus, therapists are not likely to have achieved their higher competence ratings in working with patients with higher levels of anxiety by incorporating greater use of anxiety-focused intervention strategies. Rather, overall competence with CT for depression (rather than focusing more on anxiety-specific strategies) predicted greater subsequent symptom change.
There are several limitations that merit consideration. First, as noted previously, although the Cognitive Therapy Scale reflects the current standard for assessing competence, it was constructed largely on theoretical rather than empirical grounds. Refinements in how competence is conceptualized and measured may still be needed. Second, as our sample was composed of moderately to severely depressed outpatients, our results may not extend to milder forms of depression. In addition, given the restricted range of symptom severity, our data do not allow us to test whether competence and outcome may be more strongly related among more severely depressed patients. Third, although the raters of therapist competence had been trained in CT, they were not highly expert. Therefore, our data suggest that competence assessments as rated by relatively novice cognitive therapists are useful in predicting subsequent symptom change, and they may therefore represent a lower bound, all else equal, of what might be found if more expert cognitive therapists were to provide competence ratings.
An association between CT competence and symptom improvement was observed using a research design that served to rule out the possibility that the association reflected an effect of symptom improvement on the relevant therapist behaviors. In specific subsets of depressed patients, most notably those with relatively high levels of comorbid anxiety, the prediction of symptom change from rated competence was particularly robust, and therefore more likely to be an important causal factor in the process of change.
This research was supported by National Institute of Mental Health Grants MH55877 (R10), MH55875 (R10), MH01697 (K02), and MH01741 (K24). GlaxoSmith-Kline (Brentford, United Kingdom) provided medications and pill placebos for the trial. No authors have relevant conflicts of interest to disclose. We thank our colleagues for making this research possible. Paula R. Young and Margaret L. Lovett served as the two study coordinators. John P. O’Reardon, Ronald M. Salomon, and the late Martin Szuba served as study pharmacotherapists (along with Jay D. Amsterdam and Richard C. Shelton). Cory P. Newman, Karl N. Jannasch, Frances Shusman, and Sandra Seidel served as the cognitive therapists (along with Robert J. DeRubeis and Steven D. Hollon). Jan Fawcett provided consultation with regard to the implementation of clinical management pharmacotherapy. Aaron T. Beck, Judith Beck, Christine Johnson, and Leslie Sokol provided consultation with respect to the implementation of cognitive therapy. Madeline M. Gladis and Kirsten L. Haman oversaw the training of the clinical interviewers. David Appelbaum, Laurel L. Brown, Richard C. Carson, Barrie Franklin, Nana A. Landenberger, Jessica Londa-Jacobs, Julie L. Pickholtz, Pamela Fawcett-Pressman, Sabine Schmid, Ellen D. Stoddard, Michael Suminski, and Dorothy Tucker served as the clinical interviewers. Joyce L. Bell, Brent B. Freeman, Cara C. Grugan, Nathaniel R. Herr, Mary B. Hooper, Miriam Hundert, Veni Linos, and Tynya Patton provided research support.
|PAPER SECTION And topic||Item||Description||Reporte d on Page #|
|TITLE & ABSTRACT||1||How participants were allocated to interventions (e.g., “random allocation”, “randomized”, or “randomly assigned”).||8|
|2||Scientific background and explanation of rationale.||3–8|
|3||Eligibility criteria for participants and the settings and locations where the data were collected.||8–9|
|Interventions||4||Precise details of the interventions intended for each group and how and when they were actually administered.||8–10|
|Objectives||5||Specific objectives and hypotheses.||8|
|Outcomes||6||Clearly defined primary and secondary outcome measures and, when applicable, any methods used to enhance the quality of measurements (e.g., multiple observations, training of assessors).||10–14|
|Sample size||7||How sample size was determined and, when applicable, explanation of any interim analyses and stopping rules.||8 & 14|
|Randomization -- Sequence generation||8||Method used to generate the random allocation sequence, including details of any restrictions (e.g., blocking, stratification)||9|
|Randomization -- Allocation concealment||9||Method used to implement the random allocation sequence (e.g., numbered containers or central telephone), clarifying whether the sequence was concealed until interventions were assigned.||9|
|Randomization -- Implementation||10||Who generated the allocation sequence, who enrolled participants, and who assigned participants to their groups.||9|
|Blinding (masking)||11||Whether or not participants, those administering the interventions, and those assessing the outcomes were blinded to group assignment. When relevant, how the success of blinding was evaluated.||8–10|
|Statistical methods||12||Statistical methods used to compare groups for primary outcome(s); Methods for additional analyses, such as subgroup analyses and adjusted analyses.||12–14; 15–20|
|13||Flow of participants through each stage (a diagram is strongly recommended). Specifically, for each group report the numbers of participants randomly assigned, receiving intended treatment, completing the study protocol, and analyzed for the primary outcome. Describe protocol deviations from study as planned, together with reasons.||8 & 14; also see appendix|
|Recruitment||14||Dates defining the periods of recruitment and follow-up.||9|
|Baseline data||15||Baseline demographic and clinical characteristics of each group.||8–9; 11|
|Numbers analyzed||16||Number of participants (denominator) in each group included in each analysis and whether the analysis was by “intention-to-treat”. State the results in absolute numbers when feasible (e.g., 10/20, not 50%).||8; 14|
|Outcomes and estimation||17||For each primary and secondary outcome, a summary of results for each group, and the estimated effect size and its precision (e.g., 95% confidence interval).||15–20 (tab. 1, 2; fig. 1)|
|Ancillary analyses||18||Address multiplicity by reporting any other analyses performed, including subgroup analyses and adjusted analyses, indicating those pre-specified and those exploratory.||NA|
|Adverse events||19||All important adverse events or side effects in each intervention group.||NA|
|20||Interpretation of the results, taking into account study hypotheses, sources of potential bias or imprecision and the dangers associated with multiplicity of analyses and outcomes.||20–24|
|Generalizability||21||Generalizability (external validity) of the trial findings.||21–22; 24|
|Overall evidence||22||General interpretation of the results in the context of current evidence.||20–24|
NA = not reported in this paper, but please see DeRubeis et al. (2005) for additional details.
1In a separate paper, we examined ratings of therapist adherence and alliance as predictors of subsequent symptom change among this same sample of patients (Strunk, Brotman, & DeRubeis, in press). Competence and adherence are widely regarded as related, but conceptually distinct in the literature (Barber et al., 2007). Additional analyses were consistent with this view. We examined whether 9 adherence items (the Cognitive Methods items, the factor accounting for the most variance in adherence items) and the 11 competence items reflected one or two factors. Results suggested a two factor solution provided a better fit to the data.
2In fact, our expectations were confirmed. When we examined whether competence ratings predicted the slope of symptom change (following the first four sessions through the end of treatment), we failed to find a significant competence by time interaction in a model examining change in HRSD scores and in a model examining change in BDI scores (ps for interaction of competence and time > .2 in each model).
3To facilitate comparison with models tested by Shaw et al. (1999), we also examined competence ratings as a predictor of subsequent outcome while including additional covariates. We tested all models with two sets of covariates: (1) the overall adherence measure used by Shaw and colleagues; and (2) the overall adherence measure and a measure of the working alliance (for more information, see Strunk et al., in press). Models of session-to-session symptom change and longer-term models (for both the BDI and HRSD) were examined. In contrast to the results of Shaw and colleagues, in all models competence ratings failed to predict subsequent outcomes once covariates were entered (all ps > .15).
4In addition to examining whether patients met criteria for a personality disorder, we also examined the total number of personality disorder criteria each patient had satisfied at the intake assessment. There was no evidence the association between competence and outcome differed as a function of number of personality disorder criteria satisfied. For models of both HRSD and the BDI models, the interaction of competence and number of personality disorder criteria was non-significant (ps > .5).
5In the two models examining anxiety as a moderator of the relationship between competence and post-treatment symptom severity, higher levels of anxiety were predictive of higher post-treatment symptom severity at the level of a non-significant trend (in the HRSD model: r = .24, t = 1.78, p = .08; in the BDI model: r = .24, t = 1.71, p = .09).
Publisher's Disclaimer: The following manuscript is the final accepted manuscript. It has not been subjected to the final copyediting, fact-checking, and proofreading required for formal publication. It is not the definitive, publisher-authenticated version. The American Psychological Association and its Council of Editors disclaim any responsibility or liabilities for errors or omissions of this manuscript version, any version derived from this manuscript by NIH, or other third parties. The published version is available at www.apa.org/pubs/journals/ccp
Daniel R. Strunk, Ohio State University.
Melissa A. Brotman, National Institute of Mental Health.
Robert J. DeRubeis, University of Pennsylvania.
Steven D. Hollon, Vanderbilt University.