|Home | About | Journals | Submit | Contact Us | Français|
Major depressive disorder (MDD) is highly prevalent and associated with disability and chronicity. Although cognitive therapy (CT) is an effective short-term treatment for MDD, a significant proportion of responders subsequently suffer relapses or recurrences.
This design prospectively evaluates: 1) a method to discriminate CT-treated responders at lower versus higher risk for relapse; and 2) the subsequent durability of 8-month continuation phase therapies in randomized higher risk responders followed for an additional 24-months. The primary prediction is: after protocol treatments are stopped, higher risk patients randomly assigned to continuation phase CT (C-CT) will have a lower risk of relapse/recurrence than those randomized to fluoxetine (FLX).
Outpatients, aged 18 to 70 years, with recurrent MDD received 12–14 weeks of CT provided by 15 experienced therapists from two sites. Responders (i.e., no MDD and 17-item Hamilton Rating Scale for Depression ≤ 12) were stratified into higher and lower risk groups based on stability of remission during the last 6 weeks of CT. The lower risk group entered follow-up for 32 months; the higher risk group was randomized to 8 months of continuation phase therapy with either C-CT or clinical management plus either double-blinded FLX or pill placebo. Following the continuation phase, higher risk patients were followed by blinded evaluators for 24 months.
The trial began in 2000. Enrollment is complete (N=523). The follow-up continues.
The trial evaluates the preventive effects and durability of acute and continuation phase treatments in the largest known sample of CT responders collected worldwide.
Although the past 20 years have witnessed the introduction of a number of new antidepressant medications, many people with depression prefer psychotherapy over pharmacotherapy [1–4]. Acute phase cognitive therapy (CT) is the most extensively researched psychological treatment of depression [5–7] as well as the most influential contemporary model of psychotherapy . Moreover, when compared to pharmacotherapy, CT may have the greatest potential to prevent depressive relapse after time-limited therapy . Nevertheless, prophylaxis following CT is not universal; in some studies, relapse and recurrence rates over 50% have been reported within one year of termination of acute phase CT .
To reduce the risk of depressive relapse following acute phase CT, Jarrett developed a continuation phase model of cognitive therapy (C-CT) [10–11] and established its efficacy compared to historical  and assessment-only  control groups. Jarrett et al.  found that C-CT resulted in a significant reduction in the risk of depressive relapse over the 8 months that patients received it. The risk of relapse and recurrence over 24 months was moderated by risk factors, including early age of onset and the pattern of patients’ remission of depressive symptoms during the final weeks of acute phase CT. Specifically, the subset of patients who had a later age of onset or achieved a stable remission (defined below) by the end of 12 weeks of acute phase CT could terminate with a lower subsequent risk of relapse and recurrence. In contrast, those with an earlier age of onset or showing less stable or incomplete remission at the end of acute phase CT were at increased relapse and recurrence risk - unless they received C-CT [13–15]. For those at higher risk, the effect size of C-CT was as large as that observed in placebo-controlled continuation phase therapy studies of FDA-approved antidepressants [9, 16–17].
The rationale for the current 8-month, randomized trial and the linked 2-year follow-up study is to assess the risk of relapse and recurrence following shorter and longer courses of CT and to allow a full investigation of the largest series of patients treated with research-quality CT, as a solitary intervention, studied worldwide1. The 8-month, randomized clinical trial tested the efficacy of C-CT or active pharmacotherapy (fluoxetine; FLX) plus clinical management compared to pill placebo (PBO) plus clinical management. The additional 2-year follow-up made it possible to determine the extent to which the hypothesized, prophylactic advantage of C-CT (versus fluoxetine) persisted following treatment termination . Further, the findings allow comment on the extent to which the preventive effects of C-CT, while it is received, can actually “convert” the prognosis of the higher risk patients to a level of vulnerability comparable to the lower risk cohort.
The trial and longitudinal follow-up are called: Continuation Phase Cognitive Therapy Relapse Prevention [C-CT-RP] Trial. It was registered at ClinicalTrials.gov (NCT00118404, NCT00183664, and NCT00218764). This relapse prevention trial followed an initial 12–14 week course of acute phase CT for adult outpatients with recurrent major depressive disorder (MDD). Acute phase responders (see Primary Outcomes) were stratified into higher and lower risk groups based on the extent and stability of their remission (see Stratification and Randomization). The randomization focused only on the patients at higher risk for relapse after acute phase therapy and compared the efficacy of 10 sessions within 8 months of C-CT [10, 14] versus clinical management plus pharmacotherapy (double-blinded FLX or identically appearing PBO) in preventing relapse or recurrence over 1 year after protocol treatments were discontinued. While such follow-up lasted 2 years, the statistical power is sufficient to compare higher risk cells over only the first year post randomization (see Sample Size Determination). The purpose of the trial is to conduct the first adequately controlled test of the prophylactic efficacy of C-CT among patients at higher risk of relapse and recurrence. We predict that over 20 months post randomization, C-CT will be more effective than FLX in: a) preventing relapse and recurrence and b) consolidating response into remission and recovery.
This research was approved by Institutional Review Boards (IRBs) at each site (see below) prior to initiation and will be reviewed annually through completion. Adverse events have been and will be reported to each IRB and to the project’s Data Safety and Monitoring Board (DSMB). The DSMB used no a priori stopping guidelines, and has voted annually to continue data collection. All potential participants who were seen in research clinics provided written HIPAA authorization and written informed consent for evaluation and treatment. Prior to study entry, participants viewed a standardized videotape concerning the risks of recurrent depression, its treatment, and the requirements of participation in this randomized trial and follow-up.
The patients were recruited by teams in the Department of Psychiatry, Psychosocial Research and Depression Clinic at The University of Texas Southwestern Medical Center at Dallas (Principal Investigator: Robin B. Jarrett, Ph.D.) and in the Mood Disorders Treatment Research Program at the Western Psychiatric Institute and Clinic of the University of Pittsburgh Medical Center (Principal Investigators: Michael E. Thase, M.D. and Edward Friedman, M.D.). Recruitment methods included project promotion through IRB approved ads on the internet and in newspapers, churches, hospitals, clinics, and other community settings. Patients were both self- and practitioner-referred. Patient recruitment lasted from January 3, 2000 through July 30, 2008.
The Principal Investigators (Jarrett & Thase) began meeting in late 1996 to design the study. All three Principal Investigators will continue to meet by telephone regularly (i.e., nearly weekly) until study completion. Similarly, research coordinators and data managers meet by telephone routinely. The sites share data through a protected website and share information through e-mail communication between telephone meetings. Each research procedure has been written in detail to help personnel implement the procedures in a consistent manner across sites.
Consenting male or female outpatients with recurrent MDD, as diagnosed by the Structured Clinical Interview for DSM-IV (SCID-I) , were included when they: a) remitted between depressive episodes, had at least one prior episode with complete inter-episode recovery, or had antecedent dysthymic disorder; and b) scored 14 or more on the 17-item HRSD at the initial and second interview. Excluded patients: (a) had severe or poorly controlled concurrent medical disorders that may cause depression or require medication that could cause depressive symptoms; (b) suffered from the following concurrent DSM-IV psychiatric disorders: any psychotic or organic mental disorder, bipolar disorder, active alcohol or drug dependence, primary (i.e., predominant) obsessive compulsive disorder or eating disorders; (c) could not complete questionnaires written in English; (d) were an active suicide risk; (e) had previously not responded to a trial of at least 8 weeks of CT conducted by a certified therapist; (f) had previously not responded to at least 6 weeks of 40 mg of FLX; (g) were pregnant or planned to become pregnant during the first 11 months after intake; or (h) did not provide informed consent. Excluded patients, including those with active suicide risk, were referred for non-protocol treatment (e.g., hospitalization and/or pharmacotherapy).
When a review of medical history necessitated doing so, a physical examination and appropriate laboratory tests were obtained to ensure that patients were diagnostically eligible. Patients who were taking psychotropic medication were withdrawn from the medication and had to be unmedicated for at least one week prior to study entry.
Figure 1 illustrates the randomized, parallel group design in which patients in the higher risk stratum were randomized to one of three arms (1:1:1) for 8 months of continuation phase therapy. Figure 1 also shows that the lower risk stratum was assessed at the same time points but was not part of the randomized design. This study was powered to test the following Primary Hypothesis concerning the higher risk stratum: compared to FLX plus clinical management, C-CT will significantly reduce relapse and recurrence rates in higher risk patients from randomization through one year after continuation phase protocol treatments are stopped (i.e., over 20 months). Secondary hypotheses include: Secondary Hypothesis A: patients who receive either of the active continuation phase treatments (i.e., C-CT or FLX plus clinical management) will have a significantly lower risk of relapse over the 8 months than patients randomized to the PBO plus clinical management arm; Secondary Hypothesis B: over the 8 months that C-CT is in effect, it will “convert” the relapse risk of higher risk patients to a rate comparable to that of the lower risk patients for the same 8 month interval; Secondary Hypothesis C regarding the unrandomized lower risk stratum: the risk of relapse and recurrence over the full 32-month observational period will be significantly less for the lower risk group than for the higher risk comparators.
The sample size was set to test the study’s primary hypothesis. The hypothesized rates of relapse/recurrence were 30% for C-CT and 60% for FLX one year after discontinuation of therapy, based on previous studies [13, 20]. A minimum sample size of 180 randomized patients was necessary to detect a significant difference in a one-sided log rank test where alpha was 0.05 and power was 80%. In order to accommodate the original estimates that 80% of the responders would fall into the higher risk stratum and 35% would drop-out during the experimental phase, we estimated that no less than 225 responders would need to consent to observation. We estimated that the response rate to acute phase CT would be approximately 50%, which suggested that no less than 450 patients would need to consent to participate in acute phase CT. The randomized controlled trial (RCT) was not powered to test the secondary hypothesis A.
As can be seen in Figure 1 and Table 1, the overall study has four phases: (a) initial evaluation, (b) approximately 12–14 weeks of CT, (c) 8 months of continuation phase treatment (higher risk patients) or follow-up only (lower risk patients), and (d) 2 additional years of follow-up free of protocol treatment. Blinded evaluations are conducted every 4 months for 32 months, with the first blinded evaluation falling at the end of acute phase cognitive therapy (see Table 1).
Acute phase CT consisted of a 16–20 individual session protocol conducted according to the treatment manual of Beck et al.  which was videotaped. By protocol, a maximum of 14 weeks was permitted to allow missed appointments to be rescheduled. Sessions 1 – 8 occurred twice weekly. Thereafter, patients who experienced at least a 40% reduction in HRSD-17 scores began weekly sessions and were deemed “early responders”. To maximize the likelihood of response and subsequent participation in the continuation phase trial, patients who experienced <40% reduction in HRSD-17 scores at Session 9 (compared to HRSD-17 scores from the diagnostic assessment) were deemed “late responders” and continued twice weekly sessions for four more weeks. The 40% threshold was chosen based on our clinical experience.
Before therapists treated protocol patients, they were required to demonstrate proficiency in (a) CT and C-CT, as defined by the site supervisors’ judgment and by maintaining Cognitive Therapy Scale (CTS)  scores above 39 over time; and (b) their clinician ratings for the HRSD-17 and DSM-IV diagnoses of MDD. Experienced faculty led the weekly group supervision sessions at each site and provided, when needed, individual case consultation. Videotaped sessions were selected at random for review in supervisory groups. The CTS was completed by therapy supervisors and their teams to provide feedback to therapists and to assess their competence. Dr. Jarrett attended supervision at the Dallas site and was available to therapists at both sites as needed. Sander Kornblith, Ph.D. served as the on-site CT supervisor for Pittsburgh. The Principal Investigators and Data Safety and Monitoring Board (DSMB) monitored the quality of the CTS scores across sites and time.
Fifteen therapists participated, including 11 females and 4 males. At the Dallas site all therapists were PhDs; at the Pittsburgh site, three were PhDs, one was an MD, and four were master’s level. Patients were assigned to therapists based on geographic convenience and therapist availability.
Patients attended psychoeducational sessions before CT began and again within approximately 7 days of completing CT Session 11 (i.e., about Week 7 or 8 for early responders; about Week 6 or 7 for late responders). The purpose of these visits was to: (a) provide factual information about the risk of relapse/recurrence in recurrent MDD, (b) review the “road map” of treatment provided in the study, (c) verify continued consent for participation, and (d) collect self-report questionnaires.
On the basis of the final seven acute phase CT HRSD-17 evaluations (including the first blinded evaluation), responders were stratified into lower risk and higher risk groups. Within each site, the higher risk stratum was defined by any of the final seven consecutive HRSD-17 scores of ≥7; the lower risk group had all of the final seven HRSD-17 scores of <7. The higher risk patients were randomized to either C-CT or blinded FLX or PBO plus clinical management (see below). Previous findings by our teams had shown that patients with this trajectory of response were approximately four times more likely to relapse (e.g., 40% risk vs. 10% risk) following termination of acute phase CT than were patients with rapid, complete, and sustained remissions [13, 23]. At the randomization, treatment assignment was also stratified by site, by each patient’s number of prior depressive episodes (i.e., 1 or 2 versus ≥ 3), and by the presence versus absence of antecedent DSM-IV dysthymia.
A study statistician oversaw the design of the stratification and blocking, and used a computer-generated random number sequence to assign patients to cells. The randomization plan was housed at the Pittsburgh site and was only available to the pharmacists at each site. The research coordinators telephoned the pharmacy to obtain instructions to randomize the patient to either C-CT or “medication clinic” (i.e., clinical management plus FLX or PBO). The pharmacists could break the blind (i.e., assignment to FLX or PBO) to non-protocol staff if a medical emergency occurred. Except for the dispensing pharmacists, all research personnel have been kept blinded to assignment to FLX or PBO.
With few exceptions (e.g., therapist maternity leave), the same therapist conducted the acute and continuation phases of CT. Continuation phase CT [10, 14] consisted of 10, 60-minute sessions spread over 8 months (i.e., every other week for the first 2 months, followed by six monthly sessions). The primary aim of C-CT was to prevent relapse and to promote full remission and recovery. Patients were taught to use symptoms and emotional distress to activate their use of coping skills learned in acute phase CT. Symptom-reducing cognitive-behavioral techniques (e.g., cognitive restructuring and behavioral activation) were used; the more symptomatic the patient, the more the therapist structured the session. The therapists and patients collaborated to reduce residual symptoms, improve coping with adversity, decrease the probability of stressful events, and enhance behavioral and cognitive strengths. The therapists helped the patients identify and master the cognitive behavioral strategies that they found most effective in the acute phase for reducing distress and symptoms. They identified cognitive, interpersonal, or emotional vulnerabilities and continued to practice these self-help skills over time, applying them to new challenges as they arose or were anticipated. The patient assumed more responsibility for the session over the course of C-CT. The patient and therapist collaborated in identifying and enhancing patient strengths. The therapist concentrated on promoting generalization and maintenance of (a) compensatory skills that the patient had previously found helpful in symptom reduction and (b) coping strategies to prepare for identified or anticipated vulnerabilities.
After an initial 30–45 minute session, patients were seen for 15–30 minute sessions at the same frequency as C-CT. Sessions were conducted by experienced clinicians according to a revised version of the manual by Fawcett et al.  Clinicians were not permitted to use any C-CT intervention and instead focused on supportive contact around the signs and symptoms of the illness, beneficial and untoward effects of the medication, and information about depression. Side effect ratings were collected at each session using the Somatic Symptom Scale  and scored using the UKU (Udvalg for Kliniske Undersøgelser) method . Fluoxetine was chosen because of established continuation phase efficacy  and a low risk of discontinuation syndromes . Research pharmacies at both sites dispensed medication in unit-dose bottles. Patients initially received either FLX 10 mg/day or identically appearing PBO capsules. When necessary, the initial FLX dose could be decreased to every other day to lessen side effects. When there were no dose-limiting side effects, doses of 20 mg and 40 mg/day were prescribed at the second and third visit, respectively. These dosages were chosen on the basis of prior research in which FLX at 40 mg/day offered a balance of efficacy and tolerability [20, 28]. Patients who could not tolerate this dosage remained in the study on the maximally tolerated dose (minimum dose: 10 mg/day). Patients who could take any dosage of FLX or PBO were followed for supportive care by treating clinicians. During the first six years of the project, fluoxetine and identically appearing placebo capsules were provided by Eli Lilly and Company at no cost to the project. Thereafter, fluoxetine and identically appearing placebo capsules were purchased and formulated by the research pharmacy at the Dallas site.
The integrity of the pharmacotherapy condition was ensured by using experienced clinicians and a standardized supportive clinical management protocol ; Dr. Thase was available for consultation as needed. In addition, medication adherence was monitored at each visit by the attending clinician and research pharmacy (i.e., pill counts).
Seven male psychiatrists (two in Dallas and five in Pittsburgh) prescribed double-blinded FLX or PBO.
Lower risk patients were followed for 32 months after the first blinded evaluation (i.e., beginning at the date of randomization). They were seen by a blinded evaluator at four month intervals (i.e., Months 4, 8, 12, 16, 20, 24, 28, and 32 or exit). Throughout follow-up, patients in the lower risk stratum agreed to refrain from taking mood altering medications or drugs; the evaluator asked about such consumption at each visit.
A clinical evaluator (masked to risk and treatment assignment) collected symptom severity measures and assessed diagnostic status according to DSM-IV criteria for a major depressive episode using the SCID-I  and the Psychiatric Status Ratings (PSR) within the Longitudinal Interval Follow-up Evaluation (LIFE-II) . These videotaped evaluations were conducted: (a) at the end of the acute phase (i.e., within approximately 7 days of their last acute treatment session), (b) any time the patient exited the protocol, (c) at Months 4 and 8 of the continuation phase, (d) at Months 12, 16, 20, 24, 28, and 32 during the longitudinal follow-up, and (e) at any other time a relapse or recurrence was suspected. In order to promote complete data collection, patients were paid approximately $20 per blinded evaluation. The following safeguards were used to preserve the blind: (a) each site had at least two evaluators who were “housed” outside of the research clinic, (b) each evaluation began by reminding patients not to disclose treatment assignments, and (c) if the blind was broken for one evaluator, the second collected the remaining assessments.
At the end of the continuation phase, which was 8 months post-acute phase CT, both higher risk and lower risk patients were followed for 2 years, (i.e., 32 months with the count beginning at the first blinded evaluation). In Dallas only, higher risk patients were seen at two post treatment clinic visits (monthly for two months) at the beginning of the longitudinal follow-up. Patients were encouraged to contact study staff if they suspected a relapse or recurrence, at which time a blinded evaluation occurred. When the DSM-IV criteria for a major depressive episode (i.e., MDE) were satisfied according to the blinded evaluator, protocol staff encouraged patients to begin appropriate extra protocol treatment and assisted them in locating such treatment. Blinded evaluations were conducted as described above. Throughout follow-up, and similar to lower risk patients, the higher risk patients agreed to refrain from taking mood altering medications or drugs. The evaluator asked about and recorded consumption at each visit (see Non-Protocol Treatment). Patients continued in follow-up after relapse and recurrence, regardless of whether they initiated non-protocol treatment.
Relapse and recurrence were the primary dependent variables within the follow-up. The definitions of these and other dichotomous outcomes are as follows:
Response to acute phase CT is defined by an independent evaluator at the end of the acute phase as: (a) the absence of DSM-IV MDE and (b) a 17-item HRSD ≤ 12. This relatively broad definition of response determines eligibility to enter the continuation phase study (for higher risk patients) or follow-up (for lower risk patients).
Stable Remission represents a period of time that the patient is not fully symptomatic and is the operational definition of lower risk. Stable remission is defined by: (a) the last seven consecutive HRSD-17 scores < 7 during the acute phase or (b) return to “usual self” according to the LIFE, as denoted by six Psychiatric Status Ratings (PSR) ≤ 2 over 6 weeks after randomization, according to the blinded evaluator.
Relapse denotes a continuation or exacerbation of the presenting episode. Relapse is defined by a blinded evaluator as meeting DSM-IV criteria for MDD (i.e., LIFE PSR score of 5 or 6 for 2 consecutive weeks) that occur before the criteria for recovery (see below) have been met.
Recovery, the end of an episode, is a remission lasting ≥ 8 consecutive months.
Recurrence, a new episode, is defined by a blinded evaluator as meeting DSM-IV criteria for MDD (i.e., LIFE PSR score of 5 or 6 for 2 consecutive weeks), which occurs after recovery.
To maintain a high level of diagnostic reliability within and across the sites, evaluators (i.e., diagnostic evaluators, blinded evaluators, pharmacotherapists, and cognitive therapists) participated in formal training on the use of the SCID-I . This training consisted of observing and being observed by highly reliable and experienced evaluators until the trainee achieved agreement with ratings, history of illness descriptors, and DSM-IV diagnoses.
After evaluators were trained, the sites completed inter- and intra-site reliability studies on DSM-IV Current Major Depressive Disorder (MDD) diagnoses and the Hamilton Rating Scale for Depression (HRSD-17) . The reliability sessions occurred regularly for the duration of the study. Videotapes were randomly selected from blinded evaluations or treatment sessions, and the evaluators on the videotapes were rotated. During each reliability session, evaluators rated the two videotapes (one from each site) in groups; a discussion of discrepant ratings was included. If an evaluator’s score on the HRSD-17 differed by 4 or more points from the group mean (i.e., his/her site’s mean), additional training was provided to allow his or her ratings to converge with the group’s ratings, including those of the senior diagnosticians.
Intra-class correlation coefficients for the HRSD-17 and DSM-IV Current MDD diagnoses were also calculated within and across the two sites. The Principal Investigators, study teams, and the DSMB monitored the quality of the ratings across data collection.
The National Institute of Mental Health (NIMH) issued a Certificate of Confidentiality, which protects all study participants’ data until the certificate expires.
As outlined below, 12 domains were assessed according to the frequency detailed below and in Table 2. (Note that below “Week 12” denotes the end of acute phase CT, which can occur between Weeks 12 and 14). Detailed descriptions of the measures and their psychometric properties are available in Appendix 1.
Patient demographics (e.g., gender, age, sex, marital status, education) were collected at the diagnostic evaluation using the “Patient Background” self-report form.
The Current Major Depressive Disorder (MDE) section of the Structured Clinical Interview for DSM-IV  was administered at the diagnostic evaluation, once during Weeks 4, 8, and 12 of the acute phase and at all blinded evaluations. Higher risk patients received the MDE section at all continuation phase sessions and at two post-treatment clinic visits (monthly for 2 months). Details of past depressive episodes and MDE subtyping for the current episode (e.g., recurrent, atypical) were recorded on the “MDE Specifiers and Past MDE” at the diagnostic evaluation. All other diagnoses were recorded on the “Initial Evaluation Summary.”
The principal symptom severity measures include the Hamilton Rating Scale for Depression (HRSD-17) [30–31], Global Assessment of Functioning (GAF) , Inventory of Depressive Symptomatology - Self Report (IDS-SR) , and the Beck Depression Inventory (BDI) . Sleep disturbance was assessed using the Pittsburgh Sleep Quality Index (PSQI) . The reliability and validity of these measures are well established. Clinicians administered the HRSD-17 and GAF and patients completed the BDI and IDS-SR at diagnostic evaluation, follow-up interview, weekly during acute phase treatment, and at all blinded evaluations (i.e., Months 0, 4, 8, 12, 16, 20, 24, 28, and 32 or exit. Higher risk patients completed the BDI and IDS-SR and received the HRSD-17, GAF, and Somatic Symptoms Scale (SSS; to assess side effects)  at all continuation phase sessions and at two post-treatment clinic visits (monthly for 2 months). The PSQI was completed at the follow-up interview, CT Session 1, and acute phase CT Weeks 4, 8, and 12. Higher risk patients completed the PSQI at continuation phase Sessions 1, 3, 5, and 6–10 and two post treatment clinic visits. The PSQI was also completed at the blinded evaluations for Months 12–32 or exit. Higher scores on the HRSD-17, IDS-SR, BDI, PSQI, and SSS indicate more severe depressive symptoms. The closer a GAF score is to 100, the higher the patient’s level of overall functioning. The 17-item version of the HRSD-17, 30-item version of the IDS-SR, 21-item version of the BDI, 11-item version of the PSQI, and the 24-item version of the SSS were used in this protocol.
Self-report measures of cognitive content included the updated Attributional Style Questionnaire (ASQ) , the Dysfunctional Attitude Scale (DAS - Forms A & B) , the Beck Hopelessness Scale (BHS) , and the Self Control Schedule (SCS) . Patients completed these measures at intake (diagnostic evaluation or follow-up interview), at the second psychoeducational visit, and at all blinded evaluations. Higher risk patients also completed the ASQ and DAS at post-randomization Months 2 and 6 during the continuation phase. Higher scores on the two ASQ subscales indicate that the respondent’s attributions for negative events are more stable and more global. Higher scores on the DAS indicate more dysfunctional attitudes. Higher scores on the BHS reflect a sense of hopelessness about the future. A higher score on the SCS connotes a higher degree of learned resourcefulness. The 12-item ASQ, 40-item DAS-A/B, 20-item BHS, and the 36-item SCS were used in this protocol.
Consenting patients participated in a standardized challenge test of cognitive-affective reactivity [40–42] performed at the end of the acute phase, at the end of the continuation phase (Month 8), and at Month 32 or exit. Responders to acute phase CT first completed a mood Visual Analogue Scale (VAS)  and either Form A or B of the DAS. The VAS used here measures 100 mm from endpoint to endpoint. The descriptors ‘sad’ and ‘happy’ are located on the left and right sides respectively, with arrows indicating increasing strength of mood associated with greater distance from the center. Participants were instructed to mark an X on the line corresponding to their current mood state. Participants next listened to a piece of music (Prokofiev’s orchestral introduction to the film Alexander Nevsky, entitled “Russia Under the Mongolian Yoke,” re-mastered at half-speed), while they recalled a sad event or memory from their lives. This type of induction, combining elements of music associated with sad mood and autobiographical recall, has been found to be effective in bringing on transient dysphoric mood states [41–42, 44]. After listening to the music, participants completed a second VAS and the alternate form of the DAS. After responding to the final DAS, participants rated a third VAS and were debriefed by the evaluator. The order in which DAS-A and B were presented was randomly assigned. Patients received an additional $10 compensation for each mood induction procedure.
The Social Adjustment Scale-Self Report (SAS-SR) , Dyadic Adjustment Scale  (DYS; completed only by patients in a committed relationship), and Inventory of Interpersonal Problems (IIP)  were completed at intake (diagnostic evaluation or follow-up interview), at Week 1 of acute phase CT, at the second psychoeducational visit, and at all blinded evaluations; higher risk patients also completed these measures at Months 2 and 6 (Sessions 4 & 8) during the continuation phase. A higher score on the SAS-SR represents less social adjustment. A higher score on the DYS reflects positive dyadic functioning from the perspective of one partner in a committed dyad. Higher scores on the IIP reflect greater interpersonal problems. All patients with significant others also completed the Perceived Criticism Scale (PCS)  at the follow-up interview, the second psychoeducational visit (approximately Week 7 of acute phase CT), and at all blinded evaluations. A higher score on the PCS reflects a higher degree of perceived criticism. The Psychosocial Functioning component of the LIFE was administered at the end of the acute phase and every 4 months thereafter. Therapeutic alliance was measured with the Working Alliance Inventory (WAI) , using both Client and Therapist Versions. The Therapist Version was completed at Weeks 7 and 12 of the acute phase and, for higher risk patients in C-CT, Months 4 and 8 of the continuation phase. The Client version was completed at the second psychoeducational visit and at the end of the acute phase and, for higher risk patients in C-CT, Months 4 and 8 of the continuation phase. Higher scores on these measures reflect a stronger alliance between patient and clinician. The 56-item SAS-SR, 32-item DYS, 127-item IIP, 2-item PCS, and 36-item WAI-T/C were used in this protocol.
The Schedule for Nonadaptive and Adaptive Personality – 2nd Edition (SNAP-2)  is a 390-item true-false self-report inventory developed to assess trait personality dimensions using a combination of content-based and psychometric (primarily factor-analytic) techniques. An additional 50 questions were developed by Clark to form DSM-IV diagnoses. The SNAP has a basic 3-factor structure consisting of three higher order temperament scales and 12 primary trait scales: Negative Temperament (Mistrust, Manipulativeness, Aggression, Self-Harm, Eccentric Perceptions, and Dependency), Positive Temperament (Exhibitionism, Entitlement vs. Detachment), and Disinhibition (Impulsivity vs. Propriety and Workaholism). The SNAP was completed at Weeks 1 and 12 (i.e., first or second and last acute phase CT session), the final continuation phase session (i.e., Month 8), and the final blinded evaluation (i.e., Month 32).2 The SNAP also contains a TRIN scale; higher scores indicate that the patient is acquiescent or yea-saying, while lower scores indicate that the patient is more denying. The TRIN midpoint is used to determine the Invalidity Index to measure inconsistency in responses. Higher scores indicate more inconsistencies.
The 93-item Quality of Life Enjoyment and Satisfaction Questionnaire (QOLESQ)  was collected at intake and at all blinded evaluations. This self-report measure successfully uncouples assessment of quality of life from syndromal level depressive symptoms. Higher level of enjoyment and satisfaction with life are reflected in higher scores.
All patients completed the 12-item, self-report version of the List of Threatening Experiences (LTE)  at intake, at the second psychoeducational visit, and at all blinded evaluations. The greater the score on this measure, the more traumatic events have occurred.
Patients completed the Skills of Cognitive Therapy-Patient Version (SoCT-P)  at the second psychoeducational visit and at all blinded evaluations (i.e., Months 0, 4, 8, 12, 16, 20, 24, 28, and 32, or exit). Therapists completed the Skill of Cognitive Therapy - Observer Version (SoCT-O)  at Weeks 7 and 12 of the acute phase and at Months 4 and 8 (sessions 6 and 10) during C-CT.3 Using the time frame “over the last month,” the patient and therapist rated the patient’s use of compensatory skills presented during CT and rated the likelihood that the patient will relapse or recur. These scales were used to measure the extent to which the patient used cognitive therapy skills from the perspective of an observer (i.e., the therapist) and the patient his/herself, respectively. The greater the patient’s skill, the higher the score. The 35-item SoCT-P/O was used to facilitate instrument development and will subsequently be reduced to fewer items.
Beginning in the sixth year, the Patient Attitudes and Expectations form (PAE)  was obtained at the diagnostic evaluation and at Week 12 of the acute phase. Developed for use in the NIMH Treatment of Depression Collaborative Research Program, the PAE form was used to explore how the patient’s perceptions about depression and its treatment relate to changes in the course of illness. The 46-item PAE form was collected in this protocol.
The Psychiatric Treatment History component of the LIFE assesses psychopharmacological and psychosocial treatment sought by the patient and was administered at the end of the acute phase and every 4 months thereafter.
At randomization of the higher risk patients, treatment groups were stratified by (a) site, (b) each patient’s number of prior depressive episodes (i.e., 1 or 2 versus ≥3), and (c) the presence versus absence of DSM-IV antecedent dysthymia. All outcome analyses will follow the intent-to-treat principle and will make use of all available data for all patients who were randomized into the continuation phase trial. We will use Cox multiple regression models to analyze times to relapse/recurrence. (Mixed models with repeated measures will be used for the HRSD-17 and other continuous variables as secondary outcomes.) Analyses of the primary hypotheses will be done using Kaplan-Meier survival curves to investigate main effects of treatment, site, and their interaction; alpha levels will be reported. The validity of the Cox proportional hazards assumption will be assessed using standard diagnostics. If there are no violations of the proportional hazards assumption, we will fit the Cox model and will formally test the primary hypothesis using this model adjusting for important confounders (e.g., site, number of previous episodes, presence of dysthymia, severity group, and therapist effects). Continuous, binary, or “count” outcomes will be analyzed using linear, logistic, or log-linear random effects models, respectively . The random effects models used for these analyses will have fixed effects for group differences (e.g., non-responder vs. intermediate response; C-CT vs. pharmacotherapy) and will allow for random intercept and slopes for patients. Selected interaction effects will also be included and tested as needed. Where appropriate, different imputation strategies involving secondary outcomes and/or covariates will be compared and a sensitivity analysis will be done to assess the bias introduced by the missing data.
Model assumptions will be verified using appropriate residual diagnostic analyses. Appropriate sensitivity analyses will be performed to assess model lack-of-fit and need for alternative models.
There were no planned, unmasked, interim analyses of the primary hypothesis.
While there is no cure for depression, cognitive therapy is the most frequently studied form of psychotherapy for depression and is particularly distinguished by its potential to have a more durable or enduring effect. Whereas the therapeutic effects of antidepressants gradually dissipate after treatment is discontinued, results of a comparative meta-analysis suggest that CT may have more durable effects . However, too often prophylaxis following a time-limited course of CT is incomplete and may fade with time, leaving a significant number of CT responders at increased risk for relapse and recurrence. The field needs to better understand which parameters influence when and how long CT prophylaxis can last.
To our knowledge this study will yield the largest cohort of CT patients ever collected and will follow longitudinally the largest group of CT responders up to 2 years post-treatment. Additionally, to date this is also the only study that applies a preventive model of CT focused on relapse/recurrence [10–11] in a higher risk group of patients with recurrent MDD. Further, this study is the first to evaluate (prospectively) the efficacy of continuation phase pharmacotherapy (FLX) after incomplete remission with acute phase CT. The results have the potential to help patients and their families by identifying effective treatment sequences that prevent relapse and recurrence and restore patients with recurrent depressive disorder to full functioning. The follow-up is important because it will allow comment not only on C-CT’s preventive effect on relapse (while patients receive it) but also on recurrence (after it is discontinued).
The primary prediction is that the risks of relapse and recurrence will be lower for patients receiving C-CT than for those receiving FLX from randomization through the first year that therapies were discontinued. A secondary prediction is that relapse rates for C-CT and FLX will not differ but will be less than PBO over the 8 months of continuation phase therapy. Additionally, we also predict that the risk of relapse and recurrence (a) for the lower risk group will be significantly less than that of the higher risk group and (b) for the higher risk patients undergoing “active” C-CT (i.e., over the 8 months that patients receive it) will convert to a rate comparable to that of the lower risk patients for the comparable 8-month time period.
This study is unique in that it will establish indications for a continuation phase model of psychotherapy, designed to prevent relapse and improve longer-term outcomes. Such innovations in treatments and their selection are essential for improving the course of patients’ depressive illnesses, given that many people either refuse to accept and/or fill an initial prescription or rapidly discontinue use of the antidepressant medication . Such results are extremely important to the U.S. public at a time when citizens are asking pointed questions of the pharmaceutical and health care industries regarding benefit/risk appraisals, particularly in light of the FDA’s warnings about the associated risk of suicidal behavior in children, adolescents, and young adults [57–58], the recent reports about the numbers of unpublished failed or negative trials of antidepressant medications, and the relationship of such publication bias to pharmaceutical industry funding .
Once completed, the results have the potential to: be used prospectively by therapists, lead to straightforward treatment decisions that significantly improve outcomes, yield clinically meaningful reductions in relapse and recurrence risk, and begin to understand how long preventive effects endure. Such empirically based decision rules could lower the cost of care by providing, on an individualized basis, the right amount of treatment: neither “over-treating” patients at lower risk nor “under-treating” those patients at higher risk. Such data from this ongoing, randomized clinical trial and follow-up have important implications for public health.
This report was supported by Grants Number K24-MH001571, R01 MH-58397, R01 MH-69219 (to Robin B. Jarrett, Ph.D.) and R01 MH-58356 and R01 MH-69618 (to Michael E. Thase, M.D.) from the National Institute of Mental Health. We appreciate the support of our NIMH Program Officer, Jane Pearson, Ph.D., throughout this investigation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Mental Health (NIMH) or the National Institutes of Health (NIH). We also wish to acknowledge the unrestricted support of Eli Lilly and Company, who provided the fluoxetine and matched pill placebo until 2006. Thereafter, study materials were purchased and prepared to appear identical for both sites by the pharmacy at the University of Texas Southwestern Medical Center at Dallas.
We are grateful to our patients who made this trial possible. We appreciate the dedication and longstanding commitment of our research teams and our many colleagues (see Appendix II) at the University of Texas Southwestern Medical Center at Dallas, the University of Pittsburgh (where Dr. Thase was located during patient accrual), and the University of Pennsylvania. We also appreciate the diligence of the members of the Data Safety and Monitoring Board (see Appendix II).
The patient uses this form to report demographics (i.e., age, gender, ethnicity, marital status, etc.), use of medications, drugs, and alcohol, medical history, and reproductive status.
The SCID-I/P w/PSY SCREEN  is a structured interview organized to assess Diagnostic and Statistical Manual (DSM-IV) () criteria for each disorder, including Major Depressive Disorder (MDD), in a systematic fashion. Reliability between raters using the previous version of the SCID-I was assessed with the Fleiss’ modification of the kappa statistic to account for chance agreement and variation in rater pairs. The overall kappa was 0.74 (p <0.01), with kappa for Major Depressive Disorder being 0.72 (p <0.05) .
The LIFE  is a semi-structured interview with three sections: Psychiatric Status Ratings, Psychosocial Functioning Interview, and Treatment History. Ratings are collected retrospectively in weekly (Psychiatric Status Ratings and Treatment History) or monthly (Psychosocial Functioning) intervals. The LIFE has been used in numerous studies investigating the course of Major Depressive Disorder, including the NIMH Collaborative Study on the Psychobiology of Depression . Keller et al.  reported excellent reliability for week-by-week Psychiatric Status Ratings of major episodic affective illnesses with interclass correlation coefficients of at least .90. Reliability for other disorders evaluated using the Research Diagnostic Criteria was in the good range with interclass correlation coefficients (ICCs) of at least .70. In general, reliability among psychosocial items was high, with quantitative items showing excellent reliability. While all items showed acceptable reliability, items measuring current functioning were higher than items measuring past functioning. Warshaw, Keller, and Stout  reported good to excellent agreement on minimum and maximum Psychiatric Status Ratings in the first 8 weeks or the follow-up interval and for the 8 weeks preceding the interview. Test-retest reliability for 12-month Psychosocial Functioning yielded ICCs from 0.57 to 0.81. Summary measures derived from Psychiatric Status Ratings were correlated with patients’ Global Assessment of Functioning Scale (GAS)  and the emotional function summary scores from the Medical Outcome Survey (MOS) , demonstrating good external validity (r = −.45 to −.57). For Treatment History, Warshaw, Keller, and Stout  reported very good agreement on whether or not a medication was received, with kappas ranging from 0.85 to 1.0. Dosage also yielded good to excellent agreement, with ICC of 0.80 for the first week of the follow-up interval and 0.79 for the last week of the follow-up interval. Agreement on psychosocial treatments was good with kappas from 0.68 to 0.88, depending on treatment modality. Researchers have shown that the LIFE is a reliable instrument for use in longitudinal studies, when used with an intensive training and rater monitoring system .
The LIFE Psychiatric Status Ratings track the longitudinal course of DSM-IV Axis I psychiatric disorders. Major Depressive Disorder is rated on a 6-point scale according to duration and severity with higher scores indicating greater depressive symptomotology. Scores of 5 and 6 indicate that the disorder is present according to DSM-IV criteria. Other DSM-IV diagnoses are rated on a 3-point scale with a 3 indicating the presence of a disorder.
The LIFE Psychosocial Interview assesses the longitudinal course of psychosocial functioning. Patients identify their primary roles (e.g., full-time employed, part-time employed, unemployed, homemaker) and how well they perceive themselves to be performing in the areas of work, household duties, and student activities. This measure also assesses their quality of interpersonal relationships, changes in marital status, sexual satisfaction, global satisfaction, involvement in recreational activities, overall social adjustment (as rated by the patients), and the clinician’s assessment of the patient’s level of social adjustment. Items assessing impairment or dissatisfaction are rated on a 5-point scale with higher scores indicating greater impairment or dissatisfaction. Most items assessing quality of interpersonal relationship are rated on a scale from 0 to 7. A score of 0 indicates that the relationship is not applicable. Scores of 6 and 7 indicate variability in the quality of relationship among relationships in the category (e.g., patient has a good relationship with one sibling and a poor relationship with another). Scores from 1 to 5 indicate a continuum of quality of relationship with higher scores representing poorer quality.
The LIFE Psychiatric Treatment History assesses psychopharmacological and psychosocial treatment sought by the patient. Psychopharmacological treatment is coded for the type of medication, the psychiatric or psychological diagnosis or problem for which treatment was sought, and the dosage of the medication taken per day. Psychosocial treatments are recorded for the type of treatment sought (e.g., marital therapy, individual therapy, self-help), the psychiatric or psychological diagnosis or problem for which treatment was sought, the frequency of treatment, and who was the focus of treatment (e.g., patient, patient’s child). Any psychiatric treatment the patients reported consuming outside of the protocol was recorded on the LIFE.
The HRSD  is a clinician rating scale designed to assess the severity of depressive symptoms in patients already diagnosed with MDD; higher scores reflect greater symptom severity. Each of the 17 items is rated by the clinician on either a 3- or 5-point scale, and the total score is determined by summing the item scores. Scores >24 indicate severe depression (typical of inpatients), while scores <17 suggest mild symptoms (more typical of outpatients), and scores <6 are considered to indicate an absence of depressive symptoms. With highly trained raters, the HRSD and similar specific depression symptom measures have been found to have a good inter-rater reliability (r = 0.85) . Similarly, Kobak et al.  found that raters who were experienced and calibrated had a higher inter-rater reliability (r = 0.93) than those who were inexperienced or those who were experienced but not calibrated. There are few data on the internal consistency of the measure, but Schwab, Bialow and Holzer  found that individual items correlated with total score 0.45 to 0.78. Regarding validity, Knesevich, Biggs, Clayton, and Ziegler  found HRSD change scores to be correlated 0.68 with global change scores, and numerous studies have shown significant differences in the HRSD scores of normal controls and patients with depression, supporting its criterion validity. The measure has also shown good convergent validity with other clinical self-report depression measures (rs = approximately 0.83 and 0.70, respectively) . The 17-item version of the HRSD was scored in this protocol.
Perhaps the most widely used measure of depressive symptoms is the BDI . This 21-item self-report measure was designed to assess the intensity of depressive symptoms in psychiatric patients and to detect depressive symptoms in normal populations. Items are rated on a 0- to 3-point scale with higher numbers indicating greater symptom severity. Cut-off scores are as follows: less than 10, none or minimal depression; 10–18, mild-to-moderate depression; 19–29, moderate-to-severe depression; and 30–63 indicates a severe depression. Beck, Steer and Garbin  summarized 25 years of research on the BDI. Across studies using various subject populations, they found an average internal consistency of 0.87 (range = 0.76 to 0.95) and an average short-term (<1 month) test-retest reliability of 0.60 (range 0.48–0.86). They also reported considerable evidence supporting the validity of the BDI. Correlations between BDI scores and clinical ratings of depression in psychiatric patients ranged from 0.55 to 0.96 (M = 0.72). The BDI also has good convergent validity with the HRS-D (range = 0.61 to 0.87) and with other self-report measures of depressive symptoms . In 1996, the original BDI was updated as the BDI-II . The original 21-item BDI was used in this protocol.
The IDS  was designed to measure specific signs and symptoms of depression in both inpatients and outpatients. This 28-item questionnaire was developed as both a self-report measure (IDS-SR) and a clinician rating scale (IDS-C). Each item is rated on a 0- to 3-point scale with higher scores representing greater severity. Twenty-six of the 28 items contribute to the total score, which can range from 0 to 78. The instrument was designed to assess vegetative symptoms, cognitive changes, mood disturbance, endogenous symptoms, and anxiety symptoms. Mean scores for the IDS-SR were 36.5 for patients with depression, 21.8 for other diagnostic groups, and 2.1 for normal controls. These values differ significantly and support the construct validity of the measure. The internal consistency reliability of the IDS-SR and IDS-C are high with Cronbach’s alpha (α) ratings of 0.85 and 0.88, respectively. The self-report version was highly correlated with the BDI (r = 0.78) and the HRSD (r = 0.67), as was the clinician rating, with correlations of 0.61 with the BDI and 0.92 with the HRSD. An advantage of the IDS is that it allows assessment of some atypical features. The Dallas group revised the IDS to include the previously omitted symptoms of leaden paralysis and rejection sensitivity, bringing the total number of questions to 30. Trivedi et al.  found that the IDS-SR-30 has high internal consistency at α=0.92. They also found that item-total correlations were highest for major symptoms like sad mood, anxious mood, reactivity of mood, quality of mood, concentration/decision making, self-outlook, future outlook, involvement, energy/fatigability, and pleasure/enjoyment. The 30-item IDS-SR was used in this protocol.
The GAF  is a standardized assessment of overall functioning, included as Axis V in the DSM-IV. Scores may range from 1 to 100, with descriptive anchors at each 10-point interval. Depressed outpatients typically score between 40 and 65 before treatment, and 65 to 80 after responding to treatment. A high score on the GAF indicates good functioning, where a low score indicates poorer functioning with increased impairment. The GAF is highly correlated with its predecessor, the Global Assessment Scale . Vatnaland et al.  found that interclass coefficients of .81 to .85 were high between researchers who rated patients. Recent research has confirmed that frequent training and rater calibration is important to increase reliability of GAF scores [21–22].
The PSQI  is an 11-item self-report inventory that is reliable and sensitive to change. Each item is scored from 0 (not during the past month) to 3 (3 or more times each week). Carpenter and Androkowski  examined the psychometric properties of the PSQI reporting that Cronbach’s alphas were consistent across 472 patients (bone marrow transplant, n=155; renal transplant, n=56; breast cancer, n=102; benign breast problems, n=159). Specifically, the global PSQI score had a Cronbach’s alpha of .80, while the Cronbach’s alpha for the items related to sleep disturbance ranged from .70 to .78 . Symptom clusters (components) were formed by grouping individual items that assessed a specific component of sleep quality. Correlations between global scores and symptom cluster (component) scores were moderate to high, suggesting good internal consistency. Global to symptom cluster (component) score correlations were lower (moderate) for sleep disturbance and daytime dysfunction (r =.42–.60). The correlations between global scores and specific cluster (component) scores, including sleep quality, sleep latency, and habitual sleep efficiency, were high, ranging between .70 and .83. A higher score indicates worse sleep quality. The 11-item PSQI was used in this protocol.
The SSS  is a 24-item checklist the clinician uses to record the patient’s self-report (i.e., perception) of side effects from medication and/or the patient’s perceived severity of somatic symptoms. The first 23 items assess somatic symptoms and include constructs such as dizziness, blurred vision, sexual difficulties, skin rash, and increased perspiration. The items are rated on a 3-point scale (0 = not present; 1 = present, but tolerable; 2 = present and causing significant distress or incapacity). Thus, higher total scores on the SSS indicate a greater number and severity of symptoms. The last item assesses drug and alcohol use since the previous visit. The 24-item SSS was used in this trial.
The original ASQ  assesses the extent to which the respondent attributes the cause of six positive and six negative events to factors which are internal versus external, stable versus unstable, and global versus specific. The reliability of the three original attributional dimensions was modest, which led to the development of the Expanded Attributional Style Questionnaire (EASQ) . This scale does not include any positive events and increases the number of negative events assessed from 6 to 24, which improves the reliability of the three attributional dimensions. However, the EASQ is quite lengthy, making completion time-consuming. Thus, a shortened version of the ASQ  was created which also preserves the improved psychometric properties of the EASQ. The new version of the ASQ covers 12 negative events, yields scores for stability and globality, and has been shown to have good reliability and validity . The new shortened version of the ASQ with 12 items was used in this trial. While previous versions of the ASQ used scales of 1 to 7 for scoring, this 12-item version uses a scale of −3 to 3 to clearly illustrate the bipolarity of these factors . Higher scores on the two ASQ subscales indicate that respondents’ attributions are more stable and more global. The internality factor that was used on previous versions of the ASQ was deleted since it was the least reliable factor. Although the 12-item version of the ASQ has not been used frequently, internal consistencies of .80 and .76 were initially found for stability and globality factors . Similar results were found by Riso et al. . High scores on the 12-item ASQ have been correlated with both acute and chronic depression  and lower than average rehabilitation in cardiac patients . Many studies claim to have used the ASQ, when in fact they used one of many various other versions. This makes it difficult to determine the reliability of the measure across different populations, etc. [32–33]. The construct validity of the ASQ has been supported in reviews [34–35].
The Dysfunctional Attitudes Scale (DAS - Forms A and B) is a 40-item self-report measure of “silent assumptions” (i.e., a measure of severity of dysfunctional thoughts). Statements regarding perfectionist standards, concern about approval from others, requirements for being happy, and feelings of inadequacy are rated on a 7-point Likert scale ranging from “agree very much” to “disagree very much.” Higher scores indicate a greater number and severity of dysfunctional attitudes. Students with depressive symptoms (BDI>10) had an average score of 130.26 (SD =29.60), while the average score for students without depressive symptoms was 114.46 (SD=25.10) . The mean score for women diagnosed with depression using the SCID-I (Structured Clinical Interview for DSM-IV Axis I Disorders [SCID]; ) was 157.5 (SD=38.2); women who never experienced depression had a mean score of 96.9 (SD=24.2) . The DAS successfully distinguishes depressed inpatients from nondepressed clinical controls . High internal consistency scores have been reported (e.g., .93 to .96) [39–40]. Test-retest reliability correlations of .73 over 6 weeks  and .84 over 8 weeks  have been obtained. Both a lack of change on DAS scores during treatment and high DAS scores have been linked to shorter time before relapse [42–43]. The validity of the DAS is supported by its correlation with BDI scores (e.g., .31 to .49) [39, 41, 44–47]. In addition, DAS scores differ significantly between psychiatric patients with and without depression and volunteers without depression . The concurrent validity of the DAS is supported by its correlation of .36 and .47 (p’s < 0.001) with two measures of depressive symptoms (D-Scale of the Profile of Mood States [POMS];  and the BDI) in a student population . Although the validity and reliability of the DAS is strong, there is continued debate regarding the factor structure of the scale and if it is consistent across different communities [50–51]. The DAS has been translated to Chinese, Japanese, Norwegian, and Dutch and is found to have good reliability and validity in communities using those languages, supporting the utility of the measure across cultures and in heterogeneous populations [52–54]. The 40-item DAS was used in this protocol.
The BHS was designed to measure an individual’s negative expectancies about the future [55–56]. It is derived from an earlier hopelessness measure called the Generalized Expectancy Scale . The scale consists of 20 true/false items. Nine of these statements are keyed false, and 11 are keyed true. A score of 0 or 1 is given to each answer; the higher the score, the greater degree of hopelessness. An individual does not possess hopelessness with a score of 3 or less and has only mild hopelessness with a score of 4–8. Scores of 9–14 points are considered moderate hopelessness, and scores above 15 points are severe hopelessness . High levels of hopelessness on the BHS are significantly associated with greater severity of depression in adults  and children . Individuals with scores in the moderate to severe range have an increased potential for suicide , although a review found that the degree of that risk is smaller than originally believed . Interestingly, the correlation between high hopelessness scores and increased potential for suicide is not found among individuals with alcoholism . A population study in Finland with 1722 respondents found that individuals are more likely to have a high hopelessness scores if they believe their financial situation, health or working ability are poor . The scale is widely used and considered reliable, with the scale reporting a high internal consistency yielding a reliability coefficient (Kuder Richardson-20) of 0.93 . In a non-clinical population, the internal consistency was 0.88 . The 20-item BHS was used in this protocol.
The Self-Control Schedule assesses an individual’s tendencies to use self-control methods to solve behavioral problems . It is considered a measure of learned resourcefulness. The 36 items on the SCS assess four categories of self-control skills: the use of cognitive strategies, the use of problem-solving strategies, the ability to delay gratification, and a general belief in one’s ability to regulate internal events. Each of the items is rated on a 6-point Likert scale ranging from +3 (very characteristic of me) to −3 (very uncharacteristic of me). A higher score on the SCS connotes a higher degree of self-control. A four-week test-retest correlation (r=.86) indicated that the measure is fairly stable . Satisfactory internal consistency scores ranging from .78 to .86 have repeatedly been reported [65–69]. While SCS scores alone have not always been predictive of response to cognitive therapy , patients with high scores combined with more severe depressive symptoms did respond to cognitive therapy . Additionally, high SCS scores have been correlated with confidence in students  and with positive cognitions in elders . High scores on the SCS have also been negatively correlated with scores on the BDI . The SCS has been translated to Chinese and Thai and has been found to have good reliability and validity in communities using those languages, supporting the utility of the measure [73–74]. The 36-item SCS was used in this protocol.
Described above, the DAS was used in this protocol to assess dysfunctional attitudes before and after a mood induction procedure.
The Visual Analogue Scale (VAS) is a Likert-type scale used to assess mood. Respondents designate their current mood on a scale measuring 100 mm end to end. While the original VAS was 156 mm end to end [43, 75], many investigations, including the current protocol, have used a 100 mm long scale [76–79]. It is important to note that available psychometrics involve both lengths of scales. The descriptor ‘sad’ is located to the left of the center, and ‘happy’ is located to the right. Arrows indicate that increasing strength of mood is associated with a greater distance from the center . Patients are asked to mark their current mood with an “X” on the line. The mean mood change for formerly depressed patients who completed the VAS pre- and post-mood induction at a length of 152 mm has ranged from −24.6 to −25.48 [43, 75]. Williams et al.  found that self-described end anchors may increase the reliability of any type of linear scale, since different people may have different definitions of fixed end anchors. Results, however, suggest that fixed anchors (like the ones used in this study) are just as effective and easier to use [82–83].
Global relationship satisfaction was measured by the DYS . The DYS is a 32-item self-report measure assessing both the quality of marriages or similar dyads and satisfaction with that relationship. The DYS is likely the most widely used measure of dyadic adjustment in research . Scores on the scale ranged from 0 to 151 with mean scores of 114.8 (SD = 17.8) for married couples, 70.7 (SD = 23.8) for divorced couples, and 101.5 (SD = 28.3) for the total sample . Higher scores reflect greater dyadic adjustment. A score of less than 97 is commonly used as a criterion of dyadic discord (i.e., one standard deviation below the mean scores of married couples) . Total DYS scores have regularly differentiated distressed and non-distressed couples [86–87]. Spanier reports an internal consistency reliability of 0.96 (Cronbach’s coefficient α) and a meta-analysis of studies using the DYS found a mean internal consistency reliability of .915 across a wide range of relationship types . In these studies, reliability was not substantially influenced by marital status, ethnicity, sexual orientation, or gender. Spanier  found evidence for content, criterion-related, and construct validity of the DYS via comparisons with the Locke-Wallace Marital Adjustment Test (MAT) , factor analysis of the scale, and comparisons of normal and divorced couple groups. The 32-item DYS was used in this protocol.
The original SAS-SR  is a 42-item self-report measure of instrumental and expressive role performance. The items are rated on a 5-point scale, with higher scores indicating impairment. Mean internal consistency coefficient α of 0.74 and mean test-retest reliability of 0.80 across two different time periods have been reported . The concurrent validity of the SAS-SR has been demonstrated by Weissman and colleagues  who reported significantly different scores from psychiatric patients versus community controls, as well as from acute versus recovered depressive patients . In addition, the SAS-SR has been found to be significantly correlated with clinical ratings such as the HRS-D and the Raskin Depression Scale (RDS) , as well as self-report measures such as the Center for Epidemiological Studies - Depression Scale (CES-D)  and the Symptom Checklist (SCL-90) . The original SAS-SR has been translated in Japanese  and modified for older adults, ages 63 to 87 . In addition to the original version, two other versions have been developed. These include a) the Social Adjustment Scale II (often completed by schizophrenic patients) [96–97], and b) an abbreviated version of the SAS II--the Social Adjustment Scale for the Severely Mentally Ill (SAS-SMI), used for patients so diagnosed . A higher score on this measurement represents less adjustment. The 56-item SAS-SR was used in this study.
The IIP  is a 127-item self-report measure of distress associated with interpersonal difficulties. Respondents are presented a list of common problems and are asked to consider if each has been present in their relationships with significant others. They rate the level of distress associated with each problem on a 5-point Likert scale from 0 (not at all distressing) to 4 (extremely distressing); thus higher scores signify greater levels of distress. Interpersonal problems are divided into subscales indicating difficulties in being assertive, sociable, submissive, intimate, or too responsible or controlling. Mean scores for each subscale correspond to the level of distress associated with that area and can be compared to the norms derived from nonpsychiatric or psychiatric populations [99–100]. Ten-week test-retest reliability for the overall mean score was 0.98 (Horowitz et al., 1988). Internal consistency for the subscales ranged from .82 to .94, and their 10-week test-retest reliability ranged from 0.80 to 0.90 . Concurrent validity of the IIP is supported by predicted associations of IIP personality categories with other assessments, such as the Symptom Checklist-90R (SCL-90-R)[101–102] and therapist evaluations of patients’ personalities . The IIP is also sensitive to change resulting from psychotherapy . Lower scores on the general subscales and circumplex subscales are indicative of higher levels of that trait. For example, a low score on the assertive scale indicates higher levels of assertiveness. The 127-item IIP was used in this protocol.
The PCS is a measure of expressed emotion (EE) that consists of a two-question, 10-point, Likert-type scale which asks the subject to rate (a) “How critical is your spouse of you?” and (b) “How critical are you of your spouse?”. These authors found that ratings of the spouse’s level of criticism were a significant predictor of relapse in depressed patients . The scale also demonstrated good 3-month test-retest reliability . PCS patient ratings were significantly correlated with total EE as assessed by the Camberwell Family Interview , although not with the criticism subscale . A rating of high perceived criticism does not appear to be a proxy measure of symptom severity as scores are not correlated with scores on the Beck Depression Inventory (BDI) , Present State Exam [104, 106], Hamilton Rating Scale for Depression (HRSD) [10, 107], Global Assessment of Functioning (GAF), Symptom Checklist 90-Revised (SCL-90-R) [108–109], nor the Inventory to Diagnose Depression (IDD) . A review of EE assessments recommended the PCS where a brief measure of EE is needed . A higher score on this scale reflects a higher degree of criticism. The two-item PCS was used in this study.
The WAI  is a 36-item self-report measure designed to assess the quality of the alliance between a therapist and patient. There is a client version and a therapist version of the WAI. The model of alliance assessed is based on Bordin’s  tripartite conceptualization of alliance: bonds, goals, and tasks. The patient or therapist is asked to rate each of the items on a 7-point Likert scale. Each questionnaire yields a composite working alliance score and task, goal, and bond subscale scores. For the two versions of the WAI, Cronbach’s alpha has ranged from 0.87 to 0.93, and the WAI has good convergent, concurrent, and predictive validity . Interestingly, scores on the client version mediated the relationship between interpersonal functioning and outcome from treatment with cognitive behavioral therapy of depression . Results from a metaanalysis suggested that the WAI has adequate reliability and predicts outcome. The results also lead to their recommendation of the WAI as the choice measure of therapeutic alliance . A higher score on this measure reflects a higher level of alliance between patient and clinician. The 36-item WAI was used in this protocol.
The SNAP-2 is a 390-item factor-analytically derived self-report inventory, with a true-false format, that assesses (a) 15 traits relevant to personality disorders (PDs), (b) the 10 DSM-IV personality disorders, and (c) six potentially invalidating response styles (i.e., validity scales) [116–117]. The original SNAP item pool targeted clusters of PD criteria, whereas later developments were guided by psychometric as well as content considerations. The scales form three higher order factors—Negative Temperament, Positive Temperament, and Disinhibition (vs. Constraint)—that have been identified repeatedly in factor analyses of self-report scales (e.g. [118–120]. In terms of the well-known Five-Factor Model (FFM) of personality, the SNAP factor structure is at a higher level of the personality hierarchy: Negative and Positive Temperament correspond to Neuroticism and Extraversion, respectively, whereas Disinhibition represents a combination of (low) Agreeableness and (low) Conscientiousness. Openness is minimally represented in the SNAP because it has not been found to play a major role in PD. Each factor contains a “core temperament” scale plus several more specific trait scales. Specifically, the higher order Negative Temperament factor is comprised of the core Negative Temperament scale, plus Mistrust, Manipulativeness, Aggression, Eccentric Perceptions, Self-harm, and Dependency. The Positive Temperament factor is comprised of the core Positive Temperament scale, plus Exhibitionism, Entitlement, and (on the low end) Detachment. The Disinhibition factor is comprised of the core Disinhibition scale plus Impulsivity, with Propriety and Workaholism on the other end.
Chapters 3 and 4 of the SNAP Manual  contain extensive psychometric data on the scales showing that they are both internally consistent (over five samples, including college student, community adults, and three patient samples; median alpha coefficient = 0.81; range = 0.63 to 0.9);  (Table 3.17) and have good short-term stability (Median retest r = 0.87 over 1 week to 4½ months; Mdn = 7 weeks)  (Table 3.18) in a community adult sample. The median pre-post retest rs were .70 and .68 in two treatment samples  (Table 3.19). Moreover, the 12 primary scales are much more independent than those of other instruments assessing characteristics of PDs-only 5 of 264 of the interscale correlations were stronger than |0.50| in college student, community adult, or two patient samples (Mdn = |0.20|) in both patient and non-patient samples  (Tables 3.20–3.21). Conversely, supporting the construct validity of these three scales, the primary scales are moderately to strongly correlated with their respective higher order scales (Median r = .35, range = .13 – .70;  Tables 3.20–3.21).
Validity data comparing the SNAP to clinical ratings of the DSM-IV PDs using the Structured Clinical Interview for DSM-IV Personality  indicate good convergence between the two methods in a heterogeneous patient sample  (Table 4.14–4.16). For example, ratings of Narcissistic PD correlated with SNAP Exhibitionism (.50) and Entitlement (.40). Detachment correlated .56 with both Schizoid and Avoidant Personality Disorder. Mistrust correlated .52 with both Paranoid PD and Borderline PD, the latter of which also correlated with Negative Temperament (.59), Self-harm (.65), and Dependency (.50). In addition, the scales correlated systematically with such measures as Tellegen’s Multidimensional Personality Questionnaire, several measures of the five-factor model of personality, and the MMPI-2 . The SNAP contains a TRIN scale. A higher score indicates that the respondent is acquiescent or yea-saying, while a lower score indicates that the patient is more denying. The TRIN midpoint is used to determine the Invalidity Index to measure inconsistency in responses. A higher score indicates more inconsistencies.
The QOLESQ is a 93-item self-report measure designed to assess the degree of enjoyment and satisfaction experienced by subjects in multiple areas of daily functioning. Responses are grouped into eight areas of functioning which include physical health, subjective feelings, leisure time activities, social relationships, general activities, work, household duties, and school/course work. Each of the questions is scored on a 5-point scale that indicates the degree of enjoyment or satisfaction achieved during the past week. Endicott, Nee, Harrison, and Blumenthal  reported good test-retest reliability (range was 0.63 to 0.89 across eight subscales) and strong alpha coefficients of internal consistency (range was 0.90 to 0.96 across eight subscales). In addition, these authors found that the QOLESQ was sensitive to changes in symptom severity of depression but that it measured important dimensions of the illness that were not reflected in severity measures. Since the development of the original QOLESQ, two abbreviated versions of the measure have been developed. A 16-item form was developed from the General Activities portion of the 93-item version . Another group  developed an 18-item abbreviated Quality of Life measure intended to identify core items from the original test which could be completed in less time. In order to increase the use of the measure, pediatric as well as Italian versions were also created [124–125]. A higher score indicates a higher level of enjoyment and satisfaction with life. The 93-item QOLESQ was used in this protocol.
The LTE-Q  measures 12 categories of common life events that have been found to be threatening. Examples of categories include suffering from serious illness or injury, marital or relationship difficulties, or losing one’s employment. The respondent checks off any categories of events he/she has experienced over a specified period of time prior to the assessment (e.g., 3 or 6 months). The LTE-Q provides an assessment of life events which have been identified as threatening by researchers who are unable to utilize more comprehensive interview techniques (e.g., Life Events and Difficulties Scale - LEDS, [126–127]). In fact, Brugha and colleagues  report that these 12 categories accounted for 77% of life events found to be related to marked or moderate long-term threats measured by more comprehensive methods [e.g. 128]. Cohen’s kappa, used to measure test-retest reliability, was strong with the majority falling between 0.78 and 1.0 for each life event category . The LTE-Q has high agreement with the LEDS  for long-term contextual threat and demonstrates good concurrent validity with good sensitivity (1.00) and specificity (0.88) for events 3 months prior to data collection . The greater the score on this measure, the more threatening events have occurred. The 12-item LTE-Q was used in this trial.
The 8-item Skills of Cognitive Therapy [SoCT; 130] assesses patients’ understanding and use of basic cognitive therapy skills rated from the perspectives of observers (SoCT-O) and patients (SoCT-P). Ratings of patients’ skills are made on a 5-point Likert scale ranging from 1 (“never”) to 5 (“always or when needed”). The score is the mean of the items; higher scores reflect greater skill in cognitive therapy principles and coping strategies. In this trial, we used a 35-item pool, rated by both patients and their therapists, to evaluate the reliability and validity of both versions of the SoCT and to develop the resulting shorter versions.
The Patient Attitudes and Expectations form (PAE)  consists of 46 items designed to measure: (a) patients’ beliefs about the cause of their problems, (b) patients’ beliefs about helpful treatments, (c) patients’ expectations of their therapist, (d) patients’ expectations of treatment outcomes, and (e) patients’ beliefs about the length of time before treatment is effective. The questions are rated on Likert scales ranging from 1–4 to 1–7. The 46-item PAE form was used in this protocol.
© 2010, Robin B. Jarrett at The University of Texas Southwestern Medical Center at Dallas, all rights reserved.
1We found no larger samples of patients with recurrent MDD treated with research-quality CT, as a solitary intervention, in our review of the literature. The review included: (a) Jarrett and Rush’s (1994) report of studies of short-term psychotherapy for depressive disorders between January 1, 1967 and November 30, 1993  and (b) a new literature search between December 1, 1993 and April 1, 2010. This search included the terms “short-term psychotherapy,” “acute phase,” “cognitive therapy,” “behavior therapy,” and “interpersonal psychotherapy” in conjunction with “depression” to search for studies of patients treated with research-quality CT in PubMed, PsychINFO, and OVID. Other large samples include (but are not limited to) the ENRICHD randomized controlled trial (RCT) of 2,481 patients who suffered myocardial infarction, of which 498 patients with major or minor depression received CBT alone and 427 patients with major or minor depression and low perceived social support (LPSS) received cognitive behavioral therapy (CBT) and LPSS counseling ; the Bristol-Myers RCT of patients with chronic major depressive disorder (MDD) (included a total of 228 patients who received the Cognitive Behavioral-Analysis System of Psychotherapy (CBASP) alone and 227 patients who received CBASP plus nefazodone) ; The DELTA group who studied recurrent MDD in 172 patients, 88 of whom received preventive CBT ; Craigie and Nathan (2009) who compared individual CBT to group CBT in 234 outpatients with primary mood or anxiety disorders ; DeRubeis et al. (2005) who reported on 240 patients with moderate to severe MDD, 60 of whom received CBT alone ; and finally the STAR*D RCT of depressed patients with MDD who were refractory to antidepressant medications included 36 patients who received CBT alone and 65 patients who received citalopram plus CBT .
2Prior to April 2002, the SNAP was not collected at the beginning of the acute phase at either site and was not collected at the Month-32 blinded evaluation in Pittsburgh.
3In addition to the description above, to foster instrument development, the SoCT-P was collected in Dallas from all patients at blinded evaluations conducted during the longitudinal follow-up, when a relapse or recurrence was suspected, and at early exit. In Pittsburgh, the SoCT-P was only collected from patients randomized to C-CT at routine blinded evaluations during long-term follow-up.
Throughout the course of this research, Dr. Thase served as a consultant to and was a member of various advisory boards for Eli Lilly and Company and received honoraria for talks sponsored by this company. He has or has had similar relationships with the manufacturers of other medications used to treat depression. In addition to Eli Lilly and Company, during the past 2 years Dr. Thase has consulted with, served on advisory boards for, or received honoraria for talks from: AstraZeneca, Bristol-Myers Squibb Company, Forest Laboratories, GlaxoSmithKline, Janssen Pharmaceutica, MedAvante, Inc., Neuronetics, Inc., Novartis, Pfizer Pharmaceuticals, Schering-Plough (now Merck), Shire US Inc., Supernus Pharmaceuticals, Transcept Pharmaceuticals, and Wyeth Pharmaceuticals (now Pfizer). During the past 2 years, he has received grant support from Eli Lilly and Company, Forest, GlaxoSmithKline, Otsuka, and Sepracor, Inc., in addition to funding from the National Institute of Mental Health. He has equity holdings for MedAvante, Inc. and has received royalties from American Psychiatric Publishing, Inc. (APPI), Guilford Publications, Herald House, and W.W. Norton & Company, Inc. One book currently promoted by the APPI specifically pertains to cognitive therapy. Dr. Thase also discloses that his spouse is an employee of Advogent (formerly Cardinal Health), which does business with several pharmaceutical companies that market medications used to treat depression.
Dr. Jarrett’s medical center receives the fees from the cognitive therapy she provides to patients. Dr. Jarrett is a paid consultant to the NIMH.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.