|Home | About | Journals | Submit | Contact Us | Français|
It is unknown to what degree spontaneous improvement accounts for the large placebo response observed in antidepressant trials for Major Depressive Disorder (MDD). The purpose of this study was to estimate the spontaneous improvement observed in treatment-seeking individuals with acute MDD by determining the symptom change in depressed patients assigned to wait-list controls in psychotherapy studies.
The databases PubMed and PsycINFO were searched to identify randomized, prospective studies randomizing outpatients to psychotherapy or a wait-list control condition for the treatment of acute MDD. Standardized effect sizes calculated from each identified study were aggregated in a meta-analysis to obtain a summary statistic for the change in depression scores during participation in a wait-list control.
Ten trials enrolling 340 participants in wait-list control conditions were identified. The estimated effect size for the change in depression scores during wait-list control was 0.505 (95% CI 0.271 – 0.739, p < 0.001), representing an average improvement of 4 points on the Hamilton Rating Scale for Depression.
Depressed patients acutely experience improvement even without treatment, but spontaneous improvement is unlikely to account for the magnitude of placebo response typically observed in antidepressant trials. These findings must be interpreted in light of the small number wait-list control participants available for analysis as well as certain methodological heterogeneity in the psychotherapy studies analyzed.
Scientific and popular interest in placebo effects has been rising due to increased recognition of their therapeutic effectiveness in the treatment of some illnesses, such as Major Depressive Disorder (MDD). Placebo response in acute randomized controlled trials (RCTs) of antidepressant medications for MDD averages 30% (Walsh 2002), and meta-analyses suggest that placebo treatment conditions may duplicate 50–75% of the improvement observed with active treatment (Kirsch and Sapirstein 1998, Kirsch et al 2008). Appreciation of the potency of placebo effects has led clinicians to speculate that it may be possible to optimize placebo effects to improve the treatment of MDD (Andrews 2001), which remains a leading cause of disability due to illness worldwide (WHO 2004). Kessler et al 2005, Kessler et al 2003). However, rising placebo response rates also have resulted in progressively fewer trials of putative antidepressant agents demonstrating active drug to be superior to placebo (Demitrack et al 1998). Thus, addressing the problem of high placebo response is an important challenge impacting the future of psychopharmacologic drug development.
A significant difficulty in assessing the impact of placebo effects in antidepressant treatments is differentiating true placebo effects from other factors leading to symptom change among patients receiving placebo. Stewart-Williams and Podd (2004) defined a placebo effect as the psychological and physiological effect on a patient of receiving a substance or procedure that is believed by the patient to be effective for the target disorder. Other factors may influence the observed symptom change among participants in a research study, including the natural history of the patient’s condition, regression to the mean, therapeutic aspects of the health care context, and the expectations of clinicians and raters (Kienle and Kienle 1997). Surprisingly little information is available regarding the relative magnitudes and modes of interaction of placebo effects and these other non-specific factors.
A portion of the change from pre- to post-treatment in an RCT is likely caused by spontaneous improvement or worsening in a patient’s depressive illness. If unaccounted for, spontaneous improvement could be attributed to the effects of an antidepressant medication or placebo, resulting in an overestimation of their respective influence on MDD. Given the ethical difficulties associated with following the untreated course of MDD, few prospective non-intervention studies using modern diagnostic criteria exist. The National Institute of Mental Health Collaborative Program on the Psychobiology of Depression naturalistically followed the course of 431 subjects meeting Research Diagnostic Criteria (RDC) for MDD (Keller et al 1984, Keller et al 1992). Approximately 30% of participants met recovery criteria (defined as minimal symptoms on self-report measures for 8 consecutive weeks) after 10 weeks of follow up, and 64% were rated as recovered by 6 months (Keller et al 1984). Since this was a naturalistic study, many of these patients received treatment for their depression, so the extent to which these data reflect the natural course of MDD is unclear.
An alternative source of information about the natural course of MDD are studies on the effectiveness of psychotherapy for MDD, which occasionally utilize a “wait-list” control group in order to determine whether psychotherapy has an effect on depression beyond the passage of time. Examining the change occurring in patients randomized to a wait-list provides some information on the acute course of untreated MDD. One study reported the change occurring in the wait-list control groups of 11 psychotherapy studies for depressive disorders and found a mean decrease in baseline Hamilton Rating Scale for Depression (HRSD) scores of 15.0% over 4–8 weeks follow-up (Posternak and Miller 2001). However, the conclusions that can be drawn from these data are limited, because authors did not perform a systematic literature review, included studies enrolling patients with both MDD and Dysthymic Disorder, and did not undertake a formal meta-analysis of standardized effect sizes from the included studies.
The present study sought to estimate the acute symptom change in patients with MDD who are randomized to a wait-list control group in a psychotherapy study. We conducted a systematic literature review, extracted effect size data from the identified studies, and aggregated them in a meta-analysis. We hypothesized that the average symptom change among patients in wait-list control conditions would be significantly greater than zero, suggesting that improvement not due to the specific effects of a medication or placebo is an important contributor to the change observed in antidepressant clinical trials.
PubMed and PsycINFO were searched to identify prospective clinical studies contrasting psychotherapy to a wait-list control condition in adults with depression. To capture all available psychotherapy studies, we used the search terms “psychotherapy” OR the specific names of all known psychotherapy modalities (e.g., cognitive behavior therapy, behavior therapy, interpersonal therapy, psychodynamic psychotherapy, etc.), which we compiled by referencing standard textbooks of psychiatry (Sadock and Sadock 2009, Tasman et al 2008). These results were combined with the search term “major depressive disorder” using the AND operator, yielding 11,979 results. Next, these results were limited to 1) human studies, 2) English language articles, 3) clinical trials or comparative studies, and 4) publication date form 1985 to the present, which resulted in 2,999 journal articles. The year 1985 was chosen to select trials utilizing more rigorous methods.
One author (SM) conducted an initial review of these titles to exclude those that were not prospective studies of psychotherapy for depressive disorders, yielding 521 titles. The remaining journal articles were then sequentially reviewed by two authors (BRR and SM), proceeding from article title, to abstract, and finally full paper text, to determine whether they met inclusion and exclusion criteria. These evaluations were pooled, and any differences between judges were resolved by discussion. To further ensure all relevant papers were reviewed, the references of all meta-analyses and review articles published since the year 2000 among the 2,999 journal articles were searched for pertinent references.
Included studies were required to have at least one active treatment arm as well as a wait-list control condition. Further criteria required trials to enroll primarily patients with Major Depressive Disorder (at least 85% of total sample being solely MDD), last between 5 and 12 weeks (inclusive), and have assessments using a standardized outcome measurement. Studies were excluded if they enrolled patients who were currently taking antidepressant medications or allowed wait list patients to receive any form of active treatment. We also excluded studies enrolling patients with psychosis, bipolar disorder, or those defined to have treatment resistant depression, studies of antidepressant augmentation, and trials requiring as inclusion criteria a specific medical illness or an Axis I disorder other than depression.
For each included study, demographic characteristics of the participants (sample size, age, gender, race, clinical characteristics), details of the treatment conditions, duration of active treatment or wait-list control, and outcome data (i.e., pre-treatment mean, standard deviation (SD) of pre-treatment mean, post-treatment mean, post-treatment mean SD) were entered into a database. If studies reported data based upon multiple outcome measures, we selected one set of data for extraction according to the following priority list: HRSD (Hamilton 1960), Beck Depression Inventory (BDI) (Beck 1961), Montgomery-Asberg Depression Rating Scale (MADRS) (Montgomery and Asberg 1979), and Clinical Global Impression (CGI) (Guy 1976). Two judges (BRR and SM) extracted the data, and any differences were resolved by consensus. Study quality is typically measured by determining whether studies report critical methodological aspects such as (1) concealment of treatment allocation, (2) blinding of outcome assessment, and (3) use of intent to treat data analyses (Juni et al 2001). However, we opted not to rate study quality in this fashion, since blinding is not possible in psychotherapy studies, making it the case that all studies would necessarily be rated poor quality.
First, effect size estimates were computed for the change in depressive symptom severity occurring in each wait-list control condition from baseline to endpoint of the study, which ranged from 5–12 weeks. The calculation of effect sizes (Cohen’s d) and its standard error depends on pre- and post-study means, standard deviations and the correlation between depression severity scores across time within study. If the correlation between time points within a study is zero, then the variance of the difference between pre and post scores is:
However, given any correlation between time points, the standard error of the difference will decrease because a portion of each variance consists of redundant error. The formula for the variance of a difference between correlated scores then becomes:
Since the correlation between pre- and post-treatment scores was not provided by any of the papers in our sample (and rarely if at all is reported in any study), we conservatively estimated that the correlation would lie between 0.3 and 0.7 based on data from antidepressant clinical trials conducted by our group (SPR). In order to obtain a range of effect sizes, we therefore performed separate analyses using correlations of 0.3 and 0.7.
For each study the correspondent 95% confidence intervals (95% CIs) were calculated to assess statistical significance. In accordance with the DerSimonian and Laird random-effects method (Dersimonian and Laird 1986), weighted individual effect sizes were calculated using the inverse of the variance, where the total variance for a study is the sum of the within-study variance and between-studies variance. The weighted effect size estimates for the individual studies were aggregated to obtain a summary statistic.
The random effects model assumes that the true treatment effects in the individual studies may be different from each other (Borenstein et al 2009). We adopted this model based on the assumption that the true effect sizes of studies with different methodological designs may vary. The presence of statistical heterogeneity was confirmed with Cochrane’s Q (considered significant for p < 0.10) and quantified with I-squared (Higgins et al 2003, Ioannidis et al 2003). Cochrane’s Q statistic indicates whether the observed variation is greater than what would be expected based on within-study error. The I-squared statistic adds to this the proportion of the observed variability that may be due to true differences rather than chance. In the presence of significant heterogeneity beyond chance, moderator analyses will be conducted to account for systematic variance between studies.
The failure to include unpublished studies with non-significant results may result in overestimation of an effect size. Tests for publication bias rely on the underlying theory that small studies with small sample size would be more prone to publication bias, while large-scale studies would be less likely to escape public knowledge and more likely to be published regardless of significance of findings (Dickersin 2006). Consistent with this theory, we assessed the symmetry of funnel plots to safeguard against publication bias (Rosenthal 1979). The classic fail-safe N was computed to estimate the number of hypothetical missing studies with an effect size of 0 would be required to return to P > .05 (Egger et al 1997).
The meta-analysis was performed using the Comprehensive Meta-Analysis version 2 software package (Biostat, Inc). Continuous and categorical data regarding study characteristics, patient demographics, and clinical features were summarized using SPSS version 18.
Ten trials met the study’s inclusion and exclusion criteria (Arean et al. 1993, Bolton et al 2003, Clarke et al 1999, Cohen et al 2010, Diamond et al 2002, Mufson et al 1999, Nezu 1986, Nezu and Perri 1989, O’Leary and Beach 1990, Wright et al 2005). As shown in Table 1, the 10 trials had a mean sample size of 32.3 ± 51.6 patients, duration of 10.0 ± 3.7 weeks, and dropout rate of 18.6% ± 17.1%. The 340 participants assigned to wait-list control conditions across these 10 trials had mean age 36.9 ± 16.4, baseline HRSD score 21.0 ± 5.0, and baseline BDI 24.1 ± 4.8. In the 6 papers that provided gender distribution data, 26.7% of participants were male (2 studies enrolled couples, and 2 studies did not report this data). Pre- and post-study HRSD difference scores ranged from −0.4 to −6.8 (mean −3.5 ± 2.6).
Forest plots of effect sizes for the symptom change in each wait-list control condition are depicted in Figure 1 and and2.2. For the model based on a pre-post correlation in depression scores of 0.3, the calculated effect size was 0.486 (95% confidence interval [CI] 0.246 – 1.98, p < 0.001). For the model based on a pre-post correlation in depression scores of 0.7, the calculated effect size was 0.505 (95% CI 0.271 – 0.739, p < 0.001). Thus, both analyses resulted in medium effect sizes, which were significantly different from zero, indicating improved depression during the wait list control.
Calculation of Cochrane’s Q and I-squared statistic indicated the presence of a moderate to large amount of heterogeneity among the studies in our sample (r = 0.3 model: Q-value 16.728, df 9, p = 0.053, I2 = 46.198; r = 0.7 model: Q-value 38.670, df 9, p < 0.001, I2 = 76.726). Moderator analyses were conducted to account for systematic variance between studies, but none were found to be significant.
There was no statistical evidence of publication bias. The funnel plots for each model were symmetrical, suggesting that there were no missing studies. This visual impression was confirmed by Egger’s test, which yielded non-significant p-values (r = 0.3, p = 0.31466; r = 0.7, p = 0.30764). The classic fail-safe N test found that 77 studies (r = 0.3) and 186 studies (r = 0.7) of a similar size with an effect size of 0 would be needed to return the study to P > .05.
Consistent with the hypothesis of this meta-analysis, patients with MDD randomized to wait-list control conditions experienced symptomatic improvement. The magnitude of this improvement corresponds to a medium effect size of approximately 0.5, which was significantly different from zero (p < 0.001). Left untreated, patients with MDD may expect an improvement on average of approximately 4 HRSD points based upon the passage of time alone.
For the purpose of comparison, prior meta-analyses of medication and placebo response in antidepressant clinical trials have reported standardized effect sizes of approximately 1.5 for medication conditions and 1.2 for placebo conditions (Kirsch and Sapirstein 1998). Thus, patients in wait-list control conditions experience approximately 33% of the improvement occurring with medication treatment and 40% of the improvement seen with placebo administration. What causes this improvement in wait-list control conditions cannot be conclusively determined, particularly since it is unknown whether most patients improved a small amount or whether a subgroup of patients experienced remission while most others did not improve. Prior naturalistic studies and retrospective analyses would suggest that much of the change observed in the wait-list control group subjects is likely due to fluctuation in illness severity and spontaneous remission, but regression to the mean likely accounts for a portion of the change as well.
By the same token, these data indicate that 60% of the placebo response typically observed in antidepressant clinical trials is not due to the passage of time alone, but rather results from true placebo effects as well as other non-specific factors, such as demand characteristics, patient response biases, and rater bias (Kienle and Kienle 1997). These findings lend partial support to recent efforts aimed at identifying the active physiologic mechanisms of placebo effects in psychiatric disorders such as MDD (National Institutes of Health 2011). To the extent that placebo effects, rather than non-specific factors like spontaneous remission, are responsible for most of the change occurring in the placebo groups of RCTs, optimizing placebo effects in clinical treatment may be a means of providing relief to the approximately 120 million people worldwide suffering from MDD (WHO 2004). Currently, many patients receiving maximal treatment will not experience sustained remission of their depression (Rush et al 2006).
These results must be interpreted with several limitations in mind. The first pertains to whether participation in a wait-list control condition influences the natural course of depression. On the one hand, some investigators have reported that the diagnostic assessments and symptom measurements that are administered to patients in wait-list control conditions are therapeutic (Endicott and Endicott 1963). Patients in research studies are provided with diagnoses that conceptualize and explain their symptoms, psycho-education about the causes and course of depression, and medical work up and monitoring, all of which may influence their reported symptoms (Fava et al 2003). On the other hand, critics of the use of wait-list control groups in psychotherapy studies have raised the possibility that assignment to such a group is disappointing or demoralizing to patients who were hoping to receive treatment. Such a “nocebo” effect of being assigned to a wait-list control might result in the symptom change in such conditions representing an underestimate of the actual spontaneous remission of depression.
Secondly, it is possible that patients seeking out and choosing to participate in psychotherapy research studies differ in important ways from patients choosing to enroll in antidepressant clinical trials. If such differences obtained, they might confound the extension of the results found in this meta-analysis to the placebo and medication groups in antidepressant RCTs. An evaluation of the clinical and demographic characteristics of the patients enrolled in the studies making up our sample did not reveal obvious differences compared to the patients typically enrolling in antidepressant RCTs, but it is difficult to definitively rule out this possibility on the bases of retrospective analyses.
In summary, spontaneous improvement is an important contributor to change in prospective clinical studies of MDD, but it is unlikely to account for the magnitude of placebo response typically observed in antidepressant RCTs. Efforts to determine physiologic mechanisms of placebo effects in the treatment of MDD should continue in order to develop strategies for maximizing placebo effects in clinical contexts and minimizing them in the drug development setting.
ROLE OF THE FUNDING SOURCE
This work was supported by a National Institute of Mental Health grants K23 MH085236 (BRR), K23 MH075006 (JRS), R21 MH087774 (JRS), T32 MH15144 (SPR), and a Hope for Depression Research Foundation (BRR).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
CONFLICT OF INTEREST
Disclosures: Dr. Rutherford, Ms. Mori, Dr. Sneed, and Ms. Pimontel have no disclosures to report. Dr. Roose reports serving on a Data and Safety Monitoring Board for Medtronics, Inc. This paper has not been previously presented.
CONTRIBUTORSDr. Rutherford designed the study and drafted the manuscript. Ms. Mori assisted with the literature review and data extraction. Dr. Sneed, Dr. Roose, and Ms. Pimontel assisted with the statistical analyses and revision of the manuscript. All authors contributed to and have approved the final manuscript.
Bret R Rutherford, Columbia University College of Physicians and Surgeons, New York State Psychiatric Institute, 1051 Riverside Drive, Box 98, New York, NY 10032, 212 543 5746 (telephone), 212 543 6100 (fax)
Shoko Mori, New York State Psychiatric Institute.
Joel R. Sneed, Queens College of the City University of New York.
Monique A. Pimontel, Queens College of the City University of New York.
Steven P. Roose, Columbia University College of Physicians and Surgeons, New York State Psychiatric Institute.