The present findings are congruent with reviews discussed above indicating that antidepressant drug-
vs-placebo differences in published reports of controlled trials are generally moderate (
Baldessarini, 2005;
Gartlehner et al, 2008;
Kirsch et al, 2008;
Tsapakis et al, 2008;
Bridge et al, 2009;
Wooley et al, 2009;
Masi et al, 2010;
Pigott et al, 2010;
Khin et al, 2011). This conclusion was reached in the previous literature despite typical reliance on initial improvement on scale ratings rather than less readily achieved clinical remission, and despite growing evidence of publication bias toward underreporting of studies without significant drug–placebo differences (
Ioannidis, 2008;
Turner et al, 2008). Following nearly identical mid-range, initial depression ratings across drug and placebo arms and reporting-years, the crude response rates in the reports reviewed here averaged 54% with FDA-approved antidepressants that are employed clinically to treat major depression in the United States, compared with 37% with placebo. These differences consistently favor active drugs, but by only 17%.
The present findings also support the broad consensus that drug–placebo differences have been declining for a variety of psychotropic drugs in recent decades, making it increasingly difficult to demonstrate efficacy (
Khin et al, 2011;
Yildiz et al, 2011a,
2011b). This trend probably has encouraged increased reliance on larger trials (more subjects and collaborating sites) in order to maintain statistical power. Moreover, increasing reliance on complex trials carried out in varied geographic locations and cultures may tend to limit the reliability of research findings (
Vázquez et al, 2011).
It is evidently widely held that differences in efficacy among specific drugs or types of antidepressants in the treatment of acute episodes of major depressive disorder are generally minor (
Healy, 1997;
Baldessarini, 2005,
2012;
Cipriani et al, 2007;
Gartlehner et al, 2008;
Ghaemi, 2008;
Pigott et al, 2010;
Khin et al, 2011). The present findings support the conclusion that pooling of data from placebo-controlled trials does not yield clear rankings of specific drugs or drug-types by apparent efficacy (). Unexpectedly, however, there were significant differences in reported apparent efficacy between TCAs and newer antidepressants (). We propose that this outcome may reflect important changes in characteristics of clinical trials for depression over the past three decades. These include increasing size and complexity, with selective increases in response rates with placebos and somewhat decreasing responses with antidepressants (). It is particularly noteworthy that when placebo-response data from the generally older TCA trials were substituted for those in more recent trials of modern drugs, both types of agents yielded identical meta-analytically pooled RR values. In contrast, we did not find evidence of significant changes over the years in initial ratings of depression-severity (adjusted for variance among rating scales), in approximate IMI-eq antidepressant doses, or in several other measured characteristics of trials ().
It is increasingly clear that drug–placebo differences in trials of antidepressants and other psychotropic agents have been declining (
Gartlehner et al, 2008;
Ioannidis, 2008;
Kirsch et al, 2008;
Tsapakis et al, 2008;
Turner et al, 2008;
Bridge et al, 2009;
Masi et al, 2010;
Khin et al, 2011;
Vázquez et al, 2011;
Yildiz et al, 2011a,
2011b). In accord with recent findings in controlled treatment trials for mania (
Yildiz et al, 2011a,
2011b), a secular increase in sites and participants per trial was associated, selectively, with rising placebo-associated response rates, resulting in declining drug–placebo contrasts or effect-size (; ). We propose that this tendency may, at least in part, reflect declining quality-control and greater heterogeneity of diagnostic and clinical assessments in large, complex, multi-site trials, particularly when dissimilar cultures are involved and local standardization of assessment methods is limited (
Yildiz et al, 2011a,
2011b;
Vázquez et al, 2011). We propose that selective increases in response rates associated with randomized placebo-treatment might reflect ‘regression-to-mean' effects (
Anderson, 1990;
Bland and Altman, 1994) or random outcomes. Placebo-associated responses have increased from former levels of 20 to 30% to current levels of 30 to 50%, and to as high as 59.2% in a 1997 trial involving paroxetine (
Lecrubier et al, 1997).
Alternative factors that may contribute to the observed secular trends include changes in the types of patients recruited into antidepressant trials, including less severely ill patients willing to accept potential randomization to a placebo, and even partially treated subjects. Levels of training and expertise of personnel providing diagnostic and symptom-rating assessments may also have declined. In addition, trials have become longer over the years sampled (), requiring more clinical assessments with greater risk of measurement-variance, and providing more clinical contact and more time for spontaneous improvement—all of which may favor responses associated with placebo treatment. Additional technical factors may include less reliance on expert raters, with greater risk of less stable assessments in a very heterogeneous disorder (
Healy, 1997).
If the preceding interpretation of the present findings is correct, it suggests several practical considerations for the design and conduct of therapeutic trials for major depression and perhaps other disorders. These include seeking an optimal range of trial-sizes, with redoubled efforts to maximize quality-control, limit placebo-associated responses, and maximize drug–placebo differences. Preliminary analyses of the present data suggest that an optimal range of collaborating sites per trial may be 2–10, and of subjects per trial, about 30–75. Such conservative considerations for the design of future trials may improve outcomes. Additional potential benefits may include reduced time, complexity, and costs, as well as limiting exposure of as many acutely depressed patient-subjects to placebo-treatment as possible.
Limitations of this study include a lack of relevant details in many reports of controlled trials, sometimes including inconsistent reporting of definitions and outcomes for responder rate and percentage improvement, of the number of rating-scale items and of their maximum attainable scores in a few trials. Also, in most trials, exposure times are estimated from nominal protocol requirements since precise, subject-based actual weeks of treatment usually are not stated. Also, numbers of patients with defined outcomes are usually, but not always, based on prevalent intention-to-treat methods, which can limit responses owing to early dropout. Routine reporting of such details would greatly benefit future meta-analyses. Additional limitations to generalization arise from our requirements of peer-review and publication of findings in placebo-controlled trials concerning antidepressants approved and marketed in the United States for acute adult, major depression.
In conclusion, the present meta-analytic review of outcomes of placebo-controlled trials of antidepressants for acute episodes of major depressive disorder found evidence that older antidepressants, particularly TCAs, yielded somewhat superior apparent efficacy to some modern, second-generation agents. However, such nominal differences appear to have been influenced by secular changes in the nature of such trials over the past three decades. These include rising subject- and site-numbers and increasing placebo-associated responses, leading to falling drug–placebo differences or effect-size. We hypothesize that more conservative numbers of subjects and sites, with improved quality-control of trial methods, may paradoxically yield superior results in controlled trials of some psychotropic drugs, and do so more economically. Finally, the lack of major and compelling differences in apparent efficacy among specific antidepressants, and moderate differences among drug-types, suggest that meta-analyses of controlled trials may have limited value in efforts to develop an evidence-basis (
Sackett et al, 1996) for identifying superior treatments.