We examined symptom trajectories through 6 weeks from all double blind placebo controlled RCTs with fluoxetine and venlafaxine that were conducted by the sponsors. Statistically and clinically significant benefits of treatment were found. Based on relative change in slopes, remission, response rates, and NNT, treatment effect was largest for youth followed by adults, and more limited for geriatrics. Similar results were found for fluoxetine and venlafaxine.
While average differences at 6 weeks are relatively small, they translate into clinically significant differences in response and remission rates. In adults treated with fluoxetine, 55.1% of treated patients achieved response (50% reduction in severity) compared with only 33.7% of controls and is similar to previous causal inference (growth mixture modeling) findings for fluoxetine and imipramine (15
). From a public health perspective this is an enormous difference and indicates that for every 5 treated patients an additional patient treated with fluoxetine will respond. Similarly, remission rates were 45.8% for treated patients but only 30.2% for controls. Even stronger results were observed for children. In youth studies, 29.8% of treated children responded whereas only 5.7% of children on placebo responded. Similarly, remission rates were 46.6% for treated children but only 16.5% for controls. These rates translate into an additional child treated with fluoxetine responding and remitting for every four children treated. The higher rates of remission suggest that the remission criterion (CDRS-R=28) should be re-evaluated.
By contrast, we found statistically significant (for HAM-D scores but not remission and response rates) but much less clinically significant effects for geriatrics. Response rates were 37.3% versus 27.4% translating to one additional patient responding on fluoxetine for every 17 patients treated. Remission rates were 26.5% versus 20.0%, which translates into one additional patient remitting on fluoxetine for every 39 patients treated. The efficacy of antidepressant treatment in geriatric patients should be studied in greater detail based on these findings. There may be a biological explanation for the age effect on response rates since both neuroendocrine challenge studies and receptor imaging studies report poorer antidepressant responses in depressed patients with more pronounced serotonin abnormality (16
). Serotonin function declines with age, potentially increasing the proportion of such patients in geriatric studies.
Venlafaxine produced similar results to fluoxetine, suggesting these results are not specific to fluoxetine. Increased efficacy for the IR versus ER formulations was observed and should be further studied.
Perhaps most importantly these findings illustrate that relatively small overall mean differences can translate into relatively large patient-level differences in clinically interpretable and meaningful endpoints such as response and remission. Statistically, these small changes in the mean of the distribution can often translate into much larger effects in the tails of the distribution.
Most studies were designed for achieving regulatory approval and do not demonstrate the maximum effect that a drug can produce. Some studies were as short as 6 weeks in duration, whereas the maximum effect during an acute treatment episode is likely 12 weeks or longer. Few well-controlled studies, other than the long-term maintenance study of Frank and colleagues (18
) have documented response rates for extended treatment with a single effective antidepressant. In that study remission rate was 82%, with 75% achieving remission by 140 days (17
). For fluoxetine, 23% of patients who were unimproved at 8 weeks showed full remission at 12 weeks (19
The findings of this study shed light on meta-analytic results that related average study-level initial severity to the magnitude of treatment response. When examined at the patient level, baseline severity did not moderate treatment response for any endpoint, age group, or drug. Overall response rates were lower for geriatrics than adults but did not vary by baseline severity. For children, response rates were lower overall compared to adults; however, here there were substantial differences between low and high baseline severity groups for both treated and control patients (i.e., not a treatment related effect).
Results of this study raise serious questions regarding the results of meta-analyses that are now so prevalent in guiding medical decisions. In addition to the obvious issues related to publication bias (1
), the use of average endpoints gleaned from studies that use a variety of different approaches to handling missing data (e.g., LOCF or completer analyses), and the loss of intermediate longitudinal measurements and their associated contribution to the overall estimate of variability, can yield biased results. Reliance on meta-regression to examine relationships that exist at the person-level but are analyzed at the study level, can lead to erroneous conclusions that are not supported when all available person-level data are available. We note; however, that the approach to research synthesis taken in this paper requires that all studies use a common endpoint. When different studies have used different endpoints, then this is exactly the type of problem that meta-analysis was designed for, and for which it should be used. In these cases; however, one must take great care to use a well chosen effect size (not a mean difference, for example), that is both statistically and clinically meaningful.
There are several limitations of the present study. First, we considered only two antidepressant medications, and other antidepressants may produce different effects. Indeed, fluoxetine is the only antidepressant that is approved for the treatment of childhood depression and it was the only antidepressant that we studied in children. Second, there were only four youth trials and we must therefore interpret the estimated efficacy observed in these trials with caution. However, the rather impressive effects on clinically interpretable outcomes such as response and remission indicate the clinical benefit that children may receive with careful pharmacologic treatment. These findings should also favor reconsideration of the risk benefit equation that led to the black box warning for suicidal thinking and antidepressants in children (20
). Third, a similar note of caution is in order for the results for the four geriatric studies, where some statistically significant but more marginal clinically significant results were observed. Fourth, the reported findings are limited to the first 6 weeks of study. Results may differ for long-term outcomes, and may be stronger as placebo benefit may degrade over time (19
Fifth, it is possible that some selection bias remains even though our synthesis included all studies conducted by the sponsors and was not limited to the subset of studies that were in the published literature. Sixth, our study used industry sponsored studies, which were designed to demonstrate efficacy and may have enrolled patients that may not have been representative of the patients seeking treatment for depression. However, two of the four youth studies were academic studies (TADs and the study X065). Since the largest effect of treatment was seen for youth, and these studies are a mix of industry and academic studies, it seems unlikely that reliance on industry-sponsored studies produced biased results. Seventh, the majority of the studies had placebo lead-in periods that are designed to eliminate early placebo responders. However, an analysis of 75 RCTs involving antidepressants and placebo from 1981–2000 (21
) found that the use of a placebo lead-in period did not relate to the response rate in the placebo group (p=.73). Like us they also found that baseline severity did not predict placebo response. We note however, that while this analysis did not find any effect of a prospective lead-in on placebo response rates, and it did not find a relationship to baseline severity, it is possible that the method of analysis (analyzing response rates as opposed to absolute magnitude of change) may have missed meaningful effects of the lead-in.
To determine if we included the majority of placebo controlled depression studies of fluoxetine, we reviewed the published literature on placebo-controlled RCTs of fluoxetine in the acute treatment of major depressive disorder that met the following criteria: 1) not sponsored by a pharmaceutical company; 2) not associated with a specific medical illness (e.g., post myocardial infarction, AIDS); 3) not associated with comorbid substance abuse (including alcohol); 4) not associated with a specific diagnosable comorbid psychiatric disorder; 5) used the Hamilton Depression Scale; 6) had a minimum enrollment of 30 patients. Knowledge Finder™
(Aries System; North Andover, MA) was used to search the PubMed database from 1966 through October 31, 2010. The Boolean search option, with word variants, was used to search “placebo controlled trials of fluoxetine in major depression”. The search returned 329 references. Titles and abstracts of these references were reviewed to find articles potentially meeting the above criteria (n=13) as well as articles that were reviews or meta-analyses (n=7). The reference lists for the reviews and meta-analyses were inspected for additional articles potentially meeting the above criteria (n=1). Following these two manual reviews, reprints of candidate articles were obtained and reviewed for the 14 candidate articles. Two articles fulfilled the above criteria (22
). The first study (22
) was restricted exclusively to patients meeting the Columbia criteria for atypical depression (while meeting criteria for major depressive disorder). This study was partially funded by Lilly and their data were not available to us. The second study (23
) was a small study conducted in Brazil comparing St. John’s wort (n=20), with fluoxetine (n=20) and placebo (n=26) and was partially funded by the company supplying the St. John’s wort. This literature search confirms that few if any academic studies of fluoxetine in the treatment of adult depression were conducted and that our data represent the majority of available published (12 studies) and unpublished (8 studies) RCTs.
In conclusion, a detailed research synthesis using patient-level longitudinal data from all available youth, adult, and geriatric placebo controlled RCTs of fluoxetine conducted by the sponsor reveals consistent statistically significant benefits of treatment, the magnitude of which was greatest in youth and smallest in geriatric subpopulations (where differences in remission and response rates did not reach statistical significance). Analyses of venlafaxine RCTs confirmed the results for the efficacy of antidepressant treatment in adults. Baseline severity did not moderate the effect of treatment. Similar re-analyses should be conducted with other newer antidepressant medications to confirm these findings. This study also highlights many of the limitations of meta-analysis that combine evidence from multiple RCTs (e.g., meta-regression of study-level characteristics that exhibit inter-individual variability, inconsistent and potentially biased handling of missing data, etc.) and further highlights advantages of a more complete person-level analysis when such data are available, and increases the need for caution regarding interpretation of meta-analytic results when person-level data are not available.