Characteristics of Trials and Subjects
Primary meta-analyses included 56 randomized, double-blind comparisons (13 with negative results) of 17 drugs versus placebo from 38 studies involving a total of 13

093 randomized and 12

920 ITT manic patient subjects (). Corrected for duplicate counting of placebo arm patients who appear more than once in multiarm trials, 6988 manic patients were randomized to active agents and 3812 to placebo, with at least one follow-up assessment (total
n=10

800 ITT patients). Mania symptom ratings used YMRS in 45/56 trials (80.4%), and MRS in 11/56 (19.6%). Most studies (34/38: 89.5%) involved multiple collaborating sites (mean: 29.7±18.9 sites/study; range: 1–70). Manufacturers of tested agents sponsored 89.5% of studies. Placebo-associated improvement in mean mania ratings relative to baseline varied greatly, from −19% (
Zarate et al, 2007) or +0.63% (
Pope et al, 1991) to +38% (
McIntyre et al, 2009a). Likewise, study drop-out rates ranged from 13–15% (
Kushner et al, 2006;
Smulevich et al, 2005, respectively) to 82% (
Hirschfeld et al, 2010) with placebo, and from 11–14% (
Bowden et al, 2005;
Khanna et al, 2005;
Smulevich et al, 2005) to 83% (
Hirschfeld et al, 2010) with drug. The impact of these sources of variance lie beyond this study and are reported separately (
Yildiz et al, 2010). Of the 11

072 randomized subjects (corrected for duplicate counting in placebo arms), 5603 (50.6%) were men, and age averaged 39.1±11.7 years. Diagnostic criteria followed DSM-IV or -IV-TR in 92.1% of 38 studies, and less often, DSM-IIIR (5.3%) or RDC criteria (2.6%). Most subjects (73.1%) were diagnosed with mania, whereas 26.5% randomized to drugs and 27.1% given placebo were considered to be in a mixed manic depressive state. However, responses of men
vs women, specific age groups, those diagnosed with mania
vs mixed states, or outcomes at specific sites were rarely reported separately, precluding direct comparisons. Psychotic features at intake were noted in 29.3% of subjects (28.0% given drugs and 31.8% given placebo). Nominal trial duration was 3 weeks in 97.4% of studies (considered sufficient for regulatory approval; ). However, rates of protocol completion averaged 65.8% with active agents (34.2% dropout) and 57.4% with placebo (42.6% dropout), in 36/38 studies providing such data, indicating that actual treatment exposure was close to 2 weeks.
| Table 1Characteristics of Included Randomized, Placebo-controlled Monotherapy Trials in Mania (N=37 studies with 54 comparisons) |
Secondary meta-analyses involved comparison of a test agent with an established comparison-control drug (with or without a placebo arm), assigned randomly in 31 studies with 33 comparisons (31 (93.9%) double-blind) involving 13 drugs and a total of 6710 manic patients as the ITT sample corrected for duplicate counting of placebo arms (). These trials rated mania with the YMRS in 77.4%, and MRS in 22.6% of the 31 studies. Multiple sites were involved in 80.6% of these 31 trials (averaging 30.4±23.8 (1–76) sites/study), and drug manufacturers sponsored 77.4% of them. Nominal trial duration was 3 weeks in 21 studies (67.7%) and protocol completion averaged 73.4% (26.6% drop out; ).
| Table 2Characteristics of Included Randomized, Monotherapy Trials Comparing Two Active Drugs for Treatment of Acute Mania (N=27) |
Comparisons of Individual Drugs vs Placebo
Meta-analysis indicated statistical superiority over placebo for 13/17 agents tested: aripiprazole (
n=1662 subjects), asenapine (
n=569), carbamazepine (
n=427), cariprazine (
n=235), haloperidol (
n=1051), lithium (
n=1199), olanzapine (
n=1335), paliperdone (
n=1001), quetiapine (
n=1007), risperidone (
n=823), tamoxifen (
n=74), valproate (
n=1046), and ziprasidone (
n=663); and lack of efficacy in four others: lamotrigine (
n=179), licarbazepine (
n=313), topiramate (
n=1074), and verapamil (
n=20; ; ). For the 13 effective drugs, the pooled effect size was moderate (in 48 trials involving 11

092 patients, Hedges'
g=0.42, 95% CI: 0.36–0.48;
p<0.0001). On contrast, four agents with non-significant summary effects yielded a pooled effect size of <0.10 in seven trials with 1586 subjects (Hedges'
g= −0.03, CI: −0.13 to +0.08;
p=0.62). For categorical responder rates, pooled RR for the 13 effective drugs was 1.52 (CI: 1.42–1.62) in 46 trials with 10

669 subjects (
p<0.0001), and only 0.98 (CI: 0.82–1.19) in 7 trials of the 4 apparently ineffective agents with 1586 subjects (
p=0.87; ).
| Table 3Results of Random Effects Meta-analyses for the Outcomes of Response as Risk Ratio, Absolute Difference in Responder Rates, and NNT with Drug vs Placebo Comparisons |
Comparisons of Drug Classes vs Placebo
On the basis of primary outcome measure Hedges' g, as a measure of improvement of mania ratings between drugs and placebo, SGAs as a group yielded an overall effect size of 0.40 (CI: 0.32–0.47 in 29 trials involving 7295 patients; p<0.0001). For mood stabilizers (MSs, including carbamazepine, lithium, and valproate), pooled effect size was 0.38 (CI: 0.26–0.50 in 13 trials involving 2672 patients; p<0.0001). The unique central PKC-inhibiting drug tamoxifen yielded an unusually large Hedges' g of 2.32 (CI: 1.66–2.99; p<0.0001) in two small trials involving a total of 74 patients. Studies involving haloperidol as a standard active comparator (FGA), in its direct comparisons with placebo, yielded a pooled Hedges' g of 0.54 (CI: 0.34–0.74; p<0.0001) in four trials with 1051 subjects.
With respect to categorical responder rates (), SGAs vs placebo yielded a pooled RR of 1.47 (CI: 1.36–1.59; 28 trials, 7094 patients, p<0.0001); MSs, as a group yielded pooled RR of 1.59 (CI: 1.39–1.82; 12 trials, 2450 patients, p<0.0001), again indicating similar summary effects and CIs. Tamoxifen yielded an unusually high RR of 7.46 (CI: 1.88–29.7; 2 trials, 74 patients, p=0.004). For haloperidol, RR was 1.58 (CI: 1.29–1.94; 4 trials, 1051 patients, p<0.0001). Estimates of NNTbenefit values (smaller NNT with greater efficacy) ranked: tamoxifen <haloperidol <MSs <SGAs ().
Direct Comparisons
On the basis of the improvement in mania ratings (Hedges' g; ), SGAs as a group yielded greater effect size than MSs (in eight trials with 1464 patients, Hedges' g=0.17, CI: 0.07–0.28, p=0.001). Similarly, comparison of MSs vs all antipsychotics tested (SGAs or haloperidol) also favored the antipsychotics (Hedges' g=0.18, CI: 0.08–0.28 in 10 trials with 1530 subjects, p<0.0001), and SGAs did not differ from haloperidol (Hedges' g= −0.001, CI: −0.24 to +0.24 in six trials with 1536 subjects, p=0.99). Similarly, valproate and lithium did not differ significantly (Hedges' g=0.11, CI: −0.04 to +0.26 in four trials with 679 subjects, p=0.16).
| Table 4Results of Random Effects Meta-analyses for the outcomes of Hedges' g, Risk Ratio, and Rate Difference (absolute difference in responder rates) with Head-to-head Drug Comparisons |
On the basis of categorical responder rates in direct comparisons (), SGAs again appeared to be somewhat more effective than MSs (RR=0.88, CI: 0.80–0.96, in six trials with 1443 subjects, p=0.006). Antipsychotics (SGAs or haloperidol) were similarly superior to, or faster acting than, MSs (RR=0.88, CI: 0.80–0.97, in seven trials with 1479, p=0.01). Direct comparisons of haloperidol (the only FGA tested) with SGAs indicated little or no difference (RR=0.93, CI: 0.79–1.10, in seven trials with 2166 patients, p=0.40), as did lithium vs valproate (RR=1.00, CI: 0.81–1.24, in four trials with 679 patients, p=1.00).
Factors Associated with Drug–Placebo Contrasts
Overall inter-study variance in effect sizes of drug–placebo contrasts was substantial (Q=47.6, df=12, p<0.0001; I2=70.4), encouraging consideration of possible explanatory factors. In regression models involving drug arms, we considered only the 13 agents found more effective than placebo, so as to avoid potential confounding by drug inefficacy, which itself would influence treatment effects (drug–placebo contrasts). We tested pre-selected covariates (study site counts, sample size, and initial manic symptom severity) for possible association with observed effect size (Hedges' g) as a measure of treatment effect (difference in improvements in mania ratings between drug versus placebo), and mean difference (change in mania scores between baseline and final rating) to indicate drug or placebo effects. With these three covariates, statistical significance set at two-tailed α=0.016 (0.05/3).
We found significant associations between higher number of collaborating study sites and smaller treatment effects (drug versus placebo: 48 trials; slope (β)=–0.007, CI: −0.01 to −0.003, z= −3.79, p=0.00015), as well as larger placebo effects (38 trials; β=+0.11, CI: 0.06–0.15, z=4.67, p<0.0001), but not drug effects (48 trials; β= −0.02, CI: −0.06 to +0.03, z= −0.80, p=0.43). As more study sites corresponds with larger patient samples, we found similar associations between larger sample sizes and smaller treatment effects (48 trials; slope (β)= −0.001, CI: −0.003 to −0.0004, z= −2.63, p=0.008), and larger placebo effects (38 trials; β=+0.06, CI: 0.04–0.08, z=6.47, p<0.0001), but not drug effects (48 trials; β= −0.003, CI: −0.02 to +0.01, z= −0.30, p=0.77).
Treatment effects were unrelated to baseline symptom ratings (as the percentage-of-maximum attainable mania scores: 100%=60 for YMRS; 100%=52 for MRS, to avoid confounding by scaling differences) across 47 trials (β=0.43, CI: −0.57 to +0.65, z=0.14, p=0.89). However, higher baseline mania ratings predicted greater improvement with drug (46 trials; β=+0.26, CI: 0.13–0.40, z=3.80, p=0.0002), but not with placebo (36 trials; β=0.02, CI: −0.18 to +0.22, z=0.18, p=0.86).
Publication Bias
As studies with larger than average effects are more likely to be published, it is possible that the studies in a meta-analysis may overestimate the true effect size because they are based on a biased sample of target population of studies. As a first step in exploring any evidence of such bias in the present meta-analysis, the funnel plot of the effect size (Hedges'
g)
vs its standard error was plotted, which numerically (not visually) indicated some sort of asymmetry in distribution of the studies (Kendall's tau (
τ)=0.19,
z=2.02,
p=0.04). As a next step for assessment of publication bias we evaluated the possibility that the entire effect is an artifact of bias by calculating Orwin's Fail-safe
N value, which was 140, suggesting that a large number of trials with zero effect would need to be added to the analysis to make cumulative effect trivial (defined in this study as Hedges'
g<0.10). We made a concerted effort to include all available completed trials in mania, regardless of publication status; and could only include 38 studies with 56 comparisons (13 being trials with negative findings). Thus, it is very unlikely that we failed to identify such a large of number of studies, and the entire effect is an artifact of bias. For the primary meta-analyses including 56 placebo-controlled comparisons, trim and fill analysis identified and trimmed only one aberrant small study (of tamoxifen with 16 subjects;
Zarate et al, 2007), before the funnel plot became symmetric about the adjusted effect size (Hedges
' g) of 0.37 (CI: 0.29–0.45), indicating only a trivial change on the observed overall effect-size (Hedges'
g=0.37, CI: 0.31–0.42). When we considered only the trials for effective agents however, trim and fill analysis did not identify any aberrant studies; and the summary effect remained unchanged at the Hedges'
g of 0.42 (CI: 0.36–0.48). Overall, these considerations indicate that the effect of publication bias in this meta-analysis was negligible.