Protocol, design, and objectives
This double blind, double dummy, randomised phase III trial examined the efficacy of hypericum extract WS 5570 compared with paroxetine in the acute treatment of moderate to severe major depression. After a screening examination participants underwent a single blind placebo run-in phase of three to seven days, during which they received three coated tablets of hypericum placebo per day plus one paroxetine placebo capsule in the morning. After that, we randomised those still meeting the selection criteria to six weeks of double blind treatment with hypericum extract or paroxetine. Those who responded to treatment (that is, their total score on the 17 item Hamilton depression scale decreased by ≥ 50%) were invited to participate in a four month double blind maintenance phase (reported elsewhere).
All patients provided written informed consent. We did not use a placebo control group because we considered it unethical to treat severely depressed patients with placebo for six weeks.
We recruited male and female outpatients in 21 psychiatric primary care centres in Germany. All participants were 18-70 years old and had single or recurrent moderate or severe episodes of unipolar major depression without psychotic features (Diagnostic and Statistical Manual of Mental Disorders
, fourth edition, (DSM-IV) 296.22, 296.23, 296.32, 296.33) persisting for two weeks to a year. At screening and baseline all participants had to have a total score ≥ 22 points on the 17 item Hamilton depression scale and ≥ 2 points for the item “depressive mood.” The diagnosis of depression was based on the mini-international neuropsychiatric interview.14
There were no restrictions regarding ethnic group.
We excluded anyone with a decrease in total depression score of ≥ 25% during the run-in, or with a diagnosis of schizophrenia, acute anxiety disorder, adjustment disorder, depressive disorder of any type not stated above, bipolar disorder, organic mental disorder, acute post-traumatic stress disorder, or substance abuse disorder. We also excluded patients with increased risk of suicide (defined by a score ≥ 4 for item 10 of the Montgomery-Åsberg depression rating scale), who had previously attempted suicide, or who had not responded to more than one adequate treatment (equivalent to 150 mg/day amitriptyline for ≥ 6 weeks) in the present episode. Participants were not allowed to take other psychotropic medication and psychotherapy during the study (in case of previous antidepressant medication an appropriate wash out period of five half lives had to be observed).
Interventions and blinding
We used hypericum extract WS 5570 (Dr Willmar Schwabe Pharmaceuticals, Karlsruhe, Germany), a hydroalcoholic extract from herba hyperici (drug to extract ratio 3-7:1) with standardised contents of 3-6% hyperforin and 0.12-0.28% hypericin. The coated tablets contained 300 mg or 600 mg of the extract. Paroxetine was supplied in tablets of 20 mg packed in capsules containing one or two tablets. High and low dose tablets or capsules were indistinguishable in all aspects of their outward appearance. For each drug an identically matched placebo was available (the success of blinding was evaluated by examining the drugs before distribution).
During the six weeks of randomised treatment patients allocated to hypericum always took three coated tablets of hypericum/day plus one paroxetine placebo capsule in the morning whereas those in the paroxetine group took one capsule of paroxetine in the morning and three coated tablets of hypericum placebo/day. Initially this corresponded to three doses of 300 mg/day hypericum or one dose of 20 mg/day paroxetine. For patients whose total depression score had not decreased by at least 20% after two weeks of treatment compared with baseline we increased the treatment to three doses of 600 mg/day hypericum or one dose of 40 mg/day paroxetine. The doses for paroxetine were based on published recommendations.12
We assessed efficacy and safety at screening, baseline, and at the end of the first, second, fourth, and sixth weeks. The primary outcome measure was the absolute decrease of the Hamilton total depression score between baseline and week six. Secondary outcome measures included the Montgomery-Åsberg depression rating scale, the clinical global impressions, and the Beck depression inventory. We based assessments of safety and tolerability on spontaneous reports of adverse events, a semistructured interview exploring known side effects of the investigational treatments, physical examinations, and routine laboratory measurements.
To assure uniform diagnostic and rating standards, all assessments were performed by psychiatrists and psychologists who had participated in training before patients were included.
Random sequence generation, allocation concealment, implementation
Patients who still met the selection criteria at baseline were randomised at a ratio of 1:1 to hypericum or paroxetine. Randomisation was performed in blocks stratified by trial centre. A biometrician otherwise not involved in the trial generated the code using a validated computer program. The study drugs were dispensed to the centres in numbered containers. On inclusion of a patient into randomised treatment the local investigator allocated each participant the lowest available number. The block size was withheld from the investigators.
Statistical methods, sample size
Non-inferiority is usually established by showing that the true treatment difference is likely to be smaller than a prespecified non-inferiority margin that separates clinically important from clinically negligible (acceptable) differences.15
We considered that hypericum would not be relevantly inferior to paroxetine if the true decrease in total depression score (primary outcome measure) for hypericum was not more than 2.5 points16
smaller than for paroxetine (δ = -2.5).
The study was performed with an adaptive interim analysis. This design includes options for early stopping with rejection of the null hypothesis or for fultility (boundaries α1 = 0.01 and α0 = 0.5, respectively) or for re-estimation of sample size in case of continuation.
For the change in total depression score we assessed non-inferiority of hypericum by a shifted t
test using the prespecified non-inferiority margin of 2.5 points and a global one sided type I error of α = 0.025. We used Fisher's combination test17
in the final analysis, where the null hypothesis can be rejected when the product of the P values from both study parts falls below cα
= 0.0038. An analogous approach consists of calculating the one sided repeated 97.5% confidence limit for the treatment difference adjusted for the interim analysis.18
If this confidence limit is completely above the non-inferiority margin δ = -2.5, hypericum would be judged to be not inferior to paroxetine.
According to applicable guidance19
we reserved the option of testing for superiority after establishing non-inferiority of hypericum. If the lower one sided 97.5% confidence limit lies above 0, hypericum can be considered superior to paroxetine. We replaced missing values by carrying the last observation forward. The primary analysis was based on the intention to treat analysis to mirror clinical practice. We also performed a per protocol analysis to demonstrate robustness of the trial result to the choice of the analysis set.19
All secondary efficacy and safety measures were analysed descriptively. For the Hamilton total score, we defined response as a decrease in total score of ≥ 50% from baseline and remission as a score ≤ 10 points at week six.
We calculated the sample size for the first stage of the study until the interim analysis by assuming equal changes in depression score in each group with a common SD of 6 points. We needed 2×50 patients to attain 90% power for a one sided P value of P1 ≤ 0.20 in the interim analysis (trend towards non-inferiority of hypericum). The interim analysis resulted in a one sided P1 = 0.084 for the primary outcome measure so that the local type I error level for the second part of the trial was determined as cα/P1 = 0.045. Assuming a common SD of 6 points and equal means in both groups, we needed 2×75 patients to attain a power of 80% for the second stage of the trial, resulting in a total sample size of 2×125 patients.