|Home | About | Journals | Submit | Contact Us | Français|
Objectives To investigate the effectiveness of non-benzodiazepine hypnotics (Z drugs) and associated placebo responses in adults and to evaluate potential moderators of effectiveness in a dataset used to approve these drugs.
Design Systematic review and meta-analysis.
Data source US Food and Drug Administration (FDA).
Study selection Randomised double blind parallel placebo controlled trials of currently approved Z drugs (eszopiclone, zaleplon, and zolpidem).
Data extraction Change score from baseline to post-test for drug and placebo groups; drug efficacy analysed as the difference of both change scores. Weighted raw and standardised mean differences with their confidence intervals under random effects assumptions for polysomnographic and subjective sleep latency, as primary outcomes. Secondary outcomes included waking after sleep onset, number of awakenings, total sleep time, sleep efficiency, and subjective sleep quality. Weighted least square regression analysis was used to explain heterogeneity of drug effects.
Data synthesis 13 studies containing 65 separate drug-placebo comparisons by type of outcome, type of drug, and dose were included. Studies included 4378 participants from different countries and varying drug doses, lengths of treatment, and study years. Z drugs showed significant, albeit small, improvements (reductions) in our primary outcomes: polysomnographic sleep latency (weighted standardised mean difference, 95% confidence interval −0.57 to −0.16) and subjective sleep latency (−0.33, −0.62 to −0.04) compared with placebo. Analyses of weighted mean raw differences showed that Z drugs decreased polysomnographic sleep latency by 22 minutes (−33 to −11 minutes) compared with placebo. Although no significant effects were found in secondary outcomes, there were insufficient studies reporting these outcomes to allow firm conclusions. Moderator analyses indicated that sleep latency was more likely to be reduced in studies published earlier, with larger drug doses, with longer duration of treatment, with a greater proportion of younger and/or female patients, and with zolpidem.
Conclusion Compared with placebo, Z drugs produce slight improvements in subjective and polysomnographic sleep latency, especially with larger doses and regardless of type of drug. Although the drug effect and the placebo response were rather small and of questionable clinical importance, the two together produced to a reasonably large clinical response.
Hypnotic drugs are often prescribed in primary care for insomnia.1 Despite a reduction in prescribing of benzodiazepine hypnotics in the past decade, hypnotic use and costs remain high because of the introduction and increase in use of Z drugs,2 a group of non-benzodiazepine hypnotic drugs (including eszopiclone, zaleplon, and zolpidem), which act on the GABA (γ aminobutyric acid) receptor and are used in the treatment of insomnia. These are now the most commonly prescribed hypnotic agents worldwide. Prescriptions exceed costs of $285m (£178m, €221m) in the United States3 and £25m (€31m, $40m) in the UK.4 Although widely prescribed, Z drugs are not without risks. These include adverse cognitive effects (such as memory loss), psychomotor effects (such as falls, fractures, road traffic crashes), daytime fatigue, tolerance, addiction, and excess mortality5 with no significant difference from benzodiazepines.6 These established risks need to be weighed against the benefits.
Previous meta-analyses6 7 8 9 of clinical trials of Z drugs have been prone to publication bias, such as unavailability of unpublished trials, selective or duplicate publication, and selective reporting of results in constituent studies.10 11 An example of the distorting effects of these publishing practices was shown in a study by Mattila and colleagues.12 This study compared European Public Assessment Reports of three drugs for insomnia to identify clinical trials that were performed between 1998 and 2007 for the purpose of registration of these drugs in the European Union. They found that the effect size of these drugs was 1.6 times larger when it was based on published data compared with the whole sample of studies, published and unpublished. They also found “remarkable inconsistencies in the reporting of the secondary end points, methods, results and, especially safety.” Different characteristics of included studies have not been examined as possible moderators of effects of Z drugs in previous meta-analyses.
One way of reducing the problem of publication bias is to analyse the effect of drugs that have been approved by governmental agencies with data derived from regulatory submissions.13 Drug companies are required to provide information on all sponsored trials, published or not, when applying for new drug approvals.14 Hence, the US Food and Drug Administration (FDA) files contain a complete dataset of published and unpublished trials up to the date of drug approval. We therefore undertook a meta-analysis of randomised placebo controlled parallel group studies of clinical effectiveness of Z drug hypnotics for insomnia in adults using only data provided to the FDA for drug approval.
Another concern with studies of hypnotics is the magnitude of the placebo response. We have considered the distinction between drug and placebo responses and drug and placebo effects.15 16 A drug response is the change that occurs after administration of the drug. The effect of the drug is that portion of the response that is due to the drug’s chemical composition; it is the difference between the drug response and the response to placebo. A similar distinction can be made between placebo responses and placebo effects. The placebo response is the change that occurs after administration of a placebo. It includes such factors as improvement because of the natural course of the condition and regression toward the mean, as well as the placebo effect itself.
Previous studies have shown significant improvements in placebo arms in placebo controlled trials of hypnotic drugs.17 18 Assessment of the magnitude of the placebo effect is important for understanding drug-placebo differences and their implications for clinical practice. For example, a small drug-placebo difference might lead to different treatment options if the drug and placebo are both effective rather than if neither are effective. Because change in the absence of placebo administration is rarely assessed in randomised controlled trials (and was not assessed in the trials contained in the FDA files), we could not assess the placebo effect. Therefore, we assessed changes in placebo groups, as well as those in drug groups, thus allowing us to establish the magnitude and significance of placebo responses, drug effects, and other variables that can moderate these outcomes.
For this systematic review we adhered to PRISMA guidelines.19 20 We obtained data on all currently approved (non-benzodiazepine) Z drugs: eszopiclone, zaleplon, and zolpidem from the FDA website (see appendix 1).
The criteria for inclusion were randomised double blind controlled trials, recruitment of adults with primary insomnia (transient or chronic), an intervention comparing a Z drug with a placebo control, submission to the FDA before approval, sponsored by the manufacturer, and studies from any country or reported in any language (although we found only reports in English). Studies were excluded if they were crossover designs, included healthy patients with normal sleep, were single night studies with induced insomnia, or did not report any inference test or enough descriptive information (for instance, percentages or means and a variability measure for both groups and/or both time measures) as included studies were too heterogeneous and not large enough to estimate the missing information to calculate an effect size. We excluded crossover trials because of problems associated with reactivity, learning, carry over effects, and failure of blinding. Blinding failure is more likely with crossover studies, leading to an enhanced placebo effect in the drug treatment arm, thereby increasing the likelihood of a false positive (type I) error. We did not include post-approval trials in our analysis because it is not possible to obtain access to all unpublished data for those trials.
Two independent trained raters extracted information related to the study with high inter-rater reliability: mean Cohen’s κ 0.90, for categorical variables, and mean intraclass correlation r=0.92 for continuous variables. Because of the nature of the FDA data, extractors were blind to researchers and institutions.
Methodological quality was assessed with the Jadad scale21 22 as adapted by Miller and colleagues23 (see appendix 2). For each study, we extracted statistical data for drug and placebo. We also coded sample and study characteristics and included dimensions such as study identifier, year of publication, location/s of study (country and number of sites), and study duration. Data were extracted for two primary outcomes and eight secondary outcomes. The primary outcomes were polysomnographic and subjective sleep latency. The secondary outcomes were subjective and polysomnographic total sleep time, subjective and polysomnographic number of awakenings, subjective sleep quality, sleep efficiency, and subjective and polysomnographic time awake after sleep onset. Measured characteristics of participants included proportion of women, age, and sample type (outpatients, elderly, etc). Design characteristics included design type, recruitment method, intervention drug(s), treatment duration, and statistics reported. None of the trials reported race or ethnicity.
For both measures of sleep latency (polysomnographic and subjective) and the eight other sleep related outcomes, we calculated effect sizes as the mean difference between pre-test and post-test divided by the standard deviation (SD) of the pre-test value24 for each group separately (that is, repeated measures effect sizes, correcting for sample size bias).25 The standardised mean change in the placebo group was subtracted from that of the intervention group to evaluate the drug effect with respect to the placebo effect for each comparison (that is, effect size between groups adjusted for baseline). We calculated multiple effect sizes if the study reported more than one drug group or multiple outcomes. The latter were analysed separately to investigate main effects and moderators. For multiple dose studies, with the same or different drug but with the same control group, we controlled multi-treatment dependence by estimating the covariance among them26 to analyse the effect of different drug and dose combinations. The sign of the effect size was set so that negative values signified a decrease of waking after sleep onset, sleep latency, and number of awakenings (all both polysomnographic and subjective). The sign of the effect size was positive for increases in polysomnographic and subjective total sleep time, polysomnographic sleep efficiency, and sleep quality.
We obtained repeated measures effects sizes for each group for those comparisons reporting means and SDs and used medians and interquartile ranges as the best approximation for those studies missing mean and SDs for drug and placebo. Sensitivity analysis was undertaken by comparing the main results, with and without those comparisons where median and interquartile range were used to obtain a standardised mean difference. Transformations were conducted to obtain effect sizes between groups for those cases where F test or P values were reported.27 As 63% (41) of the comparisons did not report SDs of each group and those provided were largely heterogeneous, we have reported repeated measures results for only six studies (24 comparisons) with the most complete statistical data; two studies were not included in the final analysis as they did not report any inference test or variability measure either in their repeated or two groups measures. We have reported effect sizes in their raw metric for the same comparisons in parallel to facilitate clinical interpretation.
We examined the effect sizes with random effects models27 28 for weighted effect sizes and publication bias. Random effects models are more robustly generalisable as they assume variability not only within studies but also between studies, a relevant assumption when studies from different populations are integrated to account for sampling error and population variance. Moderation patterns were examined under mixed and fixed effects assumptions, but we have reported results only under the latter assumptions because of the lack of power to show any significant pattern under mixed effects models.29 The homogeneity statistic, Q, determined whether each set of weighted mean effect sizes shared a common parametric effect size: a significant Q indicates a lack of homogeneity. To assess not only significance of the heterogeneity but also its size, we calculated the I2 index and its corresponding 95% confidence intervals30 to determine and compare across outcomes the extend of the heterogeneity. I2 varies between 0 (homogeneous) and 100% (non-homogeneous), and if the confidence interval around I2 includes zero, the set of effect sizes is considered homogeneous.31 We investigated possible asymmetries in the distribution of the effect sizes, which could indicate reporting bias, using the trim and fill technique,32 Begg’s strategy,33 and Egger’s test.34 We analysed the total Jadad score as well as individual item scores to detect any possible bias effect on the overall results. Finally, we conducted sensitivity analyses with effect sizes with more than 2 SD from the average effect size.
Moderator analyses were conducted for the main outcomes, polysomnographic and subjective sleep latency. To explain possible moderation of the variability of the overall effect sizes, we examined the relation between sample, methodological, or condition characteristics and magnitude of effect using a modified weighted least squares bivariate regression analyses with weights equivalent to the inverse of the variance for each effect size.27 27 35 Because doses of different drugs are not equivalent, we also tested the drug by dose interaction. We used total score on the methodological quality scale as a moderator to analyse possible interaction with the final weighted effect sizes and have presented any significant pattern for either sleep latency or its subjective measure.
In the data obtained from the FDA website, we identified 13 clinical trials comprising 4378 participants that examined 65 separate drug-placebo comparisons by type of outcome, type of drug, and dose and that met the inclusion criteria. Figure 1 shows the trial flowflow.. Table 11 and appendix 3 provides descriptive features of the studies. Methodological quality of the studies ranged from 13 to 21 on the Jadad scale (mean 15.63, SD 1.8). Publication year and quality score were not significantly correlated (r=0.34, P=0.28).
Studies were conducted in North America (eight studies), North America and Europe (one study), South America (one study), or Australia (one study), with one study conducted entirely in Europe, and another study without location information. The mean duration of studies was 33.9 days (SD 33.3, range 14-180 days). Of the 4378 participants sampled, 61% were women, 61% were aged under 45, and the mean age was 49.6 (SD 13.3; range 38-72) years.
All 13 studies included comparisons of at least one of our primary outcomes. Ten studies (22 comparisons) assessed polysomnographic sleep latency and seven (11 comparisons) assessed subjective sleep latency. The eight remaining secondary outcomes appeared in fewer studies: four studies (seven comparisons) assessed subjective total sleep time, two (two comparisons) assessed total polysomnographic sleep time, four (six comparisons) assessed subjective number of awakenings, three (four comparisons) assessed polysomnographic number of awakenings, two (four comparisons) assessed subjective sleep quality, three (five comparisons) assessed sleep efficiency, three (three comparisons) assessed polysomnographic waking after sleep onset, and one (one comparison) assessed subjective waking after sleep onset.
Zolpidem was most commonly prescribed drug (eight studies); eszopiclone and zaleplon were assessed in three studies each (one study included both zolpidem and zaleplon). Zolpidem was prescribed in eight studies (15 comparisons) measuring polysomnographic sleep latency, zaleplon in three studies (six comparisons), and eszopiclone in only one study (one comparison). Only zolpidem and eszopiclone were used in studies measuring subjective sleep latency, in five (eight comparisons) and two (three comparisons) studies, respectively.
For our primary outcomes, analyses of standardised effect sizes showed significant but small to medium differences in polysomnographic (weighted standardised mean difference −0.36, 95% confidence interval −0.57 to −0.16) and subjective sleep latency (−0.33, −0.62 to −0.04) for treatment versus control. There were significant effect sizes for the primary outcome (sleep latency) within groups separately for both placebo (−0.39, −0.54 to −0.23 (for polysomnographic); −0.33, −0.63 to −0.03 (for subjective)) and drug (−0.93, −1.32 to −0.54 (polysomnographic); −0.67, −1.30 to −0.03 (subjective)). Analyses of weighted mean raw differences indicated that drugs decreased sleep latency by 22 minutes (−33 to −11 minutes).
Tables 2 and 3 show standardised and raw effect sizes, respectively.respectively. Figures 2 and 3 show forest plots for polysomnographic and subjective sleep latency, respectively.respectively. Analysis of secondary study outcomes showed no significant drug effect. The lack of difference between groups for other sleep measures coupled with the fact that few reports included them meant there was insufficient evidence to show efficacy on these measures.
There was no evidence of asymmetry of the distribution of the effect sizes for sleep latency by the trim and fill technique, Begg’s test33 (P=0.88 (polysomnographic); P=0.22 (subjective)), or Egger’s test34 (P=0.34 (polysomnographic); P=0.77 (subjective)), which suggests that these results are not significantly affected by publication bias. One study was an outlier (eszopiclone study No 190-047, table 11 and appendix 3), with a large pooled effect size for sleep latency 0.46 (0.24 to 0.69) and >2 SD from the overall weighted effect size. When we excluded this study, the pooled effect size for sleep latency was −0.54 (−0.91 to −0.15). Sensitivity analysis showed no significant differences in overall efficacy (the overall effect size without the outlier was still significant, with a slightly lower reduction of subjective sleep latency) and the same patterns for the moderator results adjusted for the two outlier comparisons provided by this study.
Every item was evaluated through bivariate weighted regression analysis under fixed and random effects assumptions to critically and robustly appraise any included study for risk of bias in attributing outcomes to the intervention and their possible effect on the overall efficacy, but none of the results was significant. Therefore, there was no evidence of any interaction between quality/risk of bias in the included studies and the final results.
The main outcomes, polysomnographic and subjective sleep latency, were the only measures with sufficient cases to permit detailed models for moderator analyses (table 44).). Sleep latency was more likely to be reduced in studies published earlier, with larger drug doses, longer treatment duration, and samples that included a greater proportion of younger patients and/or female patients (table 44).). Polysomnographic and subjective sleep latency were reduced when larger doses were used, regardless of type of drug. The interaction of dose by type of drug was not significant, and all drugs (zolpidem and zaleplon for polysomnographic and subjective sleep latency and eszopiclone and zolpidem for subjective sleep latency, the latter being significantly more effective in this particular outcome) showed a pattern of greater reductions in sleep latency with larger doses. Subjective sleep latency was more likely to be reduced in studies published earlier, or with greater numbers of younger patients or women included in the sample, and with zolpidem. These patterns were obtained under fixed effect meta-regression models and these held under mixed effects assumptions.
In this meta-analysis of Z drugs using data published on the FDA website, which are less likely to be affected by selection or reporting bias, we found significant reductions in polysomnographic and subjective sleep latency in both drug and placebo groups. The difference between drug and placebo was 22 minutes for polysomnographic sleep latency and seven minutes for subjective sleep latency. Although these reductions in sleep latency might have benefits, albeit short term, for quality of life, the effect sizes corresponding to these differences were −0.36 and −0.33, both of which are conventionally considered to be small effects,36 and well below the criterion for clinical significance (0.50) suggested by the National Institute for Health and Clinical Excellence (NICE) in their guidelines for the treatment of depression.37
There were insufficient data for other drug effect end points to allow a valid analysis. The large heterogeneity in sleep latency outcomes was mainly explained by larger doses needed to obtain a greater drug than placebo effect. Z drugs were more likely to be effective in reducing sleep latency in studies published earlier, those including more younger and/or female patients, and those using zolpidem. Significant placebo responses were present in polysomnographic and subjective sleep latency. There have been several previous meta-analyses of published data on Z drugs, although none included moderator analyses and all acknowledged publication bias.4 6 8 9 38
As in previous studies, we found that data submitted for licensing enabled detailed investigation of drug efficacy.13 39 40 We included sponsored studies submitted to the FDA but did not assess whether they were subsequently published. Studies submitted to the FDA are required to report all data so are less likely to be affected by reporting bias.
Studies were subjected to the same methodological scrutiny and analytical rigour as meta-analyses of published studies. As in other meta-analyses, we did not include studies that did not report enough statistical data to calculate an effect size. Because of the small number of reports for some outcomes, and the heterogeneity of statistical data reported, we could not compare some studies directly or robustly impute missing data. There was insufficient information about sample setting characteristics, drug side effects, and other factors that might have explained heterogeneity to fully account for these. The entry criteria for studies varied, with some studies focusing just on sleep latency, particularly for shorter acting drugs such as zalpelon. This could have affected the capacity of some studies to identify effects other than on sleep latency. All the drugs are licensed for insomnia, and patients presenting for treatment have a range of symptoms, not just sleep latency, for which these drugs are commonly prescribed in general practice.
Another weakness in the present analysis is that all the trials were industry sponsored. Industry sponsorship has been shown to enhance the outcome of clinical trials.41 Thus, although we were able to include published and unpublished studies, at least for the reports used to approve these drugs, we could not avoid sponsorship bias, and our results might therefore overestimate the drug effect. Unfortunately, eliminating both sources of bias simultaneously is difficult, if not impossible. Although clinical trials now need to be registered in advance to be published in major medical journals,42 there is no requirement that the results be submitted for publication, and many failed clinical trials or clinical trials with negative results go unpublished.10 Furthermore, although many clinical trials are subject to mandatory reporting of results to the FDA, most are not, and for those that are, as many as 78% fail to comply with this requirement.42 Because sponsorship bias is in the direction of greater effects for industry sponsored trials, our results might overestimate the effects of Z drug hypnotics for treating adult insomnia.
We found evidence of a significant placebo response for sleep latency. McCall and colleagues undertook a meta-analysis of sleep changes associated with placebo in published hypnotic clinical trials and found a clinically important and statistically significant placebo response for subjective sleep latency and total sleep time.17 Belanger and colleagues undertook a meta-analysis of sleep changes in control groups of 34 hypnotic drug studies in which 23 used a pharmacological placebo, four a psychological placebo, and seven a waiting list. They found significant pre-post changes in the pharmacological placebo group on several sleep outcomes, both objectively or subjectively measured, suggesting that sleep measures might change significantly in response to a pharmacological placebo.18
The response to placebo is more than just the placebo effect. Just as the effect of a drug is estimated by the difference between the response to the drug and the response to a placebo, the placebo effect would be the difference between the placebo response and changes occurring without administration of a placebo. Belanger and colleagues assessed the response to placebo hypnotic drugs and compared it with sleep changes among patients placed on a waiting list.18 Compared with those on a waiting list, there were significantly greater improvements in subjective sleep onset latency (19.55 min v 2.43 min), subjective total sleep time (31.13 min v 7.30 min), and objective total sleep time (18.27 min v 10.34 min) in the placebo group.18 These data were based on comparisons between studies rather than comparisons within studies, and none of the trials in the FDA database included waiting list controls. Nevertheless, the results of Belanger and colleagues suggest that the placebo response observed in our meta-analysis was largely caused by a genuine placebo effect. Future clinical trials including both placebo and untreated (natural course) controls would be useful, as well as combining the results of studies using network meta-analysis.
The response to a medical treatment consists of two components: a true drug effect and a non-specific placebo response, which includes the placebo effect, regression toward the mean, and improvement because of the natural course of the condition. For that reason, it is useful, both for current clinical practice and for future treatment development, to know the effect sizes for the placebo group as well as for the control group. For example, finding that both placebos and drugs are effective but that the drug is more effective than the placebo, suggests that placebo characteristics can be used to amplify effectiveness of a drug. Conversely, finding improvement only in drug arms indicates that the placebo effect is not an important component of treatment, whereas finding that both are equally effective, compared with waiting list controls, suggests that non-specific aspects of patient care might be having positive effects.
We found that both the drug effect and the placebo response were small and of questionable clinical importance. The two put together, however, lead to a reasonably large clinical response. Although the drug-placebo difference in objectively measured sleep latency was only 22 minutes, the response to the Z drugs, including both drug effect and the placebo effect components, was 42 minutes. Similarly, the effect size for the drug response was −0.93 and that for the placebo response was −0.39, accounting for about half of the drug response.
Insomnia is a symptom defined disorder characterised by distress about perceived poor sleep or lack of sleep. Hence, subjective sleep latency might be as important as objective sleep latency in understanding the benefits of treatments for this condition. The response to Z drugs was 25 minutes shorter for subjectively perceived sleep latency, whereas the response to placebo was an improvement of 19 minutes. Thus the benefit of Z drugs in term of subjectively perceived sleep latency was only seven minutes and was not significant. However, this was based on only two comparisons. Effect sizes for subjective sleep latency were calculable for a larger number of trials and the drug-placebo difference (−0.33) was small but significant, with the placebo response again accounting for about half of the drug response.
Taken together, these data suggest that the placebo response is a major contributor to the effectiveness of Z drugs. The remaining effect needs to be balanced against the harms associated with these drugs. The substantial proportion of the drug response accounted for by the placebo response indicates the importance of non-specific factors in the treatment of insomnia. As the placebo effect is a psychological phenomenon, these data suggest that increased attention should be directed at psychological interventions for insomnia.
FDA data could also provide further opportunities for studying effects of adverse effects with Z drugs (particularly as larger effect sizes were associated with higher drug doses), as well as examining issues of publication and reporting delays and bias. We did not look at adverse effects, which can pose significant risks,43 leading to concerns about the widespread and sometimes inappropriate use of these drugs.44 45
This study of FDA data shows that Z drugs improve objective and subjective sleep latency compared with placebo, particularly in younger and female patients. The size of this effect, however, is small and needs to be balanced with concerns about adverse effects, tolerance, and potential addiction. The placebo response accounted for about half of the drug response. This suggests that increased attention should be directed at psychological interventions for insomnia.
Appendix 1: Details of how to obtain drug trial data from FDA
Appendix 2: Modified Jadad score
Appendix 3: Full details of studies included
Contributors: ANS and IK had the original idea for the study. All authors were involved in the design of the review, developed the search strategy, performed the study selection, interpreted and discussed results, and contributed to the writing and review of the various drafts of the report. JM, MK, and ANS extracted data from included studies. TBH-M, IK, and ANS were involved in data analysis. ANS is guarantor.
Funding: This study was funded by the College of Social Science Research Fund at the University of Lincoln. This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: Not required.
Data sharing: No additional data available.
Cite this as: BMJ 2012;345:e8343