This review reveals that imperfect placebos are common in low back pain trials, a finding that has implications for the design of future trials and also for the interpretation of published trials evaluating treatment of low back pain. Two common problems were identified in the design of trials: the use of placebos that are potentially not inert (as indicated by contemporary treatment) and the uncertain success of blinding.
It may be argued that our search strategy may have inflated the proportion of trials with non-inert placebos, because we used the term “minimal intervention” in our search strategy. However, we only included trials in the review if the authors categorised the control intervention as a placebo intervention, or if they have stated in the manuscript that the intervention was designed to control for non-specific effects of treatment [29
]. The use of non-inert placebos in trials is usually a consequence of an uncritical attempt to design placebos that are indistinguishable from real interventions. For example, among non-pharmaceutical trials, we found that indistinguishability was more frequent for trials of acupuncture but all these indistinguishable placebos consisted of potentially genuine treatments. In acupuncture trials, the use of invasive sham acupuncture techniques has been criticised because the mechanism behind the effects of acupuncture may not depend on the depth or location of needling, but on needling itself [98
]. Accordingly, the lack of a clear understanding on the mechanisms underlying specific therapeutic effects is also a challenge to the design of indistinguishable placebos in other complex interventions [66
In pharmaceutical trials, “active placebos” are sometimes used to create intervention groups that are more closely matched. These placebos aim to mimic the side effects of drugs (e.g. dry mouth) while maintaining the same characteristics of other placebo types [115
]. However, pharmaceutical trials with improper choices of “active placebos” can also be at risk of spoiling their placebo comparisons. Two trials included in this review had a choice of “active placebo” (diphenhydramine) that might have acted as a genuine treatment because of its sedative properties. Thus, the results of these trials no longer reflect a placebo-controlled comparison but instead reflect a comparison of two genuine treatments. The decision on whether to use “active placebos” in pharmaceutical trials should be balanced with its risks. In antidepressant trials for example, their use may not be justifiable given that the incidence of side effects in experimental and placebo groups seems to be similar regardless of the use of an “active placebo” [4
The inclusion of naïve subjects in trials is one of the alternatives to enhance blinding when true indistinguishability is difficult to achieve. This is illustrated in a trial where TENS therapy is provided by a functioning device and the placebo via a non-functioning device (sham TENS group). Although both interventions will look the same, the electrical stimulation will only be detected by patients treated with the functioning device. In order to keep patients blinded in trials like this, researchers often tell them that they might or might not feel the stimulation regardless of whether the treatment provided was a placebo [27
]. However, it is unlikely that such information will prevent patients who have previously received a course of TENS therapy from knowing the sensation of true treatment and consequently from becoming unblinded. For the same reason, the use of a crossover design in these trials might not be appropriate [50
]. Deyo and colleagues [37
] have argued for the inclusion of naïve subjects in electrotherapy trials and, consistent with this recommendation, our results showed that naïve subjects were used more frequently in electrotherapy trials than in trials of other interventions.
From the different strategies with the potential to facilitate blinding in placebo-controlled trials, we found that structural equivalence was the most frequently used. When experimental and placebo interventions are structurally equivalent, they might not look the same, but they involve similar degrees of therapeutic contact. Provision of structurally equivalent placebo interventions may control for placebo effects without the risk of having a placebo that is not inert. A meta-analysis of psychotherapy trials has provided some evidence that structural equivalence reduces bias in treatment estimates [9
]. The meta-analysis showed that trials with structurally equivalent groups reported smaller effects of interventions than trials with groups that were not structurally equivalent. The “larger treatment effects” observed in the latter would reflect larger placebo effects in the experimental group due to the differential amount or quality of therapeutic contact. Nevertheless, because potentially many factors influence the magnitude of placebo effects, it would seem unlikely that structural equivalence alone can control for all the factors that generate unbalanced placebo effects in trials.
The use of any strategy to facilitate blinding will be worthless if, ultimately, an acceptable level of blinding is not achieved. As noted by Schulz and colleagues [126
], “blinding must succeed to reap its benefits”. Accordingly, the CONSORT statement recommends that the success of blinding be reported [105
]. Blinding success was poorly documented in a sample of general medicine and psychiatry trials [44
]. Likewise, our results show that disappointingly few trials of low back pain report on blinding success. However, this fact is not sufficient to rule out successful blinding. Hill and colleagues [70
] contacted the investigators of 40 rheumatology trials and found that the lack of reporting of randomisation, concealed allocation and blinding does not necessarily mean that these research methods have not been properly conducted. Nevertheless, although successful blinding might have been achieved in some trials where this was not reported, it would be clearer if future trials included the results of their blinding assessments in their reports.
One way of checking if blinding is successful is to measure how often the group assignment is guessed correctly. In a two-arm trial in which blinding is successful, guesses would be accurate 50% of the time. Nevertheless, in placebo-controlled trials, the success of blinding is better understood by the differences in the proportions of patients in each group who believed a “real” treatment was provided. That is, if patients in the placebo group are more likely to believe that the intervention received was a placebo, blinding was unsuccessful. The timing of blinding assessments also deserves special consideration. For instance, if the experimental intervention consists of a highly effective treatment, the difference in the proportion of patients believing in the provision of a “real” treatment will tend to be higher regardless of the use of adequate strategies to secure blinding. For this reason, it is preferable that blinding success is assessed earlier rather than latter in a course of treatment.
Some investigators supplement assessments of blinding success with measurements of expectations with treatment. While important imbalances in patients’ expectations were reported in eight trials (out of 14), it is likely such imbalances are common across trials of this type because of the small number of trials in which assessments of expectations were performed. Health care providers may also transfer to patients their own expectations [125
]. As noted by Critelli and Neumann [34
], “there appears to be a tendency for experimental placebos to be in some sense weaker, less credible, or applied in a less enthusiastic manner than treatments that have been offered as actual therapies”. However, in this review we have focused exclusively on investigating this concept from a patient’s perspective.
Despite the contribution of expectation measurements to the interpretability of the results in clinical trials, these measurements have important limitations. Firstly, there is no consensus on how expectations should be assessed in clinical trials, represented by the lack of standardisation in these assessments. In addition, deciding the best timing for these assessments is difficult and may explain the large variation encountered among the trials included in this review. As with assessment of blinding, treatment effects may confound ratings of patients’ expectations obtained at follow-up. Thus, it is questionable whether assessments of expectations as late as 6 months after enrolment measure the same construct as assessments of expectations at baseline. If researchers choose to assess expectations at baseline, patients might find it difficult to describe their expectations associated with interventions to which they are unfamiliar. Optimal ways to assess expectations in trials and the standardisation of such measurements are a priority and should be addressed by future studies.