A potential source of clinical heterogeneity is variation between trials in the way in which the intervention is delivered. This problem is least likely in reviews of simple interventions, particularly some drugs. Dose finding trials are required before drugs can be used in clinical trials, so the optimal dose is usually known and administration is consistent across trials. In contrast, complex, multifaceted interventions are likely to be administered in different ways in different settings. Some examples of complex interventions relevant to our discipline (physiotherapy) are back schools,
2 programmes to prevent falls,
3 and functional restoration programmes for injured workers.
4 Other complex interventions include education programmes
5 and most surgical procedures.
6 These interventions may be administered in quite different ways across trials, and it is reasonable to expect that the way in which they are administered could influence their effectiveness. Heterogeneity of effects may occur because, all else being equal, well planned, intensive, competently administered interventions are likely to be more effective than low intensity interventions that are poorly planned and administered.
Exercise, a component of many physiotherapy interventions, provides a case in point. Exercise science provides us with some insights into how the design of an exercise programme influences physiological responses to exercise. A dose-response relation exists, as with drugs. Physiological responses to exercise are determined by the mode, frequency, intensity, and duration of exercise.
7 In practice, the dose is also determined by adherence to the exercise programme. The effects of exercise programmes observed in clinical trials are likely to vary because trials use different training doses and inspire different levels of adherence.
8An example of statistical heterogeneity that could be attributed to trials administering the interventions differently comes from studies of the effects of training the pelvic floor muscle. We identified four randomised trials of the effects of pelvic floor training to prevent urinary incontinence during pregnancy.
9–12 Three presented enough data to permit meta-analysis.
10–12 The studies were heterogeneous with respect to intervention. Two showed significant and clinically important effects of antenatal training,
11,12 whereas one study reported non-significant and clinically trivial effects.
10 In the two trials with positive effects, training was supervised regularly by a physiotherapist, whereas in the study with negative effects women saw the physiotherapist only once.
The pooled estimate of effect obtained from a meta-analysis of all three trials did not show an effect of pelvic floor training on risk of urinary incontinence (odds ratio 0.67, 95% confidence interval 0.39 to 1.16; figure). When we excluded the large trial of a low intensity intervention
10 from the meta-analysis, we found a clinically worthwhile effect of antenatal training (0.50, 0.34 to 0.75). The largest trial may have reported a smaller effect because of its size. Resource limitations often mean that large trials provide less intensive interventions, and in large trials it may be logistically difficult to provide well supervised interventions. Yet large trials are most heavily weighted in meta-analyses. If large studies with less intense interventions show smaller effects, they will tend to dilute the effects of smaller studies that show larger effects. An uncritical synthesis of these data suggests that the intervention is ineffective, but a more accurate interpretation may be that the intervention is effective only if administered intensively.
The differences in conclusions reached by analyses that do and do not consider the quality of intervention are likely to be clinically important. In our opinion, clinicians, providers of health care and patients should be more impressed by systematic reviews that explicitly consider the quality of interventions, particularly when the interventions are complex. Systematic reviews of complex interventions should routinely examine the quality of interventions.