|Home | About | Journals | Submit | Contact Us | Français|
Most randomised trials allocate individual participants to different treatments. However, cluster randomised trials in which groups of subjects are allocated to different treatments are becoming increasingly popular.1 Cluster randomisation is often advocated to minimise treatment “contamination” between intervention and control participants. For example, in a trial of dietary change, people in the control group might learn about the experimental diet and adopt it themselves.
Contamination of control participants has two related effects. It reduces the point estimate of an intervention's effectiveness and this apparent reduction may lead to a type II error—that is, rejection of an effective intervention as ineffective because the observed effect size was neither statistically nor clinically significant.
Although the threat of contamination is an issue in some controlled trials, it may be not be of much practical importance in many. Trialists should use individual randomisation if possible because of the drawbacks of cluster allocation. Cluster trials are associated with problems of recruitment bias and the need for larger samples than would be required in similar, individually randomised trials. In recruitment bias, different sorts of participants are selected into the various arms of the trial, thereby defeating the objective of randomisation, while a larger sample size may increase the cost of a trial, its length, or its complexity. This paper describes the difficulties of cluster trials and argues that the problem of contamination can often be dealt with by individual randomisation.
Members of clusters cannot be treated as independent, and the effect of this on outcomes leads to a need to increase the sample size.1,2 This problem can also be described as follows: for any given sample size, the correlation between cluster members will reduce the overall power of the study. The difficulty is well known and much has been published recently on sample size and analytical issues.1–4
The randomised trial is the ideal study design in evaluative research because of its ability to deal with known and unknown confounding factors. If randomisation is successful, these factors should be balanced across the treatment groups. Balance in cluster trials can be achieved through randomisation across the trial arms at the cluster level if there are sufficient clusters. However, simple randomisation of clusters, even with relatively large numbers, can still result in an imbalance. For instance, in a randomised trial of breast screening there was an imbalance in socioeconomic groups between the two study arms even though 87 clusters (general practices) and 50000 women were included.5 This imbalance would have been extremely unlikely had the trial used individual randomisation.
Cluster trials generally use the postrandomised consent method of trial design, which is similar in concept to Zelen's method in individually randomised designs.6 Participants are not asked for their consent to randomisation; they are usually asked for consent to treatment (although this is not always done in cluster trials) and are asked whether they consent to inclusion in the study analysis. When cluster trials are analysed at the individual level using this method, balance between the randomised groups can only be guaranteed if one of two conditions are fulfilled: all members of the cluster must be included in the trial, or a random sample of participants must be included in the trial's analysis. The latter approach may be used to increase the statistical power of the study for a given sample size by increasing the number of clusters rather than the total number of patients.4 For instance, Kerse et al took a random sample of patients from each cluster rather than all randomised individuals.7 Neither of these conditions has been met in recently published cluster trials. Often participants are asked after randomisation whether they will participate in the trial in terms of providing data for follow up. If a significant proportion refuses, there is a possibility of selection bias.
An example of selection bias was seen in a recent trial of counselling to reduce the risk of cardiovascular disease.8 Participants were allowed the choice of accepting the intervention and providing data on outcomes. Participation in the control arm was more desirable than participation in the intervention, as a result of which nearly twice as many subjects were recruited as controls. Moreover, the participants in the intervention group had some observable characteristics, such as a lower prevalence of smoking, which put them at a lower risk of coronary heart disease than the control group. Thus, the modest effect associated with the intervention could have been due to the recruitment of different types of participant rather than the intervention. Similarly, in a cluster trial of diabetes care the intervention clusters identified about 25% more eligible patients than the control clusters, although the total size of the list of patients was slightly greater in the control clusters.9
Even when recruitment between the intervention and control clusters is similar and the observed baseline characteristics between allocation groups seem balanced, if not all the members of the cluster or a random sample of them are included in the analysis, the trial may still be unbalanced because of unobserved characteristics. For example, in a trial of a behavioural intervention to prevent violence in schools, the number of pupils in the two groups was similar, but only 66% of children—those whose parents had returned consent forms—were included in the study.10 However, the intervention was still delivered to children whose parents had not returned consent forms. Although the trial recorded a creditable 82% follow up, this represented only 54% of the entire randomised sample. In contrast, a trial of accident prevention in children included all relevant members of the cluster in the analysis (children aged 3-12 months) and managed to collect and analyse outcome data on more than 92%, thus reassuring us that little, if any, selection bias had taken place.11
When Zelen's method is used in an individually randomised trial there is usually a loss of statistical power because there is incomplete penetration of the intervention. Thus, uptake of a new treatment that is less than 100% will lead to a dilution of the effect of treatment and will reduce the measurable effect size.12 Because cluster trials also use Zelen's method, this dilution of effect applies to them too. For example, in a cluster trial of injury prevention in young children the intervention was not delivered to about 25% of parents in the experimental arm.11 This problem is not generally recognised in estimating the sample size for cluster trials, and if it is taken into account it will lead to a general increase in sample sizes. Thus, some cluster trials not only need to account for the clustering effects on statistical power, they also need to boost sample sizes because of the dilution effect.
Before considering cluster randomisation as a way of addressing contamination we need to be certain that contamination is a real rather than a theoretical possibility. Are patients who are receiving counselling really going to pass on the intervention in such a way as to alter behaviour in the control arm of a trial? This thesis could be tested empirically by undertaking a pilot trial using individual randomisation with contamination of the control group as the outcome. However, even if contamination were a problem and a sizeable proportion of the control group was affected, individual randomisation might still be best.
As an example, consider a trial of counselling to promote healthy behaviour in adults at high risk of cardiovascular disease.8 To detect a 9% reduction in smoking prevalence from 50% to 41% with 90% power at a significance value of 5%, a total of 1282 participants would have been needed. However, because the trialists elected to use a cluster design a sample of 2000 was required. The trialists could have chosen individual randomisation with a sample size of 2000 and utilised the extra power of this increased sample to cope with any contamination. Thus, a sample of 2000 would have been able to detect a 7% difference in smoking prevalence with 90% power and 5% significance, allowing for up to a 20% contamination of the control group. Furthermore, this estimate of the allowable contamination assumes that the effectiveness of the intervention through contamination is as powerful as if it had been delivered by healthcare professionals. However, if we assume that it was only half as effective, contamination could be much greater. Indeed, if contamination in this particular trial had been theoretical, the non-significant effect size noted in the trial would have been significant if individual randomisation had been used.
The probable scale of contamination will clearly differ with the intervention, and I am not aware of any published estimates in trials that might have used a cluster design to avoid contamination. However, a review of Zelen's method in cancer trials that did not use cluster randomisation showed that the mean proportion of patients who crossed from one treatment to another, which is analogous to contamination, was 18% (range 10%-36%).12 The tables show the impact that different intracluster correlation coefficients and contamination rates have on estimates of sample size or contamination. Around 30% contamination can be sustained before the sample size has to be doubled to take into account the reduced effect size from such contamination. However, use of cluster randomisation rapidly leads to a doubling of the sample size. Even if the intracluster coefficient of correlation is very low, the clusters are small, and there are relatively high levels of contamination, individual randomisation can still result in a smaller sample size.
Although some contamination can be dealt with by increasing the sample size, a trial might still show a statistically significant effect but one that is too small to be of clinical relevance. Trials are rarely powered on the basis of a minimum clinical significance. Rather, they are powered on the likely effect that other trials have detected or on the basis of observational work or on logistical factors. Thus, the reduced effect size caused by contamination could be used to calculate sample sizes.
Often a cluster design is the correct design for a trial. For instance, a randomised trial of training teachers to promote smoking cessation demanded a cluster design as it would not have been practical to randomise children to different teachers.13 In these instances one must be aware of the potential problems of selection bias creeping into the study. However, there is uncertainty with many trials. Thus, recent cluster trials which probably could have used an individually randomised design include a vitamin A supplementation study,14 an accident prevention trial,11 and a lifestyle behaviour changes study.8 Because of the problems outlined in this paper, trialists ought to consider very carefully the cluster approach to trial design. Substantial contamination can be tolerated within the usual individual randomised trial before a cluster design is better in terms of total sample size.
I thank Hazel Inskip, the BMJ 's referee, for helpful comments on the original manuscript.
Competing interests: None declared.
A statistical appendix appears on the BMJ's website