We have described the calculation of sample size when subjects are randomised in groups or clusters in terms of two variances—the variance of observations taken from individuals in the same cluster, *s _{w}*

This sum of two components of variance is analogous to what happens with measurement error, where we have the variance within the subject, also denoted by *s _{w}*

For cholesterol concentration in the Medical Research Council thrombosis prevention trial the two components of variance were *s _{w}*

The design effect is the ratio of the total number of subjects required using cluster randomisation to the number required using individual randomisation.^{1} It can be presented neatly in terms of the intracluster correlation and the number in a single cluster, *m*: *D*=1+(m−1)*r _{I}*. If there is only one observation per cluster,

The main difficulty in calculating sample size for cluster randomised studies is obtaining an estimate of the between cluster variation or intracluster correlation. Estimates of variation between individuals can often be obtained from the literature but even studies that use the cluster as the unit of analysis may not publish their results in such a way that the between practice variation can be estimated. Recognising this problem, Donner recommended that authors should publish the cluster specific event rates observed in their trial. This would enable other workers to use this information to plan further studies.

In some trials, where the intervention is directed at the individual subjects and the number of subjects per cluster is small, we may judge that the design effect can be ignored. On the other hand, where the number of subjects per cluster is large, an estimate of the variability between clusters will be important.

