A total of 929 research protocols were identified by the initial search. Of these, 446 met the inclusion criteria (see supplementary figure 5). Table 2 lists the main characteristics of the 446 research protocols (also see supplementary table 5). The most common therapeutic areas were oncology (94; 21%) and endocrinology (49; 11%). Most studies were sponsored by industry (314; 70%), were in phase III (251; 56%), had a parallel group design (319; 72%), and had superiority of the test over control medicinal product as the primary objective (375; 84%). Six (1%) protocols included sample size re-estimation in the study design.
Main characteristics of the 446 research protocols
Reporting of sample size components
The individual core components of the sample size were generally reported in the 446 protocols, with the exception of withdrawals (269; 60%, fig 1) (also see supplementary table 6). Of the 446 protocols, 240 (54%) reported all the core components; withdrawal rate was the only element missing in 143 out of 206 (69%) protocols that did not report all core components.
Fig 1 Reporting of core sample size components
When we considered protocols that reported all core components and additional information such as adjustments for multiple testing to accurately recalculate the sample size (complete reporting) then the number reduced to 188 protocols (42%).
Reporting design assumptions
Less than half of the 446 protocols (190; 43%) reported the data on which the treatment difference (or margin) was based. Of the 190 protocols that did report the basis of the treatment difference, 92 (48%) cited previous studies with the product or a product in the same class and 38 (20%) cited a literature search (fig 2 and supplementary table 7). In only four (2%) protocols was the estimated treatment difference based on a meta-analysis. Reporting the basis for the treatment difference was lowest in studies on oncology (28/94; 30%) and cardiovascular disease (12/36; 33%) and highest in those on pain and anaesthesia (16/27; 59%) (see supplementary table 8).
Fig 2 Reporting the design assumptions
Overall, 55 out of 446 (12%) protocols reported both the basis of the treatment effect and its clinical importance, 135 (30%) protocols reported the basis only, and 256 (57%) reported neither. Limited information on the nature of the data underpinning the treatment effect was usually given, and just 13 (3%) protocols gave a reasoned explanation why the value chosen was plausible for the planned study.
The same pattern was observed with population variability or survival, with less than half (213/446; 48%) of the protocols reporting the basis of the variable used in the calculation (fig 2 and supplementary table 9). Previous studies, a literature search, or both, were again most commonly cited. The variability or survival estimate was based on a meta-analysis in only two of the 213 (1%) protocols. Again, limited information was usually given, and just 17 (4%) protocols explained the plausibility of the value chosen.
Only 11 out of the 446 (3%) protocols reported analyses investigating the sensitivity of the sample size to deviations from the assumptions used in the calculation.
Reporting of strategies to control type I (false positive) and type II (false negative) error
Adjustments for multiple comparisons (81/144; 56%) or interim analyses (56/95; 59%) were reported in just over half of the research protocols with these design features (see supplementary table 10). The potential for increasing the type II error was not considered in any study with multiple comparisons. If all co-primary variables must be significant to declare success then the type II error rate can be inflated, resulting in reduction in the overall study power.1
Recalculation of the original sample size determination
If all protocols were considered using the rules for imputing missing information then 262 of out 446 (59%) sample size determinations could be reproduced, with 51 (11%) under-estimated and 103 (23%) over-estimated. Thirty (7%) of the original sample size calculations could not be recalculated (see supplementary table 11). Figure 3 shows a box plot of the relative differences between the reported and recalculated sample sizes.
Fig 3 Difference between reported and calculated sample size. *Ratio of number of evaluable patients or events reported in protocol to that calculated. †All calculations (n=416) with missing data imputed. Observations below 2.5th (0.61) or above (more ...)
A total of 134 of the 188 (71%) sample size calculations from protocols with complete reporting could be reproduced, with 20 (11%) under-estimated and 34 (18%) over-estimated, respectively. The reproducibility of the sample size increased with more comprehensive reporting, primarily withdrawal rates and adjustments for multiple testing. None the less, both analyses showed a tendency for over-estimation, and in total only 134 of the 446 (30%) original sample size calculations could be accurately reproduced.
Supplementary figure 6 shows a Bland-Altman plot comparing reported and calculated sample sizes.
Commercial versus non-commercial sponsors
The reporting of the core components of the sample size determination did not differ noticeably between studies with commercial and non-commercial sponsors (fig 4 and supplementary table 12). Studies with non-commercial sponsors were more likely than those with commercial sponsors to report the basis for design assumptions (relative risk 1.69, 95% confidence interval 1.38 to 2.08 for treatment difference and 1.29, 1.07 to 1.56 for variance and survival). Conversely, studies with non-commercial sponsors were less likely than those with commercial sponsors to report adjustments for multiple comparisons (0.26, 0.13 to 0.50) and interim analyses (0.54, 0.31 to 0.93) and provide complete reporting (0.60, 0.45 to 0.81); the sample size calculation from protocols of studies with non-commercial sponsors was also less likely to be reproduced (0.72, 0.59 to 0.88).
Fig 4 Reporting by commercial status