|Home | About | Journals | Submit | Contact Us | Français|
Inadequate sample size and power in randomized trials can result in misleading findings. This study demonstrates the effect of sample size in a large, clinical trial by evaluating the results of the SPRINT (Study to Prospectively evaluate Reamed Intramedullary Nails in Patients with Tibial fractures) trial as it progressed.
The SPRINT trial evaluated reamed versus unreamed nailing of the tibia in 1226 patients, as well as in open and closed fracture subgroups (N=400 and N=826, respectively). We analyzed the re-operation rates and relative risk comparing treatment groups at 50, 100 and then increments of 100 patients up to the final sample size. Results at various enrollments were compared to the final SPRINT findings.
In the final analysis, there was a statistically significant decreased risk of re-operation with reamed nails for closed fractures (relative risk reduction 35%). Results for the first 35 patients enrolled suggested reamed nails increased the risk of reoperation in closed fractures by 165%. Only after 543 patients with closed fractures were enrolled did the results reflect the final advantage for reamed nails in this subgroup. Similarly, the trend towards an increased risk of re-operation for open fractures (23%) was not seen until 62 patients with open fractures were enrolled.
Our findings highlight the risk of conducting a trial with insufficient sample size and power. Such studies are not only at risk of missing true effects, but also of giving misleading results.
Sample size is a key consideration in detecting differences in a study. Trials in emergency medicine, cardiovascular research, nursing, internal medicine, general practice, rehabilitation, and hand surgery, have all demonstrated the use of sample sizes too small to ensure statistical significance for what may be clinically important results4. Orthopaedic research has been no exception. Trials in orthopaedic surgery are typically single-centre initiatives that are severely limited by small sample size and thus lack adequate power to inform clinical decision making2. Lochner et al1 conducted a systematic review of 117 articles within orthopaedic trauma literature to examine the rates of beta errors in clinical trials with negative outcomes. The majority of studies (95%) did not meet the accepted standard for beta error rates (β ≤0.20, study power ≥80%) with regard to the primary outcome1.
Failure to ensure adequate sample size in orthopaedic randomized trials results in publication of findings from small, inadequately powered trials. These studies yield an unacceptably high risk of false-negative results1. Further, these findings may be inconsistent with those that would have been attained with a larger sample size and appropriate power. The situation is further complicated when a calculated a priori power is in fact insufficient to detect the actual observed effect. These matters are of particular concern within orthopaedic surgery, in which trials are typically single-centre initiatives2. Thus, the findings of small trials should be interpreted with caution. The purpose of this study is to demonstrate the effect of sample size in a large, clinical trial by using the actual data from a recently completed trial comparing reamed versus unreamed intramedullary nailing2,3.
This investigation was part of a multi-centre endeavor called the Study to Prospectively Evaluate Reamed Intramedullary Nails in Patients with Tibial Fractures (SPRINT)2,3. The standardized protocol at each clinical center was approved by the human subjects committees. The SPRINT study was a randomized controlled trial that evaluated the effect of reamed vs. unreamed nailing of the tibia on 1226 patients across 29 clinical centres in the United States, Canada, and the Netherlands2,3.
The current analysis uses the relative risk of reamed versus unreamed intramedullary nailing on the SPRINT primary outcome of re-operation. To demonstrate the effect of sample size, we analyzed the data from this trial in increments starting at the first 50 patients, 100 patients and increments of 100 until the final sample size of 1226 patients (N=400 open fracture patients and N=826 closed fracture patients). Increments of 100 were chosen for ease of reporting. For each “enrollment” we calculated the relative risks between treatment groups with 95% confidence intervals for the primary outcome of re-operation rates for all fractures, as well as for the open fracture and closed fracture subgroups. All analyses were two tailed. We also calculated power at each enrollment4. The power values were calculated using an assumed control event risk of 13% for the total group5 and 10 and 20% for the closed and open subgroups, respectively. A relative risk reduction (RRR) of 35% was used as per the parameters in the two-sided sample size calculation for the SPRINT study5.
In the final analysis of 1226 patients, reamed intramedullary nails showed a trend towards reduction in the risk of re-operation in patients with tibial shaft fractures (relative risk = 0.89, confidence interval 0.70–1.14). In contradistinction to this, a substantial trend towards increased risk (85%) for reamed nails was seen after the first 50 patients had been enrolled (rel. risk = 1.85, CI 0.64–5.35) (Figure 1). The true overall trend of a reduction in risk of re-operation with reamed nails was not seen until enrollment reached 700 patients (Figure 1). Additionally, the overall event rate was 34% higher than the final figure when only the first 50 patients enrolled were evaluated (24.0% versus 17.9%) (Figure 1). Even at the final enrollment, the CI still included unity. Figure 1 also shows the power of the study for each enrollment. As expected, the power progressively increases with sample size, slightly exceeding its nominal value of 80% for all fractures combined when the final sample size is reached. Note that the final number of 1226 patients was slightly larger than the planned sample size of 1200. As stated above, the final results show an estimated relative risk of 0.89, equivalent to an 11% RRR which is smaller than the corresponding design parameter of 35%. Therefore, the study was not powered to detect this smaller treatment effect, and hence the corresponding confidence interval includes the null value of 1 indicating no significant advantage of one treatment over the other.
In patients with closed tibial shaft fractures, reamed intramedullary nails led to a statistically significant reduction in the risk of re-operation in the final SPRINT analysis (rel. risk = 0.65, CI 0.46–0.93). Again, the results for the first 35 patients with closed fractures enrolled in the trial revealed a trend towards a substantial increased risk for reamed nails (rel. risk = 2.65, CI 0.59–11.86) (Figure 2). It was not until enrollment reached 543 patients with closed fractures that the results reflected the final findings of an advantage for reamed nails (Figure 2). It was at this point that the confidence interval no longer included unity. The event rate for closed fractures was 46% higher at 50 patients than in the final analysis (20% versus 13.7%) (Figure 2). Further, the width of the 95% confidence interval narrowed considerably (0.59–11.86 to 0.46–0.93) from the initial enrollment interval to the final sample size. The power for the subgroup of closed fractures is lower than for the whole sample, reflecting the smaller number of patients in this subgroup at any given total sample size. On completion of the study, the empirical effect size in the closed fractures was 35%, our planned RRR and was statistically significant. Note also that the result achieved statistical significance (with the upper confidence limit for the relative risk being less than 1) when 65.7% of the final closed fracture sample size had been attained; this occurred because of statistical fluctuations, with slightly higher values of the event risk and the RRR, compared to the final values of these parameters.
Finally, within the open fracture subgroup, a trend towards an increased risk of re-operation with reamed intramedullary nails was seen in the final analysis (rel. risk, 1.23, CI 0.88–1.71). This trend favoring unreamed nails, was not seen until 62 patients with open fractures were enrolled (Figure 3). The event rate was 28% higher after the first 50 patients than that seen in the final analysis of 1226 patients (33.3% versus 26.0%) (Figure 3). The subgroup of open fractures is approximately half the size of the closed fracture subgroup, but its event risk is higher. These two effects have approximately cancelling effects on power; hence the power is very similar in each of the two subgroups. Furthermore, the relative risk increase of 23% for reamed nails, equates to a 19% RRR for unreamed nails, which is again smaller in magnitude than the design value of a 35% RRR.
Having a smaller sample size, as seen by evaluating the same trial with lesser enrollments as they existed at the time, would have produced misleading results in the SPRINT trial. The substantial increased risk for reamed nailing initially seen after 35 patients were enrolled in the closed fracture subgroup demonstrates this. The final results showing a significant effect with reamed nails was only reflected after enrollment continued to 65.7% of the final sample size. The best explanation for this early inclination of benefit is the play of chance7. For example, take a comparison of a new treatment with a standard treatment in which 6 out of 10 people improved with the new treatment and 4 out of 10 people improved with the standard treatment. The rates of improvement (60% and 40%) are clearly subject to substantial sampling variability, and the difference between them is far from being statistically significant. It would therefore be wrong to conclude confidently that the new treatment was better than the standard treatment because the results might simply reflect the play of chance. Collins (2005) likens this to a target with the truth at the bull’s eye10. Large trials will cause results to cluster closer to the “truth” while the results seen in smaller trials will likely be more scattered; thus, the small trials may not reveal where the true center lies10. It is necessary to know if valid conclusions can be drawn from the sample being assessed6, thus, sample size and power need to be adequately addressed before the start of any randomized controlled trial.
The overall results follow the same trend as the closed subgroup, but without statistical significance. In this instance, it was found that the power was insufficient to detect the smaller treatment effect. Here, we highlight another challenge associated with small studies presenting large but non-significant results. Such studies are all too often disseminated as important ‘trends’ with the implication that a larger sample size would have been able to confirm the findings. In this case, the foresight of the actual results shows that the large confidence intervals and lack of significance were not simply due to an inadequate sample size.
Additionally, this highlights the need to report the confidence intervals in the study results. As sample size increases, the standard error of the result decreases and therefore the confidence interval for a given value will narrow6. Accordingly, inadequate sample size will be highlighted by wide confidence intervals6. This will increase the possibility of type II errors6, or concluding no difference between treatments when in fact, there is.
Further, upon observing an insufficient level power, some investigators might wish to conduct post hoc analyses before discounting the statistically insignificant results11. While this remains a controversial topic within the literature, we believe this would be inappropriate and thus, have chosen not to include such an analysis. Once the study has been completed, the result is either statistically significant or it is not; it is then meaningless to consider the probability of this event, which has already been observed or not12,13. At best, we believe the results of a completed study could be used to calculate the power of a future study on the same topic. The observed values of relative risk reduction and event rates might be used to update the design considerations, including power of a new study, but they should not be used to retrospectively calculate the power of the existing study.
Randomized trials often have superior influence in practice changes, as they are considered to provide the highest quality of evidence, based on their methodological strengths of randomization and concealment8. Thus, underpowered studies can, in fact, lead to conclusions that mistakenly favour inferior treatments1.
Lastly, the inability to conclusively resolve an important research question yields an ethical failure towards participating patients if the chance of drawing valid conclusions is already diminished at the beginning of a study due to inadequately planned sample size6. Underpowered trials waste valuable patient and investigator time and resources.
Of course the results of this study should be interpreted while considering the limitations. Most notably, we only examined the data presented in the SPRINT trial so we cannot comment on whether the findings are generalizable.
Our findings suggest that a smaller sample size for the SPRINT trial would have led to misleading estimates of the relative risk of re-operation between reamed and unreamed nails in the management of closed tibial shaft fractures. These data highlight the risk of conducting a trial with an inadequate sample size and power. Further, this analysis demonstrates that studies with small sample sizes not only run the risk of concluding that there is no treatment benefit when in fact a one exists, but also for incorrect results.
This Study was funded by Research Grants from the Canadian Institutes of Health Research # MCT-38140 [PI: G. Guyatt], National Institutes of Health NIAMS-072; R01 AR48529 [PI: M. Swiontkowski], Orthopaedic Research and Education Foundation [American Academy of Orthopaedic Surgeons [PI: P. Tornetta III], Orthopaedic Trauma Association [PI: M. Bhandari]. Smaller site specific grants were also obtained from Hamilton Health Sciences Research Grant [PI: M. Bhandari] and Zimmer [PI: M. Bhandari].
Dr. Bhandari is funded, in part, by a Canada Research Chair in Musculoskeletal Trauma, McMaster University.
Role of Sponsors/Funders
The funding sources had no role in design or conduct of the study; the collection, management, analysis, or interpretation of the data; or the preparation, review, or approval of the manuscript.
The following persons participated in the SPRINT Study:
Writing Committee: Mohit Bhandari (Co-Chair); Paul Tornetta III (Co-Chair); Shelly-Ann Rampersad, BSc; Sheila Sprague, MSc; Diane Heels-Ansdell, MSc; David W. Sanders; Emil H. Schemitsch; Marc Swiontkowski; and Stephen Walter. Study Trial Co-Principal Investigators: Mohit Bhandari; Gordon Guyatt; Steering Committee: Gordon Guyatt (Chair); Mohit Bhandari; David W. Sanders; Emil H. Schemitsch; Marc Swiontkowski; Paul Tornetta III; Stephen Walter; Central Adjudication Committee: Gordon Guyatt (Chair); Mohit Bhandari; David W. Sanders; Emil H. Schemitsch; Marc Swiontkowski.; Paul Tornetta III; Stephen Walter; SPRINT Methods Centre Staff: McMaster University, Hamilton, Ontario: Sheila Sprague; Diane Heels-Ansdell; Lisa Buckingham; Pamela Leece; Helena Viveiros; Tashay Mignott; Natalie Ansell; Natalie Sidorkewicz; University of Minnesota, Minneapolis, Minnesota: Julie Agel; Data Safety and Monitoring Board (DSMB): Claire Bombardier (Chair); Jesse A. Berlin; Michael Bosse; Bruce Browner; Brenda Gillespie; Peter O’Brien; Site Audit Committee: Julie Agel; Sheila Sprague; Rudolf Poolman; Mohit Bhandari.
Investigators: London Health Sciences Centre / University of Western Ontario, London, Ontario: David W. Sanders; Mark D. Macleod; Timothy Carey; Kellie Leitch; Stuart Bailey; Kevin Gurr; Ken Konito; Charlene Bartha; Isolina Low; Leila V. MacBean; Mala Ramu; Susan Reiber; Ruth Strapp; Christina Tieszer; Sunnybrook Health Sciences Centre / University of Toronto, Toronto, Ontario: Hans Kreder; David J. G. Stephen; Terry S. Axelrod; Albert J.M. Yee; Robin R. Richards; Joel Finkelstein; Richard M. Holtby; Hugh Cameron; John Cameron; Wade Gofton; John Murnaghan; Joseph Schatztker; Beverly Bulmer; Lisa Conlan; Hospital du Sacre Coeur de Montreal, Montreal, Quebec:Yves Laflamme; Gregory Berry; Pierre Beaumont; Pierre Ranger; Georges-Henri Laflamme; Alain Jodoin; Eric Renaud; Sylvain Gagnon; Gilles Maurais; Michel Malo; Julio Fernandes; Kim Latendresse; Marie-France Poirier; Gina Daigneault; St. Michael’s Hospital / University of Toronto, Toronto, Ontario: Emil H. Schemitsch; Michael M. McKee; James P. Waddell; Earl R. Bogoch; Timothy R. Daniels; Robert R. McBroom; Robin R. Richards; Milena R. Vicente; Wendy Storey; Lisa M. Wild; Royal Columbian Hospital / University of British Columbia, Vancouver, British Columbia: Robert McCormack; Bertrand Perey; Thomas J. Goetz; Graham Pate; Murray J. Penner; Kostas Panagiotopoulos; Shafique Pirani; Ian G. Dommisse; Richard L. Loomer; Trevor Stone; Karyn Moon; Mauri Zomar; Wake Forest Medical Center / Wake Forest University Health Sciences, Winston-Salem, North Carolina: Lawrence X. Webb; Robert D. Teasdall; John Peter Birkedal; David Franklin Martin; David S. Ruch; Douglas J. Kilgus; David C. Pollock; Mitchel Brion Harris; Ethan Ron Wiesler; William G. Ward; Jeffrey Scott Shilt; Andrew L. Koman; Gary G. Poehling; Brenda Kulp; Boston Medical Center / Boston University School of Medicine, Boston, Massachusetts: Paul Tornetta III; William R. Creevy; Andrew B. Stein; Christopher T. Bono; Thomas A. Einhorn; T. Desmond Brown; Donna Pacicca; John B. Sledge III; Timothy E. Foster; Ilva Voloshin; Jill Bolton; Hope Carlisle; Lisa Shaughnessy; Wake Medical Center, Raleigh, North Carolina: William T. Ombremsky; C. Michael LeCroy; Eric G. Meinberg; Terry M. Messer; William L. Craig III; Douglas R. Dirschl; Robert Caudle; Tim Harris; Kurt Elhert; William Hage; Robert Jones; Luis Piedrahita; Paul O. Schricker; Robin Driver; Jean Godwin; Gloria Hansley; Vanderbilt University Medical Center, Nashville, Tennessee:William Todd Obremskey; Philip James Kregor; Gregory Tennent; Lisa M. Truchan; Marcus Sciadini; Franklin D. Shuler; Robin E. Driver; Mary Alice Nading; Jacky Neiderstadt; Alexander R. Vap; MetroHealth Medical Center, Cleveland, Ohio:Heather A. Vallier; Brendan M. Patterson; John H. Wilber; Roger G. Wilber; John K. Sontich; Timothy Alan Moore; Drew Brady; Daniel R. Cooperman; John A. Davis; Beth Ann Cureton; Hamilton Health Sciences, Hamilton, Ontario: Scott Mandel; R. Douglas Orr; John T.S. Sadler; Tousief Hussain; Krishan Rajaratnam; Bradley Petrisor; Mohit Bhandari; Brian Drew; Drew A. Bednar; Desmond C.H. Kwok; Shirley Pettit; Jill Hancock; Natalie Sidorkewicz; Regions Hospital, Saint Paul, Minnesota: Peter A. Cole; Joel J. Smith; Gregory A. Brown; Thomas A. Lange; John G. Stark; Bruce Levy; Marc F. Swiontkowski; Julie Agel; Mary J. Garaghty; Joshua G. Salzman; Carol A. Schutte; Linda (Toddie) Tastad; Sandy Vang; University of Louisville School of Medicine, Louisville, Kentucky: David Seligson; Craig S. Roberts; Arthur L. Malkani; Laura Sanders; Sharon Allen Gregory; Carmen Dyer; Jessica Heinsen; Langan Smith; Sudhakar Madanagopal; Memorial Hermann Hospital, Houston, Texas:Kevin J. Coupe; Jeffrey J. Tucker; Allen R. Criswell; Rosemary Buckle; Alan Jeffrey Rechter; Dhiren Shaskikant Sheth; Brad Urquart; Thea Trotscher; Erie County Medical Center / University of Buffalo, Buffalo, New York: Mark J. Anders; Joseph M. Kowalski; Marc S. Fineberg; Lawrence B. Bone; Matthew J. Phillips; Bernard Rohrbacher; Philip Stegemann; William M. Mihalko; Cathy Buyea; University of Florida – Jacksonville, Jacksonville, Florida: Stephen J. Augustine; William Thomas Jackson; Gregory Solis; Sunday U. Ero; Daniel N. Segina; Hudson B. Berrey; Samuel G. Agnew; Michael Fitzpatrick; Lakina C. Campbell; Lynn Derting; June McAdams; Academic Medical Center, Amsterdam, The Netherlands: J. Carel Goslings; Kees Jan Ponsen; Jan Luitse; Peter Kloen; Pieter Joosse; Jasper Winkelhagen; Raphaël Duivenvoorden; University of Oklahoma Health Science Center, Oklahoma City, Oklahoma: David C. Teague; Joseph Davey; J. Andy Sullivan; William J. J. Ertl; Timothy A. Puckett; Charles B. Pasque; John F. Tompkins II; Curtis R. Gruel; Paul Kammerlocher; Thomas P. Lehman; William R. Puffinbarger; Kathy L. Carl; University of Alberta / University of Alberta Hospital, Edmonton, Alberta: Donald W. Weber; Nadr M. Jomha; Gordon R. Goplen; Edward Masson; Lauren A. Beaupre; Karen E. Greaves; Lori N. Schaump; Greenville Hospital System, Greenville, South Carolina: Kyle J. Jeray; David R. Goetz; Davd E. Westberry; J. Scott Broderick; Bryan S. Moon; Stephanie L. Tanner; Foothills General Hospital, Calgary, Alberta: James N. Powell; Richard E. Buckley; Leslie Elves; Saint John Regional Hospital, Saint John, New Brunswick: Stephen Connolly; Edward P. Abraham; Donna Eastwood; Trudy Steele; Oregon Health & Sciences University, Portland, Oregon: Thomas Ellis; Alex Herzberg; George A. Brown; Dennis E. Crawford; Robert Hart; James Hayden; Robert M. Orfaly; Theodore Vigland; Maharani Vivekaraj; Gina L. Bundy; San Francisco General Hospital, San Francisco, California: Theodore Miclau III; Amir Matityahu; R. Richard Coughlin; Utku Kandemir; R. Trigg McClellan; Cindy Hsin-Hua Lin; Detroit Receiving Hospital, Detroit, Michigan: David Karges; Kathryn Cramer; J. Tracy Watson; Berton Moed; Barbara Scott; Deaconess Hospital Regional Trauma Center and Orthopaedic Associates, Evansville, Indiana: Dennis J. Beck; Carolyn Orth; Thunder Bay Regional Health Science Centre, Thunder Bay, Ontario: David Puskas; Russell Clark; Jennifer Jones; Jamaica Hospital, Jamaica, New York: Kenneth A. Egol; Nader Paksima; Monet France; Ottawa Hospital – Civic Campus, Ottawa, Ontario: Eugene K. Wai; Garth Johnson; Ross Wilkinson; Adam T. Gruszczynski; Liisa Vexler.