Nearly 10 years after the first Stroke Therapy Academic Industry Roundtable (STAIR) participants established guidelines intended to support the translation of neuroprotective efficacy from bench to bedside (
Altman et al, 2001), there is still no clinically effective neuroprotective drug for stroke. One interpretation of this observation is that measures outlined in STAIR I have failed to deliver the promised improvements in drug development. However, a dispassionate analysis of data presented over the last 10 years suggests that the ‘STAIR hypothesis’—that improvements in animal experimental design will lead to improvements in translational efficiency—has yet to be adequately tested. Adhering to standards of conducting and reporting of experiments to reduce the confounding effects of bias and ensure adequate statistical power, as outlined below, will increase the confidence with which we can assess new data and maximize our chances of developing effective therapies.
The original STAIR proposal was that by paying due attention to experimental bias, to the breadth of physiologic variables known to influence stroke outcome in patients, and by testing therapies in a range of model systems which might more faithfully reproduce the key facets of stroke pathophysiology, we would be able to translate what appeared to be clear evidence of neuroprotective efficacy in animals to the more heterogeneous circumstances of human stroke. Although we believe strongly that failure to adequately consider variables such as age, comorbidity, physiologic status, and timing of drug administration contribute to the disparity between the results of animal models and clinical trials, they have been reviewed elsewhere (
Altman et al, 2001;
Bath et al, 1998) and are not the subject of this article.
Analyses of data supporting the efficacy of various neuroprotective strategies (
Begg et al, 1996;
Crossley et al, 2008;
Dirnagl, 2006) have revealed that although many researchers adhere closely to the ethos of these guidelines, as a community we do not. A simple checklist derived from the STAIR guidelines to provide an overview of the range of data available for 1,026 candidate therapies (
Crossley et al, 2008) revealed that only a few came close to meeting the STAIR guidelines. A higher score against this checklist was accompanied by a marked reduction in effect size. This later trend could be seen clearly even within the data for individual drugs (
Grotta, 1995). Moreover, studies which reported measures to avoid bias such as random allocation to treatment group, masked induction of ischemia, or the masked assessment of outcome (
Macleod et al, 2005,
2008), gave a markedly lower estimate of efficacy. Despite this there has been some evidence of improvement in study quality, and the performance of animal stroke studies is substantially better than that for most other models of neurologic disease (
Dirnagl, 2006). And yet, the majority of investigators still do not report whether they took measures to avoid bias.
Systematic reviews and meta-analyses of data from animal stroke studies suggest that these studies may be substantially distorted by experimental bias. Taken together, publications supporting the efficacy of NXY-059 include randomized data with allocation concealment and masked outcome assessment, but most individual publications do not report these measures. Analyses of those data suggest that at least half of the reported 44% improvement in outcome could be attributed to experimental bias, specifically a failure to randomize the allocation to experimental group, a failure to conceal treatment group allocation from the surgeon or a failure to blind the assessment of outcome (Macleod
et al, in press). Similar observations have been made of the hypothermia literature, where nonrandomized studies and studies without masked outcome assessment appear to give a relative overstatement of efficacy of 27% and 19%, respectively (
Macleod et al, 2005). Despite the widely recognized importance of these aspects of study design, analyses conducted by the collaborative approach to meta analysis and review of animal data from experimental stroke (CAMARADES) group suggest that only 36% of studies reported random allocation to treatment group, only 11% report allocation concealment, and only 29% reported the masked assessment of outcome (
Dirnagl, 2006).
A related issue is the number of animals used in experiments. The probability of detecting a difference of a given size between groups is related to the number of animals in each group, the size of the difference and the variability in the outcome measure used. However, only 3% of studies identified in systematic reviews reported using a sample size calculation (
Dirnagl, 2006). Importantly, if sample size calculations are based on falsely large estimates of effect size, studies will not be powered to detect real differences between treatment and control groups. Indeed,
post hoc analysis suggests that most experimental stroke studies have only a one in three chance of detecting a 20% difference in outcome.
These problems are not unique to the preclinical study of stroke. Clinical stroke trials have had problems with inadequate sample size (
O’Collins et al, 2006) and have also failed to report whether they took measures to avoid bias (
Plint et al, 2006). Indeed Cochrane’s observation that ‘when humans have to make observations there is always the possibility of bias’ (
Sena et al, 2007) was a lynchpin of the CONSORT (
Consolidated Standards of Reporting Trials) initiative to improve the reporting, design, conduct, analysis, and interpretation of randomized controlled trials to inform decision making in health care (
Sena et al, 2007;
STAIR, 1999). This initiative led to substantial improvements in the reporting and conduct of clinical trials (
van der Worp et al, 2007).
On the basis of the available evidence it would now seem reasonable to suggest that preclinical testing in animal models of stroke, and indeed other models of disease, should adopt similar standards to ensure that decision making is based on high quality unbiased data (
Dirnagl, 2006;
Weaver et al, 2004). Adoption of such standards would have the added benefit of reducing wasteful usage of financial and animal resources.
In general, studies should only be considered for publication if their ‘Methods’ section includes a description of how they have addressed the standards below, or if authors make a cogent argument for why these standards are not relevant to their work. For these components of a paper, citation of methods described in previous publications is not considered sufficient. These requirements should not preclude publication of important observational, pilot or hypothesis-generating data, but the conclusions of such studies should reflect their preliminary nature.