|Home | About | Journals | Submit | Contact Us | Français|
The initial Stroke Therapy Academic Industry Roundtable (STAIR) recommendations published in 1999 were intended to improve the quality of preclinical studies of purported acute stroke therapies. Although recognized as reasonable, they have not been closely followed nor rigorously validated. Substantial advances have occurred regarding the appropriate quality and breadth of preclinical testing for candidate acute stroke therapies for better clinical translation. The updated STAIR preclinical recommendations reinforce the previous suggestions that reproducibly defining dose response and time windows with both histological and functional outcomes in multiple animal species with appropriate physiological monitoring is appropriate. The updated STAIR recommendations include: the fundamentals of good scientific inquiry should be followed by eliminating randomization and assessment bias, a priori defining inclusion/exclusion criteria, performing appropriate power and sample size calculations, and disclosing potential conflicts of interest. After initial evaluations in young, healthy male animals, further studies should be performed in females, aged animals, and animals with comorbid conditions such as hypertension, diabetes, and hypercholesterolemia. Another consideration is the use of clinically relevant biomarkers in animal studies. Although the recommendations cannot be validated until effective therapies based on them emerge from clinical trials, it is hoped that adherence to them might enhance the chances for success.
To help address barriers in the translation of animal studies to human clinical trials,1 the original Stroke Therapy Academic Industry Roundtable (STAIR) publication in 1999 provided recommendations for the preclinical development of acute ischemic stroke therapies. The initial STAIR recommendations are outlined in Table 1. It is now more than a decade since the original publication of these recommendations, proposed both as an experimental framework for the evaluation of candidate therapies and as a starting point for critical assessments of how stroke research in general is conducted. In this article, we consolidate and update the original recommendations based on experiences obtained since that publication, as well as new knowledge, especially as it relates to the outcomes of subsequently completed clinical trials.
STAIR VI met in the aftermath of several failed stroke trials in which preclinical data partially after the initial STAIR preclinical recommendations and initial clinical trial results appeared promising. Although there are undoubtedly numerous potential reasons for disappointing outcomes, a question that STAIR VI addressed is whether applying externally derived standards to stroke research would improve the likelihood of identifying effective stroke therapies. A reasonable question is whether, as a group, stroke researchers need explicit standards to help ensure that the research is robust and reproducible.
In 2006, O’Collins et al2 performed a systematic review that extracted data for 1026 neuroprotective strategies tested in 8516 experiments relevant to stroke and published in ≈3500 article between 1957 and 2003. This study used a simple checklist derived from STAIR I to provide an overview of the quality and breadth of data available for individual therapies. Testing of only 5 of the 550 drugs reported to be effective in animal models of focal ischemia fully met this interpretation of the STAIR criteria. An initial assessment of the NXY-059 preclinical assessment program suggested that it closely fulfilled the STAIR criteria, but a subsequent analysis suggested that adherence was not absolute.3 One observation in the O’Collins2 systematic review was a relationship between increasing study quality score (based on adherence to STAIR I criteria) and declining efficacy.2 It appeared that poor quality studies overestimated efficacy, a phenomenon partially attributable to bias from lack of randomization and blinding. Similar striking observations have been made in some, but not all, of a series of detailed meta-analyses of the efficacy of individual drugs. This effect was particularly pronounced for FK506.4 Systematic review and meta-analysis of the data for 13 putative neuroprotectants revealed that the presence or absence of randomization to a treatment group, blinding of drug assignment during stroke induction, and blinding of outcome assessments were among the most powerful determinants of outcome.5 For example, studies of NXY-059 reported that efficacy was significantly lower in randomized studies (20.3% vs 52.8%) and in those that reported allocation concealment between cerebral ischemia induction and outcome assessment (25.1% vs 54.0%).6 In studies of hypothermia, these effects were less marked (37% vs 47% and 39% vs 47%, respectively) but still present.7
Perhaps because of the frustrations engendered by the failure of translation of apparently efficacious animal neuroprotectants into human stroke therapies and previous STAIR recommendations, stroke researchers are performing studies of better quality than in the past. However, stroke experimentalists still report random allocation to treatment group in only 36% of studies, allocation concealment in only 11%, and blinded assessment of outcome in only 29% of stroke studies.8
Sample size calculation illustrates the influence of these issues on experimental results. The probability of detecting a difference between groups is related to the magnitude of the difference, the variability in the outcome measures, and the number of times the population is sampled, in this case the number of animals per group. In systematic reviews of the preclinical stroke literature, only 3% of studies report using a sample size calculation.8 If we examine a worst case scenario and make the assumptions that the majority of authors indeed performed but did not report power calculations, used the minimum necessary calculated sample size but did not consider failure to randomly allocate to treatment group as a potential source of falsely large estimates of effect size bias, then >60% of studies might have been under-powered to detect real differences between treatment and control groups. With lack of allocation concealment, the potential for underestimating sample size increases to nearly 90% of the studies performed. If the required sample size for detection of a particular effect size in reality is 20 but only 18 animals are used, then potentially all 18 might have been wasted. However, if 22 are used, then the extra 2 have still contributed to useful data. Although, such scenarios almost certainly do not apply to most of the papers evaluated, without appropriate reporting of sample size calculation, it is not known in which situations it does apply.
There is precedence for standards in research being well-accepted and applied. Clinical trialists adhere to the Consolidated Standards of Reporting Trials (CONSORT) statement, which led to substantial improvements in the reporting and conduct of clinical trials as a requirement for publication.9,10 On the basis of the available evidence, it would now seem prudent to suggest that preclinical testing for the purpose of determining therapeutic efficacy in animal models of stroke should adopt similar standards for conducting and reporting experiments to ensure high-quality unbiased data.8,11 However, several of the authors acknowledge that we have not complied entirely with such standards in the past. Recently, 2 journals that publish experimental therapeutic stroke studies have acted on this suggestion and will only consider articles for publication if their methodology section includes the criteria outlined in Table 2.12,13 These standards should not preclude publication of observational, pilot, or hypothesis-generating data, but the conclusions of such studies should reflect their preliminary nature. The tendency of journals to reject reports of negative results could be addressed by the establishment of central repositories of preclinical results as has been performed for clinical trial data.
The following sections address specific issues related to animal models in stroke and their influence on the most current STAIR criteria.
There is a range of ischemic stroke models that involve the insertion of sutures or clots, electrocoagulation, photothrombosis, arterial ligation, or occlusion by various methods and injection of endothelin-1. Many of these models can mimic either permanent or transient occlusion. None precisely mimics clinical stroke in that truly permanent occlusion rarely occurs in humans because of spontaneous recanalization, and the transient models are restricted to relatively short time windows, otherwise risking fatal edema and hemorrhage.14 The use of embolic models has the advantage of allowing studies with concomitant thrombolytic or fibrinolytic agents. Although, these models primarily target all or part of the middle cerebral artery territory, there are also a few models of posterior circulation ischemic injury, although these may be less reproducible than the middle cerebral artery models.15
Currently available animal stroke models have been useful in studying biological mechanisms of brain ischemia and in many proof-of-principle therapeutic studies. With few exceptions, however, such as clot-based embolic models for tissue plasminogen activator therapy, they have been unsuccessful in predicting efficacy in human stroke.16 Potential reasons for these failures related to experimental design were discussed in the introduction. There also may be important intrinsic limitations of these models. In the absence of a positive prediction for neuroprotective efficacy, it can only be hypothesized what the most important limitations are, but the different contexts in which human stroke occurs compared to the animal models may play a role in these failures. The most salient of these is that all of the animal models involved experimentally induced ischemia in otherwise healthy animals, whereas in humans stroke is usually the result of the natural progression of underlying diseases or risk factors.
Human stroke occurs in the context of aging, hypertension, diabetes, heart disease such as atrial fibrillation, and the use of concomitant medications.17 In addition, gender differences may influence both stroke mechanisms and responses to therapy.18 Each of these factors likely influences the effects of a therapeutic agent. For example, aging, through effects on pharmacokinetics and pharmacodynamics, alters the efficacy and side effect profiles of many medications.19 Aging interacts with the processes involved in spontaneous recovery and is a risk for hemorrhagic transformation after thrombolytic therapy.20,21 Although the mortality of the experimental surgery is higher in older animals, studies have shown that standard models can be performed in middle-aged rats, and species such as Fischer 344 are available for studies in very old animals.22 Hypertension, the most prevalent risk factor for stroke, alters vascular responses to ischemia that may extend beyond the vasculature and lead to compensatory responses in the neurovascular unit.23,24 These effects have not been well-studied in animal models, at least in part because animal models of hypertension are heterogeneous.25 Diabetes and hyperglycemia occur in one-third of acute stroke patients and are associated with worse outcome in large vessel and cardioembolic stroke but better outcomes in lacunar stroke.26 They reduce the likelihood of good recovery after recanalization therapy through complex mechanisms that partly depend on the duration and severity of ischemia and whether parenchymal reperfusion actually occurs.27 No single animal model mimics this circumstance completely. Concomitant medications both positively and negatively influence stroke outcomes.28 Drug interactions with a therapeutic agent have rarely been considered in the interpretation of negative clinical trials. Although it is challenging to mimic these clinical conditions, we contend that drug testing should not occur in a vacuum. A new proposed therapy should, at the very least, need to consider known issues related to these types of factors and ideally propose to investigate them in preclinical studies.
Most preclinical testing initially involves rodents. Higher-order species such as cats and primates are also available to test specific hypotheses or mechanisms. Although unproven, there may be some advantages to testing in nonhuman, gyrencephalic primates. In contrast to rodents, the descending anatomic pathways such as the corticospinal tracts in these animals have innervation patterns similar to humans.29 The primate and pig models may also be useful for testing the effects of drugs on white matter injury. Higher-order species can also be more readily used than small rodents to test endovascular recanalization approaches such as mechanical embolectomy, angioplasty, and stenting. As for drugs, we believe that the safety and clinical efficacy of devices should be determined in animal models before advancing to clinical studies. Previously, the endpoints for such models have been primarily related to their efficacy at recanalization. We contend that because these devices are used in patients to improve clinical outcomes, the outcome assessment should be no different from those used with pharmacological treatments in which infarct reduction coupled with improved functional outcomes are the goals and safety issues such as hemorrhage or edema are also considered. Note, however, that even with the need and advantages of higher-order species, it must be acknowledged that the cost and limited availability of primates may not allow for definitive efficacy studies with sufficient power, which would likely become even more prohibitive if the same comorbidity conditions recommended for rodent studies were reproduced in primates and the predictive value of primate models for success in clinical trials remains unproven. In addition, although the SAINT program relied heavily on positive results from a long-term functional and histological outcome study in primates that tested delayed treatment with NXY-059 and was understood to have been conducted rigorously with regard to randomization and blinding, retrospective scrutiny suggests that even such unprecedented encouragement may be misleading for reasons discussed elsewhere.3,30,31
One reason for the failure of clinical trials to confirm positive results in animal studies relates to the lack of direct linkage between the model and the clinical situation. These linkages ideally should include the disease state being modeled, the biological activity of the agent, and the outcomes being measured. The efficacy of neuroprotective agents is typically screened in a restricted set of models ranging from in vitro activity to promote neuronal survival and animal models involving permanent or transient ischemia. In considering the feasibility of linking animal models to human stroke, differences in brain structure must be considered. The human brain has a higher proportion of white matter relative to the rodent brain. It is unlikely that a treatment that targets only neurons and that does not also salvage white mater tracts would have widespread clinical relevance. The emerging concept of the neurovascular unit emphasizes that all the multiple cell types in the brain must be considered. It is likely that not only neurons but also glial and vascular elements in the brain need to be rescued. Furthermore, we must not only prevent cell death per se but also preserve cell function, especially the cell–cell signaling that subserves the integrity of the neurovascular unit.32 Finally, from a molecular and cellular perspective, accumulating data suggest that many of the neuroprotective targets tested in preclinical models may have a negative effect on the recovery process.33 Thus, any acute therapy must be carefully targeted to block the desired target during its deleterious phase without interfering with endogenous substrates of recovery later on. Without understanding how and when these injury-into-repair transitions occur, it may not be possible to effectively translate acute experimental interventions into the appropriate timing, dose, and duration in clinical trials.
Endpoints are imperfect both in animal models and in humans. Spontaneous recovery is surprisingly common for many functional outcome measures, because most of the testing is performed in young healthy animals, a situation in which spontaneous recovery in humans would likely occur as well.34 Ideally, functional endpoints should be chosen that are relevant to the target human population. For neuroprotectant therapies, it may be important to demonstrate that infarct size is reduced. Imaging studies, including diffusion/perfusion MRI scanning and growth of the restricted diffusion lesion, although not fully validated are fairly straightforward in animals because of the control that is possible over the induction of the ischemic lesion.35 There is no established optimal definition or measurement technique for assessing the mismatch in humans, although much work is underway in this regard that can be used by nonexpert sites.36 Moreover, recovery of early diffusion abnormalities occurs, indicating it may not be a precise marker of infarcted tissue.37 The hope that imaging endpoints would reduce variance allowing for smaller sample sizes may be only partly fulfilled and practical issues of delays in therapy and availability of imaging remain challenges.
Biological activity of the therapy remains a challenge in translating preclinical data to the human and include pharmacokinetics or pharmacodynamics influenced by some of the issues discussed, or could involve different fundamental biological mechanisms in animals compared to humans. Ideally, a biological marker would exist for a specific therapy that could be measured directly or indirectly in humans. A marker could be as simple, as whether parenchymal reperfusion actually occurs after recanalization,38 or more complex, such as whether a biochemical target is altered in an individual patient. For example, if the proposed agent is intended to scavenge free radicals, it would be helpful to demonstrate that direct or indirect biomarkers of oxidative stress are indeed reduced.
The ultimate goal is to show that in concert with these target and tissue endpoints, functional neurological outcomes are also improved in treated populations. Perhaps, future clinical trials may need to first demonstrate efficacy at the target and tissue levels before attempting to affect the more complex functional outcome. Like many of the factors discussed, we expect heterogeneity in these biological events between animals and humans, and even between different types and durations of ischemic mechanisms. Obtaining such information, particularly in the first in human studies, should provide guidance in identifying the most promising target population for subsequent study.
The costs of modern drug discovery and development in most therapeutic disciplines have become almost prohibitively expensive for any pharmaceutical company, because investments in a single drug approximate $0.5 to $1.0 billion in Research and Development activities. The bounty of new molecular targets derived from the recently unraveled human genome poses unprecedented challenges to define and validate the biology, pathology, and clinical usefulness of these targets for safe and effective drug development. These new realities, increased by regulatory requirements for novelty and differentiation of new therapies over those currently available, creates unprecedented hurdles in developing new drugs, especially for complex medical conditions such as stroke.39–41
Resources beyond those of pharmaceutical and device companies are required to overcome these mounting barriers. The effort to bring breakthrough therapies to the market needs the participation of all potential stakeholders, including academia, governmental regulatory agencies, and private companies. The challenges to transform novel scientific discoveries with advanced, breakthrough technologies and therapies call for broader participation in discovery and development maximizing the utilization of available intellectual, technological, and clinical expertise toward more effective and successful translational medicine. Such resource utilization can only be achieved by a spectrum of scientific, medical, and financial organizations working together yet respecting each other’s interests and governances. New models of cooperation and collaboration have emerged in the form of precompetitive consortia.42 The precompetitive consortia are designed to leverage resources from multiple entities toward breakthroughs in research not likely to be produced by any single stakeholder. Information, technology, processes, samples, databases, and analytical methods are all shared among the members, allowing each to pursue independent competitive commercial interests based on the proprietary position held by each member. The ADNI (Alzheimer Disease Neuroimaging consortium) is a relevant example. ADNI is a government, academic, and pharmaceutical industry precompetitive consortia aimed at facilitating translational medicine in Alzheimer Disease. Scientific, technological, and clinical resources are shared among funding members, with each retaining the intellectual property of its drug discovery and development programs. Another innovative consortium that encompasses top academic universities, government and pharmaceutical company is Wyeth–TMRC–Scotland TMRC (Translational Medicine Research Collaboration), a consortium between “big Pharma,” The Scottish government, and 4 leading Scottish academic institutions. It is jointly funded by governmental and industry sources and is a true collaboration in respect to study design, execution, data sharing, and publication. Areas of interest include new models of stroke, “penumbra imaging,” and technology development. It is hoped that this collaboration of intellectual and technological expertise and others like it will be better able to address the difficult research paradigms in stroke discussed that otherwise could not be effectively executed independently. These examples of cooperative research efforts have imperfections and predominantly focused on clinical research. They do provide hope that disparate groups can work together to provide innovation for both preclinical and clinical research endeavors and hopefully develop new therapies that will benefit all the stakeholders as well as affected patients.
The initial STAIR recommendations were used by some as a benchmark to assess the quality and sufficiency of preclinical experiments of drugs before clinical trial evaluation.2,43 The recommendations likely influenced acute ischemic stroke drug development. For example, there are fewer pretreatment studies in the acute stroke animal literature compared to the animal studies completed before 1999. Retrospective reviews, however, find that most preclinical studies of neuroprotective agents that progressed to clinical trials did not fully meet the previous recommendations.11,43 This suggests that the previous recommendations are not uniformly accepted as the most appropriate way to test novel therapeutic candidates. The previous STAIR preclinical recommendations are updated, followed by suggested additions.
The minimum effective and maximum tolerated dose should be defined. As stated in STAIR I, there should be a target concentration, a tissue level of effect identified from animal histology, with behavioral studies giving some indication that when the drug is administered to humans there is a reasonable prospect of achieving clinical benefit. It should also be documented that the drug in these ranges accesses the target organ.
There is debate about the relevance of a therapeutic time window in animals to acute clinical stroke. Some studies suggest that the time window for thrombolysis to salvage ischemic brain tissue may be similar in animals such as rodents and rabbits and humans, although this is model-dependent. Accordingly, rodent studies appear to be relevant to address a therapeutic window for thrombolytic and neuroprotective drugs. It should also be noted that penumbral imaging using perfusion/diffusion MRI mismatch may be useful to guide the identification of the therapeutic window in a particular model.
Multiple endpoints are important and both histological and behavioral outcomes should be assessed. Histological and behavioral studies need to include studies conducted at least 2 to 3 weeks or longer after stroke onset to demonstrate a sustained benefit with emphasis on behavioral outcomes in delayed survival studies.
Focal ischemic stroke in animals is typically induced by occlusion of the middle cerebral artery. However, the models of middle cerebral artery occlusion including the suture and embolic methods are imperfect in causing a sustained reduction in blood flow. It is possible in some situations that occlusion may occur but spontaneous reperfusion may ensue, leading to infarct size variability. Basic physiological parameters such as blood pressure, temperature, blood gases, and blood glucose should be routinely monitored. Temperature should be maintained within the normal physiological range. It is important to monitor cerebral blood flow using Doppler flow or perfusion imaging to document adequate sustained occlusion and to monitor reperfusion in temporary ischemia models.
It is suggested that treatment efficacy should be established in at least 2 species using both histological and behavioral outcome measurements. Rodents or rabbits are acceptable for initial testing and gyrencephalic primates or cats are desirable as a second species, but the cost, availability, and ethical acceptability may be problematic.
The positive results obtained in 1 laboratory need to be replicated in at least 1 independent laboratory before advancing to clinical studies. Based on subsequent accumulated experience, several additional areas are now proposed.
Although we believe the initial recommendations were useful in improving many features of preclinical testing, they have not yet been shown to predict whether any drug will improve outcome in pivotal efficacy phase III trials. It will not be possible to validate any guidelines until there is definitive, reproducible proof of efficacy in clinical studies. Meanwhile, these updated and amended STAIR preclinical recommendations may provide a basis for further thinking, careful discussions, and interlaboratory collaborations regarding how to best enhance the usefulness of preclinical testing of purported acute stroke therapies. However, it must be recognized that fulfilling them does not guarantee success in clinical development. Nonetheless, rigorous and complete preclinical testing should provide reassurance that there is potentially a greater chance for success in clinical trials, assuming that the clinical development program is also conducted according to currently accepted standards.
The authors sincerely thank Gary Houser for his help in organizing the STAIR VI meeting and in the preparation of this manuscript.
Contributors to STAIR VI Manuscript: Harold Adams; Harris A. Ahmad; Greg Albers; Harvey J. Altman; Jaroslaw Aronowski; Richard P. Atkinson; Neil C. Barman; Johannes Boltze; Natan M Bornstein; Joseph Broderick; Anthony O. Caggiano; Juan C. Chavez; CPLH Chen; Steve Cramer; Mads K. Dalsgaard; Exuperio Díez-Tejedor; Billy Dunn; Lori A. Enney; Robert W. Fasciano; Seth P. Finklestein; Byron D. Ford; Blanca Fuentes; Maurice Gleeson; Larry B. Goldstein; Matthew J. Gounis; Byoung Joo Gwag; Vladimir Hachinski; Daniel F. Hanley; Nils Henninger; David C. Hess; George Howard; David Howells; Patricia D. Hurn; Jennifer F. Iaci; Tom Jacobs; Karen Johnson; Thomas A. Kent; Pooja Khatri; Chelsea S. Kidwell; Brett Kissela; Walter J. Koroshetz; Tien-Li Lee; Ken R. Lees; David E. Levy; David S. Liebeskind; J.L. Lorenzo; Patrick D. Lyden; John Kylan Lynch; Malcolm R. Macleod; Arshad Majid; Rafael Rodriguez-Mercado; Brian W. Mcilroy; Colin G. Miller; Majaz Moonis; Herbert Moessler; Satoru Murayama; Karoly Nikolich; Menelas N. Pangalos; Philip Perera; Peter Rumm; Ralph L. Sacco; Jeffrey L. Saver; Wolf-R. Schäbitz; John F. Schenck; Armin Schneider; Dietmar Schneider; Judith A. Spilker; Aneesh B. Singhal; Wade Smith; Yoram Solberg; Jackson Streeter; Lars Torup; Daniel-Christoph Wagner; Ajay Wakhloo; Gail Walkinshaw; Marc Walton; Max Wintermark; Margaret M. Zaleska; Justin A. Zivin.