|Home | About | Journals | Submit | Contact Us | Français|
Progress in regenerative medicine seems likely to produce new treatments for neurologic conditions that use human cells as therapeutic agents; at least one trial for such an intervention is already under way. The development of cell-based interventions for neurologic conditions (CBI-NCs) will likely include preclinical studies using animals as models for humans with conditions of interest. This paper explores predictive validity challenges and the proper role for animal models in developing CBI-NCs. In spite of limitations, animal models are and will remain an essential tool for gathering data in advance of first-in-human clinical trials. The goal of this paper is to provide a realistic lens for viewing the role of animal models in the context of CBI-NCs and to provide recommendations for moving forward through this challenging terrain.
Progress in regenerative medicine seems likely to produce new treatments for neurologic conditions using human cells as therapeutic agents; a trial for one such intervention is currently under way (Taupin, 2006; see also http://www.clinicaltrials.gov). Because with most new clinical interventions, the development of cell-based interventions (CBIs) for neurologic conditions (CBI-NCs) will likely include evidence from preclinical studies using animals as models for humans with conditions of interest. The recent US Food and Drug Administration (FDA) decision to place on clinical hold the first application for a trial of a CBI-NC using cells derived from embryonic stem cells suggests that there is substantial uncertainty about the ability of these animal models to accurately predict safety and efficacy of CBI-NCs in human trials (http://www.nature.com/news/2008/080519/full/news.2008.842.html). Although the specific reasons for the hold are not yet known, the decision comes despite 4 years of communications between the FDA and Geron, and an application of some 21,000 pages (Fox, 2008). Responding to this sort of uncertainty, this paper will explore predictive validity challenges and the proper role for animal models in developing CBI-NCs. In spite of limitations, animal models are and will remain an essential tool for gathering data in advance of first-in-human clinical trials. The goal of this paper is to provide a realistic lens for viewing the role of animal models.
CBIs are regulated by the FDA Center for Biologics Evaluation and Research. For the purposes of FDA review and approval, most CBIs will be treated similarly to drugs (http://www.fda.gov/CbER/rules/gtp.htm). The FDA requires the completion of an investigational new drug (IND) application before conducting clinical trials of novel drugs or biologics (http://www.fda.gov/cder/regulatory/applications/ind_page_1.htm). Approvals of IND applications and initiation of human clinical trials depend upon the submission of pharmacologic and toxicologic data from preclinical studies to establish reasonable evidence of safety and efficacy (http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/CFRSearch.cfm?CFRPart=312). Preclinical studies rely heavily upon animal models of human disease. To provide a sound basis for making determinations about reasonable safety and efficacy, these animal models must provide accurate information about how a medical intervention will perform in human clinical trials. The ability of animal model studies to accurately predict clinical trial outcomes can be termed predictive validity. Animal models for human disease never have perfect predictive validity. Poor predictive validity can result in outcomes for human clinical trials that differ significantly from the results of preclinical data.
Numerous CBIs are currently in preclinical and early clinical stages of development. Some examples include efforts to regenerate heart muscle (Behfar and Terzic, 2007; Kocher et al, 2007; Boyle et al, 2006), repair vision loss (Limb et al, 2006; Adler, 2008; Vemuganti et al, 2007), and methods for cell-based insulin replacement strategies to treat diabetes (D’Amour et al, 2006; Kroon et al, 2008).
CBI-NCs are being considered for many neurologic diseases that currently have few or no available medical interventions (Lindvall and Kokaia, 2006; Imitola, 2007; Joannides and Chandran, 2008). These are of particular interest here, as neurologic diseases affect uniquely human traits that may be particularly difficult to replicate using animal models.
There are several different strategies for employing CBI-NCs. Transplanted stem cells or stem cell derivatives may offer trophic support to host cells, improving endogenous cell function; they may secrete chemicals such as enzymes or neurotransmitters, compensating for deficiencies in the patient; and they may migrate within the nervous system and replace damaged endogenous cells (Nayak et al, 2006).
At the time of writing this paper, a phase II randomized trial to evaluate neurotransplantation of cultured neuronal cells for patients with subcortical motor stroke has been completed (Kondziolka et al, 2005), and one clinical trial of a CBI-NC, for Batten disease, has been approved by the FDA and is currently being conducted (Taupin, 2006; see also http://www.clinicaltrials.gov). Numerous CBI-NCs are being developed and additional clinical trials of several CBI-NCs appear imminent, as shown in Table 1.
CBI-NCs are also currently being administered to patients without internationally accepted peer-reviewed scientific evidence. It appears to be the case that the majority of these interventions are delivered in a manner that does not generate useful scientific evidence, and are, thus, of little or no value in establishing evidence of safety and efficacy in human trials of CBI-NCs (Dobkin et al, 2006; Enserink, 2006).
With respect to preclinical evidence in support of clinical trials, should CBI-NCs be treated in the same way as traditional drugs? Or, is there something about the challenges presented by CBI-NCs that requires rethinking typical approaches? To address these questions, this paper will first review general problems with the predictive validity of animal models, then explore the special problems presented by using animal models for research on neurologic conditions, and those presented by using animal models to evaluate CBIs. We will describe the range of challenges that animal models will face in this new context and explore the implications that this full set of challenges will have for the predictive validity of animal models and their use to provide evidence of safety and efficacy in support of clinical trials of CBI-NCs. Ultimately, we argue that although the individual challenges to the predictive validity of animal models are not completely novel, when combined, these many challenges present a unique obstacle, requiring careful consideration, for predicting the safety and efficacy of human trials for CBI-NCs, and we provide recommendations for moving forward.
The predictive validity of animal models is compromised by interspecific differences with humans, subject sample homogeneity and imperfect outcome measures. The most obvious problems stem from differences between species. Interspecific anatomic differences affect, among other things, morphology (e.g., heart size, motor neuron length) and composition (e.g., presence of certain structures, the amount of white matter and glial cells present, interspecific biochemical differences). As a result, animal models never fully recapitulate the human disorder (Scheffler et al, 2006). Disease symptoms, courses, outcomes, and the effects of various interventions vary radically across species. Thus, investigators may need to choose between animal models that match the cellular pathologic study of human disease or animal models that reproduce the course and symptomatology of human disease, or just certain important aspects of the target disease.
For example, all animal models of tuberculosis have limitations (Gupta and Katoch, 2005), as the causative mycobacterium can affect animal species differently. For some species, the tools to study the progression of the disease are not available; for example, researchers lack the immunologic reagents required to evaluate immune response in guinea pig and rabbit. Studies using nonhuman primates are not widely conducted due to cost. In spite of these limitations, researchers have been able to make significant progress in understanding the pathogenesis and treatment of tuberculosis with the careful use of available animal models.
Another example is the congenital deficiency of hypoxanthine-guanine phosphoribosyl transferase (HPRT) that causes Lesch–Nyhan disease in humans. One of the striking features associated with this deficiency is severe and recurrent self-injurious behavior. The HPRT-deficient knockout mouse model, however, does not display this unusual behavior (Kasim and Jinnah, 2002).
Interspecific variations in lifespan lead to time-related problems for animal models. Animals used to model human disease typically have much shorter lifespans than humans. The shorter lifespan may prevent an animal model from following the human disease trajectory. Disease characteristics of particular interest must occur within the shorter lifespan of the animal being tested. In addition, shorter lifespans may require the use of methods to accelerate the appearance of active symptoms, so that therapeutic interventions can be tested; such methods may come at the cost of predictive validity.
Animal studies typically use young, healthy populations of animals that are homogeneous for sex and age. Such optimal demographics may decrease predictive validity, as these study populations are often a poor match for the heterogeneity of human patients for whom the interventions are being developed. In some cases, particularly in studies involving genetically distinct strains of mice, the extreme genetic homogeneity may even lead to intraspecific failures of predictive validity (i.e., the results are confounded by a unique genetic variant within a particular mouse strain and cannot be generalized beyond that strain).
Sex-related differences also become apparent in the context of the homogeneous populations typically used for animal studies. Studies have showed that cultured cells of the central nervous system, such as neurons or astrocytes, exposed to different levels of oxygen–glucose deprivation to simulate a stroke, exhibit a significant sex difference in response, and this occurs even in the absence of sex steroids (Liu et al, 2006). These data have implications for findings from any study—animal or human—that does not allow for comparison across sexes.
In addition to choosing appropriate study populations, outcome measure selection poses challenges for translating research using animal models of human diseases. For example, the use of animals allows researchers to rely heavily upon histologic findings. Animal subjects can be killed and cellular level data can be readily gathered. Human trials, even if tissue samples are available, may rely more upon functional rather than histologic study outcomes because they are more directly relevant to investigators and patients. Interspecific differences, including behavioral differences, pose difficulties for developing valid functional outcome measures for use with animal models. Furthermore, comparing studies that rely upon different outcome measures may be problematic, as functional outcomes may not be directly correlated to histologic findings.
Even seemingly simple animal models, such as those for tuberculosis or for Lesch–Nyhan disease, with a single enzyme deficiency, can present challenges, as noted above. Many of the diseases that are garnering the most attention as potential targets for CBI-NCs are complex neurodegenerative conditions. Although animal models of human neurologic conditions can recreate key features of disease, there are numerous examples of limitations. Multiple sclerosis (MS) in humans is a complex disease that is incompletely understood. Animal studies of MS rely upon the induction of experimental allergic (autoimmune) encephalomyelitis (EAE) in mice and rats. Although EAE shares many clinical and neuropathologic features with MS, there are important differences. EAE is an induced condition, and cannot provide information about the spontaneous onset of MS. EAE also does not capture the heterogeneity or complexity of MS. In addition, experimental interventions have been more successful in treating EAE than MS. Compounding this further, the pathogenesis of EAE itself is poorly understood (Steinman and Zamvil, 2006; Friese et al, 2006).
Animal models for Parkinson’s disease (PD) primarily rely on toxins to recreate the loss of dopaminergic neurons in the nigrostriatal pathway. Although this successfully mimics important symptoms of PD, and allows testing of interventions targeting these symptoms, it does not capture the full pathologic study and more importantly, it does not model the chronic progression of PD (Soderstromq et al, 2006; Fleming et al, 2005; Bove et al, 2005).
Many murine models of Huntington’s disease (HD) use genetic knock-in methods to recreate important features of the human disease. The mouse huntingtin gene differs somewhat from the human version and the promoters differ significantly (Li et al, 2005; Menalled and Chesselet, 2002; Rubinsztein, 2002). In addition, the knock-in models do not share the marked cell loss common in the human disease (Levine et al, 2004). Similar short-comings can be found with models for a variety of additional neurologic conditions such as Alzheimer’s disease (AD; Spires and Hyman, 2005), traumatic brain injury (Cernak, 2005), and adrenoleukodystrophy (Powers et al, 2005; Lu et al, 1997; Berger et al, 2001).
Additional challenges beset the interpretation of data from animal models of neurologic conditions. Although these include the same sorts of interspecific-phylogenetic/morphologic differences mentioned previously, such challenges may be particularly pronounced in the nervous system. For example, research with paralyzed rats testing an intervention that used transplanted, embryonic stem cell-derived axons along with key factors and inhibitors resulted in partial recovery from paralysis (Deshpande et al, 2006). Although this exciting finding suggests cause for optimism about the development of a human intervention, enthusiasm should be tempered by the fact that rodent axons are measured in inches, whereas the analogous humans axons are measured in feet. It remains to be seen whether this simple morphologic difference will present an insurmountable challenge for translation, but it shows how phylogenetic differences should inspire caution when using animal models of neurologic conditions to extrapolate to human conditions.
The challenges related to the shorter lifespan of the animals typically used to model human conditions become somewhat amplified for neurologic conditions, because so many are slow degenerative processes, for example, PD, AD, HD, or Kennedy’s disease (Merry, 2005), which can remain undiagnosed in humans until later life. Furthermore, it is often expensive and medically challenging to keep alive animals used to model neurologic conditions, after intervention, to allow for collection of long-term follow-up data. This limitation may have been involved in the failure of preclinical trials of fetal tissue implants for the treatment of PD to generate the dyskinesia-related adverse events that were found in subsequent clinical testing (Olanow et al, 2003; Lindvall and Bjorklund, 2004).
Devising appropriate outcome measures to study uniquely human traits and behaviors presents familiar but amplified challenges to the predictive validity of animal models for neurologic conditions. In many cases, these are the very traits that are of central concern. Traits and behaviors such as personal identity, cognition, executive functioning, speech, depression—even headaches—defy animal modeling (Davidson et al, 2002; Pryce et al, 2005). That said, many seemingly complex human behaviors are being studied using animal models; for example, social interactions and nesting behaviors in mice have been used to create models for autism (Kwon et al, 2006; Crawley, 2004; Moy et al, 2006). Mouse models of schizophrenia rely upon the observation of defects in working memory tasks and behavioral flexibility that are consistent with prefrontal cortical function (Kellendonk et al, 2006; vandenBuuse et al, 2005). Similar efforts, to overlap target symptoms with reproducible animal behaviors in animal models, have been made for other psychiatric disorders. (Lijam et al, 1997; Korff and Harvey, 2006).
One particularly noteworthy example of repeated predictive validity failures of animal models for neurologic conditions can be found in neuroprotective pharmacologic interventions for acute ischemic stroke. In spite of dozens of agents showing neuroprotective efficacy in animal studies, thus far no drugs have proven to be effective in clinical trials (Green et al, 2003; Gilman, 2006). This stark lack of success has raised concerns about the models’ predictive validity and their use in this area. In 1999, the Stroke Therapy Academic Industry Roundtable (STAIR) published recommendations and guidelines for improving the predictive validity of preclinical animal studies (STAIR, 1999), and in 2001 they published recommendations for clinical trial evaluation of acute stroke therapies (STAIR-II, 2001). However, the SAINT II trial (Feuerstein et al, 2008), which adhered to many of the STAIR guidelines, still failed to show the expected efficacy in humans. Because of this failure and in the context of the promise of CBIs, the new Stem cell Therapeutics as an Emerging Paradigm in Stroke Consortium was formed, and recently released their criteria for designing laboratory studies on cell therapy for stroke (Borlongan et al, 2008).
It is important to note that some attribute the failures not to flaws in the animal models, but to inappropriate generalizations and breaches of the scientific method by the investigators conducting the trials (Carmichael, 2005; DeKeyser et al, 1999; Traystman, 2006), many of which have been detailed in this paper. For example, the human subjects’ strokes involved significantly smaller percentages of total brain volume than those induced in the animal models, which were more consistent with fatal human strokes. There were discrepancies in the outcome measures between the preclinical animal trials and the clinical trials. Strokes in humans involve a substantial degree of reperfusion. Although some animal models do allow for reperfusion (Sola et al, 2008), others do not (Traystman, 2003). In animal models, strokes are induced in otherwise healthy animals, whereas human strokes typically occur in the context of (or as a result of) significant medical comorbidities (Lippoldt et al, 2005). The clinical trial study population was heterogeneous about many important variables such as age, stroke location, and stroke severity, whereas the animals used in preclinical studies are fairly homogeneous in these respects. Finally, human subjects were recruited as they appeared for treatment after stroke, thus the time between the ischemic event and intervention varied significantly among the human subjects (Cheng et al, 2004). This is in comparison to animal studies of stroke that use a standardized methodology in which the intervention is delivered with precise timing.
In spite of these sorts of methodological problems, many scientists maintain confidence in the predictive validity of animal models for stroke for properly designed human trials, and continue to work with and publish results from promising preclinical trials (Borlongan et al, 2008; Lindvall and Kokaia, 2006). New studies should be evaluated in the context of the methodological critiques of earlier work. In addition, further research should be conducted to better elucidate the underlying reasons for these predictive validity failures.
CBIs are not completely novel or untested. Bone marrow transplants have been conducted for decades (Buckley, 2004). Nonetheless, several notable challenges arise when using animal models during the development of CBIs. The most fundamental of these are raised by the special nature of crossing species boundaries in this context. To conduct a CBI preclinical trial, the intervention must be derived either from human cells or from cells of the animal model’s species. If animal cells are transplanted, an expert interdisciplinary working group on safety issues in CBI trials concluded, ‘(g)iven that the processes of culturing and differentiating cells are idiosyncratic and successful methods vary from one species to the next, the extent to which it is reasonable to extrapolate from the results with mouse cell lines to human cell lines is unclear’ (Dawson et al, 2003).
If human cells are transplanted, there may be concerns about species-specific cell signaling leading transplanted human cells to behave differently than in their native environment (Dawson et al, 2003). In addition, placing human cells in nonhuman species could elicit an immune response (Ginis and Rao, 2003; Magnus et al, 2008). To combat this, investigators may need to induce immunosuppression or use modified animals (e.g., non-obese diabetic/severe combined immunodeficient mice). Depending on the cell source, this immune response may or may not be involved in subsequent clinical trials and may threaten predictive validity. Further complexities for data interpretation are suggested by evidence that the growth of primitive tumors from human embryonic stem cells (hESCs), a primary safety concern, is host dependent (Shih et al, 2007). This study found aggressive tumor growth after hESC injection into human fetal tissue engrafted in SCID mice. It also found differences among fetal tissue types in their ability to support tumor growth. It seems likely that the various challenges raised by xenotransplantation in the context of CBIs will be compounded when cells are placed into the nervous system or more specifically into the brain.
Evaluating CBI-NCs will likely require rethinking other typical aspects of drug testing procedures and trial design. For example, selecting appropriate control groups for CBI trials will require careful consideration in this context (Mathews et al, 2008). Issues such as the use of placebos and sham surgeries often present interesting and unique challenges, and, in the context of CBI-NCs, these issues become even more difficult to resolve.
The lack of dose–response curves has been cited as a potential factor leading to predictive validity failure for animal models of stroke. Dosing and dose–response curves may be more difficult to determine and interpret in the context of CBIs (Ginis and Rao, 2003). The notion of dosing makes sense when considering cells that are being used to deliver chemicals, but investigators will need to both determine the dose of chemical delivered by the transplanted cells and account for potential future flourishing of these cells within the transplant recipient to ensure that the transplanted cells themselves do not cause any harm, such as tumor formation. For many CBIs, dose–response curves may not be feasible.
The cells themselves will change after transplantation and the biologic readouts and parameters typically used for drug testing will not apply. Also, allelic variability may cause cells to have a context-dependent response that is more variable than that of drugs (Ginis and Rao, 2003). Greater variability will make results more difficult to interpret and require larger numbers of subjects for both animal studies and clinical trials.
Unlike drugs, introduced cells may survive for the remainder of the recipient’s life. Once transplanted, cells may migrate and differentiate. These attributes are part of what fuels the optimism about the therapeutic potential of CBIs, but are also associated with risks of adverse events such as tumor formation (Bjorklund et al, 2002). Although drugs can be discontinued or electrodes turned off, there is currently no easy way to stop undesirable effects of transplanted cells, as evidenced by the dyskinesia resulting from fetal nigral transplants for PD (Olanow et al, 2003). It should be noted that capacity for in vivo replication varies by cell type (Cai et al, 2004). Furthermore, CBIs are likely to involve differentiated, rather than undifferentiated cells, due to the risk of uncontrolled growth of undifferentiated cells, particularly for interventions developed using embryonic stem cells (Laflamme and Murry, 2005). In this case, the presence of residual undifferentiated cells is a significant cause for concern.
Another example of a potential risk that may not be readily apparent if animal subjects are not followed for an appropriate duration are problems related to cranial volume. Particularly for nonlesion brain conditions (i.e., there is no space available for the transplanted cells) it is theoretically possible that, over time, transplanted cells could multiply sufficiently (Vescovi et al, 2006; Chaichana et al, 2006) to increase cranial pressure, leading to undesirable outcomes (Allen and Ward, 1998). Data from preclinical studies may not be predictive of these sorts of adverse events due to the lack of long-term follow-up data and challenges associated with capturing the full range of human cognitive functional outcomes using animal models.
It is medically challenging and expensive to gather long-term follow-up data for animal models of neurologic conditions, and, at present, there is no agreement on appropriate timeline for follow-up in animal CBI trials.
Challenges presented by research into the development of CBI-NCs are not completely novel. These challenges, however, exist at the highly charged end of a number of spectra of traditional concerns. This suggests that great care must be taken to accurately assess the predictive validity of the models used. This does not, however, invalidate all results from preclinical trials utilizing animal models. Important progress has been made, and can continue, using animal models of neurologic conditions to assess CBIs. Animal models can teach us a great deal about pathogenesis, mechanism, cell migration, etc., which would not otherwise be ascertainable. It is imperative, however, that the ongoing use of animal models for CBI-NCs be informed by the challenges and notable predictive validity failures in prior related work. These should be carefully analyzed and used to develop and refine an accurate assessment of the predictive validity for various animal models in different contexts, and investigators should take great care to ensure that claims do not exceed the scope of the available data.
Many animal models, although incomplete, can capture aspects of the pathogenesis of human disease. When possible, results from studies utilizing different animal models should be combined to capture a fuller picture of a particular condition. It will be important, as noted in the STAIR recommendations (STAIR, 1999), to repeat successful studies using animals that are more similar to humans; ideally these would use a gyrencephalic primate species, such as macaque monkeys. Scientific consensus should be reached regarding the standardization of animal models for use in studying CBI-NCs.
CBI-NCs hold great promise. In the absence of animal models that more fully recapitulate human disease, a relatively high level of uncertainty will have to be tolerated for approval of human trials. It is possible that gathering evidence of safety for CBI-NCs will be less difficult than evidence of efficacy, as one trial could provide evidence of safety that will be applicable to numerous CBI-NCs. However, establishing that CBI-NCs are safe requires reaching consensus about the length of follow-up that would be sufficient to reveal long-term effects. The more serious problem is establishing efficacy, and here we may need to proceed without good evidence. When possible, strategies should be developed and used that help to manage this increased risk to early subjects (Mathews et al, 2008; Taupin, 2006).
This is a difficult topic with no simple answers; the number, complexity and serious nature of the challenges associated with the use of animal models as a mechanism to provide reasonable evidence of safety and efficacy for CBI-NCs suggests both that public education and debate should occur. The recent ISSCR announcement of a task force to establish international guidelines for the clinical translation of stem cells and their direct derivatives should provide one suitable forum for addressing these challenges (Daley et al, 2008). In addition, the FDA and the National Institute of Neurological Disorders and Stroke should hold a consensus conference to clarify the appropriate role for animal models in such translational research. Owing to the likely willingness of desperate subjects to tolerate greater risks in human trials of CBI-NCs, informed consent procedures for these trials ought to be performed under close scrutiny, and Institutional Review Boards should single out CBI-NC protocols for particularly close review, informed by the unique challenges of this research.
The authors acknowledge the contribution of Dr Ira Black, who died before the completion of the manuscript.