|Home | About | Journals | Submit | Contact Us | Français|
The decision to initiate invasive, first-in-human trials involving Parkinson’s disease presents a vexing ethical challenge. Such studies present significant surgical risks, and high degrees of uncertainty about intervention risks and biological effects. We argue that maintaining a favorable risk-benefit balance in such circumstances requires a higher than usual degree of confidence that protocols will lead to significant direct and/or social benefits. One critical way of promoting such confidence is through the application of stringent evidentiary standards for preclinical studies. We close with a series of recommendations for strengthening the internal and external validity of preclinical studies, reducing their tendency toward optimism and publication biases, and improving the knowledge base used to design and evaluate preclinical studies.
A major objective in Parkinson’s disease (PD) research is the development of strategies to halt degeneration and/or restore the function of dopaminergic and other neurons. Among the approaches being tested are various small molecule drugs,i trophic factors,ii deep brain stimulation,iii cell transplantation, and gene transfer.iv
Ethical discussion of translational PD trials has tended to center on sham surgery,vvivii quality of informed consent,viii subject selection,ix and the use of fetal tissues.x Nevertheless, a perhaps more fundamental question remains uncharted: when is it ethical to initiate invasive first-in-human (FIH) PD trials? Decisions to launch such trials are often marked by controversy. To date, three gene transfer strategies have been reviewed by the Recombinant DNA Advisory Committee. All received full public review, signaling novelty and ethical concerns.xi The first received a particularly skeptical hearing,xii and when this trial was initiated, several researchers castigated the investigators as “crazy,” venturing into “terra incognita,”xiii and “raising hopes in people with minimal evidence of benefits.”xiv
In November 2007, the Novel Neurotechnologies Ethics Research Group convened a workshop at McGill University to discuss the ethics of initiating invasive FIH PD trials. The analysis and recommendations that follow emerged from this meeting.
Two of us (JK and AJL, both ethicists) drafted a statement outlining objectives of the workshop. We then sought input from a preclinical researcher (MEE). After refining our statement, we invited another preclinical researcher (AF), a neurologist clinician / researcher (BR), an epidemiologist (TR), a neurosurgeon / researcher (MB), and a neuroscience historian (FS). Participants were selected on the basis of their interest in neurology translational research ethics. To ensure focused discussions, the pre-workshop statement was refined by invitees. The workshop was held on November 9, 2007; after a series of presentations and discussion, areas of consensus were identified. Except where noted, participants were in agreement about the analysis and recommendations presented here.
The core ethical challenges surrounding the initiation of invasive PD studies derive from the nature, probability, and indeterminacy of study risks. In particular, PD trials target the brain, and as “the organ of our personhood,”xv adverse events have the potential to disrupt those features that make us who we are: language, memory, cognition, and identity.
PD trials are somewhat unusual because they involve a relatively high baseline level of risk, prior to the delivery of investigational agents. Whereas administration of small molecule drugs involves minimal risk, invasive PD studies involve inoculations to the brain. Based on studies examining complication rates associated with analogous surgical procedures in similar brain regions, risk of surgery-related permanent neurological deficits are on the order of 0.5 – 1% per inoculation.xvixviixviiixixxx This might appear modest when compared against phase 1 oncology studies (these involve 14% risk of grade 4 adverse events),xxi but two additional factors must be considered. First, many PD protocols involve multiple injections; assuming somewhat conservatively that permanent neurological deficit risk is 0.5% per inoculation, four injections present almost 2.0% risk (in comparison, risk of irreversible adverse events in phase 1 oncology studies is approximately 1.5%xxi ). Second, these represent baseline levels of risk that are present before investigational agents are even received (contrast this with phase 1 cancer studies, where study risk normally derives from the study drug itself).
Uncertainty presents another important challenge for PD FIH studies. Experimental agents employed in invasive PD studies often have characteristics (e.g. potential vector immunogenicity) that limit the reliability of animal models for risk determination.xxiixxiii Impairments in brain processes such as cognition are also difficult to model preclinically. Furthermore, though nonhuman primate toxicology studies play a crucial role in risk assessment, sample sizes and time horizons enable the detection of only high frequency, immediate, and catastrophic events. Our review of five well-known gene transfer and neurotrophic factor trialsxxivxxvxxvixxviixxviii found that, on average, investigators used 20 nonhuman primates in preclinical studies; two of these trials appear to have been based on preclinical studies for which the last time point was eight months (Table 1).xxix
We believe that this high uncertainty should affect the appraisal of risk. It should translate to a presumption that study risks exceed numeric “best-estimates” provided by preclinical studies. This position is justified on the grounds that risk estimates from preclinical studies will tend, at best, to have wide confidence intervals and, at worst, will have limited predictive value. This proposal reflects the fact that the full range of adverse events is not known, and that their probability might be higher than anticipated.xxx xxxi
Numerous influential statements on research ethics underscore the importance of preclinical studies.xxxiixxxiii For example, the CIOMS guidelines state “clinical testing must be preceded by adequate laboratory or animal experimentation to demonstrate a reasonable probability of success without undue risk.”xxxiv This suggests that safety is an insufficient condition for launching human studies; study interventions must also have shown evidence of promise in laboratory studies. However, there is no consensus on either the quantity or quality of preclinical evidence necessary to justify a human study.
We believe that the risks of invasive PD FIH studies create exacting demands on the quantity and quality of preclinical evidence. This position is rooted in the requirement that risks and benefits must be favorably balanced in clinical research. In contrast, where study risks are moderate or minor, evidentiary standards need not be as stringent.
Many would question whether interventions used in FIH PD trials can be plausibly viewed as presenting a prospect of direct medical benefit. Granting for the moment that they do, it is nevertheless axiomatic within medicine that potentially harmful and difficult to reverse interventions should be undertaken only where there is a reasonably high degree of confidence that they offer therapeutic benefit. This favors stringent evidentiary standards for preclinical studies.
Risks in clinical trials are also ethically justified by the prospect of producing generalizable knowledge. With greater risk comes the obligation that studies present proportionately greater probability of producing knowledge benefits. We contend that trials founded on a sound evidence base will tend to have stronger claims to scientific value than those supported by weak evidence.
This claim rests on the following logic: the value of an experiment is partly a function of the confidence in the veracity of a hypothesis that an experiment engenders in the expert community. Experiments that can exclude competing explanations for results will produce greater confidence in the veracity of a study hypothesis than those that exclude fewer alternatives. For example, clinicians interpreting randomized controlled trial results tend to form stronger beliefs about a drug’s efficacy if treatment allocation during the trial is concealed, because this excludes a competing explanation—that observer bias explains a “positive” trial result.xxxv
Novel interventions rarely translate smoothly into clinical applications, especially for CNS disordersxxxvixxxviixxxviii. This is documented by the negative results of multiple randomized trials testing disease-modifying strategies for PDxxxixxl, strokexlixlii and other neurodegenerative diseasesxliiixliv A central question guiding the clinical translation of an invasive PD trial is whether negative outcomes will be informative. A critical ingredient in appraising the value of a null result is whether obvious competing explanations can be excluded. For example, if observer bias may have produced exaggerated treatment effects in preclinical studies, a null human result will be less informative.
In conclusion, preclinical researchers should reduce uncertainty where it is feasible to do so. Where preclinical studies can either limit uncertainty surrounding the properties of an intervention or weaken the plausibility of competing explanations for observations, FIH studies can make a stronger claim to producing scientifically meaningful outcomes. Preclinical data thus become an essential feature in deciding the risk-benefit balance of FIH trials.
What practices should investigators use in preclinical studies to enhance the scientific and/or therapeutic promise of FIH studies? When reviewing proposals, what elements should IRBs, funding agencies, and policy-makers expect? The remainder of our analysis is directed towards describing methodologies and practices in preclinical research that we believe would greatly enhance the ethical justification for initiating invasive FIH studies (Table 2).
Clinical research has evolved a series of methodological practices aimed at maximizing internal validity. As extensively documented in stroke research, these practices have permeated preclinical research to only a limited degree.xlvxlvi The first is a priori power calculation. These are almost never reported in PD preclinical studies. Though expense and burden severely constrain sample size for non-human primate studies, the absence of power calculations seems harder to defend where studies involve lesioned rats. Stating an a priori hypothesis, and powering a study accordingly, helps discourage researchers from expanding their samples sizes or manipulating definitions of efficacy to attain “significant” results.
PD preclinical studies also rarely report the use of random treatment allocation. Of 22 published preclinical studies used to support human GDNF and/or gene transfer studies against Parkinson’s, only four reported randomization. This is lower than reported in other preclinical literatures.xlviixlvi In a systematic review of 290 animal studies, non-randomized studies had a 3.4 times higher odds of showing a positive treatment effect compared with studies that used randomization.xlviii We acknowledge that simple randomized treatment allocation may not always be the best strategy in PD: it will often be more useful to balance groups on the basis of pre-treatment performance on key outcome measures, and then assign treatment randomly within these groups.
Blinded treatment allocation is also necessary to avoid subtle differences in the handling of animals. In one meta-epidemiological study involving preclinical studies of stroke interventions, studies that did not mask investigators to allocation produced significantly larger treatment effects than those that did.xlix However, another procedure—blinded or automated outcome assessment—seems almost universally accepted in PD preclinical research (only three of 22 preclinical studies did not report using either).
Another crucial variable is the treatment of missing data. Given the small sample size in many preclinical studies, how missing data are managed can significantly alter the outcome of statistical tests.
External validity issues are related to the models used in PD preclinical studies. Although spontaneous rodent models of PD are now available,lli researchers have typically relied on various injury-induced models, which mimic nigrostriatal dopamine deficiency but do not recapitulate the slow, progressive degenerative or pathophysiological nature of PD. Thus, a typical PD neuroprotection study administers a putative neuroprotective agent before or at the same time as inducing an acute PD-like lesion. In contrast, human trials administer interventions in the context of a disease that is progressed and chronic.lii The difference in disease states severely constrains the external validity of preclinical studies. A number of different animal models and clinical outcome measures are available for PD preclinical researchers,liii and transgenic models should be given serious consideration when designing preclinical studies. At a minimum, investigators should justify their selection of models and outcome measures.
Preclinical studies should also correspond as much as possible with the methods used in clinical studies. For example, clinical studies should be performed without departing from delivery techniques, agent composition, or inoculation sites validated in preclinical studies.
Optimism bias refers to an often unconscious tendency for researchers to present or interpret their data in a favorable light.liv Optimism bias might pose particular challenges to translational research, because at the point where preclinical studies are underway, investigators have devoted many years to a research program. Though personal identification with a therapeutic strategy can fuel perseverance, it can also interfere with dispassionate appraisal of study findings.lv
Publication bias (that is, a tendency to withhold reporting of unfavorable findings) presents yet another challenge to the credibility of preclinical study claims. By showing strong relationships between small sample sizes and large treatment effects, several meta-analyses of neurological preclinical studies show high rates of publication bias.xlvilvilvii
Preclinical research should therefore build mechanisms to check optimism and publication biases. Trial protocols should include critical reviews that place animal studies within a broader clinical context. Similar to a systematic review but broader in scope, a critical review should comprehensively summarize all existing literature that may be relevant to the results being reported. In contrast to a systematic review, the search strategy should be open-ended rather than rigorously pre-specified. This is because preclinical animal studies that replicate a given study exactly are relatively rare. Rather, researchers should search for all possible studies that may be relevant, whether they are based on the same treatment in different animal models, related treatments in the same animal model, or even related treatments in different animal models. Given a propensity for optimism bias, critical review should aim at finding dissenting evidence (as opposed to supporting evidence). Preclinical reports and trial proposals should discuss competing explanations for observed treatment effects. Reports should disclose limitations of a study by discussing the weaknesses of animal models and rating scales.
One last counter to optimism bias is transparency. Publication of preclinical toxicology studies is hardly the norm. Such non-publication might have understandable commercial motivations, but it frustrates the ability of assessors to form independent judgments about the safety and promise of an intervention. Researchers should make good faith efforts to publish all preclinical studies before proposing human trials; public and private funding agencies might also establish public databases of preclinical studies. One promising model that might be employed would be the National Gene Vector Laboratory toxicology database.lviii
Though we recognize that each of the proposed practices entails burdens, budgets, and in some instances, proprietary liabilities, we find the counterargument—that these sources of bias remain unchecked—untenable. Our recommendations only address what we consider to be the most tractable problems confronting individual PD FIH studies.
Nevertheless, just as clinical practice is best judged on the basis of community standards, so too should research practices be evaluated. To that end, we suggest two further measures designed to foster the development and articulation of quality standards and practices surrounding FIH studies.
Decisions concerning trial initiation would be greatly enhanced if the field were to articulate clear standards for producing and evaluating preclinical evidence. Questions include the types of cellular and molecular evidence that should be available before initiating studies, effect sizes and consistency of effects in animal models, length of follow-up, clinical and subclinical toxicity testing in animals, and appropriate functional rating scales or other outcome measures with clinical impact. Given the great difficulty encountered in translating PD interventions into clinical applications, guidelines might have limited utility for predicting if a candidate intervention is likely to prove clinically useful. Nevertheless, guidelines might be used to cull agents that do not meet minimum criteria. The Stroke Therapy Academic Industry Roundtable established guidelines in 1999 for the design of preclinical studies.lix Although these guidelines are not “fail-proof,”lxlxi they provide a framework for designing and evaluating preclinical studies. PD research sponsors might consider whether similar substantive standards might be established for preclinical researchers and referees.
As noted above, several PD preclinical models are available for preclinical testing. Some members of our workshop questioned whether PD preclinical models had any predictive value whatsoever. This view echoed concerns recently expressed by some Amyotrophic Lateral Sclerosis, Alzheimer’s disease, and Huntington’s disease researchers.lxii Accumulated evidence supports the value of preclinical research for neurological disorders when appropriate models and hypothesis-driven experimental designs are utilized.lxiii A thorough understanding of the models, their differences and their limitations is nevertheless needed to match the model to the question at hand. This knowledge is also needed to select appropriate outcome measures (functional, anatomical, cellular or molecular) that will provide accumulated evidence of the effects of a therapy.liiilxiii Recently, Alzforum hosted a discussion of animal models to discuss shortcomings in mouse models used for neurodegenerative disease research; representatives from the Michael J. Fox Foundation participated.lxiv A similar forum might be established for non-human primates, and for articulating research needs with respect to alternative models. PD preclinical models will be among the many topics discussed in detail on PD Online Research, which is being built by Michael J. Fox Foundation as a web-based, “large-scale self-organizing community of basic and clinical scientists, industry professionals, grantmakers, and financial investors involved in PD research.”lxv
In summary, the nature and degree of risk for invasive PD FIH studies puts particular pressure on the requirement that risks be favorably balanced against benefits for human studies. A critical factor in assuring favorable benefit profiles—whether this involves direct, therapeutic benefits or knowledge benefits—is the requirement that preclinical studies be designed and executed with scientific rigor. Toward that end, we have offered several recommendations that center on strengthening internal and external validity, while managing or reducing optimism and publication biases.
Though our workshop centered on PD, our analysis and recommendations could conceivably extend to any research area involving novel and complex agents delivered to the brain. Our recommendations are thus consistent with others aimed at improving the scientific utility and predictive value of experimental neurological interventions. In 1992, for example, PD researchers established the core assessment program for intracerebral transplantation, which aimed at enhancing the interpretability and value of open-label transplant studies.lxvi This framework was subsequently extended to Huntington’s Disease,lxvii and an analogous framework has been devised for surgical treatments of PD.lxviii In 1999, the Stroke Therapy Academic Industry Roundtable convened a working group to “optimally preclinically assess neuroprotective and restorative drugs for acute ischemic stroke.” The group issued a series of recommendations on preclinicallxix and clinicallxx study design.
We nevertheless acknowledge that several additional issues will need to be addressed, perhaps in future workshops, to assure that FIH studies have a favorable risk-benefit balance. These include standards for study design, best practices for reporting and disseminating findings, and how barriers to high quality preclinical research might be overcome. Our recommendations on preclinical study methodology would also be complemented by the development of substantive criteria for initiating studies and procedural guidelines for establishing acceptable risk. At the minimum, our recommendations should stimulate reflection and debate within the wider PD research community.
This work was funded by Canadian Institutes of Health Research, States of Mind: Emerging Issues in Neuroethics (NNF 80045). We wish to thank other participants at the workshop, including Andrew Fenton, Eric Racine, Lynette Reid, and Mary Sunderland. We also thank two anonymous referees for their comments on our manuscript, and Kat Duckworth for research assistance.
Financial Support: All financial and material support for this manuscript, regardless of date, was from the Canadian Institutes of Health Research (NNF 80045). Support unrelated to this research held by various authors comes from the following sources: CIHR, NINDS, Andrew W. Mellon Foundation, CDC, Kinetics Foundation, M. J. Fox Foundation, Parkinson's Disease Foundation, NIH -RARC base grant RR000167, Health Canada, US EPA, Heart and Stroke Foundation of Canada.
Research Project: (Initial Conception: JK; Design: JK, AJL, MEE, BR, TR; Organization: JK; Data Collection: JK, MEE; Execution and Content of Workshop: All)
Statistical Analysis: Not applicable
Manuscript: (Writing of the first draft: JK; Review, Critique, and Revision: All)