|Home | About | Journals | Submit | Contact Us | Français|
Recognition of limited economic resources, as well as potential adverse effects of ‘over testing,’ has increased interest in ‘evidence-based’ assessment of new medical technology. This creates a particular problem for evaluation and treatment of epilepsy, increasingly dependent on advanced imaging and electrophysiology, since there is a marked paucity of epilepsy diagnostic and prognostic studies that meet rigorous standards for evidence classification. The lack of high quality data reflects fundamental weaknesses in many imaging studies but also limitations in the assumptions underlying evidence classification schemes as they relate to epilepsy, and to the practicalities of conducting adequately powered studies of rapidly evolving technologies. We review the limitations of current guidelines and propose elements for imaging studies that can contribute meaningfully to the epilepsy literature.
There are now a bewildering variety of ‘guidelines’ and practice parameters, and a burgeoning literature devoted to them. Current trends emphasize applying strict criteria to diagnostic and therapeutic studies in order to assess the strength of evidence presented. Typically a series of studies is reviewed in order to determine to what extent available evidence may address specific practice questions, the “quality” of evidence is rated, and conclusions of varying ‘strength’ drawn, often quite weak (for a recent epilepsy imaging example see (Harden et al., 2007) often followed by recommendations for further research to fill gaps in knowledge.
The American Academy of Neurology (AAN) has a formal guidelines procedure that allows studies to be considered in broad distinct categories: therapeutics; diagnosis; prognosis, screening, and causation. Each of these is relevant to the role of technology in the evaluation and care of patients with epilepsy. Therapeutic guidelines are the most clear, and are based on a long history of medication and intervention trials (see Table 1 for the AAN classification of evidence). Other examples include the Grade Working Group (Atkins et al., 2004) employed by the National Institute for Health and Clinical Excellence (NICE) and by the World Health Organization. Some but not all guidelines processes use prospective, double-blind, randomized, controlled trials (RCTs) as the highest standard of evidence for therapeutic or diagnostic efficacy (French, 2009). A recent AED monotherapy guideline published by the ILAE commission on therapeutic strategies adopted stricter criteria than the AAN, adding duration and power criteria for study classification (Glauser et al., 2006).
Others, such as the GRADE method, appear more open to evidence from ‘well-designed observational studies’ or cohort studies than the AAN process. For example, observational studies may be considered to have the same level of evidence as RCTs if there is high relative risk in two or more studies, and no plausible confounders(Atkins et al., 2004). The UK National Institute for Health and Clinical Excellence (NICE) explicitly considers social and economic criteria, and includes a wide range of ‘stakeholders,’ such as patient groups, in the process (National Institute for Health and Clinical Excellence 2009). It uses “expert consensus to make decisions if evidence is poor or lacking.” And note that whilst therapeutic studies, especially of medication, lend themselves well to a prospective double-blind study design, this is not the case for diagnostic studies.
Despite clear advantages, including standardization, reduced bias, and reasonable objectivity, applying rigorous approaches to technology based studies aimed at diagnosis and prognosis may lead to difficulties: the classification criteria have important limitations (see tables 1, ,2).2). Technological approaches, including imaging (CT, MRI, radioisotope based), and neurophysiologic (EEG and MEG now routinely are co-registered with MRI data, making the term ‘imaging’ a useful shorthand) studies have several applications, including diagnosing etiology, syndrome classification for clinical trials, prognosis in long term outcome studies (see population based, epidemiologic (Harvey et al., 1997, Spooner et al., 2006, Shinnar et al., 1994), the proposed new ILAE classification (Berg 2010) and, perhaps most prominently, focus localization, to plan epilepsy surgery and predict surgical outcome.
Epilepsy imaging investigators may add to the problem by failing to report complete data, as well as understand the criteria and processes for assessing the strength of clinical evidence. However, existing guidelines may in part be inappropriate for imaging and neurophysiologic studies. This review will examine controversies and challenges that confront investigators in study design and conduct. The review also provides suggestions for how best to organize and conduct a study that will provide optimal information and contribute meaningfully to the literature and to improved practice.
Guideline reviews of diagnostic literature -- structural imaging, functional imaging, and neurophysiological studies in epilepsy -- seem to raise particular problems leading to ‘low’ evidence ratings. Sample sizes are small, randomization and blinding uncommon. Most criteria for investigative criteria are designed to assess procedures on fairly narrow ‘diagnostic’ criteria, rather than the more fluid localization and prognostic questions important for intractable epilepsy (AAN; Center for Evidence-based Medicine (CEBM) (http://www.cebm.net/index.aspx?o=1157--accessed July 27 2009). Given limitations in the available data, and disagreement about the process, it is challenging to develop guidelines based on satisfactory quality of data that would appear generally satisfactory for a number of important questions to help guide clinical practice:
The classification of evidence for diagnostic and outcome studies of technology derives primarily from therapeutic trials (see Table 1) which outline clear study populations, control populations, intervention, measures, and outcomes. Technology does not readily lend itself to classification in this model format. Devices are usually evaluated in terms of their accuracy, reliability, therapeutic potential, and cost effectiveness. In epilepsy studies devices and techniques usually are directed at diagnosis and prognosis for seizure control. There are several aspects of epilepsy that make application of evidence classification schema problematic.
The course of epilepsy is irregular, with remissions and exacerbations. It may take as long as ten years after seizure onset for patients to develop persistent 'intractable epilepsy (Spooner et al., 2006, Berg, 2009). Imaging modalities used early in prospective studies may be obsolete by the time the data are analyzed, and thus irrelevant to current practice.
For surgical planning, identifying -- or confirming -- the area responsible for seizures and therefore for surgical resection is considered to be paramount, based on the data showing that patients with focal findings on imaging or neurophysiology do better than those with normal studies (e. g. (McIntosh et al., 2004). These data themselves, however, generally would receive low ratings in the AAN scheme (due to lack of blinding and randomization, among other issues), perhaps doing slightly better in the GRADE classification. To complicate matters, patients may have a restricted zone of epileptogenicity within a structural lesion, a wider zone beyond it, multiple lesions, or a more broadly defined ‘epileptogenic network’, that is not evident on imaging studies.
Imaging studies are predicated on the assumption that a visualized abnormality is linked to cause, pathology, seizure focus, and outcome. MRI evidence of hippocampal sclerosis is usually taken to have pathophysiological significance. However, this presumption is based on the observation that such MRI findings have been rare in the large number of normal volunteers scanned for neuropsychological studies. Some investigators suggest that hippocampal sclerosis is not always associated with intractable epilepsy (Kobayashi et al., 2002, Stephen et al., 2001). Moreover, hippocampal sclerosis in the setting of refractory epilepsy may have different significance than when found in new onset seizure populations (Spooner et al., 2006) or asymptomatic people. The lesion which has been shown to progress over time (Theodore et al., 1999, Mathern et al., 2002), may be a consequence as well as a cause of seizures.
Not all MRI abnormalities, including hippocampal sclerosis, cavernomas, gliomas and malformations cause seizures and not all seizures originate from identified structural cerebral abnormalities. It is necessary to establish with clinical and neurophysiological data whether a given lesion is likely to be responsible for the seizures. Nevertheless, the consensus that, identifying clear (hippocampal sclerosis, MCD, tumor; not gliosis or encephalomalacia) imaging abnormalities is associated with good surgical outcome would make it very difficult to perform a prospective study (see also the large scale retrospective ILAE 2004 pediatric surgery outcome data (Harvey et al., 2008).
Both diagnostic and prognostic classification schemes are based on some variety of a ‘final common criterion,’ often referred to, with unintended irony, as a “gold” standard. The criterion itself may be elusive or flawed; in some instances there is no standard. The standard for identification of a seizure focus may be based on video - scalp ictal EEG, intracranial ictal EEG, pathology, or post operative seizure freedom. For diagnostic purposes the standard usually means the seizure focus, initially defined electrophysiologically, with supporting evidence from imaging and sometime pathology. This approach of course runs the risk of creating circular arguments, although new imaging approaches can be evaluated in comparison to ”established” ones.
Linking imaging standards to pathology can be difficult as well: changes may be subtle, or missed due to limited tissue availability and quality for review or insufficient expertise. Moreover, the relation between underlying pathology and clinical seizures is inexact. Pathological classification schemes are subject to debate and reconsideration; changing pathological classification schemes, like changing MR technology, can make comparison of new and old data difficult (Palmini et al., 2004).
Many factors may affect clinical outcome. For surgical studies the ideal measure, seizure freedom, is problematic. Surgical outcome depends on the surgeon, the approach, and functional/anatomic constraints. A success rate of less than 100% may not mean that imaging was incorrect. Sometimes the abnormality or the focus can not be entirely removed for technical reasons (e.g. vascular), pathological reasons (e.g. gliomas), or functional reasons (e.g. overlap with eloquent cortex). A reduction in seizures may suggest that the imaging data were correct, but the resection was incomplete. A further difficulty is the variability in time at which post-operative outcome is assessed. Post-operative seizure frequency fluctuates, as may patient compliance with postoperative AED treatment. A patient could be seizure-free for several years, experience one or more seizures, followed by another extended remission, or longer relapse. These confounds will effect sensitivity and specificity measures by underestimating or overestimating the value of diagnostic and prognostic testing.
For language and memory lateralization the intracarotid amobarbital test (IAT) is often considered a “gold” standard. Yet there are clearly flaws: the IAT includes measurable risk, limited time for cognitive assessment of variables of interest, poor validation of memory, inaccurate results of IAT, as well as technical and vascular reasons for failure. Electro-cortical stimulation (ECS) is considered the “gold” standard for functional localization but is limited in time for assessment, and sampling can only be performed at sites of implanted electrodes. Post-operative cognitive assessment could be considered a standard, but no study will randomize patients to removal of areas where language or memory are thought to reside on the basis of an imaging procedure -- one can only examine unintended adverse surgical consequences.
While there are flaws in current guideline criteria, the current imaging literature commonly lack study designs necessary to provide meaningful contributions to clinical practice. A limitation that plagues epilepsy surgery investigations is the size of study populations, especially for new or limited availability technology, and in pediatrics. Initial reports on imaging and physiology studies are usually small (15-30) with follow up studies rarely more than 100, and smaller when ionizing radiation is involved. With these limited numbers it is often impossible to generalize findings because of the heterogeneity of patient populations and limited statistical power. Imaging technology also changes rapidly, with upgrades annually and major changes of equipment every five years common place. Even at the most active epilepsy sites it takes several years to obtain homogenous patient populations, with a minimum of 12 months post-op follow up, that have sufficient power to make meaningful conclusions. Meanwhile new PET ligands or MR sequences may have been introduced.
Only a minority of epilepsy imaging studies have control populations. Exceptions include some adult PET studies, fMRI language studies, DTI, and structural MRI VBM based approaches to data analysis (primarily structural, DTI, Magnetization transfer, FLAIR) (Cook et al., 1992, Rugg-Gunn et al., 2001, Rugg-Gunn et al., 2003, Salmenpera et al., 2007, Focke et al., 2008b, Focke et al., 2008a, Focke et al., 2009, Gaillard et al., 2002). Ionizing radiation used for PET and SPECT precludes obtaining normal data in children (Chugani et al., 1987, Gaillard et al., 2002). Even when controls are available, the set may not be large enough to ensure data accurately reflect population age related norms; the control population must be appropriately powered for experimental comparisons (e.g. MRI VBM methods require 30 or more subjects (Focke et al., 2009). Defining control populations for imaging studies in epilepsy populations in regards to outcome is also problematic. In therapeutic trials one can more readily randomize patient populations, and then move to open label or cross over design (see below). The usual approach is to choose a more or less homogeneous sample of subjects with an epilepsy syndrome of interest (usually TLE), perform an imaging study, in order to compare clinical characteristics and surgical outcome between patients with positive and negative imaging findings.
Therapeutic trials are facilitated by an infrastructure for multi-site trials and strict government criteria for approval (e.g. the EMEA and FDA). There is no mechanism for conducting comparable multi-site imaging studies that would be the equivalent of ‘pivotal’ medication trials. Other impediments to multi-site technology studies include expense, limited availability, and expertise. Perceived technical differences in machines and sequences are viewed as impediments to studies though these differences are less than patient heterogeneity.
Diagnostic data are, with rare exception, used in the decision making process (For the exception, see Theodore 1992 where 18FDG PET data was obtained but not provided for surgical planning and intervention). Sometimes imaging data identifies an abnormality that leads to intracranial EEG and subsequent resection in a patient previously considered not to be a surgical candidate (Salmenpera et al., 2007, Focke et al., 2009). It is difficult, in these circumstances, to test the data independently and without compromising good clinical practice in the use of accepted techniques. For example, it would be difficult to do such studies with MRI, SPECT, language fMRI, or MEG; one should be able to do so with new MRI sequences (diffusion/perfusion).
Recent alternative study designs advocate presentation of novel image or neurophysiological data, after a case conference decision has been made using standard clinical and imaging material, in order to assess how reconsideration with the new information alters decision making. (Medina et al., 2005, Knowlton et al., 2008b, Knowlton et al., 2008a). Here one does not know what would happen with those patients who do not undergo the procedure and effect ultimate outcome. It is not clear how this can be avoided without compromising good clinical practice. The practice introduces a selection bias; TLE patients with normal MRI may be less likely to have surgery, the effect is greater for extra-temporal lobe epilepsy. Studies often do not evaluate how novel imaging changes practice.
There is a general failure to collect data prospectively. Ideally all the imaging analysis should be done before surgery, unless results of analysis may bias study conduct (e.g. pre-operative fMRI to predict post operative memory outcome). Imaging and physiologic data, inherently objective, lend themselves to independent review; but retrospective analysis may introduce several sources of bias. Many studies do not interpret data or assess outcomes blindly. Data need to be analyzed by a person blinded to patient identity and without a vested interest in the outcome. Most centers do not have special expertise in all imaging modalities, complicating multi-modal comparisons.
Another major limitation is the continuing and rapid evolution in technology. While there are no class 1 studies on 1.5 MRI, imaging has moved to 3T, and 7T studies are commencing. New MRI sequences and changes in scanner hardware and software are introduced every few years, but their application and proper place in epilepsy evaluation is not well established. In short, the technology does not stand still long enough to enable adequately powered studies with adequate follow-up to be carried out.
There is also an issue of sensitivity and specificity. Subtle focal cortical malformations are considered to be the likely cause of many cases of non-lesional focal epilepsy. With higher resolution scanners and sequences it will be difficult to be certain that increasingly subtle findings are clinically relevant unless adequate numbers of healthy controls are studied. Last, there is the issue of how to pay for new technology assessment of efficacy; this is most problematic for new PET ligands, less an issue for new MRI sequences that can be added to a clinical series. Studies of new data do not test whether a given technology is equivalent, and more importantly to not test when a test may be redundant (e.g. FDG-PET when MRI and vEEG are concordant, or IAT when fMRI language laterality is clear).
For an ethical clinical trial there must be equipoise between the two arms of the study in terms of patient benefit. This may not be possible with many imaging studies. It would not now be considered ethical to withhold fMRI language lateralization results from a surgical team to determine whether the study could predict post-operative dysphasia. This would however be feasible at this time for fMRI studies of memory, not yet generally accepted. Here, there is reasonable equipoise as to whether and how the data should influence surgical decision making.
While all the current systems of evidence classification have flaws, they all emphasize essential features of a study that could contribute meaningfully to evaluation and care of patients with epilepsy. This section outlines items that can, and should, be incorporated in imaging studies (Table 2). STROBE (http://www.strobe-statement.org ) and CONSORT (http://www.bmj.com/content/340/bmj.c869.full) are efforts to help standardize and improve presentation of data from observational studies and randomized trials, elements of which may also help inform planning and reporting of imaging studies. It may be possible to conduct a “Class 1” epidemiological study on prognosis for developing intractable epilepsy based on standardized imaging if given enough time (Berg, 2009). However, it is not likely that broad population, randomized imaging trials will be conducted with control populations for epilepsy surgery. We propose below study designs and elements that address many of the current difficulties in the epilepsy imaging literature. Studies that contain these essential elements should be strongly considered as meeting best clinical research practice that informs clinical care.
Investigators must clearly define the clinical or pathophysiological question (e.g. comparison with EEG, pathology, surgical outcome, IAT, other imaging) and design a study to answer it. The patients and data should be prospectively obtained with clearly defined populations and study selection criteria, in agreed diagnostic categories. As patients in imaging/neurophysiology epilepsy studies are unlikely to be randomized, the imaging modality should be applied to all patients with the caregiver blinded, when equipoise is present, to study result. The image analysis methods and measures should be clearly defined. Preferably the image data should be assessed by objective, quantitative measures, or where not possible, by expert blinded raters, with a separate image set used to assess interpretative reliability. All assessments need to be blinded to patient identity from the rater, and when equipoise is present, from the caregiver.
Studies need to contain a sufficiently large patient and control population and be powered to accommodate heterogeneity and allow statistically-valid subgroup analyses of more homogeneous sub-populations. Where diagnostic considerations are paramount, pathological confirmation should be provided in surgical series; these data should be analyzed on image findings not pathology findings. Where outcome is paramount a prolonged (at least one year) and complete follow-up should be made; outcomes should be defined and ascertained by a person without a vested interest in the outcome.
Control populations are the hallmark of any clinical study yet remain problematic for epilepsy. Some studies more readily lend themselves to normal control populations and need to be used whenever possible. Other studies will only be conducted in patient populations, where the next best option is to examine the data between those who undergo a procedure in question or who do not have the procedure. In this setting comment can not be made, particularly regarding outcomes, on those who did not have the procedure.
Ideally the experimental data will not be used in the decision process. Where ethical restraints prohibit such a design one can make decision without the data then reconsider the clinical decision with the data provided (Change in practice model). In these circumstances meticulous documentation of how the information altered decision making would need to be provided. For example, one scenario to establish the utility of a new test is to apply the new test to cases in whom a clinical answer is not clear (e.g. non-lesional) and then to determine if new information is provided that changes the plan (proceed vs. not proceed to surgery) and then whether it leads to a good outcome. Other possible models are to set up a sham committee with the two sets of data, or to set up a study where one center employs the new technology and the other does not in order to see if the new technology influences outcomes presuming comparable patient populations and, where relevant, surgical approach and expertise. In this circumstance data would need to be examined to assure the patient populations are comparable.
The investigators should provide a data table showing results for each subject explicitly. The presentation of data allows independent assessment, facilitates comparison of data, and facilitates future meta-analysis. The data should be analyzed with the appropriate statistical test, which will usually be some variant of ‘validity:’ sensitivity and specificity, and positive predictive value. It is also important to acknowledge limitations including potential sources of referral bias. Methods should be clear, and when possible with standardization protocols, in order to facilitate study replication and pooling of data across specialty centers. A broad range and spectrum of patients necessary for class one diagnostic and outcome studies are unlikely to derive from any single center. If a different method is used, comparison to more common methods should be included with a determination of positive contribution and redundancy made.
Potential conflict of interest needs to be addressed in guideline development. In addition to relationships with industry, it is important to consider that investigators may have substantial clinical income, grant support, or academic publications and prestige related to particular techniques.
The US Institute of Medicine recently issued a report (http://www.iom.edu/Reports/2011/Finding-What-Works-in-Health-Care-Standards-for-Systematic-Reviews.aspx) with a set of standards already generally adhered to by most organizations, and in particular designed to evaluate comparative effectiveness data, little of which exists, as yet, for epilepsy imaging. The guideline process is in flux, with a desire to achieve at least some degree of international harmonization. One risk is that guideline processes with the most rigorous evidence classification schemes will be diluted in the interests of compromise. However, objective and rational assessments and procedures are necessary that meet the demands and constraints of what is practicable and achievable.
The care of patients with epilepsy will be improved when those who care for patients with epilepsy have a clear sense of the quality and integrity of data we draw upon to make decisions for our patients. Ideally a standardized approach, with standardized assessments will be made. With standardized assessment and collection then large repositories may be established. Such approaches will allow for converging evidence from small studies and facilitate meta-analyses based on good data in absence of large scale studies. Large repositories allow discernment, within a heterogeneous population, based on multiple clinical variables (such as the ILAE pediatric epilepsy surgery outcome project and the NINDS common measures initiative). Care in acquisition of image data and clinical variables using methods proposed above will improve the quality of data and clinical care. Moreover, in a field evolving as rapidly as epilepsy imaging, guidelines must be reviewed frequently.
We thank Mr Alexander Zeitchick for assistance in manuscript preparation.
Task Force on Practice Parameter Imaging Guidelines for the International League Against Epilepsy, Commission for Diagnostics
Disclosure of Conflicts of Interest.
William D Gaillard MD has served on an educational course supported by Lundbeck Inc., His department derives clinical income from the evaluation and management of children with epilepsy, and receives research support from Lundbeck Inc., King Pharmaceuticals, PRA International, Eisai Inc., and Marinus Pharmaceuticals, Inc. He is supported by federal funding from the NIH [NINDS 1R01NS44280-01 (PI) and NICHD 1P30HD40677-01 (IDDRC, core director), NCRR 1K12RR17613-01 (mentor), NIMH 1 R01 MH065395-01A2 (Co-I), and CDC-APTR R-03 (Paid consultant)].
J Helen Cross MD PhD Reports no conflict of interest.
John S Duncan FRCP has received Institutional support from Eisai, GSK, Janssen-Cilag, UCB Pharma, GE Healthcare, MedTronic, and research grants from Wellcome Trust, Medical Research Council, Action Medical Research, European Union.
Hermann Stefan MD has received honoraria and travel support for talks and consultations from Desistin Arzneimittel GmbH, Electa, Eisai GmbH, GlaxoSmith Kline, Novartis Pharma, UCB Pharma, Eisia GmbH. In additon he was supported by federal funding forn the German Ministry of Health, DFG, OTAN.
William H Theodore MD received honoraria from serving as an editor of Epilepsy Research, receives research support and salary from NINDS DIR, and holds stock options in GE.
We confirm that we have read the Journal's position on issues involved in ethical publication and affirm that this report is consistent with those guidelines.