|Home | About | Journals | Submit | Contact Us | Français|
Correct outcome prediction after cardiac arrest in children may improve clinical decision making and family counseling. Various investigators have used EEG to predict outcome with varying success, but one limiting issue is the potential lack of reproducibility of EEG interpretation. Therefore, we aimed to evaluate interobserver agreement using standardized terminology in the interpretation of EEG tracings obtained from critically ill children following cardiac arrest.
3 pediatric neurophysiologists scored 74 EEG samples using standardized categories, terminology, and interpretation rules. Interobserver agreement was evaluated using kappa and intra-class correlation coefficients.
Agreement was substantial for the categories of continuity, burst suppression, sleep architecture, and overall rating. Agreement was moderate for seizure occurrence and inter-ictal epileptiform discharge type. Agreement was fair for inter-ictal epileptiform discharge presence, beta activity, predominant frequency, and fastest frequency. Agreement was slight for maximum voltage and focal slowing presence.
The variability of inter-rater agreement suggests that some EEG features are superior to others for use in a predictive algorithm. Using only reproducible EEG features is needed to ensure the most accurate and consistent predictions. Since even seizure identification had only moderate agreement, studies of non-convulsive seizures in critically ill patients must be conducted and interpreted cautiously.
Multiple clinical, laboratory, imaging, and electrographic features have been used to attempt outcome prediction in children following acute hypoxic ischemic brain injury after cardiac arrest, but none are perfect prognostic tools (Abend and Licht 2008). Utilization of EEG features is appealing since EEG can be performed non-invasively at bedside and provides unique information about functional brain status. Further, some critically ill children already undergo EEG monitoring to detect non-convulsive seizures which are common in this clinical scenario (Abend, et al. 2009, Jette, et al. 2006). Several studies have attempted to predict outcome in comatose adults after cardiac arrest. These were summarized in the American Academy of Neurology practice parameter which concluded that myoclonic status epilepticus on the first day predicted poor outcome. The EEG features of generalized suppression (less than 20 microvolts), burst-suppression, and generalized periodic complex were strongly but not invariably associated with poor outcome (Wijdicks, et al. 2006). Although there has been less study in children, investigators have reported that EEG features of low amplitude (less than 10 microvolts), lack of reactivity, and inter-ictal epileptiform discharges were predictive of poor outcome, although with wide confidence intervals (Mandel, et al. 2002). Prediction algorithms combining multiple EEG characteristics may have greater predictive value in children (Nishisaki, et al. 2007)
The widespread utility of EEG patterns for prognostication is dependent on the degree to which EEG characteristics can be reproduced among different readers (Houfek and Ellingson 1959). One of the fundamental limitations of studies addressing the prognostic value of EEG is the unclear reproducibility of EEG scoring given the highly variable and often ambiguous EEG patterns observed in comatose patients (Husain 2006). Several classification systems have been developed in adults (Roest, et al. 2009, Synek 1988, Young, et al. 1997). However, in critically ill adults, inter-rater agreement using standardized terminology was moderate at best for identification of rhythmic and periodic discharges (Gerber, et al. 2008, Hirsch, et al. 2005) and seizures (Ronner, et al. 2009). Only one pediatric classification system has been developed (Nishisaki, et al. 2007) and it did not include any measure of inter-rater agreement. In both children and adults, higher reproducibility has been reported for broad interpretive categories than more specific narrow EEG features (Azuma, et al. 2003, Gerber, et al. 2008, Little and Raffel 1962, Piccinelli, et al. 2005, Stroink, et al. 2006, Synek 1988, Williams, et al. 1985, Young, et al. 1997). Consensus guidelines based on direct discussion among readers may improve reliability (Azuma, et al. 2003).
EEG interpretation is a subjective process and thus the problem of inter-rater reproducibility must be addressed before generalizable conclusions about the prognostic value of EEG can be drawn. This study evaluated the interobserver reproducibility of EEG interpretation in critically ill children with hypoxic ischemic brain injury following cardiac arrest using predefined categories, terminology, and interpretation rules.
This study was conducted as a component of a larger, single institution, prospective study of children undergoing therapeutic hypothermia after cardiac arrest. This study was approved by the Children’s Hospital of Philadelphia Institutional Review Board.
Children undergoing therapeutic hypothermia underwent continuous EEG monitoring to detect seizures as part of our clinical therapeutic hypothermia protocol. If parents/guardians consented to participate in this study, their EEG tracings were saved. Some EEG samples used in this study were drawn from this cohort which has been partially described previously (Abend, et al. 2009). Two 30-minute tracings were clipped from the full EEG record. The first tracing was the initial 30 minutes of EEG recorded during the initial 6 hours of hypothermia (34.5°C rectal). The second tracing was the initial 30 minutes after return to normothermia (>36.5°C rectal). If there was substantial artifact present at those times, then an alternate clip was selected within 1 hour of the specified time.
The EEG tracings were reviewed independently by three board-certified pediatric neurophysiologists (EM, RC, DD) using Twin-Telefactor software. The readers were informed of the purpose of the study and were given specific guidelines regarding terminology and interpretation. They were permitted to change montages and filters and to review time-locked video if desired, but did not use any quantitative analytic tools. Interpretation was completed using a standardized score sheet. Readers were informed there were no neonates in the study. However, readers did not have access to any other clinical data, including age or medications since they were asked to describe the EEG tracing and not determine whether it was normal or abnormal given a specific age or pharmacologic exposure. General EEG characteristics for this study included descriptors of EEG background (e.g. continuous versus discontinuous versus burst suppression, amplitude, and predominant frequencies), presence or absence of electrographic seizures, present or absence of inter-ictal epileptic discharges, and overall characterization as severely abnormal or not severely abnormal. It was decided a priori that agreement on predominant or fastest cerebral EEG frequency would be defined as ± 1 Hz. EEG features scored and related rules are shown in Table 1.
Interobserver reproducibility analysis occurred in a stepwise manner (Table 2), since certain overarching patterns made scoring of other features illogical. For example, only tracings that were not scored as burst suppression or flat were evaluated for sleep architecture, predominant and fastest frequency, voltage, and beta activity.
Interobserver agreement was assessed using kappa coefficients for categorical variables and intraclass correlation coefficients (ICC) for continuous variables. Kappa coefficient is a measure of agreement for categorical data while controlling for agreement by chance. Kappa values range from 0 (inter-rater agreement does not differ from chance) to +1 (total agreement). However, if the observed agreement is less than chance agreement, kappa could be negative. Intraclass correlation coefficient (ICC) is used to measure inter-rater reliability when data are continuous. It may be conceptualized as the ratio of variance that is associated with differences among measured subjects to total variance. ICC values range from 0 to 1, and ICC is high when any given subject tends to have the same score across the raters. Kappa coefficients were calculated using the statistical package SAS 9.1. ICCs were calculated using the statistical package SPSS 16.0 (Green). The level of agreement measured by Kappa and ICC coefficients was classified as follows: 0-0.20 slight agreement; 0.21–0.40 fair agreement; 0.41–0.60 moderate agreement; 0.61–0.80 substantial agreement; 0.81–1.00 almost perfect agreement.
Following initial analysis, the three neurophysiologists re-reviewed and adjudicated 10 tracings in which there was initial disagreement in major categories including presence/absence of seizure, continuity, and burst suppression. Inter-rater agreement was re-assessed after group discussion and key issues were identified that that contributed to initial disagreement.
Thirty-seven children underwent therapeutic hypothermia after cardiac arrest and had EEG tracings retained in the research database. The mean age of the children in this study was 4.5 ± 6.0 years, including 20 males and 17 females. Prior to cardiac arrest, 22 were normal, 9 had chronic static encephalopathy or other neurodevelopmental problems, and 6 had medical problems but were neuro-developmentally normal. Twenty-two survived to ICU discharge. Two 30 minute EEG samples were saved for each patient, leading to 74 tracings scored in this study.
Interobserver agreement in EEG interpretation is shown by kappa and ICC scores in Table 2. Interobserver agreement for burst suppression (present or absent, kappa=0.73), continuity (continuous or discontinuous or flat, kappa=0.69), sleep architecture (present or absent, kappa=0.8), and overall rating (severely abnormal or not, kappa=0.65) all had substantial inter-rater agreement. Seizure occurrence (present or absent, kappa=0.46) and inter-ictal epileptiform discharge type (kappa=0.4) had moderate inter-observer agreement. Inter-ictal epileptiform discharges (present or absent, kappa=0.36) and beta activity (present or absent, kappa=0.38) had fair interobserver agreement. Focal slowing had only slight interobserver agreement (kappa=0.1). Among the continuous variables, both fastest (ICC 0.40) and predominant frequency (ICC 0.39) had fair agreement.
With group re-review and adjudication of 10 tracings, readers agreed on continuity and burst suppression scoring in all tracings (kappa = 1). However, disagreement persisted for two tracings without clinical seizure activity regarding the presence or absence of seizures even after review. Two readers considered the tracings to contain frequent inter-ictal periodic epileptiform discharges but one opined there was there was sufficient evolution to consider those discharges as seizures (kappa = 0.66). Furthermore, disagreement on the presence or absence of seizures automatically degrades agreement on inter-ictal epileptiform discharges.
Several major obstacles to agreement were identified during discussion. First, the availability of video can be important in diagnosing seizures versus inter-ictal periodic epileptiform discharges. When video was reviewed more extensively, some discharges that were initially considered inter-ictal by some were found to have associated movements and were subsequently scored as seizures due to myoclonic status epilepticus. Reviewing the video associated with more epileptiform bursts identified more myoclonic seizures, since in some children only some bursts were associated with a clinical change. Second, it was realized how difficult it can be to decide in an isolated 30 minute epoch of EEG whether the periodic discharges represented inter-ictal periodic epileptiform discharges or an ictal pattern of prolonged status epilepticus. Third, background interpretation was difficult or impossible when very frequent regular periodic epileptiform discharges were present.
We evaluated the interobserver agreement in the interpretation of EEG samples in critically ill children who were comatose or obtunded after cardiac arrest. The greatest agreement was found for continuity state (continuous, discontinuous, or flat), burst suppression (present or absent), overall rating (severely abnormal or not), and sleep architecture (present or absent). Agreement was lower but still fair for seizure detection and more subtle EEG features such as fastest and predominant frequency, voltage, beta activity, and inter-ictal epileptiform discharge presence and type.
The issue of agreement in EEG interpretation has a long history. Investigators have examined EEG interpretation in critically ill patients (Gerber, et al. 2008, Ronner, et al. 2009), patients with new onset seizures (Stroink, et al. 2006), and children with idiopathic epilepsy (Piccinelli, et al. 2005). All described large variability in EEG interpretation. For example, a study of interobserver agreement in tracings from critically ill adults reported that agreement was substantial for the presence or absence and localization of rhythmic discharges but lower for more subtle features such as rhythmic discharge duration, persistence, and onset type (Gerber, et al. 2008). In another study of interobserver agreement in seizure detection in critically ill adults, moderate agreement was demonstrated, but agreement was lower for more subtle seizure descriptors including their frequency, onset, and offset (Ronner, et al. 2009). On the other hand, EEG scoring systems of broad categories, each composed of multiple interpretive features, have been studied in comatose adults and found to have nearly perfect or substantial agreement (Young, et al. 1997). Our data fits well with these past studies. Following group discussion and review, improved agreement could be obtained for most tracings. This confirms a prior study that demonstrated agreement improved from the fair-substantial level to the almost perfect range after group discussion and implementation of rules (Azuma, et al. 2003).
Following consensus discussion, several points that might lead to better agreement for some of the major EEG features studied. Careful use of time-locked video was important in distinguishing whether periodic epileptiform discharges were inter-ictal or ictal. Subtle myoclonic movements time-locked to the discharges establishes an ictal diagnosis. In addition to time locked video, adding other physiologic information such as EMG or respiratory pattern could further enhance the ability to differentiate between ictal and interictal discharges, but this information is currently not used in our ICU. Furthermore, within the limitations of an isolated 30 minute EEG snapshot, it was often unclear whether periodic epileptiform discharges represented an inter-ictal pattern or the middle of non-convulsive status epilepticus. Longer tracings and anticonvulsant response information may allow improved/more accurate categorization of these tracings. However, controversy persists regarding the management of periodic epileptiform discharges (Chong and Hirsch 2005). Further study is needed to better categorize and understand the importance of periodic epileptiform discharges.
Providing accurate prognosis in comatose children after cardiac arrest, is clinically important but difficult (Abend and Licht 2008). Historical and current clinical information can be unreliable. For example, arrest duration may be unknown and clinical examination findings may be confounded by pharmacologic intervention. Thus, utilizing non-invasive neurophysiologic testing at the bedside is appealing. EEG features reported to be predictive of poor outcome in children with hypoxic ischemic encephalopathy include low amplitude and electrocerebral silence (Tasker, et al. 1988), discontinuity and lack of reactivity and epileptiform discharges (Mandel, et al. 2002), and lack of reactivity and lack of normal sleep architecture (Cheliout-Heraut, et al. 1991). Another study that created an EEG grade utilizing continuity, frequency, and voltage found an association between worse EEG grades and poor outcome (Nishisaki, et al. 2007). While these studies suggest a prognostic role for EEG, our data suggest some of the features utilized for outcome prediction may not be interpreted in a standard manner. Future studies focused on prognostication may have improved accuracy and generalizability by focusing on EEG features with high inter-rater agreement. This will require clear and unambiguous definitions which can be reproducibly applied by many EEG readers and possibly by supplemented by employing quantitative analysis (Wennervirta, et al. 2009).
Recent studies have observed that non-convulsive seizures are common in critically ill children (Abend and Dlugos 2007, Abend, et al. 2009, Alehan, et al. 2001, Hosain, et al. 2005, Hyllienmark and Amark 2007, Jette, et al. 2006, Saengpattrachai, et al. 2006, Tay, et al. 2006) and non-convulsive seizures may impact outcome in critically ill adults (Carrera, et al. 2008, Oddo, et al. 2009, Young, et al. 1996). In addition to identifying EEG features with prognostic significance, attempts have also been made to identify EEG features that are predictive of seizures which would allow limited EEG monitoring resources to be directed to the highest risk patients. A study of critically ill children found that lateralized but not generalized or bilateral periodic epileptiform discharges predicted non-convulsive seizures (Jette, et al. 2006). In another study of children undergoing therapeutic hypothermia after cardiac arrest, burst suppression or excessive discontinuity, inter-ictal spike/sharp waves, or the absence of expected pharmacologic beta activity were predictive of seizures (Abend, et al. 2009). However, the current data suggest there may be wide variability in identifying some EEG features, and further definition of these terms or quantitative analysis may be needed. Further, the fact that inter-reader agreement for seizure detection was only moderate in our analysis suggests that these studies along with our clinical efforts to detect and treat non-convulsive seizures must be viewed in the context of variable detection since interpretation of the “gold standard” may not be perfectly standardized.
This study has several limitations. First, only 30 minute tracings were studied and it is possible that interpretation of some features of longer tracings may have worse agreement. On the other hand, in instances of periodic discharges, longer samples may have allowed interpreters to distinguish between the middle of status epilepticus and an inter-ictal periodic pattern. Second, this was a single institution study in which the participating neurophysiologists frequently review EEGs together and agreement may have been higher than if readers were from different institutions. Third, we exclusively studied EEGs obtained from children with diffuse brain injury after cardiac arrest so there was little opportunity to evaluate focal features. Fourth, readers were not provided data regarding subject age or medical exposure since they were asked to describe the EEG tracing and not to determine whether it was normal or not given a given age or pharmacologic exposure. However, this may impact the generalizability of our findings since in clinical practice readers have at least this basic clinical information which could impact clinical EEG interpretation.
In conclusion, we have demonstrated that in critically ill children certain inter-ictal EEG features (continuity, burst suppression) have high inter-reader agreement while other features have much lower agreement. The identification of EEG features that have high inter-rater reliability lays the ground-work to evaluate which EEG features are accurate and generalizable predictors of outcome.
Dr. Abend has received funding from NIH K12-NS049453 (Neurological Sciences Academic Development Award).