|Home | About | Journals | Submit | Contact Us | Français|
Stroke is an important cause of death and disability among children. Clinical trials for childhood stroke require a valid and reliable acute clinical stroke scale. We evaluated inter-rater reliability (IRR) of a pediatric adaptation of the NIH Stroke Scale (PedNIHSS).
The PedNIHSS was developed by pediatric and adult stroke experts by modifying each item of the adult NIH Stroke Scale (NIHSS) for children, retaining all exam items and scoring ranges of the NIHSS. Children 2–18 years with acute arterial ischemic stroke were enrolled in a prospective cohort study from 15 North American sites January 2007-October 2009. Examiners were child neurologists certified in the adult NIHSS. Each subject was examined daily for 7 days or until discharge. A subset of patients at 3 sites was scored simultaneously and independently by 2 study neurologists.
IRR testing was performed in 25 of 113 a median of 3 days (interquartile range [IQR] 2–4 days) after symptom onset. Patient demographics, total initial PedNIHSS scores, risk factors and infarct characteristics in the IRR subset were similar to the non-IRR subset. The two raters’ total scores were identical in 60% and within one point in 84%. IRR was excellent as measured by concordance correlation coefficient of 0.97 (95%CI; 0.94–0.99); ICC of 0.99 (95%CI 0.97–0.99); precision measured by Pearson ρ of 0.97; and accuracy measured by the bias correction factor (Cb) of 1.0.
There was excellent IRR of the PedNIHSS in a multicenter prospective cohort performed by trained child neurologists.
Ischemic stroke affects 1.2–7.9 per 100,000 children age 1 month to 18 years annually in Europe and North America and ranks among the top 10 causes of death 1, 2. Long-term motor and cognitive deficits interfering with activities of daily life and academic attainment affect 40–60% of survivors 3. There are no proven strategies for acute management or prevention of childhood stroke other than blood transfusion for children with sickle cell anemia. Progress in defining factors which determine outcome and designing clinical trials are hindered by the lack of a validated and reliable clinical stroke scale.
Previously published cohort studies of acute clinical presentation or long-term outcome in childhood stroke have not used standardized, validated, and reliable measures of clinical stroke severity at stroke onset. The National Institutes of Health Stroke Scale (NIHSS) is a quantitative measure of stroke-related acute neurologic deficit which has proven intra- and inter-rater reliability (IRR) and predictive validity for outcome among adults4–6. Neurological examination of children requires adjustment according to the maturation of the child’s neurological and cognitive function and ability to comprehend instructions. The objective of this study was to evaluate IRR of a pediatric modification of the NIHSS, the Pediatric NIH Stroke Scale (PedNIHSS), administered by child neurologists in children with acute arterial ischemic stroke (AIS).
This was a multicenter, prospective consecutive cohort study designed to evaluate the PedNIHSS for IRR, relationship to acute infarct volume, and predictive validity for functional outcome at 3 and 12 months after stroke onset. Outcome prediction and relationship to infarct volume will be reported separately. Patients were enrolled from 15 sites in the United States and Canada from January 2007 through October 2009. This study was approved by the institutional review board or institutional ethics board at all sites.
Patients were identified as potential study subjects and enrolled by the site neurologist or his/her designee at each site. Informed consent was obtained from the parent or guardian. Children enrolled met the following criteria: age at stroke onset ≥ 2 years and < 19 years, presentation within 96 hours after symptom onset with acute neurologic deficit of any duration consistent with focal brain ischemia in an arterial distribution, and MRI or CT performed within 96 hours of symptom onset demonstrating acute infarction in an arterial territory corresponding to the clinical deficit. Exclusion criteria included: acute traumatic brain injury; primary intracerebral or intraventricular hemorrhage; meningitis or encephalitis; status epilepticus (continuous clinical or electrographic seizure activity for ≥ 30 minutes); severe metabolic, toxic, or global hypoxic-ischemic encephalopathy; pre-existing severe neurologic deficit due to malignancy, congenital brain malformation, neurodegenerative disease, metabolic encephalopathy, severe residual deficits from perinatal or postnatal acquired encephalopathy; or stroke after craniotomy for any neurosurgical procedure.
The PedNIHSS was developed by a panel of pediatric and adult stroke experts by a consensus review process in which each item of the NIHSS was reviewed and modified for age-dependent variations in comprehension and participation in the exam item and age-appropriateness of testing materials (language items, picture, commands). All items were adapted to an age-appropriate format, while the scoring strategy and scoring ranges for all items administered in the adult NIHSS were retained in the PedNIHSS. See Online Supplement for details of modifications for each item, and item-by-item guide for administration in children age 2–18 years used in this study. Pilot testing of the pediatric modifications was conducted by four study neurologists in 15 patients at two study sites (CHOP and Toronto Hospital for Sick Children) and supported the validity and reliability of the PedNIHSS in a larger multicenter study.
We collected data on demographics, past medical history, clinical presentation, treatment, stroke risk factors, stroke-related diagnostic studies, and hospital course. All data were entered in a central database at the clinical and data coordinating center (CDCC) at Children’s Hospital of Philadelphia (CHOP). Neuroimaging studies including head CT, brain MRI, brain and cervical MRA, CT angiograms, and catheter angiograms were de-identified and sent to a central imaging repository at the CDCC. Admission case report forms were reviewed by the CDCC staff (principal investigator [PI] and study coordinator) for completeness and adherence to inclusion/exclusion criteria. Admission neuroimaging was reviewed by the study PI (RI) to confirm the diagnosis of AIS. All cases in which study inclusion was questioned were adjudicated by a rotating panel of three site neurologists and a study neuroradiologist. Acute diagnostic and treatment decisions were made according to standard clinical care protocols at each site.
Study examiners included the site neurologists, who are board certified child neurologists, and child neurology trainees supervised by the site neurologist, all experienced in the care of children with acute ischemic stroke. Each examiner was certified in use of the NIHSS using the standard online training program of the American Stroke Association (www.asatrainingcampus.org). Each site neurologist was given additional training by the study PI and provided with a detailed set of instructions regarding modifications of the NIHSS for the PedNIHSS. The site neurologist performed the PedNIHSS once daily from the day of admission through day 7 after admission or discharge, whichever occurred first. Evaluation of IRR was based on simultaneous scoring by two examiners for a subset of patients at each of three sites (CHOP, Children’s Hospital of Pittsburgh, and Toronto Hospital for Sick Children). The hospital day chosen for the IRR exam was determined by the site neurologist as the earliest time after hospital admission that two examiners were available to perform simultaneous examination. The primary examiner was the site neurologist, and the secondary examiner was designated by the site neurologist from among child neurologists or child neurology trainees at that site who regularly participated in the care of stroke patients. Each secondary examiner was trained using the online NIHSS training and certification and underwent comparable orientation and instruction in the use of the PedNIHSS as the primary site neurologist. The primary and secondary examiners each independently and simultaneously assigned scores to each item of the PedNIHSS at the time the primary examiner conducted any of that patient’s study examinations during the first week.
The sample size needed for the assessment of IRR was based on obtaining an intraclass correlation coefficient (ICC) of at least 0.80 for the PedNIHSS total scores. A sample size of 25 subjects, each rated independently by two neurologists using the PedNIHSS, was estimated to have 85% power to reject the null hypothesis that ICC = 0.50 using an F-test with a two-sidedα of 0.05. PASS was used for sample size estimation (NCSS, Kaysville, Utah). Statistical analyses were performed using SAS, SPSS, and MedCalc statistical packages (SAS Institute, Cary, NC; SPSS Science, Chicago, Ill; MedCalc Software, Mariakerke, Belgium). Descriptive analyses were performed using means with standard deviations or medians with interquartile ranges for continuous variables and frequency distributions and proportions for categorical variables. All continuous measures were evaluated for normality, and those variables differing markedly from normality were considered candidates for Box-Cox transformations. For the PedNIHSS scoring data, the mean, median, and frequency distribution for responses were computed for each individual item. A summed total score was computed, and its distribution examined for this sample. A two-tailed p-value < 0.05 was considered statistically significant. For the analysis of PedNIHSS reliability, internal consistency (Cronbach’s alpha) of the total scale was computed. IRR was assessed via the intra-class correlation coefficient (ICC) calculated by the analysis of variance (ANOVA) between total summed scores. We analyzed agreement on each item of the PedNIHSS for the two raters using weighted kappa (k) statistics. A k ≤0.40 defined poor reliability, 0.40 > k < 0.75 defined moderate reliability, and k ≥ 0.75 define excellent reliability 7. The measures of precision ρ and accuracy Cb for the total score pairs obtained from both raters were estimated8, 9. Scoring bias and the limits of agreement of the two raters’ total PedNIHSS score were estimated using Bland and Altman methods10. In this method, score differences between the two raters (d) were plotted against the average reading of the two raters. The mean differences in the total scores represent the bias in reading among the two rates. The 95% limits of agreement between the two raters were estimated by d ± SD of the differences.
113 patients were enrolled, of which 25 (22%) underwent simultaneous exams by two investigators, and were included in the inter-rater reliability (IRR) analysis. Simultaneous exams for IRR were performed at a median interval of 3 days after symptom onset (IQR 2–4 days), and were completed in 4 patients (16%) on hospital day 1, 6 (24%) on day 2, 6 (24%) on day 3, 3 (12%) on day 4, 2(8%) on day 5, and 4 (16%) on day 6. Patients in the IRR subgroup were compared to non-IRR patients with respect to age, sex, time from symptom onset to presentation to health care providers, median total PedNIHSS score on initial exam, infarct location, and primary stroke risk factor, and were found to be similar (Table 1).
There was excellent association between scores from both raters, as seen in Figures 1 and and2,2, with Spearman correlation coefficient of 0.97 (p<.001). The distribution of differences between the two raters’ total scores is shown in Table 2. Scores were identical in 60% of cases and within one point in 84%. Bias in scoring estimated using Bland and Altman methods was very small at −0.1 (Figure 3), and disagreement between raters was random. The concordance correlation coefficient (ρc) was 0.97 (95%CI 0.94–0.99). The precision measured by Pearson ρ was 0.97, and accuracy measured by the bias correction factor (Cb) was 1.0. Reliability measured by ICC was excellent (0.99; 95%CI 0.97–0.99). Internal consistency measured by Cronbach’s alpha estimated to be 0.99. Analysis of IRR for each exam item of the PedNIHSS demonstrated that agreement was excellent or moderate for all items (Table 3).
We found excellent IRR of the PedNIHSS administered by child neurologists in a prospective, multicenter study of children age 2–18 years with acute arterial ischemic stroke. Moreover, our finding that IRR was good to excellent for all items of the scale suggests that all items contribute to the overall excellent IRR of total PedNIHSS scores. Subjects examined for IRR were representative of the larger cohort with respect to demographic and clinical factors, as well as initial PedNIHSS score. The generalizability of our findings is supported by the fact that subjects in our study were representative of the age and sex distribution, time to initial presentation and primary stroke risk factors reported in other pediatric stroke cohort studies11–15.
The PedNIHSS closely resembles the adult NIHSS and is the first clinical stroke scale developed and evaluated for reliability in children. As with the adult NIHSS, we found the PedNIHSS is readily performed in the acute hospital setting in children of a broad range of ages and stroke severity. The excellent IRR demonstrated in our study compares favorably to that seen in adults examined with the NIHSS. Brott et al 4 demonstrated good IRR in a cohort of 24 adult patients examined by neurologists and neurology house officers with k values of 0.70–0.77 for the total score. Inter-rater agreement for individual exam items in Brott’s study was similar to our findings, with the strongest inter-rater agreement seen in adults for level of consciousness as assessed by LOC questions, best gaze, and motor tasks, and least agreement for sensory and language exam items. Goldstein et al further evaluated IRR of a refined version of Brott’s stroke scale in 20 adult patients with acute ischemic stroke and found good agreement, with k values for language and motor items of 0.77–0.79, and lower k values (0.4–0.6) for LOC commands, neglect, and sensory items 16. In contrast to the adult studies, we found good IRR for facial weakness, dysarthria, and ataxia. The reasons for this difference are uncertain. It is possible that the more observational nature of the examination in children actually enhances reliability compared to the more prescribed and confrontational examination in adults.
There are several limitations in our study. The small sample size relative to age range of subjects precluded separate analysis of IRR in different age groups. We did not evaluate PedNIHSS in children under age 2 years due to the importance we ascribed to the inclusion of language assessment, as the typical child under age 2 years has limited language ability. Additionally, children younger than age 2 and neonates often fail to present with focal deficits, potentially requiring a scale with less emphasis on focal sensorimotor deficits. . It is possible interactions between raters during examination may have occurred in order to achieve clarity of findings, resulting in an overestimate of inter-rater reliability. This effect is likely small because investigator training addressed this possible confounder at study start-up. Examiners in our study were child neurologists and child neurology trainees. This limits the generalizability of our findings on IRR with respect to the examining health care provider. As neurological assessment of the acutely ill child is often challenging, evaluation of the reliability of the PedNIHSS performed by non-neurologists will be needed before this scale can be used by non-neurologists in clinical trials.
Evaluation of the PedNIHSS in a multicenter cohort study revealed excellent inter-rater reliability. This is an important first step in developing a valid pediatric acute stroke scale, and is of fundamental importance for planning and executing future clinical trials in childhood stroke. Analysis of the relationship of PedNIHSS scores with infarct volume and 3- and 12-month functional outcomes obtained in our study is in progress, and will further characterize the validity and utility of this instrument.
Stefanie Mason and Charlene Jones for database development and data management; Jorina Elbers MD, Lori Billinghurst MD, Nomazulu Dlamini MBBS for performing IRR examinations.
Rebecca Ichord: NIH-R01-NS050488, K23-NS062110
Timothy Bernard: NIH-K23-HL096895
Lauren Beslow: NIH-T32-NS007413, L. Morton Morley Funds of The Philadelphia Foundation
Gabrielle deVeber: NIH-R01-NS062820
Michael Dowling: NIH-KL2-RR024983, The First American Real Estate Services, Inc.
Heather Fullerton: NIH-R01-NS062820; K02-NS053883
Lori Jordan: NIH K23-NS062110
Adam Kirton: Heart/Stroke Foundation Canada, Heart/Stroke Foundation Alberta, Alberta Children’s Hospital Foundation and Research Institute, Hotchkiss Brain Institute
Daniel Licht: NIH-K23-NS52380, Dana Foundation, June and Steve Wolfson Family Fund for Neurological Research
Sabrina Smith: NIH-K12-NS049453
Disclosures: R.N. Ichord: Consultant/Advisory Board, Modest (Berlin Heart Clinical Event Committee).
L. Jordan: Consultant/Advisory Board, Modest (Berlin Heart Clinical Event Committee).
None for: R. Bastian, L. Abraham, R. Askalan, S. Benedict, T. Bernard, L.A. Beslow, G. deVeber, M. Dowling, N. Friedman, H. Fullerton, L. Kan, A. Kirton, C. Amlie-Lefond, D. Licht, W. Lo, C. McClure, S. Pavlakis, S.E. Smith, M. Tan, S.E. Kasner, A.F. Jawad