Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Stroke. Author manuscript; available in PMC 2012 March 1.
Published in final edited form as:
PMCID: PMC3065389

Inter-rater Reliability of the Pediatric NIH Stroke Scale (PedNIHSS) in a Multicenter Study

Rebecca N Ichord, MD,1 Rachel Bastian, BA,1 Lisa Abraham, MD,2 Rand Askalan, MD, PhD,3 Susan Benedict, MD,4 Timothy J Bernard, MD,5 Lauren Beslow, MD,1 Gabrielle deVeber, MD,3 Michael Dowling, MD, PhD, MSCS,6 Neil Friedman, MBChB,7 Heather Fullerton, MD, MAS,8 Lori Jordan, MD, PhD,9 Li Kan, MD,10 Adam Kirton, MD,11 Catherine Amlie-Lefond, MD,12 Daniel Licht, MD,1 Warren Lo, MD,13 Chalmer McClure, MD, PhD,14 Steve Pavlakis, MD,15 Sabrina E Smith, MD, PhD,1 Marilyn Tan, MD,16 Scott Kasner, MD, MSCE,17 and Abbas F Jawad, PhD18


Background and Purpose

Stroke is an important cause of death and disability among children. Clinical trials for childhood stroke require a valid and reliable acute clinical stroke scale. We evaluated inter-rater reliability (IRR) of a pediatric adaptation of the NIH Stroke Scale (PedNIHSS).


The PedNIHSS was developed by pediatric and adult stroke experts by modifying each item of the adult NIH Stroke Scale (NIHSS) for children, retaining all exam items and scoring ranges of the NIHSS. Children 2–18 years with acute arterial ischemic stroke were enrolled in a prospective cohort study from 15 North American sites January 2007-October 2009. Examiners were child neurologists certified in the adult NIHSS. Each subject was examined daily for 7 days or until discharge. A subset of patients at 3 sites was scored simultaneously and independently by 2 study neurologists.


IRR testing was performed in 25 of 113 a median of 3 days (interquartile range [IQR] 2–4 days) after symptom onset. Patient demographics, total initial PedNIHSS scores, risk factors and infarct characteristics in the IRR subset were similar to the non-IRR subset. The two raters’ total scores were identical in 60% and within one point in 84%. IRR was excellent as measured by concordance correlation coefficient of 0.97 (95%CI; 0.94–0.99); ICC of 0.99 (95%CI 0.97–0.99); precision measured by Pearson ρ of 0.97; and accuracy measured by the bias correction factor (Cb) of 1.0.


There was excellent IRR of the PedNIHSS in a multicenter prospective cohort performed by trained child neurologists.

Keywords: sischemic stroke, childhood, outcome, validation, stroke scale


Ischemic stroke affects 1.2–7.9 per 100,000 children age 1 month to 18 years annually in Europe and North America and ranks among the top 10 causes of death 1, 2. Long-term motor and cognitive deficits interfering with activities of daily life and academic attainment affect 40–60% of survivors 3. There are no proven strategies for acute management or prevention of childhood stroke other than blood transfusion for children with sickle cell anemia. Progress in defining factors which determine outcome and designing clinical trials are hindered by the lack of a validated and reliable clinical stroke scale.

Previously published cohort studies of acute clinical presentation or long-term outcome in childhood stroke have not used standardized, validated, and reliable measures of clinical stroke severity at stroke onset. The National Institutes of Health Stroke Scale (NIHSS) is a quantitative measure of stroke-related acute neurologic deficit which has proven intra- and inter-rater reliability (IRR) and predictive validity for outcome among adults46. Neurological examination of children requires adjustment according to the maturation of the child’s neurological and cognitive function and ability to comprehend instructions. The objective of this study was to evaluate IRR of a pediatric modification of the NIHSS, the Pediatric NIH Stroke Scale (PedNIHSS), administered by child neurologists in children with acute arterial ischemic stroke (AIS).


Study Design

This was a multicenter, prospective consecutive cohort study designed to evaluate the PedNIHSS for IRR, relationship to acute infarct volume, and predictive validity for functional outcome at 3 and 12 months after stroke onset. Outcome prediction and relationship to infarct volume will be reported separately. Patients were enrolled from 15 sites in the United States and Canada from January 2007 through October 2009. This study was approved by the institutional review board or institutional ethics board at all sites.


Patients were identified as potential study subjects and enrolled by the site neurologist or his/her designee at each site. Informed consent was obtained from the parent or guardian. Children enrolled met the following criteria: age at stroke onset ≥ 2 years and < 19 years, presentation within 96 hours after symptom onset with acute neurologic deficit of any duration consistent with focal brain ischemia in an arterial distribution, and MRI or CT performed within 96 hours of symptom onset demonstrating acute infarction in an arterial territory corresponding to the clinical deficit. Exclusion criteria included: acute traumatic brain injury; primary intracerebral or intraventricular hemorrhage; meningitis or encephalitis; status epilepticus (continuous clinical or electrographic seizure activity for ≥ 30 minutes); severe metabolic, toxic, or global hypoxic-ischemic encephalopathy; pre-existing severe neurologic deficit due to malignancy, congenital brain malformation, neurodegenerative disease, metabolic encephalopathy, severe residual deficits from perinatal or postnatal acquired encephalopathy; or stroke after craniotomy for any neurosurgical procedure.

Development of PedNIHSS

The PedNIHSS was developed by a panel of pediatric and adult stroke experts by a consensus review process in which each item of the NIHSS was reviewed and modified for age-dependent variations in comprehension and participation in the exam item and age-appropriateness of testing materials (language items, picture, commands). All items were adapted to an age-appropriate format, while the scoring strategy and scoring ranges for all items administered in the adult NIHSS were retained in the PedNIHSS. See Online Supplement for details of modifications for each item, and item-by-item guide for administration in children age 2–18 years used in this study. Pilot testing of the pediatric modifications was conducted by four study neurologists in 15 patients at two study sites (CHOP and Toronto Hospital for Sick Children) and supported the validity and reliability of the PedNIHSS in a larger multicenter study.

Data collection

We collected data on demographics, past medical history, clinical presentation, treatment, stroke risk factors, stroke-related diagnostic studies, and hospital course. All data were entered in a central database at the clinical and data coordinating center (CDCC) at Children’s Hospital of Philadelphia (CHOP). Neuroimaging studies including head CT, brain MRI, brain and cervical MRA, CT angiograms, and catheter angiograms were de-identified and sent to a central imaging repository at the CDCC. Admission case report forms were reviewed by the CDCC staff (principal investigator [PI] and study coordinator) for completeness and adherence to inclusion/exclusion criteria. Admission neuroimaging was reviewed by the study PI (RI) to confirm the diagnosis of AIS. All cases in which study inclusion was questioned were adjudicated by a rotating panel of three site neurologists and a study neuroradiologist. Acute diagnostic and treatment decisions were made according to standard clinical care protocols at each site.

PedNIHSS Scoring and Reliability Testing

Study examiners included the site neurologists, who are board certified child neurologists, and child neurology trainees supervised by the site neurologist, all experienced in the care of children with acute ischemic stroke. Each examiner was certified in use of the NIHSS using the standard online training program of the American Stroke Association ( Each site neurologist was given additional training by the study PI and provided with a detailed set of instructions regarding modifications of the NIHSS for the PedNIHSS. The site neurologist performed the PedNIHSS once daily from the day of admission through day 7 after admission or discharge, whichever occurred first. Evaluation of IRR was based on simultaneous scoring by two examiners for a subset of patients at each of three sites (CHOP, Children’s Hospital of Pittsburgh, and Toronto Hospital for Sick Children). The hospital day chosen for the IRR exam was determined by the site neurologist as the earliest time after hospital admission that two examiners were available to perform simultaneous examination. The primary examiner was the site neurologist, and the secondary examiner was designated by the site neurologist from among child neurologists or child neurology trainees at that site who regularly participated in the care of stroke patients. Each secondary examiner was trained using the online NIHSS training and certification and underwent comparable orientation and instruction in the use of the PedNIHSS as the primary site neurologist. The primary and secondary examiners each independently and simultaneously assigned scores to each item of the PedNIHSS at the time the primary examiner conducted any of that patient’s study examinations during the first week.

Statistical analysis

The sample size needed for the assessment of IRR was based on obtaining an intraclass correlation coefficient (ICC) of at least 0.80 for the PedNIHSS total scores. A sample size of 25 subjects, each rated independently by two neurologists using the PedNIHSS, was estimated to have 85% power to reject the null hypothesis that ICC = 0.50 using an F-test with a two-sidedα of 0.05. PASS was used for sample size estimation (NCSS, Kaysville, Utah). Statistical analyses were performed using SAS, SPSS, and MedCalc statistical packages (SAS Institute, Cary, NC; SPSS Science, Chicago, Ill; MedCalc Software, Mariakerke, Belgium). Descriptive analyses were performed using means with standard deviations or medians with interquartile ranges for continuous variables and frequency distributions and proportions for categorical variables. All continuous measures were evaluated for normality, and those variables differing markedly from normality were considered candidates for Box-Cox transformations. For the PedNIHSS scoring data, the mean, median, and frequency distribution for responses were computed for each individual item. A summed total score was computed, and its distribution examined for this sample. A two-tailed p-value < 0.05 was considered statistically significant. For the analysis of PedNIHSS reliability, internal consistency (Cronbach’s alpha) of the total scale was computed. IRR was assessed via the intra-class correlation coefficient (ICC) calculated by the analysis of variance (ANOVA) between total summed scores. We analyzed agreement on each item of the PedNIHSS for the two raters using weighted kappa (k) statistics. A k ≤0.40 defined poor reliability, 0.40 > k < 0.75 defined moderate reliability, and k ≥ 0.75 define excellent reliability 7. The measures of precision ρ and accuracy Cb for the total score pairs obtained from both raters were estimated8, 9. Scoring bias and the limits of agreement of the two raters’ total PedNIHSS score were estimated using Bland and Altman methods10. In this method, score differences between the two raters (d) were plotted against the average reading of the two raters. The mean differences in the total scores represent the bias in reading among the two rates. The 95% limits of agreement between the two raters were estimated by d ± SD of the differences.


Subject characteristics

113 patients were enrolled, of which 25 (22%) underwent simultaneous exams by two investigators, and were included in the inter-rater reliability (IRR) analysis. Simultaneous exams for IRR were performed at a median interval of 3 days after symptom onset (IQR 2–4 days), and were completed in 4 patients (16%) on hospital day 1, 6 (24%) on day 2, 6 (24%) on day 3, 3 (12%) on day 4, 2(8%) on day 5, and 4 (16%) on day 6. Patients in the IRR subgroup were compared to non-IRR patients with respect to age, sex, time from symptom onset to presentation to health care providers, median total PedNIHSS score on initial exam, infarct location, and primary stroke risk factor, and were found to be similar (Table 1).

Table 1
Demographic and clinical characteristics of study subjects

Inter-rater reliability

There was excellent association between scores from both raters, as seen in Figures 1 and and2,2, with Spearman correlation coefficient of 0.97 (p<.001). The distribution of differences between the two raters’ total scores is shown in Table 2. Scores were identical in 60% of cases and within one point in 84%. Bias in scoring estimated using Bland and Altman methods was very small at −0.1 (Figure 3), and disagreement between raters was random. The concordance correlation coefficient (ρc) was 0.97 (95%CI 0.94–0.99). The precision measured by Pearson ρ was 0.97, and accuracy measured by the bias correction factor (Cb) was 1.0. Reliability measured by ICC was excellent (0.99; 95%CI 0.97–0.99). Internal consistency measured by Cronbach’s alpha estimated to be 0.99. Analysis of IRR for each exam item of the PedNIHSS demonstrated that agreement was excellent or moderate for all items (Table 3).

Figure 1
Scatter plot demonstrating association between total PedNIHSS scores of primary rater (Y-axis) and secondary rater (X-axis). Open symbols depict data for two subjects; closed symbols for single subjects.
Figure 2
Box plot demonstrating total PedNIHSS scores of each rater: Mean (symbol), lower line of box = 25%ile, upper line of box = 75%ile, line in box = median, upper and lower whiskers = largest and smallest non-outlying values, respectively.
Figure 3
Bland and Altman plot of scoring bias. Open symbols depict data for two subjects; closed symbols for single subjects.
Table 2
Distribution of differences in total PedNIHSS scores between the two examiners
Table 3
Item Reliability of PedNIHSS


We found excellent IRR of the PedNIHSS administered by child neurologists in a prospective, multicenter study of children age 2–18 years with acute arterial ischemic stroke. Moreover, our finding that IRR was good to excellent for all items of the scale suggests that all items contribute to the overall excellent IRR of total PedNIHSS scores. Subjects examined for IRR were representative of the larger cohort with respect to demographic and clinical factors, as well as initial PedNIHSS score. The generalizability of our findings is supported by the fact that subjects in our study were representative of the age and sex distribution, time to initial presentation and primary stroke risk factors reported in other pediatric stroke cohort studies1115.

The PedNIHSS closely resembles the adult NIHSS and is the first clinical stroke scale developed and evaluated for reliability in children. As with the adult NIHSS, we found the PedNIHSS is readily performed in the acute hospital setting in children of a broad range of ages and stroke severity. The excellent IRR demonstrated in our study compares favorably to that seen in adults examined with the NIHSS. Brott et al 4 demonstrated good IRR in a cohort of 24 adult patients examined by neurologists and neurology house officers with k values of 0.70–0.77 for the total score. Inter-rater agreement for individual exam items in Brott’s study was similar to our findings, with the strongest inter-rater agreement seen in adults for level of consciousness as assessed by LOC questions, best gaze, and motor tasks, and least agreement for sensory and language exam items. Goldstein et al further evaluated IRR of a refined version of Brott’s stroke scale in 20 adult patients with acute ischemic stroke and found good agreement, with k values for language and motor items of 0.77–0.79, and lower k values (0.4–0.6) for LOC commands, neglect, and sensory items 16. In contrast to the adult studies, we found good IRR for facial weakness, dysarthria, and ataxia. The reasons for this difference are uncertain. It is possible that the more observational nature of the examination in children actually enhances reliability compared to the more prescribed and confrontational examination in adults.

There are several limitations in our study. The small sample size relative to age range of subjects precluded separate analysis of IRR in different age groups. We did not evaluate PedNIHSS in children under age 2 years due to the importance we ascribed to the inclusion of language assessment, as the typical child under age 2 years has limited language ability. Additionally, children younger than age 2 and neonates often fail to present with focal deficits, potentially requiring a scale with less emphasis on focal sensorimotor deficits. . It is possible interactions between raters during examination may have occurred in order to achieve clarity of findings, resulting in an overestimate of inter-rater reliability. This effect is likely small because investigator training addressed this possible confounder at study start-up. Examiners in our study were child neurologists and child neurology trainees. This limits the generalizability of our findings on IRR with respect to the examining health care provider. As neurological assessment of the acutely ill child is often challenging, evaluation of the reliability of the PedNIHSS performed by non-neurologists will be needed before this scale can be used by non-neurologists in clinical trials.


Evaluation of the PedNIHSS in a multicenter cohort study revealed excellent inter-rater reliability. This is an important first step in developing a valid pediatric acute stroke scale, and is of fundamental importance for planning and executing future clinical trials in childhood stroke. Analysis of the relationship of PedNIHSS scores with infarct volume and 3- and 12-month functional outcomes obtained in our study is in progress, and will further characterize the validity and utility of this instrument.

Supplementary Material


Stefanie Mason and Charlene Jones for database development and data management; Jorina Elbers MD, Lori Billinghurst MD, Nomazulu Dlamini MBBS for performing IRR examinations.

Funding sources:

Rebecca Ichord: NIH-R01-NS050488, K23-NS062110

Timothy Bernard: NIH-K23-HL096895

Lauren Beslow: NIH-T32-NS007413, L. Morton Morley Funds of The Philadelphia Foundation

Gabrielle deVeber: NIH-R01-NS062820

Michael Dowling: NIH-KL2-RR024983, The First American Real Estate Services, Inc.

Heather Fullerton: NIH-R01-NS062820; K02-NS053883

Lori Jordan: NIH K23-NS062110

Adam Kirton: Heart/Stroke Foundation Canada, Heart/Stroke Foundation Alberta, Alberta Children’s Hospital Foundation and Research Institute, Hotchkiss Brain Institute

Daniel Licht: NIH-K23-NS52380, Dana Foundation, June and Steve Wolfson Family Fund for Neurological Research

Sabrina Smith: NIH-K12-NS049453


Disclosures: R.N. Ichord: Consultant/Advisory Board, Modest (Berlin Heart Clinical Event Committee).

L. Jordan: Consultant/Advisory Board, Modest (Berlin Heart Clinical Event Committee).

None for: R. Bastian, L. Abraham, R. Askalan, S. Benedict, T. Bernard, L.A. Beslow, G. deVeber, M. Dowling, N. Friedman, H. Fullerton, L. Kan, A. Kirton, C. Amlie-Lefond, D. Licht, W. Lo, C. McClure, S. Pavlakis, S.E. Smith, M. Tan, S.E. Kasner, A.F. Jawad


1. Fullerton HJ, Wu YW, Zhao S, Johnston SC. Risk of stroke in children: Ethnic and gender disparities. Neurology. 2003;61:189–194. [PubMed]
2. Giroud M, Lemesle M, Gouyon JB, Nivelon JL, Milan C, Dumas R. Cerebrovascular disease in children under 16 years of age in the city of dijon, france: A study of incidence and clinical features from 1985 to 1993. J Clin Epidemiol. 1995;48:1343–1348. [PubMed]
3. deVeber GA, MacGregor D, Curtis R, Mayank S. Neurologic outcome in survivors of childhood arterial ischemic stroke and sinovenous thrombosis. J Child Neurol. 2000;15:316–324. [PubMed]
4. Brott T, Adams HP, Jr, Olinger CP, Marler JR, Barsan WG, Biller J, Spilker J, Holleran R, Eberle R, Hertzberg VS, Rorick M, Moomaw CJ, Walker M. Measurements of acute cerebral infarction: A clinical examination scale. Stroke. 1989;20:864–870. [PubMed]
5. Lyden P, Brott T, Tilley B, Welch KM, Mascha EJ, Levine S, Haley EC, Grotta J, Marler J. Improved reliability of the nih stroke scale using video training. Ninds tpa stroke study group. Stroke. 1994;25:2220–2226. [PubMed]
6. Demchuk AM, Tanne D, Hill MD, Kasner SE, Hanson S, Grond M, Levine SR. Predictors of good outcome after intravenous tpa for acute ischemic stroke. Neurology. 2001;57:474–480. [PubMed]
7. Fleiss JL. Statistical methods for rates and proportions. New York: Wiley & Sons; 1981.
8. Lin LI. A concordance correlation coefficient to evaluate reproducibility. Biometrics. 1989;45:255–268. [PubMed]
9. Lin LI. A note on the concordance correlation coefficient. Biometrics. 2000;56:324–325.
10. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8:135–160. [PubMed]
11. Amlie-Lefond C, Bernard TJ, Sebire G, Friedman NR, Heyer GL, Lerner NB, DeVeber G, Fullerton HJ. Predictors of cerebral arteriopathy in children with arterial ischemic stroke: Results of the international pediatric stroke study. Circulation. 2009;119:1417–1423. [PubMed]
12. Ganesan V, Prengler M, McShane MA, Wade AM, Kirkham FJ. Investigation of risk factors in children with arterial ischemic stroke. Ann Neurol. 2003;53:167–173. [PubMed]
13. Golomb MR, Fullerton HJ, Nowak-Gottl U, Deveber G. Male predominance in childhood ischemic stroke: Findings from the international pediatric stroke study. Stroke. 2009;40:52–57. [PubMed]
14. Rafay MF, Pontigon AM, Chiang J, Adams M, Jarvis DA, Silver F, Macgregor D, Deveber GA. Delay to diagnosis in acute pediatric arterial ischemic stroke. Stroke. 2009;40:58–64. [PubMed]
15. Srinivasan J, Miller SP, Phan TG, Mackay MT. Delayed recognition of initial stroke in children: Need for increased awareness. Pediatrics. 2009 [PubMed]
16. Goldstein LB, Samsa GP. Reliability of the national institutes of health stroke scale: Extension to non-neurologists in the context of a clinical trial. Stroke. 1997;28:307–310. [PubMed]