The NICHD Neonatal Research Network maintains a study manual with standard definitions (). A diagnosis of neurodevelopmental impairment (NDI) is implemented for both randomized trials and observational follow-up studies. The definition of NDI is the presence of any of the following: moderate to severe CP, bilateral blindness, bilateral hearing impairment requiring amplification, a gross motor function classification score (GMFCS)6,7
greater than or equal to level 2 reflecting moderate to severe motor impairment or a Bayley Scales-III cognitive or motor score less than 70. The NRN Follow-up Study neurologic examination5
includes a standard assessment of reflexes, muscle tone and strength, functional motor skills, hearing and vision. Assessment vision and hearing is completed by: (1) reviewing history with family, (2) review of records from subspecialty ophthalmology and audiology services, (3) determining if child is receiving services for the blind, deaf educator, speech/language or sign language, or has amplification/FM system, (4) examination of the eyes including tracking, strabismus, glasses, surgical procedures, etc. and, (5) determining if the child can follow a simple verbal direction provided by the examiner. Hospital records, for the most part, contain this information and parents are usually aware of the formal diagnosis of blindness or permanent hearing loss.
NICHD Neonatal Research Network 2009–2010 Definitions for the 18–22 Month Neurologic Exam
Based on the neurologic examination, a determination is made of whether the child is normal or abnormal, and whether the child has CP as well as the severity of CP. CP is defined with the following three criteria: (1) Definite abnormalities observed in the classical neuromotor exam, which includes measurement of tone, deep tendon reflexes, coordination and movement. Any one definite abnormality in the classic neuromotor exam, as defined, except for isolated low tone (hypotonia) or toe walking without tight ankles is sufficient. (2) A delay in motor milestones with a disorder of motor function must be present. This may or may not be reflected in a motor quotient less than 70. In mild cases, there may be a subtle difference in hand functioning with a fine pincer grasp in one hand and a raking grasp in the other hand. Some disorder of motor function must be present. (3) Aberrations in primitive reflexes and postural reactions may be present.
The hierarchical classification of CP subtypes10
is included in the annual examiner training, with the aim of achieving inter-rater reliability in all subtypes including spastic, dyskinetic, and hypotonic CP. An assessment of gross motor function level is completed based on the Palisano et al.6,7
Gross Motor Function Classification system to provide a standardized classification of the severity of motor disability. This method is useful for assessing a level of severity of CP.6,7,11
Bimanual hand function is assessed in two ways. First, in a 3-level system that includes: (1) no problems with bimanual task, (2) some difficulty with bimanual tasks, and (3) no functional bimanual tasks, and second, in a 5-level assessment of each hand that ranges from 1 fine pincer grasp to 5 unable to grasp/makes no attempt to grasp.
Since 2005, there have been 16 participating NRN centers. Each center designates a primary neurologic examiner, who is generally the Follow-up Study Principal Investigator (PI) of the center, based on their expertise in the field of newborn follow-up, neurologic examination of children, and developmental assessment. The primary neurologic examiner is responsible for fulfilling annual certification requirements and for training and certifying other examiners at their center to conduct the neurologic exam.
The annual certification process for the NRN has evolved over time (; available at www.jpeds.com
). The 1994 workshop protocol for the 1993 cohort was in place for five years (1994 to 1999). The training consisted of bringing six to seven children to the workshop at 18–22 months corrected age with a variety of neurologic diagnoses. Initially, there were 12 examiners who split into groups of three and rotated into the exam rooms to evaluate the children and score the neurologic exam sheets. The drawback with this approach was that by the time the third or fourth group entered the exam room, the child was intolerant of further examiners. Although this method of certification provided an opportunity to discuss the various diagnoses of each child, it was considered sub-optimal due to patient fatigue. In addition, not every examiner had a “hands-on” opportunity.
Methodology for Neurologic Exam Training on Former-Preterm Children at a Corrected Age of 18–22 Months between 1994 and 2010
In 2000, the training protocol was modified to use videos of neurologic exams. Each center submitted a video of their site’s primary neurologic examiner completing the neurologic examination on an 18–22 month corrected age former extremely low birth weight infant. The videos were all reviewed by two NRN Gold-Standard examiners and feedback was provided to each center examiner. Six high quality exams with a variety of diagnoses were identified for the training workshop. These six exams were viewed by the center examiners during the workshop, scored independently on the neurologic exam forms and discussed. Examiners reached consensus on the diagnoses after discussion and were certified as their center’s primary neurologic examiner. An annual certification DVD was made with the selected exams following the workshop for the certified center examiners to certify additional developmentalists at their centers. During the 2008 workshop, however, it was proposed that the certification process be formalized further to train and clearly document inter-rater agreement on all test items including the NRN primary outcome of NDI, and to provide focused training during the workshops on areas of identified examiner weakness. Certification, however, would be determined by inter-rater agreement on four specific outcomes: normal neurologic exam, CP, GMFCS greater than or equal to level 2, and NDI.
Starting in 2009, although the initial submission process of videos by the primary examiners to the NRN Gold-standard examiners remained the same, the scoring of certification videos by each of the centers was formalized (). Five to six selected exam videos that were representative of a spectrum of neurologic findings were collated on a certification DVD and sent to primary examiners to view and score using the NRN neurologic exam forms prior to the workshop. The forms were then keyed into the center’s NRN data management system and transmitted to the data coordinating center at RTI International for analysis. The data from each center were then analyzed by the center examiner’s correct response and by the group’s consensus with the video findings. The NRN Gold-Standard examiners were then able to identify strengths and weaknesses of the center examiners’ performance. With this process, the annual certification workshop in October 2009 began to focus training on scoring discrepancies and inter-rater agreement allowing for targeted training for problematic items. The workshop concluded with a “test” video, which examiners had not viewed before the annual certification workshop. Examiners viewed and scored the “test” video using NRN neurologic exam forms which were then entered into the NRN database for analysis. Examiners received feedback on both their training DVD and the scoring of their “test” video.
All analyses were performed using SAS 9.1 for Windows.12
Statistical analyses focused on four primary outcomes of the neurologic assessment: (1) whether the neurologic exam was considered normal, (2) a diagnosis of moderate to severe CP, (3) a GMFCS greater than or equal to level 2 indicating moderate to severe gross motor impairment, and (4) whether the child met the neurologic or neurosensory component of the NRN definition of neurodevelopmental impairment (moderate/severe CP or GMFCS greater than or equal to level 2 or bilateral vision or hearing impairment). The percentage of examiners who agreed with the NRN Gold Standard raters on each video was computed. Consistency among examiners (i.e., inter-rater reliability) was assessed by computing the first-order agreement coefficient (AC1) statistic, using the AC1AC2 SAS macro.13
Even though the kappa statistic has traditionally been used to assess inter-rater reliability, researchers have noted several limitations of kappa, including the lack of correction for chance agreement and the kappa paradox wherein kappa can be low even though agreement is high.14
The AC1 statistic has been developed to address these limitations.15
To help with interpretation, we used the general guidelines for reliability coefficients suggested by Fleiss16
with values less than 0.40 indicating poor agreement, 0.40–0.75 indicating good agreement, and greater than 0.75 indicating excellent agreement.