|Home | About | Journals | Submit | Contact Us | Français|
To assess interobserver reliability between two central readers of cranial ultrasound (CUS) and accuracy of local compared with central interpretations.
A retrospective analysis of CUS data from the NICHD trial of inhaled nitric oxide for premature infants. Interobserver reliability of two central readers was assessed by kappa or weighted kappa. Accuracy of local compared with central interpretations was assessed by sensitivity and specificity.
Cranial US from 326 infants had both central reader and local interpretations. Central reader agreement for grade 3/4 IVH, grade 3/4 IVH or PVL, grade of IVH, and degree of ventriculomegaly was very good (kappa=0.84, 0.81, 0.79, and 0.75, respectively). Agreement was poor for lower grade IVH and for PVL alone. Local interpretations were highly accurate for grade 3/4 IVH or PVL (sensitivity 87–90%, specificity 92–93%), but sensitivity was poor to fair for grade 1/2 IVH (48–68%) and PVL (20–44%).
Our findings demonstrate reliability and accuracy of highly unfavorable CUS findings, but suggest caution when interpreting mild to moderate IVH or white matter injury.
Cranial ultrasound (CUS) is one of the most important diagnostic procedures performed in the neonatal intensive care unit (NICU). After Pape, et. al. (1) reported bedside ultrasound to detect intraventricular hemorrhage (IVH) in premature infants, it quickly became the neuroimaging standard of care (2–4). Periventricular leukomalacia (PVL) can be detected by CUS as cystic lesions or, in evolution, as periventricular echodensities (5). Outcome studies of preterm infants have revealed strong associations of severe CUS abnormalities with later major neurodevelopmental disabilities (6–10).
Despite the apparently important role of accurate CUS interpretation, few studies have investigated reliability or accuracy. Randomized controlled trials and interventional studies involving premature infants often include severe CUS abnormalities among study outcomes, only occasionally using central reader interpretations. Observational studies routinely report frequencies of abnormal CUS findings based solely on local interpretations. However, although intraobserver reliability analyses of CUS interpretations have been previously reported (11–14), little attention has been given to central reader reliability or to accuracy of local compared with central reader interpretations.
Substantial central and local reader CUS data are available from the multicenter NICHD Neonatal Research Network randomized, controlled, double-masked trial of inhaled nitric oxide (iNO) for severe respiratory failure in premature infants (PiNO trial) (15). We undertook an analysis of these data to assess interobserver reliability between two central readers, and to assess accuracy of local interpretations compared with central readers.
This was a retrospective reliability and accuracy study of CUS data from the PiNO trial (15) (infants less than 34 weeks gestation and 401–1500 g), and from the concurrently enrolled larger premature infant pilot (infants less than 34 weeks but greater than 1500 g). A secondary hypothesis of the trial was that surviving infants in the iNO group would have no increase in grade 3 or 4 IVH or PVL compared with those in the placebo group. A single CUS was originally required at 28±3 days of age among survivors, which was to be read by two central readers. However, after interim analysis, the Data Safety and Monitoring Committee recommended that copies of all CUS performed during hospitalization be requested from sites for central reading. Data collection regarding local interpretations was nevertheless required only for the 28±3 day CUS, if the patient survived to that time. Data collection of local interpretations of other CUS was requested, but not required, for three other specifically timed CUS, if they were performed: before study gas administration, during study gas administration, and after 28±3 days of age.
Thus, for the current analysis, a single CUS was included from each patient for whom a CUS was submitted and for which both local and central readings were documented. We structured our analysis in this way in part to limit potential bias due to central reader misclassifications carried through an entire patient CUS series. The 28-day CUS was used if it existed; however, because approximately 50% of these critically ill infants died, the CUS closest to that time point was used if a 28-day study was not obtained.
Trained research staff at each institution collected data regarding local CUS interpretations directly from local radiologists’ reports. The PiNO trial did not limit the number of local radiologists or technologists involved. Cranial US were obtained and read per local institution clinical protocol; the local site radiology departments were not required to adhere to policy with respect to CUS view acquisition, nor did local radiologists implement any changes to their interpretation approach due to the PiNO trial. The PiNO trial manual of operations provided suggested definitions for grades of IVH (Grade 1: Hemorrhage confined to the germinal matrix/subependymal area; IVH Grade 2: Hemorrhage in the lateral ventricle(s) without distension; Grade 3: Hemorrhage in the lateral ventricle(s) with distension; Grade 4: Hemorrhage extending to the brain parenchyma), but no special training was undertaken for local site radiologists or technologists with respect to the PiNO trial. Echodense PVL was not distinguished from echolucent PVL on local interpretation data queries; PVL was coded as present if the local radiologist final clinical report documented PVL on CUS. Research staff at each institution completed standardized data collection forms from the local radiologist final reports, and transferred data by secure computer network to the data center (RTI International, Research Triangle Park, N.C.). No queries with respect to side of lesion were included; thus, IVH on either or both sides was coded as “Yes”. If IVH was present bilaterally, the most severe IVH was coded. Data with respect to ventriculomegaly were not collected on the local interpretation data collection instrument.
Copies of CUS were sent by centers to RTI International in film or digital format. Two central readers reviewed CUS during a two-day period after the trial closed. The central readers were board-certified pediatric radiologists from different academic institutions, and had special expertise in CUS. Prior to the trial, the central readers collaborated with the PiNO trial subcommittee to create the central reader data instrument (Figure; available at www.jpeds.com), which collected detailed, hemisphere-specific radiologic observations and diagnostic classifications. They briefly reviewed the instrument together before the central reading session, but interpreted CUS independently. For the PiNO trial (15), a third reader later adjudicated discrepant interpretations between the two central readers, but this was limited only to those studies in which the global CUS diagnosis of grade 3 or 4 IVH or PVL on either hemisphere was not agreed upon.
Given differences between the central and local reader data collection instruments, reliability and accuracy analyses focused on important clinical diagnostic and prognostic category queries common to both. Because local data collection queries did not differentiate PVL type, we limited both reliability and accuracy analyses to any PVL. Only central reader reliability analyses could be performed with respect to ventriculomegaly because local data were not collected. SAS 9.1.2. (SAS Institute, Inc., Cary, NC, USA) was used for all analyses.
Assessment of reliability between central readers was by kappa statistic. Kappa measures the extent of agreement beyond that which would be expected by chance alone under the assumption of independence, described as:
We tested central reader agreement by kappa on dichotomous CUS observations, including normal reading, grades of IVH, any PVL, grade 3 or 4 IVH or PVL, and any ventricular size increase. Multicategory weighted kappa analysis (18, 20, 21) was performed for grade of IVH and severity of ventriculomegaly. The weighted kappa allows for some credit to be given for minor misclassifications (20, 21). Because kappa is limited by its dependence on prevalence (22, 23), use of accompanying measurements of agreement such as percent positive agreement (PPA) have been recommended (24).
Assessment of accuracy was by sensitivity and specificity analyses, using central reader interpretation as the “gold standard” reference. Analyses separately compared local interpretations to central reader 1 (CR #1) and central reader 2 (CR #2). Sensitivity was defined as the proportion “positive” by local among all “positive” by central reader; specificity was defined as the proportion “negative” by local among all “negative” by central reader. Sensitivity and specificity analyses evaluated diagnostic categories common to both the local reader queries and central reader forms. Because instruments differed with respect to whether CUS findings were recorded as global or hemisphere-specific, the grade of IVH recorded on the local reader form was considered “correct” if the highest grade IVH recorded on either hemisphere by the central reader was the same as the local reader result.
The Institutional Review Boards (IRB) of all centers reviewed and approved of the study. Informed consent was obtained from the patient’s parent or legal guardian prior to randomization to iNO or placebo. Patient identifiers, including name and hospital medical record number, were removed by hand or digitally from CUS studies prior to being sent to RTI for central reading.
INO Therapeutics (Clinton, New Jersey) provided the study gas and gas delivery systems for all hospitals, and capitation funding for the hospitals outside the NICHD Neonatal Research Network that participated in the PiNO trial. The company was not involved in the trial design, data analysis or interpretation, or secondary or ancillary data analysis.
449 infants were enrolled in the PiNO trial and larger premature infant pilot; 70 had no CUS performed. Of those remaining, 361 patients had at least one CUS of adequate quality to result in completed central readings, and 326 CUS had both central and local readings. Observations for all diagnostic categories were not available due to missing or uninterpretable data entries.
326 CUS studies and 650 hemispheres were read by both central readers. The diagnostic assignments made by the central readers are shown in Table I.
Central reader reliability for major diagnostic categories is shown in Table II. The best central reader agreement was achieved for the major adverse prognostic categories, IVH grade 3 or 4 and IVH grade 3 or 4 or PVL. Agreement between central readers was poor for lower grade IVH, and for PVL alone.
When complete IVH grade classification data (Table III; available at www.jpeds.com) is viewed with respect to prognostic categorization, the results reflect very good agreement (multilevel weighted kappa=0.79, 95%CI 0.75–0.82). Of 426 hemispheres coded by CR #1 as no IVH, 3 were coded by CR #2 as grade 4 IVH; of 391 hemispheres coded by CR #2 as no IVH, 3 were coded by CR #1 as grade 4 IVH. In 2 of the 6 cases, central readers agreed that intracerebral, but not intraventricular or periventricular echodensities, were present. In 3 cases, the central readers agreed that grade 3 or 4 IVH or PVL existed on the contralateral hemisphere, thus the third reader did not adjudicate those discrepancies. In one case, the adjudicator considered the study to be of poor quality and uninterpretable.
The complete ventricular enlargement classification data (Table IV; available at www.jpeds.com) are also reassuring. Only 2 of the 516 ventricles coded by CR#1 as no or mild enlargement were coded as severely enlarged by CR#2. None of the 488 ventricles coded by CR#2 as no or mild enlargement was coded as severely enlarged by CR#1.
Table V summarizes sensitivity and specificity for local interpretations compared with CR #1 and CR #2 for major diagnostic and prognostic categories. These results demonstrate excellent sensitivity and specificity for local interpretations for broad diagnostic categories, and for the major adverse prognostic category of grade 3 or 4 IVH or PVL. However, sensitivity was poor for lower grades of IVH and for PVL. Complete classification tables for IVH grades and major diagnostic category interpretations for local compared with central reader interpretations are shown online (Table VI–Table VIII; available at www.jpeds.com).
In this analysis of CUS interpretations of PiNO trial subjects, we found that reliability between central readers was excellent for severe IVH, the major prognostic category of grade 4 or 4 IVH or PVL, and degree of ventriculomegaly, but was substantially poorer for grade 1 or 2 IVH. Local CUS interpretations were highly sensitive and specific for severe hemorrhage and for global presence of IVH, but sensitivity was poorer for lower grades of IVH. Our results also reflect significant variability in interpretation of PVL by CUS. These findings are reassuring with respect to highly unfavorable hemorrhagic CUS findings associated with adverse neurodevelopmental outcome, but suggest caution when interpreting reports of mild to moderate hemorrhage. Advanced neuroimaging methods such as MRI would better determine the extent of subtle brain injury and white matter injury.
Our interobserver analyses compare favorably with the few previous detailed studies. Pinto et. al. (12) also found that agreement for severe findings was excellent, but much worse for less severe categories. Although the Pinto study reported generally better agreement than ours, the radiologists and technologists had training sessions prior to and during the study. The study by Corbett et al (13) also showed very good interobserver agreement for severe findings, but poor for lower IVH grades and ventricular enlargement. No special training or certification process was undertaken as part of the Corbett study, thus, our analysis and the Corbett study may better reflect the “true” level of agreement between independent radiologists at different institutions.
Our reliability analyses reveal poor agreement on PVL between central readers. The prevalence of PVL was low, thus agreement analyses by either kappa or PPA may not be particularly informative. The clinical importance of the poor agreement in this scenario is also unclear. Echodense PVL is often transient, and the prognostic significance of a single CUS with echodense PVL may be minimal (25). Nevertheless, this disagreement on CUS interpretations of white matter injury, as well as the low rate of PVL in this study compared with that recognized by MRI, reinforces the potential substantial diagnostic and predictive benefit afforded by modalities such as MRI (26).
Very few hemispheres interpreted by one central reader as no IVH were interpreted as grade 4 by the other. In half of these cases, the global CUS categorization to an adverse prognostic category would not have changed because the contralateral hemisphere was affected by severe IVH or PVL. In some instances, pure parenchymal hemorrhages appear to have been variably coded as no IVH and grade 4 by the two central readers. The quality of CUS film or digital media copies from multiple sites may have limited consistent interpretation; undoubtedly, static images of film media were associated with challenges to accurate diagnoses.
Previous multicenter studies have not focused on local reader accuracy of CUS interpretations. Numerous prospective neurodevelopmental outcome studies of preterm infants have demonstrated strong associations of severe CUS abnormalities with neuromotor and neurocognitive deficits (6–8, 27, 28), but those analyses have been based solely on local reader interpretations. Similarly, reports from multicenter registries have consistently reported CUS data only from local radiologist interpretation (29–31). Our findings thus represent an important and unique opportunity to better understand the validity of the conclusions we reach with respect to those CUS data. The cumbersome logistics of central reading make it unlikely that any further analyses on this scale will be forthcoming. However, our study can offer some reassurance to future investigators using local CUS interpretations. For instance, if a planned study outcome is severe hemorrhagic abnormality, results of local interpretation could be expected to be reasonably accurate. Conversely, sensitivity and specificity of lower grades of IVH and PVL alone in our analysis are disappointing. Although infants with lower grades of IVH have traditionally been considered to have a favorable neurodevelopmental prognosis (3,32,33), a recent study disputes those assumptions. Patra et al found that ELBW infants with grades 1 or 2 IVH had significantly lower Bayley MDI scores, higher rates of major neurologic abnormalities, and higher overall rates of neurodevelopmental impairment at 20 months corrected age than those with no IVH (34). The question remains whether grade 1 or 2 IVH, if accurately and consistently diagnosed, could be a risk factor for later neuromotor impairments. This uncertainty coupled with the observed poor sensitivity for PVL again suggests that imaging techniques such as MRI would be helpful in identifying important patterns of brain injury, notably subtle white matter injury, which could be missed by CUS alone.
There are limitations to this study, some of which are inherent to any retrospective analysis. The data collection instruments for local reader interpretations differed from the central reader data instruments, particularly with respect to the level of detail. Information pertaining to ventriculomegaly, a finding known to be associated with adverse neurodevelopmental outcome (9,10), was not collected on local data instruments. A “rolling” central reading approach, with periodic intraobserver reliability checks, was not part of the PiNO trial because the original scope of the central read was anticipated to be quite small. Furthermore, local reader accuracy analysis was not an aim of the PiNO trial, and the opportunity for a large-scale accuracy analysis was unanticipated; therefore, our findings provide a “real-life” view rather than a “best-case scenario”.
In conclusion, our analysis of CUS data from the PiNO trial demonstrates excellent central reader agreement and local reader accuracy for severe IVH and major prognostic categories. For lower grade IVH and for PVL, reliability and accuracy were poor. These results suggest the validity of reports of major CUS diagnoses, but reports of lower grade IVH and PVL alone should be interpreted with circumspection. These data also reinforce the need to urgently consider expansion of our routine imaging armamentarium to include modalities such as MRI, which would allow for substantially improved discrimination of white matter and other brain injury.
Brown University Women & Infant's Hospital Principal Investigator: William Oh, MD; Study Coordinator Angelita Hensman, BSN, RNC; Respiratory Therapist: Daniel Gingras, RRT. Emory University Principal Investigators: Barbara J. Stoll, MD and Lucky Jain, MD; Study Coordinator: Ellen Hale, RN, BS; Respiratory Therapist: Irma Seabrook, BS, RRT-NPS. Indiana University Riley Hospital for Children and Methodist Hospital Principal Investigators Greg Sokol, MD and Dianne Lorant, MD; Study Coordinators: Diana Dawn Appel, RN BSN and Lucy Miller, RN BSN; Respiratory Therapists: Dale Chriscinske, BS, RRT, NPS and Jeff Attwood, RRT. Northwestern University Principal Investigator Robin Steinhorn, MD; Study Coordinator and Respiratory Therapist: Michael Sautel, RRT. Stanford University Principal Investigator: Krisa VanMeurs, MD; Study Coordinator Bethany Ball, BS, CCRC; Respiratory Therapist: Dan Proud, RCP. University of Alabama at Birmingham University Hospital-UAB Principal Investigator: Waldemar A. Carlo, MD; Study Coordinator: Shirley S. Cosby, RN, BSN; Respiratory Therapist: Robert B. Johnson RRT. University of Cincinnati University Hospital, Cincinnati Children’s Hospital Medical Center and Good Samaritan Principal Investigators Jon Fridriksson, MD and Barb Warner MD; Study Coordinators: Marcia Mersmann, RN, Barb Alexander, RN, Jody Shively, RN, Holly Mincey, RN; Respiratory Therapists: Mary Hoover, RRT, Sharon Sapienz, RRT, Eric Stephenson, RRT. University of California-San Diego UCSD Medical Center and Sharp Mary Birch Hospital for Women Principal Investigators: Neil N. Finer, MD and Maynard R. Rasmussen, MD; Study Coordinators: Chris Henderson, CRTT and Clarence Demetrio, RN; Respiratory Therapists: Wade Rich, RRT-NPS and Christine Joseph, RRT-NPS. University of Florida Wolfson Children's Hospital at Baptist Medical Center and Shands Jacksonville Medical Center Principal Investigator: Mark Hudak, MD; Study Coordinators: Shannon Osbeck, RN, BSN and Elizabeth Case, RN, BSN, CCRC; Respiratory Therapists: Amanda Kellum, RRT and Lamont Hogans, RRT. University of Rochester Golisano Children's Hospital at Strong Principal Investigator: Carl T. D’Angio, MD; Study Coordinator: Linda Reubens, RN; Respiratory Therapist: Greg Hutton, RRT. University of Texas – Dallas Parkland Hospital Principal Investigator: Abbot Laptook, MD; Study Coordinators: Susie Madison, RN, Gay Hensley, RN and Nancy Miller, RN; Respiratory Therapist: Glenn Metoyer, RRT. University of Texas – Houston Memorial Hermann Children’s Hospital Principal Investigator: Kathleen Kennedy, MD, MPH; Study Coordinator: Georgia McDavid, RN; Respiratory Therapist: Danny Emerson, BA, RRT, RCP. Medical College of Wisconsin Principal Investigator: Ganesh Konduri, MD; Study Coordinator: Mike Paquette, RCP/CRT; Respiratory Therapists: Steven Wong, Mike Paquette, RCP/CRT. Wake Forest University Wake Forest University Baptist Medical Center, Forsyth Medical Center and Brenner Children’s Hospital Principal Investigators: Judy Aschner, MD and T. Michael O’Shea, MD, MPH; Study Coordinator: Nancy Peters, RN and B.J. Hansell, RRT, CCRC; Respiratory Therapists: Jennifer Griffin, RRT, RCP and Clay Adams, RRT. RCP. Wayne State University Hutzel Women's Hospital & Children's Hospital of Michigan Principal Investigator: Seetha Shankaran, MD; Study Coordinators: Rebecca Bara, RN, BSN and Geraldine Muran, RN, BSN; Respiratory Therapist: Wonder Weekfall, RRT. Yale University New Haven Children's Hospital Principal Investigator: Richard A. Ehrenkranz, M.D. Study Coordinator: Patricia Gettner, RN; Respiratory Therapist: Art Caldwell, AS, RRT. RTI Central Reading staff Deborah Schwartz; Kim Doeden; Carolyn Petrie; Neha Patel
|PI||Center, Location||NICHD Grant #||GCRC #|
|Waldemar A. Carlo, MD||University of Alabama at|
|Edward F. Donovan, MD||University of Cincinnati|
|U10 HD27853||M01 RR 08084|
|Richard A. Ehrenkranz, MD||Yale University|
New Haven, Connecticut
|U10 HD27871||M01 RR 06022|
|Neil N. Finer, MD||University of California at San|
San Diego, CA
|Abbot R. Laptook, MD||University of Texas Southwestern|
Medical Center at Dallas
|James A. Lemons, MD||Indiana University|
|U10 HD27856||M01 RR 00750|
|William Oh, MD||Women and Infants’ Hospital of|
Providence, Rhode Island
|T. Michael O’Shea, MD||Wake Forest University School of|
|Dale L. Phelps, MD||University of Rochester|
Rochester, New York
|U10 HD40521||5 M01 RR00044|
|W. Kenneth Poole, PhD||Research Triangle Institute||U01 HD36790|
|Seetha Shankaran, MD||Wayne State University|
|David K. Stevenson, MD||Stanford University|
|U10 HD27880||M01 RR 00070|
|Barbara J. Stoll, MD||Emory University|
|Jon E. Tyson, MD, MPH||The University of Texas Health|
Science Center at Houston
|U10 HD 21373|
|Rosemary D. Higgins, MD||National Institute of Child Health|
and Human Development
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Disclosure: INO Therapeutics provided the study gas and gas delivery systems for all hospitals, and capitation funding for the hospitals outside the NICHD Neonatal Research Network that participated in the PiNO trial. The company was not involved in the trial design, data analysis or interpretation, or secondary or ancillary data analysis. The authors have no financial agreement with INO Therapeutics. For funding information, see Appendix II at www.jpeds.com.