|Home | About | Journals | Submit | Contact Us | Français|
To evaluate reader variability of white matter lesions seen on cranial sonographic scans of extreme low gestational age neonates (ELGANs).
In 1,452 ELGANs, cranial sonographic scans were obtained in the first and second postnatal weeks, and between the third postnatal week and term. All sets of scans were read independently by two sonologists. We reviewed the use of four diagnostic labels: early periventricular leucomalacia, cystic periventricular leucomalacia, periventricular hemorrhagic infarction (PVHI), and other white matter diagnosis, by 16 sonologists at 14 institutions. We evaluated the association of these labels with location and laterality of hyperechoic and hypoechoic lesions, location of intraventricular hemorrhage, and characteristics of ventricular enlargement.
Experienced sonologists differed substantially in their application of the diagnostic labels. Three readers applied early periventricular leucomalacia to more than one fourth of all the scans they read, whereas eight applied this label to ≤5% of scans. Five applied PVHI to ≥10% of scans, while three applied this label to ≤5% of scans. More than one third of scans labeled cystic periventricular leucomalacia had unilateral hypoechoic lesions. White matter abnormalities in PVHI were more extensive than in periventricular leucomalacia and were more anteriorly located. Hypoechoic lesions on late scans tended to be in the same locations, regardless of the diagnostic label applied.
Experienced sonologists differ considerably in their tendency to apply diagnostic labels for white matter lesions. This is due to lack of universally agreed-upon definitions. We recommend reducing this variability to improve the validity of large multicenter studies.
Preterm infants are physiologically unstable in the first weeks of life; thus, bedside ultrasonography is the only feasible technique to monitor the appearance of the brain. Valuable confirmatory tests such as CT and (in particular) MRI are less easily obtained given the difficulty of transporting these patients outside of the neonatal unit. Performance and interpretation of cranial sonographic scans vary among even highly competent sonographers and sonologists.1–3
To improve communication with referring physicians and to facilitate correlation with neurodevelopmental outcomes, various grading or severity scoring systems have been proposed.4–6 Since the introduction of the Papile classification of intraventricular hemorrhage (IVH) in premature infants,7 diagnostic criteria of IVH have been controversial. IVH and ipsilateral periventricular hyperechoic lesions, including periventricular hemorrhagic infarction (PVHI), often co-exist.4,8–11 Whereas some sonologists continue to use the Papille IVH classification scheme (four grades), others discourage its use and prefer the Deeg classification (three grades and descriptive approach of periventricular hyperechoic lesions6), or have abandoned the classification schemes completely and prefer to be merely descriptive in their reporting of the abnormalities.12
Our multicenter study provided us the opportunity to assess the variability of experienced sonologists at 14 different centers in their application of diagnostic labels to cranial sonographic images of severely premature infants, and to see what characteristics of the scans influence diagnostic decisions. It also allowed us to identify scan characteristics associated with interpretation variability.
The extreme low gestational age neonates (ELGAN) study was designed to identify characteristics and exposures that increase the risk of structural and functional neurologic disorders in ELGANs. During the years 2002–2004, women delivering before 28 weeks gestation at 1 of 14 participating institutions in 11 cities in five states were asked to enroll in the study. The enrollment and consent processes were approved by the individual institutional review boards.
Mothers were approached for consent either on antenatal admission or shortly after delivery, depending on clinical circumstance and institutional preference. One thousand two hundred forty-nine mothers of 1506 infants consented. Approximately 260 women were either not approached or did not consent to participate. The neurodevelopmental outcomes of this cohort at 2 years of age as a function of the neonatal cranial sonographic findings were described separately by our group.13–15 The current study, together with its companion paper,3 specifically addresses, respectively, the overall diagnostic and the merely descriptive aspects of operator variability in cranial sonographic interpretation. Both these aspects of operator variability impact substantially on the validity of the above-referenced outcomes studies.
Routine scans were obtained by technologists at all of the hospitals using high-frequency transducers (7.5 and 10 MHz). Sonograms always included the six standard paracoronal views and five parasagittal views using the anterior fontanel as the sonographic window.16 No attempt was made to standardize equipment and scanning technique across the study institutions.
Of the 1,506 infants enrolled, 1,452 had had at least one set of protocol sonographic scans. The three sets of protocol scans were defined by the postnatal day on which they were obtained. Protocol 1 scans were obtained between the first and fourth postnatal day (N = 1134); protocol 2 scans were obtained between the fifth and fourteenth day (N = 1344), and protocol 3 scans were obtained between the fifteenth day and the 40th postmenstrual week (N = 1252). Of all 1,452 infants in our study sample, 895 had all three scans, 441 had two scans, and 116 had only one scan.
All of the readers had at least 5 years of experience in interpreting neonatal cranial sonographic studies. After creation of a manual and data collection form (see Supporting Information on the online version of this article, part of which is also illustrated in Figure 1), observer variability minimization efforts included conference calls discussing aspects of images prone to different interpretations.3 To investigate the presumed interrelationship between periventricular white matter lesions, IVH, and ventriculomegaly, the following four major abnormalities were rated independently: IVH, ventriculomegaly, and hyperechoic and hypoechoic (or “cystic”) white matter lesions. Templates of various degrees of ventriculomegaly were included in the manual.
All sonographic scans were initially read by two independent readers who were not provided clinical information. Each set of scans was first read by the study sonologist at the institution of the infant's birth. The images, usually as electronic images on a CD imbedded in the software eFilm Workstation (Merge Healthcare/Merge eMed, Milwaukee, WI), were sent for a second reading to a sonologist at another ELGAN study institution, who was blinded to the clinical information and to the first reader's interpretation. If the two sonologists did not agree about the presence/absence of the four major abnormalities, the CD was sent to a third “tie-breaker” reader, who was similarly blinded and not informed about the nature of the discrepancy, but was asked to complete the entire data collection form independently. The study investigators deemed any agreement between the third and either the first or second reader with regard to the four major abnormalities to represent the final (reference standard) interpretation, and the dissenting interpretation was discarded. The eFilm program allowed the second and third readers to see what the first reader saw and provided options to adjust and enhance the studies similar to the original reader, including the ability to zoom and alter image contrast and brightness. With these measures, we attempted to correct for differences in scanning techniques between institutions that may affect interpretation.
The data collection forms required that information about hyperechoic and hypoechoic lesions be recorded for every 1 of 16 white matter zones on each side seen on coronal imaging (Figure 1). The data collection did not require a diagnosis to accompany these lesions, nor were criteria provided for any diagnosis. Rather, the sonologist was free to apply the labels of early periventricular leucomalacia (PVL), cystic PVL, and PVHI as she/he felt appropriate (Figures 2–4). A fourth diagnostic label, “other white matter diagnosis,” was created to categorize scans that demonstrated white matter abnormalities with features that the reader felt were not consistent with PVL or PVHI. Multiple diagnoses were acceptable. For the purpose of analysis, we combined “other white matter diagnosis” with blank entries, to form the category “No-dx WMD.” This allowed us to identify all scans with hyperechoic and/or hypoechoic/”cystic” lesions that were not given a specific diagnosis.
The unit of measurement for some of the tables in this article is the number of study-readings. With 1,452 sets of scans read twice, the total sample is 2,904 study-readings. Each of the sonologist authors evaluated the sets of scans of more than 200 infants, representing a combination of studies from his/her own institution, and studies from other institutions.
Correlation with other neuro-imaging studies (CT and/or MRI) was infrequently available and therefore was not part of our study design, but for illustrative purposes is included in this report (Figures 2 and and33).
To assess reader variability in the diagnostic labeling of the white matter lesions and to investigate whether associated abnormalities and lesion characteristics influenced this variability, we evaluated the following hypotheses:
We did not calculate p values because we did not plan specific comparisons a priori. In the absence of a priori comparisons, p values can be calculated, but they are then merely tests that the contents of a table are not random. Consequently, a “significant p value” does not inform us about which of the many comparisons within the row by column array makes the p value significant. This problem is especially daunting when the tables list 16 locations by the four diagnostic options. Consequently, p values are not provided for any of these arrays.
In this section, we list each of the hypotheses by what was found, followed by support for the inference.
Of the 2,904 study-readings, 329 were given a diagnosis of early PVL, 134 a diagnosis of cystic PVL, 224 a diagnosis of PVHI, and 206 a diagnosis of “other white matter damage.” One sonologist never used the “early PVL” diagnosis; eight others used it in single-digit percents, while three applied it to more than 25% of the sets of scans. No sonologist gave the diagnosis of cystic PVL to more than 9% of the studies he/she read.
Overall, the diagnosis of PVHI was applied to 8% of the scans. Nine of the 14 sonologists applied this diagnosis to between 8 and 11% of the sets of scans.
Five sonologists identified a white matter abnormality in 20% or less of the sets of scans. On the other hand, four others identified a white matter abnormality in 48% or more of the sets of scans.
Of all 329 sets of scans given a diagnosis of early PVL, 13% were also given a diagnosis of PVHI, which is in agreement with the 15% overall prevalence of PVHI scans in our sample, suggesting that these two diagnoses are used independently of each other. However, in the early PVL group, 14% were also given a diagnosis of cystic PVL, which is substantially higher than the 9% expected prevalence of cystic PVL, suggesting that these two diagnoses are related. Similarly, of all 134 sets of scans given a diagnosis of cystic PVL, 22% were also given a diagnosis of PVHI (Figure 3), which is higher than would be expected (15%) if these two diagnoses were unrelated. Since the observed distribution of diagnostic categories is highly unlikely to have occurred by chance, it appears that (at least in part) they are interrelated.
Of the sets of scans given a diagnosis of PVHI, 83% had blood in the lateral ventricles, with blood seen bilaterally in 60% of all of these sets of scans. One indicator of the magnitude of IVH in scans with a PVHI diagnosis is the relatively high frequency of blood in the third (38%) and fourth (14%) ventricles. These frequencies are three or more times higher than for early and cystic PVL.
Forty-six percent (14 + 32) of the sets of scans given a PVHI diagnosis had moderate/severe enlargement of the body of the lateral ventricle. In contrast, moderate/severe ventriculomegaly was seen on 26% of sets of scans given a diagnosis of cystic PVL.
Although some view PVL as a bilateral disorder, 32% (28 + 4) of scans given an early PVL diagnosis and 54% (19 + 36) of scans given a cystic PVL diagnosis had visible damage limited to one hemisphere (Figure 4).
In keeping with the view that PVHI tends to be a predominantly unilateral disorder, 77% of those who applied this diagnosis did so when the echogenic lesion was unilateral (Figures 2 and 4C, D). When involvement was bilateral, asymmetry as defined by more than three boxes difference between sides was reported for 38% of sets of scans given a PVHI diagnosis.
The size/extent of hyperechoic and hypoechoic lesions (as estimated by the numbers of boxes checked on the data collection form) differs among scans with different diagnostic labels (data not shown).
For both early and cystic PVL, approximately 75% of scans with a hyperechoic lesion limited to one side had only one or two boxes checked off. On the other hand, 69% of PVHI-labeled scans with unilateral white matter damage had four or more boxes checked off. Among scans with bilateral white matter damage, those given a diagnosis of PVHI were more likely than scans given a diagnosis of early or cystic PVL to have hyperechoic lesions seen in at least six zones (49% versus 18% for early PVL and 28% for cystic PVL).
Fully 98% of scans that had a unilateral hypoechoic lesion and a diagnosis of early PVL had only one box checked off. On the other hand, almost 40% of all other scans with unilateral hypoechoic lesions had two or more boxes checked off.
These differences are less prominent when considering only scans with bilateral hypoechoic lesions. Approximately one third of scans given a diagnosis of cystic PVL or PVHI had six or more hypoechoic boxes checked off, whereas no set of scans given a diagnosis of early PVL had that many boxes checked off.
Scans given a PVHI diagnosis on the first protocol scan were more likely than other scans with a hyperechoic lesion to be located anteriorly (in zones 2, 6, 9, and 12 of Figure 1). This is expected, as the primary hemorrhage focus in PVHI is located anteriorly (in the germinal matrix).
The most prominent finding displayed in Figure 1 is how similar are the locations of the hypoechoic lesions in sets of scans given the three specific diagnoses and nondiagnostic white matter damage.
Our main finding is that experienced readers of cranial sonographic scans differ in their tendency to use accepted diagnostic terms. This probably reflects interobserver differences in applying criteria for a diagnosis. For example, although some view PVL as a bilateral disorder,17,18 54% of scans given a cystic PVL diagnosis had damage limited to one hemisphere. Similarly, although PVHI is viewed as a unilateral or at least prominently asymmetric disorder,4,8 only 38% of the scans with bilateral hyperechoic lesions given a PVHI diagnosis had appreciable asymmetry. Another example is that as many as 7% of scans given a PVHI diagnosis did not have an adjacent IVH. The frequent co-existence of IVH and ipsilateral PVHI4,8–11 has prompted some19 to view them as related (perhaps via the pressure IVH places on venous drainage), while others are less convinced of this causative mechanism.20,21 As a consequence, some require the presence of IVH graded 2 or higher as a condition to diagnose PVHI, whereas others do not.
The diagnosis that varied most among the ELGAN study readers is “early” PVL. Some have claimed that ultrasound is relatively insensitive to identify this condition.22 Perhaps this insensitivity prompted some sonologists to lower their diagnostic threshold from more echogenic than the choroid plexus (the original criterion) to isoechogenic with the choroid. This is most likely to have happened when the perceived lesions were irregular in appearance, did not follow normal anatomic boundaries of the white matter (the normal periventricular halo), and/or were asymmetrical in distribution. In addition, our lack of standardization of equipment and scanning techniques across our study institutions might have contributed to this variability.
The variability we observed can be expected in light of evidence that radiologists, other clinicians, and pathologists can vary substantially in their tendency to use diagnostic labels.23–26 A number of studies deal with operator variability in the interpretation of cranial sonographic studies.1–3 Except for the identification of periventricular hyperechoic lesions, where interobserver variability is most prominent,3 our overall interobserver variability identifying cranial sonographic abnormalities is within the range seen with other medical recognition tasks.23–26 Our current study more specifically addresses operator variability as it pertains to a more global, “gestalt”-like application of descriptive criteria, as is done in formulating an imaging diagnosis. Diagnostic criteria were not included in the ELGAN study sonographic instruction manual. This reflects the intent of the designers of this study, who wanted to emphasize a descriptive rather than a diagnostic approach. Providing readers with the opportunity to apply conventional diagnostic labels allowed them to read the scan as they normally would, in addition to providing the descriptive details needed for the ELGAN study. Therefore, the observed diagnostic variability reflects readers' perception of what constitutes appropriate diagnostic criteria that remain clouded in controversy. We used experienced sonologists as readers, who would have developed their own schemata of image interpretation, which may not comply with established guidelines, and since we did not make any training efforts to accomplish diagnostic uniformity, the diagnostic variability that we found is not unexpected. We therefore think that if specific diagnostic criteria had been included in our manual, variability would have been reduced somewhat, but not completely eliminated.
We have the perception that clinicians at each of our institutions understand how the radiologists at that institution apply diagnostic labels. Consequently, the variability we describe probably has little import locally. The variability becomes important when comparing the experience at different institutions, especially for clinical research. Even though past attempts at grading white matter abnormalities and IVH have not worked well, we recommend that national organizations work toward better standardization of reporting, which should serve to enhance the validity of large multicenter studies such as the ELGAN study.
Clinical outcomes studies, including our own work, have shown that diagnostic labels do not perform better than simple descriptive characteristics to predict developmental dysfunctions.13–15,22,27–32 Diagnoses also do not perform better than clinical information to predict developmental dysfunctions.30 Because half the preterm children who develop cerebral palsy have no identifiable sonographic lesion,14 the limited value of sonographic diagnoses is likely in part due to limitations of early identification of cerebral white matter damage with ultrasound. In addition, we found that regardless of the initial diagnosis, the geographic distributions of hypoechoic lesions at around term are similar. Perhaps a common denominator underlies white matter damage, regardless of the initial sonographic presentation. Thus, it might be more important to focus on what these white matter diagnoses have in common, rather than what separates them.
An important limitation of our study is that our interpretations were based on consensus readings. We did not have the opportunity to perform MRI as an independent imaging reference standard.33 However, continuing clinical follow-up of the ELGAN cohort has provided us with a wealth of outcomes data with regard to neurodevelopment,13 late microcephaly,15 and cerebral palsy,14 which are all thought to be related to perinatally acquired brain lesions, whether or not detected with ultrasound. In future cohort studies, the addition of an MRI performed at near term (and MRI follow-up of any abnormalities noted) would be optimal to address these unresolved issues that are attributable to the lack of having an independent imaging gold standard available.
In summary, we have documented that experienced sonologists differ in their application of diagnostic labels to sonographic scans of extremely low gestational age newborns. They sometimes apply different labels to scans with similar findings, and sometimes the same label to scans with considerably different characteristics. Some imaging specialists seem more inclined to make specific diagnoses, whereas others prefer to be descriptive. This inconsistency of reading criteria can contribute to uncertainty about clinically useful prognostic information and has the potential to add noise to observational studies of large cohorts.
This study was supported by a cooperative agreement with the National Institute of Neurologic Diseases and Stroke (5U01NS040069-05) and a program project grant from the National Institute of Child Health and Human Development (NIH-P30-HD-18655). The authors gratefully acknowledge the contributions of our subjects and their families, as well as those of our colleagues.
Additional Supporting Information may be found in the online version of this article.