Our main finding is that experienced readers of cranial sonographic scans differ in their tendency to use accepted diagnostic terms. This probably reflects interobserver differences in applying criteria for a diagnosis. For example, although some view PVL as a bilateral disorder,17,18
54% of scans given a cystic PVL diagnosis had damage limited to one hemisphere. Similarly, although PVHI is viewed as a unilateral or at least prominently asymmetric disorder,4,8
only 38% of the scans with bilateral hyperechoic lesions given a PVHI diagnosis had appreciable asymmetry. Another example is that as many as 7% of scans given a PVHI diagnosis did not have an adjacent IVH. The frequent co-existence of IVH and ipsilateral PVHI4,8–11
has prompted some19
to view them as related (perhaps via the pressure IVH places on venous drainage), while others are less convinced of this causative mechanism.20,21
As a consequence, some require the presence of IVH graded 2 or higher as a condition to diagnose PVHI, whereas others do not.
The diagnosis that varied most among the ELGAN study readers is “early” PVL. Some have claimed that ultrasound is relatively insensitive to identify this condition.22
Perhaps this insensitivity prompted some sonologists to lower their diagnostic threshold from more echogenic than the choroid plexus (the original criterion) to isoechogenic with the choroid. This is most likely to have happened when the perceived lesions were irregular in appearance, did not follow normal anatomic boundaries of the white matter (the normal periventricular halo), and/or were asymmetrical in distribution. In addition, our lack of standardization of equipment and scanning techniques across our study institutions might have contributed to this variability.
The variability we observed can be expected in light of evidence that radiologists, other clinicians, and pathologists can vary substantially in their tendency to use diagnostic labels.23–26
A number of studies deal with operator variability in the interpretation of cranial sonographic studies.1–3
Except for the identification of periventricular hyperechoic lesions, where interobserver variability is most prominent,3
our overall interobserver variability identifying cranial sonographic abnormalities is within the range seen with other medical recognition tasks.23–26
Our current study more specifically addresses operator variability as it pertains to a more global, “gestalt”-like application of descriptive criteria, as is done in formulating an imaging diagnosis. Diagnostic criteria were not included in the ELGAN study sonographic instruction manual. This reflects the intent of the designers of this study, who wanted to emphasize a descriptive rather than a diagnostic approach. Providing readers with the opportunity to apply conventional diagnostic labels allowed them to read the scan as they normally would, in addition to providing the descriptive details needed for the ELGAN study. Therefore, the observed diagnostic variability reflects readers' perception of what constitutes appropriate diagnostic criteria that remain clouded in controversy. We used experienced sonologists as readers, who would have developed their own schemata of image interpretation, which may not comply with established guidelines, and since we did not make any training efforts to accomplish diagnostic uniformity, the diagnostic variability that we found is not unexpected. We therefore think that if specific diagnostic criteria had been included in our manual, variability would have been reduced somewhat, but not completely eliminated.
We have the perception that clinicians at each of our institutions understand how the radiologists at that institution apply diagnostic labels. Consequently, the variability we describe probably has little import locally. The variability becomes important when comparing the experience at different institutions, especially for clinical research. Even though past attempts at grading white matter abnormalities and IVH have not worked well, we recommend that national organizations work toward better standardization of reporting, which should serve to enhance the validity of large multicenter studies such as the ELGAN study.
Clinical outcomes studies, including our own work, have shown that diagnostic labels do not perform better than simple descriptive characteristics to predict developmental dysfunctions.13–15,22,27–32
Diagnoses also do not perform better than clinical information to predict developmental dysfunctions.30
Because half the preterm children who develop cerebral palsy have no identifiable sonographic lesion,14
the limited value of sonographic diagnoses is likely in part due to limitations of early identification of cerebral white matter damage with ultrasound. In addition, we found that regardless of the initial diagnosis, the geographic distributions of hypoechoic lesions at around term are similar. Perhaps a common denominator underlies white matter damage, regardless of the initial sonographic presentation. Thus, it might be more important to focus on what these white matter diagnoses have in common, rather than what separates them.
An important limitation of our study is that our interpretations were based on consensus readings. We did not have the opportunity to perform MRI as an independent imaging reference standard.33
However, continuing clinical follow-up of the ELGAN cohort has provided us with a wealth of outcomes data with regard to neurodevelopment,13
and cerebral palsy,14
which are all thought to be related to perinatally acquired brain lesions, whether or not detected with ultrasound. In future cohort studies, the addition of an MRI performed at near term (and MRI follow-up of any abnormalities noted) would be optimal to address these unresolved issues that are attributable to the lack of having an independent imaging gold standard available.
In summary, we have documented that experienced sonologists differ in their application of diagnostic labels to sonographic scans of extremely low gestational age newborns. They sometimes apply different labels to scans with similar findings, and sometimes the same label to scans with considerably different characteristics. Some imaging specialists seem more inclined to make specific diagnoses, whereas others prefer to be descriptive. This inconsistency of reading criteria can contribute to uncertainty about clinically useful prognostic information and has the potential to add noise to observational studies of large cohorts.