In 1991, Marshall and associates proposed a CT classification system that identifies six different groups of patients with TBI, based on the type and severity of abnormalities seen on CT scan (Marshall et al., 1991
). To improve outcome prediction, the Rotterdam CT scoring system was developed in 2005 by Maas and colleagues, and is based on adding individual CT characteristics to the Marshall classification. These individual CT characteristics include: IVH and traumatic SAH (tSAH), and a detailed differentiation of mass lesions and basal cisterns (Maas et al., 2005
). The ability to stratify TBI patients is improved by adding tSAH, because this characteristic has been shown to be a strong independent predictor of outcome and mortality (Eisenberg et al., 1990
; Greene et al., 1996
; Kakarieka et al., 1994
; Ono et al., 2001
; Selladurai et al., 1992
; Servadei et al., 2002
). Furthermore, based on studies showing that patients with an epidural hematoma have a much better clinical prognosis than those with a subdural or intracerebral hematoma, the Rotterdam CT classification system distinguishes the type of mass lesion to provide additional discriminative value (Chestnut et al., 2000
; Gennarelli et al., 1982
). Both the Marshall and Rotterdam classification systems have been found to be predictive of clinical outcome, and are included in the international guidelines for clinical management and prognosis of TBI (Chestnut et al., 2000
The primary purpose of this study was to determine the agreement between independent observers of varying backgrounds when evaluating CT imaging features of TBI, including those used to categorize patients according to the Marshall and Rotterdam CT scoring systems. Our data show that there is no significant difference between multiple readers from varying backgrounds when assessing typical CT imaging features of TBI. Regardless of background, the reviewers scored an average kappa coefficient corresponding to an agreement of fair or better for all CT characteristics. Furthermore, the diagnostic accuracy achieved by all five readers was consistently high and did not significantly differ across the various CT imaging features.
The Marshall score showed better reproducibility than the Rotterdam score. This is likely because the Marshall score is more of a summary score. Indeed, features in the Marshall system are not independent of one another; rather, they are connected. On the contrary, in the Rotterdam scoring system, each feature stands alone and gets counted individually. This makes the Rotterdam score a more reliable score (Maas et al., 2005
), but also makes it more prone to variability.
We acknowledge several limitations to our paper. A limitation of kappa statistics is that they are influenced by prevalence (Viera and Garrett, 2005
). The calculation of the kappa coefficient is based on the difference between how much an agreement is actually present compared to how much an agreement is due to chance alone. If the prevalence of a certain feature is low, it is likely that different readers agree on the absence of that feature just by chance, resulting in a lower kappa coefficient. Also, our study sample was limited in size.
This was a single-center study, and patterns of CT interpretation may be influenced by fact that the reviewers have worked together. Further validation across multiple institutions would be important for future studies.
In conclusion, interobserver reproducibility is very good between multiple readers with varying backgrounds when interpreting CT imaging features of TBI, and in calculating the Marshall and Rotterdam scores. Thus NCT can be considered a reliable tool to stratify TBI patients into categories of varying severity and types of injuries, and might be used to improve triaging to appropriate treatments that more effectively target a patient's specific injuries.