PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
 
J Neurotrauma. Feb 2010; 27(2): 325–330.
PMCID: PMC2834438
Interobserver Variability in the Assessment of CT Imaging Features of Traumatic Brain Injury
Kimberly A. Chun,1 Geoffrey T. Manley,2,3 Shirley I. Stiver,2,3 Ashley H. Aiken,1 Nicholas Phan,2,3 Vincent Wang,2,3 Michele Meeker,2,3 Su-Chun Cheng,4 A.D. Gean,1,3 and Max Wintermarkcorresponding author1
1Department of Radiology, Neuroradiology Section, University of California–San Francisco, San Francisco, California.
2Department of Neurological Surgery, University of California–San Francisco, and San Francisco General Hospital, San Francisco, California.
3Brain and Spinal Injury Center, University of California–San Francisco, San Francisco, California.
4Department of Epidemiology and Biostatistics, University of California–San Francisco, San Francisco, California.
corresponding authorCorresponding author.
Address correspondence to: Max Wintermark, M.D., M.A.S., University of California–San Francisco, Department of Radiology, Neuroradiology Section, 505 Parnassus Avenue, Box 0628, San Francisco, CA 94143-0628. E-mail:Max.Wintermark/at/radiology.ucsf.edu
The goal of our study was to determine the interobserver variability between observers with different backgrounds and experience when interpreting computed tomography (CT) imaging features of traumatic brain injury (TBI). We retrospectively identified a consecutive series of 50 adult patients admitted at our institution with a suspicion of TBI, and displaying a Glasgow Coma Scale score ≤12. Noncontrast CT (NCT) studies were anonymized and sent to five reviewers with different backgrounds and levels of experience, who independently reviewed each NCT scan. Each reviewer assessed multiple CT imaging features of TBI and assigned every NCT scan a Marshall and a Rotterdam grading score. The interobserver agreement and coefficient of variation were calculated for individual CT imaging features of TBI as well as for the two scores. Our results indicated that the imaging review by both neuroradiologists and neurosurgeons were consistent with each other. The kappa coefficient of agreement for all CT characteristics showed no significant difference in interpretation between the neurosurgeons and neuroradiologists. The average Bland and Altman coefficients of variation for the Marshall and Rotterdam classification systems were 12.7% and 21.9%, respectively, which indicates acceptable agreement among all five reviewers. In conclusion, there is good interobserver reproducibility between neuroradiologists and neurosurgeons in the interpretation of CT imaging features of TBI and calculation of Marshall and Rotterdam scores.
Key words: accuracy, computed tomography, imaging scores, interobserver agreement, traumatic brain injury
Every year, traumatic brain injury (TBI) leaves more than 5 million people with significant disabilities that require lifelong medical care at an annual cost of more than $56 billion for the United States alone. As a consequence of the wars in Iraq and Afghanistan, about 20% of U.S. soldiers deployed to the Middle East are suffering from TBI, many of whom will require long-term medical care.
The identification and classification of TBI patients most likely to benefit from different types of medical and surgical treatment is a difficult challenge (Doppenberg et al., 1997; Marshall et al., 1992; Narayan et al., 2002). Currently, the Glasgow Coma Scale (GCS), a clinical scale that reflects level of consciousness, is the main tool for classification of patients in TBI clinical trials (Saatman et al., 2008). Although GCS has been found to be extremely useful in the clinical management and prognosis of TBI, it grades only clinical symptom severity, but it does not distinguish between different types of injury. For instance, a GCS score of 8 or less is typically classified as severe TBI, but different lesions on CT may be responsible for the patient's GCS score. It is difficult to effectively treat multiple types of injury based only on the clinical severity or GCS score (Saatman et al., 2008). Another limitation to the GCS score is that it is difficult to measure GCS on admission because TBI patients are frequently intubated and/or sedated. Moreover, the GCS score has been shown to reliably predict mortality at 24 h after injury, but has only limited ability to predict longer-term outcome (Steyerberg et al., 2008).
Imaging characteristics from noncontrast head computed tomography (NCT) scans in TBI patients have become pivotal in the acute evaluation of TBI, and NCT scanning can identify the presence and extent of structural damage in these patients (Maas et al., 2007). CT imaging features of TBI have been combined into scores such as the Marshall (Marshall et al., 1991) and the Rotterdam scores (Maas et al., 2005). Marshall and Rotterdam scores determined at the time of admission to the hospital have been shown to correlate with long-term clinical outcome following TBI. By stratifying TBI patients into categories of varying severity and types of injuries, NCT imaging might allow improved triaging to appropriate treatments that more effectively target a patient's specific injuries (Borg et al., 2004).
An important issue that has been raised in the use of CT characteristics in the prediction of clinical outcome is how reliably different observers can interpret CT imaging features of TBI, when considered individually or combined into the Marshall and Rotterdam scores. In October 2007, the National Institute of Neurological Disorders and Stroke, together with the Brain Injury Association of America, the Defense and Veterans Brain Injury Center, and the National Institute of Disability and Rehabilitation Research, put forth a consensus opinion that the Marshall and Rotterdam classification systems needed additional validation of their reliability and standardization (Saatman et al., 2008).
The purpose of our study was to determine the interobserver variability between observers with different backgrounds and levels of experience when interpreting CT characteristics of TBI.
Study design
Imaging data obtained as part of standard clinical care at our institution were retrospectively reviewed with the approval of the institutional review board. At our institution, an NCT scan of the brain is a routine component of care for all patients with suspected TBI.
From the San Francisco General Hospital Trauma Registry, which identifies all trauma patients admitted to our hospital, we identified 50 consecutive patients with TBI and an admission GCS score ≤12 admitted between November 2004 and June 2005. Victims of penetrating trauma were excluded from this study. Patient charts were reviewed and demographic data were recorded.
Imaging reviews
The NCT studies were anonymized and sent to five readers. Five reviewers with different backgrounds and levels of experience independently reviewed 50 NCT scans. Two of the readers were neuroradiologists (A.A. and M.W.). A.A. has 7 years of experience and M.W. has 11 years of experience. The remaining three readers were neurosurgeons (S.S., N.P., and V.W.). S.S. has 10 years of experience, N.P. has 12 years of experience, and V.W. has 4 years of experience.
The different reviewers each evaluated the NCT exams for standard imaging features of TBI, including: the presence or absence of skull fracture, epidural hematoma (EDH), subdural hematoma (SDH), subarachnoid hemorrhage (SAH), intraventricular hemorrhage (IVH), contusion, brain parenchymal hematoma, traumatic axonal injury (TAI), and mass effect manifested as a midline shift >5 mm or effacement/absence of the basal cisterns. The reviewers each assigned two numerical scores to every NCT based on the guidelines set by the Marshall and Rotterdam classification scales, as shown in Tables 1 and and22 (Maas et al., 2005, 2007). The midline shift was measured quantitatively. The intracranial hematoma volume was calculated by adding the area of hemorrhage for each slice and multiplying each area by the slice thickness.
Table 1.
Table 1.
Marshall Computed Tomography (CT) Classification
Table 2.
Table 2.
Rotterdam Computed Tomography Classification
Statistical analysis
Sensitivity, specificity, positive and negative predictive values, and accuracy were calculated to assess the diagnostic performance of each reviewer and for each of the CT features of TBI. The standard against which the accuracy of the reviewer's interpretation was gauged was determined by comparison to the consensus opinion among the different reviewers after they completed their individual reviews. The prevalence of each individual CT feature of TBI was calculated and compared against the consensus interpretation.
The interobserver agreement was determined using Cohen's kappa statistics for qualitative features, and Bland and Altman's coefficient of variation for quantitative and ordinal features. To evaluate the overall agreement between reviewers in their interpretation of individual CT imaging features, the kappa coefficients for all pairs of readers were averaged and graded as follows: values below 0.2 indicate poor agreement; between 0.21 and 0.4, fair agreement; between 0.41 and 0.6, moderate agreement; between 0.61 and 0.8, good agreement; and >0.8, near perfect agreement (Altman, 1991).
Bland and Altman's coefficient of variation for quantitative and ordinal measurements was calculated as a ratio of the absolute difference between the measurements performed by the two reviewers over the mean value of the observers (Rosner, 1995). A coefficient of variation that is less than 30% indicates acceptable reliability, and one that is greater than 30% indicates unacceptable variability (Rosner, 1995).
Study patients
Fifty consecutive patients were retrospectively identified who met our inclusion criteria. The median age of the patients was 40 years, the interquartile range (IQR) was 32–51 years, and the range was 18–65 years. Thirty-two patients were males and 18 were females. The median admission GCS score was 8, the IQR was 5–10, and the range was 3–12.
By consensus, 23 patients (46.0%) were found to have a skull fracture, 2 patients (4.0%) had EDH, 23 patients (46.0%) had an SDH, 30 patients (60.0%) had SAH, 7 patients (14.0) had IVH, 17 (33.3%) patients had contusions, 11 patients (22.0%) had TAI, 2 patients (3.9%) presented with parenchymal hematomas, and 21 patients (42.0%) displayed a midline shift >5 mm or had effacement of the basal cisterns.
Accuracy of reader interpretations of CT imaging features of TBI
The accuracy, sensitivity, specificity, and positive and negative predictive values for each reader in their ability to differentiate an abnormal from a normal CT in TBI patients are shown in Table 3. Table 4 further describes accuracy, sensitivity, specificity, positive and negative predictive values, averaged over all readers, for detection of any CT abnormality, skull fracture, EDH, SDH, SAH, IVH, contusions, parenchymal hematomas, midline shift >5 mm, and patent basal cisterns. Results from this diagnostic performance analysis indicate that the imaging interpretation by both neuroradiologists and neurosurgeons were consistent with each other and accurate. All the reviewers had comparable performances, some more sensitive, others more specific. Sensitivity was highest for detection of the presence of any CT abnormality, IVH, and SAH. Contusions, parenchymal hematomas, and TAI were identified with the lowest rates of sensitivity. The readers demonstrated highest specificity in their evaluation of EDH, IVH, and parenchymal hematomas. These same CT features (EDH, IVH, and parenchymal hematomas) were also associated with the highest accuracy among the readers. There was no significant difference in terms of accuracy of the readers depending on the GCS (mean accuracy for detection of any CT abnormality: 92.3% in the group with GCS scores 0–8, and 91.2% in the group with GCS scores 9–12; p = 0.998).
Table 3.
Table 3.
Average Diagnostic Performance of All Five Reviewers When Evaluating for Individual CT Imaging Features of TBI Patients
Table 4.
Table 4.
Average Diagnostic Performance of All Five Reviewers When Evaluating for Individual CT Imaging Features of TBI Patients
Interobserver agreement
The average interobserver agreement (Cohen's kappa coefficient) between all readers for detection of any abnormality in the CT imaging of TBI was 0.568, indicating moderate agreement between all readers (Table 5). There was no significant difference in terms of kappa coefficient depending on the GCS (mean accuracy for any abnormality: 0.547 in the group with GCS scores 0–8, 0.567 in the group with GCS scores 9–12; p = 0.392).
Table 5.
Table 5.
Interobserver Agreement (Cohen's Kappa Coefficients) between Pairs of Readers When Evaluating Any Abnormal CT Feature of TBI
The kappa coefficients between different pairs of reviewers for each CT feature of TBI are displayed in Table 6. The kappa coefficients ranged from 0.411 (Neuroradiologist #1 and Neurosurgeon #3) to 0.754 (Neurosurgeon #2 and Neuroradiologist #2). The kappa coefficient of agreement for detection of any CT abnormality demonstrated fair to good agreement between reviewers, with no significant difference between the neurosurgeons and neuroradiologists. Interobserver agreement was fair for detection of contusions and parenchymal hematomas; moderate for skull fractures, EDH, SDH, and TAI; and good agreement between reviewers was achieved for identification of SAH, IVH, midline shift >5 mm, and patent basal cisterns.
Table 6.
Table 6.
Average Interobserver Agreement (Cohen's Kappa Coefficients) between All Readers in Comparison to the Interobserver Agreement between Only Neuroradiologists, Only Neurosurgeons, and Neuroradiologists and Neurosurgeons When Evaluating for Each CT Feature (more ...)
The average Bland and Altman coefficients of variation between all reviewers in their assignments of Marshall and Rotterdam classification scores were 12.7% and 21.9%, respectively (p < 0.001) (Table 7). This indicates acceptable agreement among all five reviewers for both the Marshall and Rotterdam scores.
Table 7.
Table 7.
Bland and Altman Coefficient of Variation between Pairs of Reviewers When Assessing the Marshall and Rotterdam Scores
The average Bland and Altman coefficients of variation between reviewers' calculations and measurement of EDH, SDH, parenchymal hematoma, and contusion volumes, as well as midline shift, are shown in Table 8. Acceptable agreement between all five readers was demonstrated for EDH volume, parenchymal volume, and midline shift measurements, with average Bland and Altman coefficients of 7.2%, 9.6%, and 28.2%, respectively, for these features. The average Bland and Altman coefficients of variation for SDH and contusion volumes were 51.0% and 51.4%, respectively, indicating poor agreement between the five readers for these measurements.
Table 8.
Table 8.
Bland and Altman Coefficients of Variation Averages Across All Reviewers for Marshall and Rotterdam Scores, As Well As for EDH Volume, SDH Volume, Parenchymal Hematoma Volume, Contusion Volume, and Midline Shift Measurement
In 1991, Marshall and associates proposed a CT classification system that identifies six different groups of patients with TBI, based on the type and severity of abnormalities seen on CT scan (Marshall et al., 1991). To improve outcome prediction, the Rotterdam CT scoring system was developed in 2005 by Maas and colleagues, and is based on adding individual CT characteristics to the Marshall classification. These individual CT characteristics include: IVH and traumatic SAH (tSAH), and a detailed differentiation of mass lesions and basal cisterns (Maas et al., 2005). The ability to stratify TBI patients is improved by adding tSAH, because this characteristic has been shown to be a strong independent predictor of outcome and mortality (Eisenberg et al., 1990; Greene et al., 1996, 1995; Kakarieka et al., 1994; Ono et al., 2001; Selladurai et al., 1992; Servadei et al., 2002). Furthermore, based on studies showing that patients with an epidural hematoma have a much better clinical prognosis than those with a subdural or intracerebral hematoma, the Rotterdam CT classification system distinguishes the type of mass lesion to provide additional discriminative value (Chestnut et al., 2000; Gennarelli et al., 1982). Both the Marshall and Rotterdam classification systems have been found to be predictive of clinical outcome, and are included in the international guidelines for clinical management and prognosis of TBI (Chestnut et al., 2000).
The primary purpose of this study was to determine the agreement between independent observers of varying backgrounds when evaluating CT imaging features of TBI, including those used to categorize patients according to the Marshall and Rotterdam CT scoring systems. Our data show that there is no significant difference between multiple readers from varying backgrounds when assessing typical CT imaging features of TBI. Regardless of background, the reviewers scored an average kappa coefficient corresponding to an agreement of fair or better for all CT characteristics. Furthermore, the diagnostic accuracy achieved by all five readers was consistently high and did not significantly differ across the various CT imaging features.
The Marshall score showed better reproducibility than the Rotterdam score. This is likely because the Marshall score is more of a summary score. Indeed, features in the Marshall system are not independent of one another; rather, they are connected. On the contrary, in the Rotterdam scoring system, each feature stands alone and gets counted individually. This makes the Rotterdam score a more reliable score (Maas et al., 2005), but also makes it more prone to variability.
We acknowledge several limitations to our paper. A limitation of kappa statistics is that they are influenced by prevalence (Viera and Garrett, 2005). The calculation of the kappa coefficient is based on the difference between how much an agreement is actually present compared to how much an agreement is due to chance alone. If the prevalence of a certain feature is low, it is likely that different readers agree on the absence of that feature just by chance, resulting in a lower kappa coefficient. Also, our study sample was limited in size.
This was a single-center study, and patterns of CT interpretation may be influenced by fact that the reviewers have worked together. Further validation across multiple institutions would be important for future studies.
In conclusion, interobserver reproducibility is very good between multiple readers with varying backgrounds when interpreting CT imaging features of TBI, and in calculating the Marshall and Rotterdam scores. Thus NCT can be considered a reliable tool to stratify TBI patients into categories of varying severity and types of injuries, and might be used to improve triaging to appropriate treatments that more effectively target a patient's specific injuries.
Acknowledgments
Max Wintermark receives funding from the National Center for Research Resources (grant KL2 RR024130). The content of the article is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Neurological Disorders and Stroke, the National Center for Research Resources, the National Institutes of Health, or the other sponsors.
Author Disclosure Statement
No competing financial interests exist.
  • Altman D.G. Practical Statistics for Medical Research. Chapman and Hall; London: 1991. pp. 404–408.
  • Borg J. Holm L. Cassidy J.D. Peloso P.M. Carroll L.J. von Holst H. Ericson K. Diagnostic procedures in mild traumatic brain injury: results of the WHO Collaborating Centre Task Force on Mild Traumatic Brain Injury. J. Rehabil. Med. 2004;(43 Suppl.):61–75. [PubMed]
  • Chestnut R. Ghajar J. Maas A.I. Management and prognosis of severe traumatic brain injury. Part 2: Early indicators of prognosis in severe traumatic brain injury. J. Neurotrauma. 2000;17:557–627.
  • Doppenberg E.M. Choi S.C. Bullock R. Clinical trials in traumatic brain injury. What can we learn from previous studies? Ann. N.Y. Acad. Sci. 1997;825:305–322. [PubMed]
  • Eisenberg H.M. Gary H.E., Jr. Aldrich E.F. Saydjari C. Turner B. Foulkes M.A. Jane J.A. Marmarou A. Marshall L.F. Young H.F. Initial CT findings in 753 patients with severe head injury. A report from the NIH Traumatic Coma Data Bank. J. Neurosurg. 1990;73:688–698. [PubMed]
  • Gennarelli T.A. Spielman G.M. Langfitt T.W. Gildenberg P.L. Harrington T. Jane J.A. Marshall L.F. Miller J.D. Pitts L.H. Influence of the type of intracranial lesion on outcome from severe head injury. J. Neurosurg. 1982;56:26–32. [PubMed]
  • Greene K.A. Jacobowitz R. Marciano F.F. Johnson B.A. Spetzler R.F. Harrington T.R. Impact of traumatic subarachnoid hemorrhage on outcome in nonpenetrating head injury. Part II: Relationship to clinical course and outcome variables during acute hospitalization. J. Trauma. 1996;41:964–971. [PubMed]
  • Greene K.A. Marciano F.F. Johnson B.A. Jacobowitz R. Spetzler R.F. Harrington T.R. Impact of traumatic subarachnoid hemorrhage on outcome in nonpenetrating head injury. Part I: A proposed computerized tomography grading scale. J. Neurosurg. 1995;83:445–452. [PubMed]
  • Kakarieka A. Braakman R. Schakel E.H. Clinical significance of the finding of subarachnoid blood on CT scan after head injury. Acta Neurochir. (Wien.) 1994;129:1–5. [PubMed]
  • Maas A.I. Hukkelhoven C.W. Marshall L.F. Steyerberg E.W. Prediction of outcome in traumatic brain injury with computed tomographic characteristics: a comparison between the computed tomographic classification and combinations of computed tomographic predictors. Neurosurgery. 2005;57:1173–1182. discussion 1173–1182. [PubMed]
  • Maas A.I. Steyerberg E.W. Butcher I. Dammers R. Lu J. Marmarou A. Mushkudiani N.A. McHugh G.S. Murray G.D. Prognostic value of computerized tomography scan characteristics in traumatic brain injury: results from the IMPACT study. J. Neurotrauma. 2007;24:303–314. [PubMed]
  • Marshall L.F. Eisenberg H. Jane J.A. Marshall S.B. Klauber M.R. A new classification of head injury based on computerized tomography. J. Neurosurg. 1991;75:S14–S20.
  • Marshall L.F. Marshall S.B. Klauber M.R. Van Berkum Clark M. Eisenberg H. Jane J.A. Luerssen T.G. Marmarou A. Foulkes M.A. The diagnosis of head injury requires a classification based on computed axial tomography. J. Neurotrauma. 1992;9(Suppl. 1):S287–S292. [PubMed]
  • Narayan R.K. Michel M.E. Ansell B. Baethmann A. Biegon A. Bracken M.B. Bullock M.R. Choi S.C. Clifton G.L. Contant C.F. Coplin W.M. Dietrich W.D. Ghajar J. Grady S.M. Grossman R.G. Hall E.D. Heetderks W. Hovda D.A. Jallo J. Katz R.L. Knoller N. Kochanek P.M. Maas A.I. Majde J. Marion D.W. Marmarou A. Marshall L.F. McIntosh T.K. Miller E. Mohberg N. Muizelaar J.P. Pitts L.H. Quinn P. Riesenfeld G. Robertson C.S. Strauss K.I. Teasdale G. Temkin N. Tuma R. Wade C. Walker M.D. Weinrich M. Whyte J. Wilberger J. Young A.B. Yurkewicz L. Clinical trials in head injury. J. Neurotrauma. 2002;19:503–557. [PMC free article] [PubMed]
  • Ono J. Yamaura A. Kubota M. Okimura Y. Isobe K. Outcome prediction in severe head injury: analyses of clinical prognostic factors. J. Clin. Neurosci. 2001;8:120–123. [PubMed]
  • Rosner B. Fundamentals of Biostatistics. Duxbury Press; Belmont, CA; 1995. p. 327.
  • Saatman K.E. Duhaime A.C. Bullock R. Maas A.I. Valadka A. Manley G.T. Classification of traumatic brain injury for targeted therapies. J. Neurotrauma. 2008;25:719–738. [PMC free article] [PubMed]
  • Selladurai B.M. Jayakumar R. Tan Y.Y. Low H.C. Outcome prediction in early management of severe head injury: an experience in Malaysia. Br. J. Neurosurg. 1992;6:549–557. [PubMed]
  • Servadei F. Murray G.D. Teasdale G.M. Dearden M. Iannotti F. Lapierre F. Maas A.J. Karimi A. Ohman J. Persson L. Stocchetti N. Trojanowski T. Unterberg A. Traumatic subarachnoid hemorrhage: demographic and clinical study of 750 patients from the European brain injury consortium survey of head injuries. Neurosurgery. 2002;50:261–267. discussion 267–269. [PubMed]
  • Steyerberg E.W. Mushkudiani N. Perel P. Butcher I. Lu J. McHugh G.S. Murray G.D. Marmarou A. Roberts I. Habbema J.D. Maas A.I. Predicting outcome after traumatic brain injury: development and international validation of prognostic scores based on admission characteristics. PLoS Med. 2008;5:e165. discussion e165. [PMC free article] [PubMed]
  • Viera A.J. Garrett J.M. Understanding interobserver agreement: the kappa statistic. Fam. Med. 2005;37:360–363. [PubMed]
Articles from Journal of Neurotrauma are provided here courtesy of
Mary Ann Liebert, Inc.