|Home | About | Journals | Submit | Contact Us | Français|
TraumaSCAN-Web (TSW) is a computerized decision support system for assessing chest and abdominal penetrating trauma which utilizes 3D geometric reasoning and a Bayesian network with subjective probabilities obtained from an expert. The goal of the present study is to determine whether a trauma risk prediction approach using a Bayesian network with a predefined structure and probabilities learned from penetrating trauma data is comparable in diagnostic accuracy to TSW.
Parameters for two Bayesian networks with expert-defined structures were learned from 637 gunshot and stab wound cases from three hospitals, and diagnostic accuracy was assessed using 10-fold cross validation. The first network included information on external wound locations, while the second network did not. Diagnostic accuracy of learned networks was compared to that of TSW on 194 previously evaluated cases.
For 23 of the 24 conditions modeled by TraumaSCAN-Web, 16 conditions had Areas Under the ROC Curve (AUCs) greater than 0.90 while 21 conditions had AUCs greater than 0.75 for the first network. For the second network, 16 and 20 conditions had AUCs greater than 0.90 and 0.75 respectively. AUC results for learned networks on 194 previously evaluated cases were better than or equal to AUC results for TSW for all diagnoses evaluated except diaphragm and heart injuries.
For 23 of the 24 penetrating trauma conditions studied, a trauma diagnosis approach using Bayesian networks with predefined structure and probabilities learned from penetrating trauma data was better than or equal in diagnostic accuracy to TSW. In many cases, information on wound location in the first network did not significantly add to predictive accuracy. The study suggests that a decision support approach that uses parameter-learned Bayesian networks may be sufficient for assessing some penetrating trauma conditions.
Probabilistic methods for computerized diagnostic decision support have been demonstrated to be useful in addressing medical problems such as pneumonia diagnosis, mammographic diagnosis of breast cancer, lymph-node disease, abdominal pain, and hypercalcemia (1–5). In the penetrating trauma domain, approaches to providing computerized decision support have included rule-based systems(6), methods that incorporate 3D models of anatomy(7–10), and methods that utilize machine learning with neural networks(11).
The assessment of injury to vital organs as a result of penetrating trauma is a complex clinical problem that requires fusing knowledge of human anatomy, physiology, and patient signs and symptoms. Complicated spatial and physiologic relationships need to be assessed rapidly in the setting of incomplete clinical information. Simultaneous injury to multiple organ systems resulting in several conditions adds to the complexity of this task. Computerized decision support systems that encode the anatomic and physiological relationships necessary for penetrating trauma assessment may help to increase diagnostic accuracy during the early part of trauma resuscitation, before a patient is stabilized and definitive imaging studies can be performed. Well-established databases that provide detailed information linking external wound locations, organ injuries, detailed patient findings and injury outcomes for the penetrating trauma domain are not widely available. While the National Trauma Data Bank and similar trauma registries collate demographic information(12), Glasgow Coma scores, and diagnostic codes, they lack detailed information on patient findings (such as external wound locations, breath sounds, heart sounds, jugular vein distension, bullet locations from xray, hemoptysis, stridor, tenderness, guarding, rebound tenderness, ileus, hematuria, arm pulse strength, leg pulse strength, guaiac status) that facilitate deeper reasoning about trauma consequences. As a result, most of the existing computerized models for penetrating trauma decision support are not derived directly from such data registries.
Several studies have shown the utility of computerized decision support in predicting outcomes in the penetrating trauma domain. Hirshberg et al studied neural networks as a means of predicting the need for damage control in abdominal gunshot injuries(11). Their study shows that bullet trajectory information and blood pressure findings are important predictors of outcomes for patients with abdominal gunshot wounds. To reason about penetrating trauma for the Virtual Soldier project, Rubin et al at Stanford utilize patient-specific 3D models of anatomy from CT-data(9–10). They link these geometric models to the foundational model of anatomy(13), a comprehensive anatomical ontology developed at the University of Seattle, Washington, and also develop a reference ontology of regional perfusion for the heart in order to deduce injuries to anatomic structures that may not be visible in the segmented CT images. A related effort by Pouchard and Dickson also creates ontology-enriched visualization of human anatomy using the foundational model(14). The Stanford project is an impressive effort to utilize canonical knowledge to reason from first principles about the effects of penetrating trauma. As they state, their approach is a deterministic one that does not account for uncertainty associated with injury.
In other related works, Walczak utilized a back-propagation artificial neural network model to predict the blood transfusion requirements of trauma patients in an emergency room(15). Shoemaker et al applied a probabilistic decision support system to predict outcomes and the effects of therapy in 396 patients with severe thoracic and thoraco-abdominal injuries(16). Between 2002 and 2006, we created a detailed thoraco-abdominal penetrating trauma dataset containing gunshot and stab injury related information on 637 patients from three hospitals: Brigham & Women’s Hospital, Massachusetts General Hospital, and MCP-Hahnemann Hospital. In this paper, we compare Bayesian network machine learning approaches for penetrating trauma diagnosis derived from this dataset to TraumaSCAN-Web, which combines 3D geometric reasoning and expert-derived Bayesian networks (network structure and probabilities are specified by experts).
TraumaSCAN-Web (TSW) (7) is a computerized decision support system for assessing chest and abdominal penetrating trauma. It computes the probability of injury to anatomic structures and the probability of consequent conditions, such as pneumothoraces and hemothoraces, using geometric reasoning about anatomic structure involvement and probabilistic reasoning with Bayesian networks. Given the absence of detailed datasets for reasoning about penetrating trauma consequences at the time it was being developed, TSW’s Bayesian network was derived using subjective probabilities from a trauma expert (JRC).
The goal of the present study is to determine whether Bayesian networks with predefined structure and probabilities learned from penetrating trauma data are comparable in diagnostic accuracy to TSW, which utilizes 3D geometric reasoning and subjective probabilities obtained from an expert traumatologist. It is well known that accurate structure learning is difficult, especially in the presence of sparse/missing data (17). In Bayesian networks, the network structure is represented visually by a directed acyclic graph that encodes the causal and associational relationships among domain variables. This visual structure can be readily critiqued by experts, especially in domains where relationships are well-defined and documented in the literature. Often, in such domains, learned structures are assessed as flawed by experts: containing spurious connections that have no real-world analogue or missing connections between nodes that have a known association. Penetrating trauma is a domain in which the consequences of disruption to the normal functioning of organs and organ-systems by projectiles/stabbing are well-understood and documented in the literature. As demonstrated by the Stanford study previously referenced (9–10), using this knowledge, models that reason from first-principles about such disruptions have been developed. Since we had access to experts from this domain, we opted to take advantage of their deep knowledge in developing the network structure.
Although TSW showed good diagnostic accuracy results for the 80 gunshot and 114 stab cases previously evaluated (18), it has some limitations. To evaluate gunshot injuries, TSW’s geometric reasoner requires a pairing of entry and exit wounds or of entry wounds and bullets; thus, the sum of external wounds and bullets must be even. This means that TSW is of limited use when the locations of internal bullets are unknown. TSW was originally designed for use within an ED where access to radiological information about bullet locations is readily available. In order to adapt it for other settings, such as pre-hospital use, we investigate a machine learning approach for trauma diagnosis.
Our hypothesis is that Bayesian networks with structures defined by an expert and probabilities learned from patient data can perform as well as or better than TraumaSCAN-Web using the area under the ROC curve as a measure of diagnostic discrimination.
TraumaSCAN-Web’s Bayesian network relates organ injury to consequent conditions such as a pneumothorax (collapsed lung) or hemothorax, and links these conditions to patient symptoms such as distended neck veins or muffled heart sounds (see Table 1, Figure 1. The appendix maps Bayesian network node labels to explanations). The root node of the original Bayesian network (Hyp) has as its values the various hypothesis identified by TSW’s geometric reasoner. Diagnostic tests modeled in the Bayesian network model reflect the variety of diagnostic tools utilized in trauma between 1988 and the present.
In order to capture information about how external wound locations are related to anatomic structure injury directly from the new trauma dataset, the original BN structure used in TSW was modified. The hypothesis node (Hyp) was replaced by nodes representing external wounds to the chest, bank, flank and abdomen (Bayesian network 1; Figure 2). Abdominal injuries due to a wound penetrating the buttock or thigh region were classified as the right and left lower LLQ and RLQ. The remaining nodes represent common, clinically used anatomical divisions. In order to reduce the complexity of the Bayesian network by preventing the number of root nodes to be very large, midline wounds were marked as both left and right of the affected structure, and wounds at the centre of the abdomen were similarly recorded to include all four abdominal quadrants. Other changes include node deletions and rearrangements that were made to accommodate the structure of the available data. These changes maintain the same causal relationships among the various organ injuries, conditions, test results, and signs and symptoms of the original Bayesian network.
A second Bayesian network was developed to investigate the impact of information about external wound locations on reasoning about penetrating trauma consequences (Bayesian Network 2; Figure 3). To accomplish this, external wound nodes were deleted from Bayesian Network 1, causing the various conditions themselves to be the root nodes of the network.
A total of 718 cases of penetrating trauma to the thoraco-abdominal region were collected from various sources. Of these, 471 cases were previously obtained from MCP Hospital (MCP) and Brigham & Women’s Hospital (BWH) and included the 194 cases used to previously evaluate TSW(18). This data was checked for accuracy. In 2006, an additional 210 cases were obtained from Massachusetts General Hospital (MGH) using relevant ICD9-CM diagnosis codes that have been previously described.
A number of data items were collected for each case from electronic discharge summaries, operative notes, and case wound diagrams with textual descriptions from chart reviews. A total of 71 clinical variables that were known at the time of initial assessment of the trauma patient by a physician were used to populate the Bayesian network. These largely included physical exam findings, laboratory results, and X-ray findings. Anatomic information about external wound locations were captured from case wound diagrams and used to populate the relevant nodes of the Bayesian network. Many of the data fields were missing (either because they were unknown or were simply not recorded) and were marked accordingly.
Of these data, a total of 81 cases inappropriate for the study were excluded for a variety of reasons. These include:
The modified Bayesian network structures were stored in the Bayesian Interchange Format for use with Weka, an open-source machine learning toolkit (19). Two distinct learning studies were performed. The first study involved parameter learning and assessing the performance of the modified Bayesian networks on penetrating trauma data from three hospitals. In the second study, we performed parameter learning by partitioning the data into training sets using data from three hospitals and test sets using data from two hospitals, in order to facilitate direct comparison with results from a previously published study.
In the first study, we combined data collected from a new hospital (MGH) with existing data from two hospitals (BWH, MCP). Using the two modified Bayesian network structures, we learned prior or conditional probabilities for each network node. The resulting networks were used to evaluate each of the 24 diagnoses listed in Table 2 using 10-fold cross-validation. For the second study, we performed parameter learning and diagnostic accuracy estimation with a focus on evaluating gunshot and stab cases that had been previously assessed using TSW. Since TSW utilizes subjective probabilities supplied by an expert within its Bayesian network, this allows for a comparison of results derived in part from subjective probabilities with results from Bayesian networks that utilize parameter learning. To achieve this, we split the data into training and test sets: for gunshot injuries, data was split into 557 training cases and 80 test cases, the same 80 as were published in Matheny et al (18), and for stab injuries, data was split into 523 training cases and 114 test cases, the same 114 as published previously. Weka was used to learn parameters and the AUC for each diagnosis was calculated. Finally, pair-wise comparisons were made between the empiric models and TraumaSCAN-Web using the Hanley-McNeil test (20). A Bonferroni correction was applied to account for the multiple comparisons, giving p<.0031 and p<0.0028 to indicate statistical significance for stab and gunshot cases, respectively.
The results from the 10-fold cross validation were used to calculate AUCs for the 24 possible diagnoses modeled by the two Bayesian networks (see Table 2). Esophageal injury could not be evaluated due to a lack of occurrences of that injury in the new dataset. For each condition, AUCs with the 95 percent confidence intervals (95% CI) are reported in Table 2. Discrimination estimates for thoracic injuries ranged from 0.953 to 0.994 for lung injuries, 0.767 for tracheobronchial tree injuries, and 0.543 to 0.989 for heart-related injuries. Injuries to the diaphragm and descending aorta ranged from 0.503 to 0.550 and 0.525 to 0.582, respectively. Non-specific intra-abdominal injuries were 0.917 while other intra-abdominal solid organ injury estimates ranged from 0.772 to 0.928. There were no significant differences in AUCs with the exception of heart injury, for which Network 1, which includes external wound locations performed considerably better than Network 2, which does not.
Pairwise comparisons of the area under the ROC curve for stab injury data show that diagnostic accuracies of the learned models were greater than TSW for 3 out of 16 outcomes (p<.0031 indicating statistical significance, based on the Bonferroni correction), namely right and left lung injuries and right hemothorax. TSW’s prediction for diaphram and heart injuries were superior to the learned networks’ estimates; however, these were not statistically significant. The learned models performed similarly to or better than TSW for all other estimates, without statistically significant differences. 6 conditions that did not occur in the previously collected dataset were not compared.
For gunshot injury data, the learned networks performed better than or equal to TSW for the majority of outcomes, while TSW had superior performance for diaphragm, heart and descending aorta injuries. However, none of these differences were statistically significant differences (p<0.0028 based on the Bonferroni correction). 4 conditions that did not occur in the previously collected dataset were not compared.
In this study, we demonstrate that a trauma diagnosis approach using Bayesian networks with predefined structures and probabilities learned from penetrating trauma data is comparable to, and in some instances better than TraumaSCAN-Web in diagnostic accuracy based on areas under the ROC curve. The Bayesian networks performed very well in discriminating between the various thoracoabdominal injuries evaluated using 10-fold cross validation. For the 23 conditions evaluated, 16 conditions had AUCs over 0.90 while 21 conditions had AUCs over 0.75 for Network 1. For Network 2, 16 conditions had AUCs over 0.90 and 20 conditions had AUCs over 0.75. Lung and renal injuries were amongst the best estimations while diaphragm and descending aorta injury estimations were the poorest. Thoracic and retroperitoneal diagnoses generally outperformed abdominal diagnoses.
The comparison of the learned Bayesian networks to TSW shows that the learned models significantly outperformed TSW consistently for lung injuries, including hemothoraces. Other thoracic and abdominal diagnoses were also more accurately discriminated by the learned models, but the differences were not statistically significant.
The predictive accuracy of Network 1 (the graph with nodes representing external wound locations) is generally very similar to Network 2 (the graph without external wound location nodes), suggesting that for many diagnoses, information about patient signs, symptoms, and test results is sufficient for reasoning about penetrating trauma consequences. Diagnoses for which there appears to be a significant advantage to having external wound location information are heart injury, and to a lesser extent, descending aorta injury (see Table 2). No broad conclusions on diagnostic accuracy can be drawn from results in Tables 3 that show a perfect AUC – for many of these cases, there was just one instance of a particular type of injury in the test set.
Heart and diaphragm injuries are two diagnoses that were discriminated better by TraumaSCAN-Web than the learned Bayesian network models for stab or gunshot wounds. Diaphragm injury was significantly better discriminated by TSW for gunshot injury cases, although the improvement in discrimination was not statistically significant for stab injury cases. Accurately diagnosing diaphragmatic injury is generally considered to be difficult given that patients may remain asymptomatic with this type of injury(21–25). Furthermore, imaging studies (CT scans, in particular) often miss diaphragmatic rupture (26). While magnetic resonance imaging is shown to be more accurate (27), this imaging modality has less utility in acute trauma settings. It is therefore not surprising that a machine learning approach that relies only on clinical features and some imaging investigations (specifically x-rays for diaphragmatic rupture) and did not utilize any anatomical model would perform worse for this type of injury.
There were a number of limitations of the data collected that resulted in some missing data and may therefore have reduced the predictive accuracy of the learned network. The data used for machine learning was collected from three different hospitals in two states, recorded by a large number of health care professionals, and collected by three different individuals. This potentially introduces sources of error and inconsistency. First, there is an inherent variability in how different physicians record medical information on discharge summaries and case wound diagrams. Some physicians may not explicitly mention negative signs and symptoms, even if they are pertinent negatives and the patient was examined for them, assuming that the absence of a record can safely imply that the sign or symptom is negative.
Cases of extensive trauma, especially those that resulted in the patient’s death, were another source of missing data. For example, in cases where the heart was hit, there is often little explicit information on less pertinent matters like weakened peripheral pulses. Some of these cases had such extensive missing data (almost nothing recorded beyond the fact that the patient expired) that they were inappropriate for the study and had to be excluded. Thus the learned models could not benefit from some of these interesting cases. This experience shows that there are instances in which expert physicians can significantly contribute to the knowledge base based on the knowledge they have acquired from handling such cases.
Several of the trauma cases involved injury to other parts of the body in addition to the thoracoabdomen. While this did not affect most of the signs and symptoms, some cases inevitably introduced “noise” in the data as a result of this. For example, a case that involved a gunshot wound to the back and the wrist resulted in a weak right radial pulse, causing an incorrect association between a back wound and radial pulses in the absence of the contextual knowledge of a gunshot wound to the wrist.
A distinct advantage of the new Bayesian network models over TraumaSCAN-Web is that they can handle cases when the internal locations of any bullets are not yet known. This is important because the 3D modeling approach used by TSW requires pairing up entry and exit wounds or entry wounds with bullets (the sum of external wounds an bullets must therefore be even), excluding cases where external wounds are known but the internal locations of bullets are not. The ability to make relatively accurate penetrating trauma predictions using just those patient findings that can be obtained without in-hospital tests (e.g., decreased breath sounds, muffled heart sounds, distended neck veins, etc.) is a key step for creating a decision support tool that could be used by Emergency Medicine Technicians (EMTs) in a pre-hospital setting. Of course, the accuracy of a Bayesian network-only solution depends a great deal on the amount of patient findings provided, while TSW’s does not. However, the results obtained using Network 2suggest that predictive power may still be decent for certain traumatic conditions in the absence of information on external wound location. If these results are replicated in further studies, it increases the number of viable strategies that can be employed for developing a pre-hospital decision support solution.
An advantage of TSW is that it provides greater control in specifying wound locations and can more accurately specify the 3D space traversed by the wound. This not only avoids the lack of spatial specificity encountered in the machine learning approach, but also overcomes the difficulty of describing whether a wound is superficial or deep (and how this affects the probability of organ injury). Furthermore, by providing a wound diagram for the user to mark, TSW avoids the fuzzy boundaries of the anatomical structures that were encountered during data collection.
In this study, we have shown that for many trauma diagnoses, Bayesian Networks with expert-defined structures and parameter learning performed as well as or better than an approach that combines 3D geometric reasoning with a Bayesian network derived from subjective expert probabilities. To the extent that this helps to address the problem of requiring wound-wound or bullet-wound pairing in TSW, these results demonstrate a potential solution for penetrating trauma decision support in the pre-hospital setting, which we will investigate further. A parameter learning approach to Bayesian networks is promising in the domain of penetrating trauma injury and future studies are warranted to explore the combination of expert knowledge with knowledge learned from data in other domains of medical decision support.
This study was supported by the National Library of Medicine under grants 1R01LM07167 and 1T15LM07092, and the CREMS Summer Program Award at the University of Toronto.