National benchmarking initiatives such as the National Surgical Quality Improvement Program (NSQIP) have highlighted the need for accurate risk stratification in both the surgical and trauma patient.11
Historically, the usefulness of severity scores has been in their ability to stratify patients for research and prognostic applications, but they now have implications for stratifying outcomes for both reimbursement and credentialing purposes. In short, the stakes for accurate stratification have been raised. A possible implication for the increased precision needed in risk stratification is the very real likelihood that there is not “one size fits all” score, and this concept may be particularly true for the trauma population. But even within the trauma population, specific subgroups may exist which require specific severity scoring for accurate risk stratification. Our objective was to evaluate the ability of commonly used trauma scores, along with a more general ICU scoring system (APACHE II), to accurately predict death in critically injured patients requiring extensive care in an ICU.
In this cohort of critically injured patients requiring greater than 48 hours of ICU care, APACHE II was the superior score in predicting mortality. This difference undoubtedly lies in the greater incorporation of physiologic and biochemical data into APACHE II. While TRISS has improved predictive accuracy over the strictly anatomic ISS, it still falls short of the predictive accuracy of APACHE II in this cohort. Others have demonstrated similar findings in this subset of trauma patients. In 691 helicopter-transported injured patients, APACHE II was a good predictor of mortality in acutely injured patients with an AUROC larger than that of ISS or the Trauma Score (TS)23
. This data contradicts previous reports that declared APAHCE II to be invalid in the trauma patient18
. McKenea et al.
have previously reported a very low correlation between APACHE II and length of stay, and also noted that APACHE II did not correlate well with either ISS or TRISS. While these results were interpreted as an invalidation of APACHE II in trauma patients, it is important to note that the primary purpose of this study was to evaluate whether or not APACHE II could accurately predict resource allocation, not clinical outcomes. In fact, the ability of APACHE II to predict clinical outcomes such as death was not reported. It is not surprising that APACHE II does not correlate well with hospital length of stay in a linear model since patients with very low APACHE II (early discharge) and very high APACHE II (early death) will have low hospital LOS, while patients with intermediate APACHE II scores will have longer hospital LOS. The poor correlation between all of the severity scores and length of stay in this study confirms this relationship.
Evaluation of the multiple score sub-components reveals important information regarding the physiologic and biochemical parameters that are useful for severity scoring in this population. Traumatic brain injury (TBI) is clearly an important contributor to death in the trauma patient, and the subcomponent scores which account for head injury (GCS and AIS head or neck) were both independently associated with mortality. The bimodal nature seen in the relationship between GCS and AIS head and subsequent mortality is likely due to the requirement for 48 hours of ICU care for study inclusion. Multiple-trauma patients with GCS >10 scores who required ICU admission for at least 48 hours are likely to have severe chest, abdomen or extremity injury with associated exsanguination. Patients with less severe chest or abdominal trauma in combination with a lower GCS representing even a mild head injury will also require greater than 48 hours of ICU care. In comparing these patient populations, both requiring greater than 48 hours in the ICU, multiple bodily injuries in the absence of TBI may confer a higher risk of death than a mild brain injury and less severe bodily injury accounting for the second peak in the relationship between GCS and AIS head and mortality. It is expected that patients with a severe TBI (GCS<5) will require admission to the ICU regardless of other injuries and will have a higher likelihood of subsequent mortality.
Interestingly the lowest hemoglobin within the first 24 hours was not associated with mortality, but this could be explained by the aggressive use of blood product transfusions. Variables which account for blood product transfusions may improve the accuracy of severity scoring in trauma. Unlike medical ICU patients, trauma patients infrequently present with derangements in common blood chemistries (serum sodium, potassium and creatinine). However, when these derangements are present, they are associated with mortality. Other physiologic parameters that are used as predictors of mortality in trauma patients (lactate, blood glucose, mixed venous oxygen saturation) are notably missing from general ICU scoring systems such as APACHE II.
The strengths of this study include its prospective nature, large sample size, and evaluation of several severity scores and their subcomponents. To our knowledge this represents the largest analysis of scoring systems in critically injured patients using data collected prospectively by dedicated research personnel. Despite these strengths, there are several important limitations. One limitation is that patients who died or were discharged from the ICU prior to 48 hours were not a part of the original study cohort. It is unclear if patients who die within 48 hours or those who are not ill enough to require 48 hours of ICU care differ in a way from those in this cohort as to make our conclusions invalid. In these patients, ISS may be artificially low if not all injuries are discovered prior to death. Regardless of this uncertainty in the ability to generalize these data we believe they are important for two reasons:
- The difficulty of outcome prediction—and the area where severity scores have the most value—is within the range of patients between those who are neither so severely injured as to die within 48 hours or only admitted to the ICU for short observational stays. For this reason, it is how various scores perform for these “in-between” patients that are of interest. We believe patients remaining in the ICU for at least 48 hours accurately represent this population.
- Besides outcome prediction, severity scores are used for risk stratification during research. Perhaps, this is their most important utility. As with most research protocols, our original study did not include “all-comers”. Most ICU based research protocols restrict study enrollment to some minimal illness severity and exclude patients who are predicted to die within a short period of time. In other words, this cohort is exactly the population to whom these scores are most commonly applied. These data would suggest that studies of critically ill trauma patients should use APACHE II as the most accurate assessment of risk. This suggestion is irrespective of the ability of ISS or TRISS to accurately predict the outcome of “all-comers.”
A second limitation is the variability in the frequency of data collection within our patient population. More frequent parameter checks would lead to more accurate APACHE II scores. Additionally, TRISS and ISS by the definition are determined based only on data collection at admission and do not reflect a patient’s condition over 24 hours.
Using APACHE II as a tool to assess quality of care or physician performance has some noted limitations. By definition, the APACHE II score includes the most deranged physiologic values from the first 24 hours of hospital admission. Physicians or ICUs providing poor care could be inappropriately rewarded if a patient’s condition worsened severely in the first 24 hours leading to a higher APACHE II score. This is a valid grievance against the use of APACHE II as a national quality evaluation tool, but the complaint assumes that poor medical care would be the only cause of a decline in the patient’s health in the first 24 hours. Underlying medical conditions as well as demographic factors can lead to a delayed physiologic response to an anatomic insult. In these circumstances, APACHE II would be a more accurate representation of the patient’s clinical status than either ISS (anatomic only) or TRISS (physiologic variables collected at admission). The differences in TRISS and APACHE II could be mostly explained by this difference in timing of collection, but regardless, the evolving physiologic response to injury in the first 24 hours is noted to be an important determinant of outcome. Therefore, this study supports APACHE II as an example of a physiologic severity score with applicability to critically ill trauma patients, but not as the ideal score for a national benchmarking tool.
In conclusion, this study adds to the available literature regarding severity scoring among the critically injured patient. It demonstrates that general ICU scoring systems with a strong physiologic basis such as APACHE II are beneficial in trauma patients requiring greater than 48 hours of intensive care; in fact, they may be better predictors of clinical outcomes than traditional trauma scores. As benchmarking initiatives become more prevalent, the need for accurate risk stratification will become more important. These initiatives should not focus on “one size fits all” scores, but instead, search for population and sub-populations specific scores that allow for the most accurate risk stratification. Further research should focus on the development of a scoring system specifically for critically injured patients which would ideally be easy to use and widely adopted.