To systematically review studies evaluating the performance of Sequential Organ Failure Assessment (SOFA)-based models for predicting mortality in patients in the intensive care unit (ICU).
Medline, EMBASE and other databases were searched for English-language articles with the major objective of evaluating the prognostic performance of SOFA-based models in predicting mortality in surgical and/or medical ICU admissions. The quality of each study was assessed based on a quality framework for prognostic models.
Eighteen articles met all inclusion criteria. The studies differed widely in the SOFA derivatives used and in their methods of evaluation. Ten studies reported about developing a probabilistic prognostic model, only five of which used an independent validation data set. The other studies used the SOFA-based score directly to discriminate between survivors and non-survivors without fitting a probabilistic model. In five of the six studies, admission-based models (Acute Physiology and Chronic Health Evaluation (APACHE) II/III) were reported to have a slightly better discrimination ability than SOFA-based models at admission (the receiver operating characteristic curve (AUC) of SOFA-based models ranged between 0.61 and 0.88), and in one study a SOFA model had higher AUC than the Simplified Acute Physiology Score (SAPS) II model. Four of these studies used the Hosmer-Lemeshow tests for calibration, none of which reported a lack of fit for the SOFA models. Models based on sequential SOFA scores were described in 11 studies including maximum SOFA scores and maximum sum of individual components of the SOFA score (AUC range: 0.69 to 0.92) and delta SOFA (AUC range: 0.51 to 0.83). Studies comparing SOFA with other organ failure scores did not consistently show superiority of one scoring system to another. Four studies combined SOFA-based derivatives with admission severity of illness scores, and they all reported on improved predictions for the combination. Quality of studies ranged from 11.5 to 19.5 points on a 20-point scale.
Models based on SOFA scores at admission had only slightly worse performance than APACHE II/III and were competitive with SAPS II models in predicting mortality in patients in the general medical and/or surgical ICU. Models with sequential SOFA scores seem to have a comparable performance with other organ failure scores. The combination of sequential SOFA derivatives with APACHE II/III and SAPS II models clearly improved prognostic performance of either model alone. Due to the heterogeneity of the studies, it is impossible to draw general conclusions on the optimal mathematical model and optimal derivatives of SOFA scores. Future studies should use a standard evaluation methodology with a standard set of outcome measures covering discrimination, calibration and accuracy.