|Home | About | Journals | Submit | Contact Us | Français|
To determine if a prediction rule for hospital mortality using dynamic variables in response to treatment of hypotension in patients with sepsis performs better than current models
Retrospective cohort study
All intensive care units at a tertiary care hospital
Adult patients admitted to intensive care units between 2001 and 2007 of whom 2,113 met inclusion criteria and had sufficient data
We developed a prediction algorithm for hospital mortality in patients with sepsis and hypotension requiring medical intervention using data from the Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II database). We extracted 189 candidate variables, including treatments, physiologic variables and laboratory values collected before, during and after a hypotensive episode. Thirty predictors were identified using a genetic algorithm on a training set (n=1500), and validated with a logistic regression model on an independent validation set (n=613). The final prediction algorithm used included dynamic information and had good discrimination (AUC = 82.0%) and calibration (Hosmer-Lemeshow C statistic = 10.43, p=0.06). This model was compared to APACHE IV using reclassification indices and was found to be superior with a NRI of 0.19 (p<0.001) and an IDI of 0.09 (p<0.001).
Hospital mortality predictions based on dynamic variables surrounding a hypotensive event is a new approach to predicting prognosis. A model using these variables has good discrimination and calibration, and offers additional predictive prognostic information beyond established ones.
A number of scoring systems have been developed for use in critically ill patients to determine disease severity and predict mortality. Commonly used outcome prediction scores include the Acute Physiology and Chronic Health Evaluation (APACHE) scores (1), Simplified Acute Physiology Scores (SAPS) (2), and the Mortality Probability Models (MPM)(3). These are based on a combination of variables that reflect pre-existing health as well as variables that reflect physiologic derangement due to acute illness. The first two systems rely on the worst physiologic variables collected within 24 hours of intensive care unit (ICU) admission (4). While these scoring systems have the potential to inform prognosis and resource allocation retrospectively at a cohort level in the ICU (5), their use has been mostly restricted to clinical trials (6), for case-mix determination in retrospective data analyses (7) and benchmarking ICU performance (8). This is largely due to the observation that these scoring systems perform well in predicting outcome at the group level, but continue to perform poorly when predicting survival in individual patients.
There are a number of reasons for the limited predictive ability of current systems. Pertinent causal factors such as genetic factors may be excluded (4). Recently, it has been shown that identification of “worst” values over a day by clinicians is biased (9), which may partially contribute to lower prediction performance than should be possible. In addition to this, scores used to benchmark ICU performance can only use information that is not influenced by local practice (admission and following 24 hours) and therefore do not benefit from the potential prognostic value of later observations.
Additional factors that are currently not well understood may also exist. Recent research has emphasized the importance of early goal directed therapy in reducing mortality from septic shock (10), increasing the need for accurate early warning systems. Changes in the physiologic variables measured within hours of this early critical period may be more predictive of outcome than focusing on the worst values measured within the day after ICU admission. While current scoring systems such as APACHE (1, 11, 12) try to capture the severity of the initial insult, they are likely limited in their ability to capture the “physiologic reserve” of a patient to respond to this insult because they tend to focus on the worst recorded value over 24 hours, and not on the variability in an individual’s immediate response to the physiologic insult (for example hypotension) and its treatment. Additionally, current severity scoring systems have been developed using a knowledge-driven approach where predictors are chosen based on known clinical variables associated with poor outcome. Recent developments in the field of genetic epidemiology (the study of the role of genetic factors in determining health in populations) have demonstrated that study designs which use a heuristic and data-driven approach to select predictors (13) have the potential to discover new causal factors of disease.
Unfortunately, most ICU databases lack sufficient information to fully characterize critical events such as the development of hypotension in sepsis. These variables are captured in MIMIC II (14), an open-access ICU research database that contains highly granular data including minute-by-minute changes in hemodynamic and other physiologic data as well as time-stamped treatments and their dosage, e.g. fluids, blood products, medications. Databases such as these can offer a extremely large number of potential predictive variables, and use of dimensionality reduction optimization procedures such as genetic algorithms (15) should be used in order to select candidate variables for predictive modeling.
In this study, we set out to determine if dynamic variables that change with the onset and treatment of hypotension in septic shock patients can provide prognostic information for mortality beyond the standard variables used in current severity scoring systems. Variables used for inclusion in our final prediction rule were selected via a combined heuristic and automated approach, with the latter approach employing a genetic algorithm, in order to discover new predictors of mortality.
MIMIC II is an open-access research database that encompasses 32,075 patients (in version 2.6) admitted to the medical and surgical at the Beth Israel Deaconess Medical Center (BIDMC, Boston, MA) since 2002 (14) and is freely available on PhysioNet (16). An Institutional Review Board (IRB) approval was obtained from both the Massachusetts Institute of Technology (MIT) and BIDMC for the development, maintenance and public use of a de-identified ICU database. This database contains high-temporal resolution data including lab results, electronic documentation, bedside monitor trends and waveforms.
Using the MIMIC II database, we identified 6,970 patients (21.7%) that matched the definition of sepsis and severe sepsis proposed by Angus et al. (17) of whom 2,155 (6.7%):
42 patients (2.7%) with more than 50% missing data were excluded from further study leaving 2,113 patients in the final dataset (Figure 1).
For each patient record, we extracted at admission or over three different time windows (before, during and after the onset of the hypotensive episode) the following available variables from the database, (Figure 2):
Clinically meaningful non-linear transforms of raw physiological variables were derived: the PaO2/FiO2 ratio (mmHg/torr), heart rate to systolic blood pressure ratio (bpm/mmHg) (also known as the “shock index” (22, 23)) and the BUN to creatinine ratio. Variables known to follow an exponential distribution such as urine output, time from admission to hypotensive episode, length of hypotensive episode and SpO2, were log-transformed. For variables typically sampled at a rate of more than one per day, the minimum, median and maximum values were extracted for each time window (before, during and after the hypotensive episode). The standard deviation was also computed for hemodynamic variables, which have higher temporal resolution. Finally, the algebraic difference between “post” and “pre” measurements was computed resulting in a total of 179 variables. Observations outside a physiologically feasible range were excluded. Missing values were imputed by the mean over the training set (see below).
The dataset was split into a training set with the first 1,500 patients (ordered by a randomly allocated ICU identification number) and a validation set with the last 613 patients (29.0%). The training set was used to select variables and train the model while the validation set was kept for external validation of performance.
Given the large number of potential predictors available, care must be taken to prevent over tting of the model to the training observations. As a general rule, the maximum number of predictors to include in a model should be no greater than the number of events (i.e. deaths) in a sample divided by ten (24). With 1,500 training samples and a mortality rate in this sample of 30%, up to 40 variables could potentially be included in the final model. Trying all possible combinations of 40 (or less) variables from a total of 179 potential variables is computationally prohibitive. Therefore, we used a genetic algorithm (GA) to nd the best combination of variables to be included in our model.
A GA is a search heuristic that mimics the mechanisms of DNA replication and natural selection. It was applied on the training set (n=1,500) to identify the optimal combination of variables to be included in our model. In the first iteration of the algorithm, different subsets of all potential predictors are randomly generated and the performance of each is estimated. At each iteration, the subsets of potential predictors showing the best performance are recombined to generate new subsets of the potential predictors. This process is repeated until the performance stops progressing or when the maximum number of iterations has been reached; this evolutionary process therefore selects the most adapted set of predictors with respect to the given performance (15, 25, 26). The GA has been successfully applied to variable selection (27) and in particular on biomedical datasets (28). Technical aspects of the GA are further described in the Supplemental Digital Content 1, where a link to the open-source code developed for this work is also provided.
Parameters that are selected most often with the GA in the training set were subsequently used to fit a multivariate logistic regression model to predict hospital mortality, and model performance (29, 30) was subsequently evaluated in a completely independent test set (n=613). The area under the receiver operating characteristic curve (31) was estimated using the Wilcoxon statistic (32). Model calibration was assessed by calculation of the Hosmer-Lemeshow C-statistic (33) and calibration plots are provided as recommended by Kramer et al. (34) and can be found in the appendix.
The baseline predicted mortality was obtained with APACHE-IV (1) in the test set. Measures of statistical significance for difference with baseline predicted mortality was computed for a conservative comparison of AUCs derived from the same cases (35). Recently, investigators have suggested that a more useful comparative metric of model performance is risk reclassification (36). Therefore, we calculated the Net Reclassification Improvement (NRI) (37) (38), which measures the ability of a new model to reclassify a high risk individual as higher risk, and a low risk individual as lower risk, for our model as compared to APACHE-IV. Finally, the Integrated Discrimination Improvement (IDI) (36) which takes into account the overall joint improvement in sensitivity and specificity of the new model, was also computed in comparison to APACHE-IV.
We also compared our model to traditional severity scores such as SAPS-I, APS, SOFA, the APACHE-III and IV and the Van Walraven co-morbidity score. The Van Walraven score is a modification of the Elixhauser comorbidity score (20), equivalent to the Charlson score (39, 40), which provides a weighting for 30 comorbidities: it is a validated and easily obtainable proxy for comorbid conditions. Scores designed to be measured at admission such as APACHE-IV were computed only at admission, while others were also evaluated for the day following the hypotensive episode. Finally the Complete Septic Shock Score (CSSS) (41), which is severity score for septic shock patients based on APACHE-III variables (11) was computed.
A total of 2,155 patients met the inclusion criteria of sepsis or severe sepsis with documented hypotension requiring medical intervention, of whom 2,113 had enough data. The demographic and clinical characteristics of these patients are shown in Table 1. Median age was 70.3 (57.2–80.3) years and the overall in-hospital mortality was 28.6%. The median amount of crystalloid administered during the hypotensive event was 1.85 L (0.9 – 3.7). Vasopressors were administered to 1,107 patients (52.4%) during this event.1,486 patients (70.3%) were mechanically ventilated, and 249 (11.8%) underwent renal replacement therapy over the time period considered.
The genetic algorithm was run 500 times and the most frequently selected variables identified by the genetic algorithm were selected for inclusion in the final model (see Table 2). The model dimensionality was set by the GA as explained in the Supplemental Digital Content 1.
Our final model had good discrimination with an AUC of 82.0% as well as good calibration with a Hosmer-Lemeshow C statistic of 10.4 (p=0.064). The performance of our model was compared to multiple other models for mortality prediction in Table 3. When using the AUC as a performance metric, our model had the best performance with a statistically significant (p<0.001) improvement of 12.4 percentage points in AUC over APACHE IV, which had an AUC of 69.6%. The NRI based on continuous measures was 0.19 (p<0.001), indicating that on average, 19% of subjects had their hospital mortality predictions from APACHE IV accurately reclassified with our model designed with a genetic algorithm. Similarly, the IDI was 0.09 (p<0.001) indicating that an aggregate measure of sensitivity and specificity was superior for our model when compared to APACHE IV.
In this study, we took a novel approach to the development of a hospital mortality prediction algorithm by focusing on dynamic variables surrounding a hypotensive event in patients with sepsis and hypotension. Additionally, we used a combined heuristic and algorithm-driven approach to variable selection. When compared to mortality predictions from APACHE IV, our model had a significantly higher AUC and superior risk reclassification. Direct comparison of our results against APACHE-IV is however not straightforward. First, in terms of discriminative power, APACHE-IV was designed for benchmarking and does not use values recorded after the first day of admission that potentially contains discriminative information, whereas our approach does. Second, in terms of calibration, APACHE-IV does not benefit from a re-calibration on our data since chronic health conditions and admission were only extracted for patients in our validation set. Third, without comparing the performance of our model against APACHE IV in an external cohort, we cannot accurately say that our model outperforms APACHE IV under all circumstances, since the ability of a model to discriminate and calibrate decreases when applied to new populations (42). Thus it is not entirely unexpected that our GA based model outperforms APACHE IV when evaluated using our test set, since our training set was drawn from the same population as the test set. However, the robust performance of our model in predicting hospital mortality is notable, and this new approach to predictive modeling in sepsis using dynamic information in conjunction with a heuristic search algorithm such as the GA is promising.
We believe that focusing on dynamic variables surrounding a hypotensive event allows us to capture the individual variation in the response to both a physiologic insult as well as the response to treatment. Current prognostic scoring systems often predict similar outcomes for patients with the same comorbidities, severity of physiologic injury, and degree of organ dysfunction. In clinical practice, there is often wide inter-individual variability in outcome even when subjects fall within the same risk strata according to these scoring systems. This may be because an important predictor of outcome, the individual’s physiologic reserve (43), has not been captured in these scoring systems. Physiologic reserve may account for the difference in clinical outcome that two patients with identical mortality risks (as traditionally defined by age, severity of illness and co-morbidities) and treatment may have. Bion places a large emphasis on the importance of cellular processes in response to stress and oxygen delivery as the major determinant of this physiologic reserve, which is thought to vary between patients because of genetic differences (44).
Prior studies have attempted to measure aspects of the physiologic reserve. For example, Vallet et al demonstrated that in a uniform population of patients with sepsis and normal lactate levels, survivors have an increase in oxygen delivery in response to dobutamine (45); this finding was subsequently validated by Rhodes et al (46). Identification of subjects with relative adrenal insufficiency with the corticotropin stimulation test may capture another aspect of the physiologic reserve (47). The physiologic reserve is likely to be dependent on the complex interplay between an individual’s genetic background (48) and the physiologic insult. We suspect that after controlling for comorbidities, severity of insult and treatment, the dynamic variables surrounding a hypotensive event allow us to determine the contribution of an individual’s physiologic reserve to prognosis, thus allowing better individual (as opposed to group) predictions of hospital mortality in patients with septic shock.
Dynamic information was included in the model in two ways: inclusion of the “delta” variables (the difference between value after the hypotensive episode and before it) for chloride, GCS, creatinine and PaO2; and the selection of a variable at two different time windows as for SOFA, INR, temperature and SpO2 standard deviation, which altogether accounted for nearly half the variables in the model (46.7%). Variables after the hypotensive episode, while closer in time to the hypotensive episode and therefore believed to have a greater predictive power, only summed up to a third of the selected features. Finally, variables before the hypotensive episode (including at admission) represented half the model’s features.
Interestingly, while the genetic algorithm selected previously known predictors of mortality such as age, urine output, shock index, SOFA score and comorbid conditions as measured by the Van Walraven score (21), we also identified the change in serum chloride levels spanning the 24 hour interval before and after the hypotensive event as a significant predictor of hospital mortality. While this finding is interesting, at this point the association of changes in chloride in response to hypotension and sepsis remain speculative and are currently being investigated within our research group.
Strengths of our study include the novel focus on the dynamic events surrounding a hypotensive event in patients with sepsis, in order to capture the inter-individual variability in the response to septic shock and treatments; this may address why prior prediction rules have been useful at the group level but performed poorly when applied clinically to individual patients. Modeling patient-specific physiologic responses to a specific dose of treatment, e.g. blood pressure rise in response to a certain volume of crystalloids, urine output after a certain dose of diuretic, or level of sedation after a certain dose of benzodiazepine, has the potential to personalize treatment guidelines to a degree never achieved before.
Furthermore, a data-driven approach using a genetic algorithm, which is not dependent on prior known biology, allowed us to select the best predictors for inclusion in our model. The adequate use of a cross-validation technique within the fitness function, in addition to an early-termination criterion during the process of feature selection, also prevented overfitting of our model on the available data, which showed good generalization properties on the test set. Finally, this study also demonstrates the significant potential of electronic health records to contribute to scientific research (49, 50).
There are several limitations to this study. Given that this is a retrospective cohort study, and one that involves a single center, further validation of this prediction algorithm either through a prospective study, or in an independent patient population, is required. Furthermore, calculation of mortality using this prediction algorithm may be burdensome for the busy clinician. However, with the increased use of paperless records and digitalization of ICU data, such algorithms can be embedded in the electronic medical record and automatically calculated to provide real-time mortality predictions with immediate application at the bedside. Ultimately, outcome prediction algorithms can be best fine-tuned using local or regional databases that reflect the patient population and physician practices at each center. Finally, the variables selected for inclusion in our model depended on what was available in MIMIC II. While MIMIC II is a highly granular database, in this version some important predictive variables such as the presence or absence of certain co-morbidities and the likely source of infection (pulmonary vs. intra-abdominal vs. bloodstream) are currently not easily obtained, although future versions of the MIMIC database will be more comprehensive. It is possible that with even higher-dimensional data, the variables ultimately selected for inclusion in our predictive model would be different. Thus the strength of our model may lie in the approach, and not specifically in the exact variables chosen for the final model.
In summary, we have demonstrated that dynamic variables measured at the time of hypotension, and in response to fluid and vasopressor treatment, can strongly predict hospital mortality from septic shock. Additionally, we showed that use of a sophisticated algorithm combined with a data-driven approach to predictor selection is a viable approach to outcomes modeling in patients with sepsis and hypotension. We also identified the additional interesting association between dynamic change in chloride during hypotension and hospital mortality, which may deserve further investigation. While further studies in additional ICU populations are needed to validate this approach and these findings, this study is the first to demonstrate that such an approach has the potential to provide better predictions for hospital mortality, highlighting the role that clinical data mining will increasingly play in both knowledge generation and the way we practice medicine.
This document describes the algorithm used to select features for prediction of mortality during the rst hypotensive episode requiring medical intervention in a population of patients with sepsis and severe sepsis. Results are compared with state-of-the-art feature selection techniques and shown equivalent performance. The potential of the Genetic Algorithm as an optimization technique is briefly discussed. The code, on which this research is based, is freely available on Google Code.
This work was supported by the Oxford RCUK Centre for Doctoral Training in Healthcare Innovation funded by the RCUK Digital Economy Program to LM.
The National Institute of Health (NIH), and its National Institute of Biomedical Imaging and Bioengineering (NIBIB) under Grant 2R01 EB001659 funded the MIMIC II database.
The authors have not disclosed any potential conflicts of interest
Address for reprints: Louis Mayaud, Institute of Biomedical Engineering (IBME), Department of Engineering Science, University of Oxford, Old Road Campus Research Building (ORCRB), Off Roosevelt Drive, OX3 7DQ, Oxford United Kingdom