|Home | About | Journals | Submit | Contact Us | Français|
Despite improvements in the care of critically-ill patients, hospital mortality for acute lung injury remains high at approximately 40%. We developed a classification rule to stratify mechanically ventilated patients with acute lung injury according to hospital mortality and compared this rule to the APACHE III prediction.
We used data of 2022 participants in ARDS Network trials to build a classification rule based on 54 variables collected prior to randomization.
We used a classification tree approach to stratify patients according to hospital mortality using a training subset of 1800 participants, and estimated expected prediction errors using tenfold cross validation. We validated our classification tree using a subset of 222 participants not included in model building, and calculated areas under the receiver operating characteristic curves (AUCs).
We identified combinations of age (>63 years), BUN (>15 mg/dl), shock, respiratory rate (>21 breaths/minute), and minute ventilation (>13.9 L/minute) as important predictors of hospital mortality at 90 days. The classification tree had a similar expected prediction error in the training set (28% versus 26%; p=0.18) and AUC in the validation set (0.71 versus 0.73; p=0.71) as did a model based on APACHE III.
Our tree-based classification rule performed similarly to APACHE III in stratifying patients according to hospital mortality, is simpler to use, contains risk factors that may be specific to acute lung injury, and identified minute ventilation as a potential novel predictor of death in patients with acute lung injury.
Acute lung injury (ALI) affects approximately 190,000 patients and is responsible for two million critical care days and 75,000 deaths in the United States each year (1). 7% of patients admitted to an intensive care unit and 11 to 23 percent of patients with respiratory failure have ALI (2). Despite improvements in critical care over the last several decades (3–6), mortality for this syndrome remains high at approximately 40% (1).
No accurate classification system specific for ALI exists to date. Several studies have attempted to identify predictors of mortality in patients with ALI to improve prognostication (7–18). Age, multiple organ failure, comorbidities, and comprehensive scoring systems such as APACHE III and SAPS 3 (19–20) have been frequently used to predict mortality. These approaches, however, were designed for use in the general population of critically ill patients and not specifically for patients with ALI. Some researchers have identified measures of lung injury severity such as minute ventilation (7), oxygenation index (11, 21), dead-space fraction (14), PaO2/FiO2 (7, 12, 15) and plateau pressure (22, 23) as independent predictors of mortality, whereas others have identified only non-pulmonary predictors (8–10, 16). In addition, scoring systems use numerous predictors to calculate a composite score to predict mortality (19–20). These scoring systems, while relatively accurate, require many data inputs and are cumbersome.
All of the previously mentioned studies used logistic regression methods for classification (7–18). Given the complexity of critically ill patients, there are many potential predictors of mortality, and numerous interactions may exist between these predictors. When using logistic regression, investigators must decide which predictors and interactions to include in the model. Even for experts, it can be difficult to conceptualize which interactions may be clinically important.
A classification tree is a statistical learning approach that can be used to develop prediction models (24). This approach considers all potential predictors and their interactions to classify participants according to an outcome. The predictors are used to recursively partition the data into subsets according to a set of decision criteria. Classification trees may provide an advantage over traditional regression approaches by identifying hard to detect interactions. They have been successfully used to predict a number of different outcomes including the risk of major complications in patients who present to the emergency department with chest pain (25), the risk of mortality in patients hospitalized with heart failure (26), and the use of coronary angiography and revascularization procedures after thrombolysis for acute myocardial infarction (27). We sought to build a classification tree that stratified a cohort of ALI patients according to their risk of hospital mortality.
We used data of all participants enrolled in clinical trials of the Acute Respiratory Distress Syndrome (ARDS) Network between 1996 and 2005 (3, 4, 28). Same inclusion criteria and similar exclusion criteria were used across included trials (Online Supplement). Our primary outcome was hospital mortality at 90 days. We chose this outcome because, in critically ill patients, the risk of mortality has been shown to extend well beyond 30 days (29). We included 54 predictors for which data were collected prior to randomization (Online Supplement). We defined shock as use of vasopressors. We did not include year of enrollment in our analysis because it was not an important predictor of hospital mortality at 90 days.
Our objective was to build a classification tree according to the risk of hospital mortality at 90 days (24). We then determined how our model performed against a logistic regression model of APACHE III score on hospital mortality at 90 days.
We used a conditional inference tree to develop our prediction rule (30, 31). To build our classification tree, we used a three-step approach. First, we examined pairwise correlations to identify highly correlated predictors. In this first step, we reduced the number of predictors to 49. Second, we used a random forest with 500 classification trees to identify the ten most important predictors (32). By creating numerous classification trees, the random forest indicates which predictors are used most frequently to partition the data with the goal of optimizing discrimination between participants who died in the hospital at 90 days and those who were alive. The random forest produces a statistical measure known as variable importance for each predictor included in the model, which is scaled against the variable with the highest degree of discrimination. Third, we built a classification tree using the ten predictors with the highest variable importance. We classified participants as dead in the hospital at 90 days if the predicted mortality was ≥0.5 and alive if otherwise.
We used 1800 participants for the training set (approximately 90% of all the data) and 222 participants (approximately 10%) as a test set. We used ten-fold cross-validation to identify the stopping criterion that minimized the expected prediction error (EPE) of our model. The EPE is a statistical measure of how well the predicted outcomes agree with the observed outcomes. We compared the EPE of the tree model with the EPE for the logistic regression model of APACHE III score using a t-test. We validated our classification tree using the test set and compared the areas under the ROC curve with that of the APACHE III score using a Mann-Whitney test (33). See Online Supplement for additional details.
We used the Cochran-Armitage test for trend (34) and logistic regression to determine if the proportion of patients who died in the hospital at 90 days decreased over time between 1996 and 2005. We used R (www.r-project.org) for statistical analysis.
This study was approved by the Internal Review Board of the Johns Hopkins School of Medicine, Baltimore, MD.
2022 participants received lung protective ventilation. We provide summary statistics for all 49 predictors in Table 1. Median age was 49 years and 50% were male. The most common causes of ALI were pneumonia (41%), sepsis (24%) and aspiration (15%). 33% of participants required vasopressors before randomization. Hospital mortality at 90 days did not change between 1996 and 2005 (Figure 1; p=0.31). We summarized differences in baseline characteristics across trials in Table 2.
The ten most important variables, in decreasing order of importance, were age, blood urea nitrogen (BUN), barotrauma, albumin, cancer, bilirubin, total respiratory rate, minute ventilation, shock and urine output (Figure 2). We assigned age a relative variable importance of 100% and compared all other predictors with age.
The EPE for our classification tree was 27.8%. That is, our classification tree misclassified participants who died in the hospital at 90 days as survivors and vice versa approximately one quarter of the time. However, the EPE for our classification tree was similar to that of a model based on the APACHE III score (Table 3, p=0.18, t-test).
Our final classification tree identified five predictors to stratify participants according to hospital mortality at 90 days (Figure 3). Age was the first split criterion. Participants aged >63 years had a higher hospital mortality than those who were aged ≤63 years (48% versus 22%, respectively). Total respiratory rate and minute ventilation further split participants after age. For patients aged ≤63 years, the absolute difference in hospital mortality for participants with a minute ventilation >13.9 L/minute versus ≤13.9 L/minute was 15% (32% versus 17%, respectively). Among those ≤63 years of age and with a minute ventilation ≤13.9 L/minute, the difference in hospital mortality with a BUN>15 mg/dl versus ≤15 mg/dl was 13% (23% versus 10%, respectively). The presence of shock provided further discrimination in two instances. At those splits, the difference in hospital mortality between patients with shock versus those without shock was approximately two to three fold. Among those >63 years of age, total respiratory rate was better than minute ventilation at partitioning participants age according to hospital mortality.
We then applied our classification tree to the 222 participants who were placed aside as a test set. We did not find a difference between the areas under the ROC curves for the prediction model based on the classification tree and the logistic regression model based on the APACHE III score (Figure 4; p=0.71; Mann-Whitney test).
Using a classification tree approach, we derived a simple rule to stratify ALI patients according to the risk of hospital mortality by age, minute ventilation or respiratory rate, BUN and shock. Our classification tree performed as well at discriminating between survivors and non-survivors as did a model based on APACHE III, but is simpler to use and does not require any calculations. It also identified minute ventilation as a potential novel marker of disease severity in ALI.
Age was the most important predictor of mortality. This finding is consistent with previous studies of mortality in ALI patients (7–8, 10, 12–13, 17, 35). Age is also a component of comprehensive scoring systems for critically-ill patients (19, 20). Minute ventilation and respiratory rate were also important predictors. Minute ventilation incorporates both tidal volume and respiratory rate; however, since our participants received lung protective ventilation with low tidal volumes, respiratory rate may have contributed more than tidal volume to the importance of minute ventilation for predicting mortality. A high minute ventilation or respiratory rate may indicate either severe respiratory failure or an attempt to compensate for acidosis secondary to hypoperfusion or to an elevated pulmonary dead space. Severe respiratory failure is the cause of death in <20% of ALI patients (35, 36). However, sepsis with multiple organ failure is the most common cause of death (35, 36). Therefore, the importance of minute ventilation or respiratory rate for predicting mortality may be associated with hypoperfusion, metabolic acidosis and elevated pulmonary dead space. However, we cannot ascertain which mechanisms led to an increased minute ventilation nor ascertain the prognostic importance of these mechanisms. Additional mechanisms that may have lead to increased minute ventilation in ALI patients include anxiety, pain, and setting the respiratory rate at a higher level than needed.
Shock before randomization was also an important predictor of death. This is consistent with a recent finding by our group in which mortality was 31% in those with shock versus 12% in those without shock (37). The most common cause of elevated BUN is decreased renal function. One study demonstrated that the mortality rate of those who develop acute kidney injury (defined as a 50% rise in creatinine) was 58% compared with 28% in those who do not develop acute kidney injury (38). However, BUN may also be elevated if renal excretion of urea is temporarily impaired due to hypoperfusion or shock and highlights the importance of shock as a predictor of ALI mortality. This is consistent with prior studies that have demonstrated higher mortality in patients with ALI from sepsis compared to ALI from other causes (35, 36, 41, 42).
Our analytical approach identified a different set of predictors from that of a previously proposed model (39). In that study, the investigators conducted a retrospective analysis of three multicenter clinical studies in patients with ALI, and identified predictors of death or ventilator dependence from variables prospectively recorded during the first three days of mechanical ventilation. They found that age, oxygenation index, and cardiovascular failure three days after intubation predicted death or prolonged mechanical ventilation (39). Additional markers of nonpulmonary organ failures such as creatinine, platelet count, bilirubin, and Glasgow Coma Scale score were not important. In contrast, our model identified both blood urea nitrogen and bilirubin as important predictors of mortality. There are several reasons why our findings may differ. First, previous the study used observational data to derive their predictive model whereas we used data from clinical trials collected prior to randomization. Baseline differences persist when using data of patients enrolled into clinical trials because once randomization occurs patients are treated according to a standardized protocol. On the other hand, observational studies are subject to variation in clinical practice that affects patient outcomes. Therefore, when predicting mortality using baseline patient data, data from clinical trials may lead to an unbiased estimation when compared to data from observational studies. Second, the previous investigators used logistic regression, whereas we used classification trees. Third, our analysis uses five times more patients in the derivation cohort to build the model than did the previous investigators, but our validation cohort was substantially smaller. Finally, the previous study only used data of ALI patients who were alive and mechanically ventilated on day three, whereas we used baseline data of all enrolled patients. By excluding those patients who died prior to day three, the previous investigators may have overlooked important predictors of early mortality.
Our classification model may also be used to stratify ALI patients according to the risk of hospital mortality. A recent study developed a simple prediction score for mortality in patients with ALI that included age, bilirubin, 24-hour fluid balance, and hematocrit (8). While the ability of this previous score to discriminate between survivors and non-survivors was not as good as APACHE III, the score was well calibrated; the predicted and observed mortalities were reasonably similar. Calibration may be more important than discrimination because calibration involves the absolute risk of death for a patient rather than determining whether that patient is more or less likely to die when compared with other patients. Similarly, the expected prediction error in our model was no different from that of APACHE III.
Another important use of classification models is risk stratification for enrollment into clinical trials. For the results of a trial to be applicable to a general population of patients with ALI, it is important to include as many patients as possible. With this approach, the population of enrolled patients is heterogeneous and includes subsets of patients with highly variable risks of death. Without this approach, increasing numbers of patients will need to be enrolled into future clinical trials to demonstrate a reduction in mortality due to new therapies,. In addition, mortality in ALI patients may be attributable to non-pulmonary organ failure rather than just ALI. One strategy for reducing sample size would be to selectively recruit patients with a high risk of mortality attributable to ALI and a high likelihood of response to therapy (40). If the mortality attributable to ALI is low and a therapy that is directed towards reducing mortality from ALI is being tested, a large sample would be required to demonstrate a difference in mortality between the control group and the treatment group (40). Although we validated our classification tree in a test set, it needs to be tested prospectively in another group of patients to see how well it can predict hospital mortality. If accurate, our classification tree would be simple to use to screen ALI patients for enrollment into clinical trials.
We used classification trees to predict hospital mortality because these methods can identify both important predictors and complex interactions between them. The automatic ability to identify interactions is an important advantage over traditional methods. Using standard regression methods, an interaction must first be considered by the investigator and then tested in the model to determine whether it is significant. For example, in our classification tree, the combination of age greater than 63 years and total respiratory rate greater than 21 breaths/minute yielded a predicted hospital mortality of 61%. This is one example of an interaction that may not be considered by an investigator because the combination of age and total respiratory rate may not seem like a clinically important interaction. Interestingly, some of the most important predictors identified by the random forest were not present in the final classification tree. This may be because the interactions between some of the other predictors are more effective at predicting mortality than some of the individual variables. One disadvantage to this approach is that there may be an important loss of statistical power when using continuous variables because the tree selects the optimal cut-point for each continuous predictor. This allows for a classification rule that is easy to use, at the cost of statistical power (43). Classification trees have been used successfully to predict clinical outcomes in cardiology (25–27), but are just starting to be applied to critically-ill patients (44).
Our model performed as well as a model based on the APACHE III score; however, both models had an expected predictor errors of approximately 25%. The classification tree may perform better at predicting ICU mortality, rather than hospital mortality, because the time of death is closer to when the predictor data were collected. Furthermore, our model and APACHE III use only baseline data to predict hospital mortality. Several studies have indicated that longitudinal scores may be more accurate for predicting mortality than using baseline scores (45–48). One study demonstrated that only 6% of patients with a higher acute physiology score on Day 3 compared to Day 1 survived to hospital discharge (45). This was in contrast to 43% survival to hospital discharge for those patients whose acute physiology scores were lower on Day 3 compared to Day 1. In addition, the change in acute physiology score had a positive predicted value of 97% for predicting 100-day mortality, whereas that of APACHE III was 80%. If change in predictors is more accurate for predicting mortality, our classification tree would be much easier to use on a daily basis than calculating an APACHE III score.
Our study has some limitations. First, we used data from randomized controlled trials conducted in tertiary medical centers. Moreover, participants had to meet eligibility criteria to be enrolled, and excluded patients with chronic liver disease (Child-Pugh Class C) and patients with malignancy or any other irreversible condition for which six-month mortality was estimated to be 50% or greater. Since patients with these conditions are not represented in our cohort, we could not determine if these conditions were more or less important for predicting mortality than the predictors used to create our classification tree. Thus, the generalizability of our prediction model may be limited to patients at tertiary medical centers and to eligibility criteria of ARDS Network trials. Thus, our prediction model requires additional external validation. A second limitation of our study is sample size. When using tree-based methods, it is optimal to have a large number of observations. However, this is one of the largest cohorts of ALI patients and we included more participants than most other studies (8–17) that attempted to predict mortality in patients with ALI. Finally, APACHE has recently undergone a revision (APACHE IV) due to poor calibration of APACHE III; however, APACHE III is still widely used.
In conclusion, we created a simple classification rule that stratified patients according to risk of hospital mortality that performed as well as the APACHE III score. In addition, our classification tree is simple to use and contains risk factors for mortality that may be specific to patients with ALI. Therefore, it may be valuable to use for risk stratification and screening ALI patients for enrollment into clinical trials.
Funding: Supported by NHLBI Contracts NO1-HR-46054 through 46064 and NO1-HR 56165 through 56179 with the National Institutes of Health, National Heart, Lung, and Blood Institute. Lisa Brown was supported by a National Institutes of Health Training Grant (T32GM008258-21). Carolyn Calfee was supported by NHLBI grant HL090833 and by a grant of the Flight Attendant Medical Research Institute. Dr. Matthay was supported by NHLBI HL51856. William Checkley was supported by a Clinician Scientist Award of the Johns Hopkins University and a Pathway to Independence Award from the National Institutes of Health, National Heart, Lung, and Blood Institute (K99HL096955).
The authors would like to acknowledge Dr. Hector Corrada-Bravo, Assitant Professor, Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, and Dr. Jerome Friedman, Professor in the Department of Statistics, Stanford University, Palo Alto, CA, for helpful suggestions in the statistical analyses and software code.
Contributions: Conception and design (LB, WC); Analysis and interpretation: (LB, WC); Drafting the manuscript for important intellectual content (LB, CC, MM, TT, RB, WC),.
Dr. Calfee consulted as a member on the Medical Advisory Boards of Ikaria and GlaxoSmithKline. Dr. Thompson holds a National Institutes of Health contract to support ARDS Net Research. The remaining authors have not disclosed any potential conflicts of interest.