|Home | About | Journals | Submit | Contact Us | Français|
Objective Hospitals are challenged to provide timely patient care while maintaining high resource utilization. This has prompted hospital initiatives to increase patient flow and minimize nonvalue added care time. Real-time demand capacity management (RTDC) is one such initiative whereby clinicians convene each morning to predict patients able to leave the same day and prioritize their remaining tasks for early discharge. Our objective is to automate and improve these discharge predictions by applying supervised machine learning methods to readily available health information.
Materials and Methods The authors use supervised machine learning methods to predict patients’ likelihood of discharge by 2p.m. and by midnight each day for an inpatient medical unit. Using data collected over 8000 patient stays and 20000 patient days, the predictive performance of the model is compared to clinicians using sensitivity, specificity, Youden’s Index (i.e., sensitivity + specificity – 1), and aggregate accuracy measures.
Results The model compared to clinician predictions demonstrated significantly higher sensitivity (P<.01), lower specificity (P<.01), and a comparable Youden Index (P>.10). Early discharges were less predictable than midnight discharges. The model was more accurate than clinicians in predicting the total number of daily discharges and capable of ranking patients closest to future discharge.
Conclusions There is potential to use readily available health information to predict daily patient discharges with accuracies comparable to clinician predictions. This approach may be used to automate and support daily RTDC predictions aimed at improving patient flow.
Hospitals and clinician objectives require balance in treating each patient’s condition effectively while efficiently distributing healthcare resources to patient populations over time.1 A key determinant of hospital capacity and resource management is linked to patient flow, a common indicator of patient safety, satisfaction, and access.2–5 Optimal patient flow facilitates beneficial treatment, minimal waiting, minimal exposure to risks associated with hospitalization, and efficient use of resources (e.g., of beds, clinical staff, and medical equipment). Patient flow is also a determinant of access to specialized inpatient services. These patients request admission from external sources (e.g., other health services) and internal sources such as the emergency department (ED), procedural areas, or peri-anesthesia care units (PACUs).6,7
Evidence supporting the benefits of improved flow has been mounting.8 For patients, prolonged hospital stays increase the risk of adverse events, such as hospital-acquired infections, adverse drug events, poor nutritional levels, and other complications.9–12 Extended stays have also been associated with poor patient satisfaction.11,13–17 For hospitals, economic pressures to deliver efficient and accessible care are at unprecedented highs. Healthcare costs as a percentage of gross domestic product (GDP) (17.9% in 2012) have been rising faster than anticipated,18 and approximately 30–40% of these expenditures have been attributed to “overuse, underuse, misuse, duplication, system failures, unnecessary repetition, poor communication, and inefficiency.”19 These factors impede patient flow, prolong patient stays, and increase the cost of care per patient.14,20–22
Patient flow, and by association, bed and capacity management, is a common focus area for operations management methods applied to healthcare. Discrete-event simulation, optimization, and Lean Six Sigma approaches have been applied successfully in various settings to improve patient flow by either redesigning care delivery processes or more efficiently matching staff and other resources (e.g., operating rooms, medical equipment) to demand.23–35 These patient flow evaluations are data-driven and inform long-term operational decision-making. Outcomes of these studies include improved patient and staff scheduling strategies, new bed management policies, and reduction in care process variability.
A more recent advancement in patient flow management alternatively focuses on short-term operational decisions. Real-time demand capacity management (RTDC) is a new method developed by the Institute for Healthcare Improvement that has shown promising but variable results when pilot tested in hospitals. The RTDC process involves 4 steps: 1) predicting capacity, 2) predicting demand, 3) developing a plan, and 4) evaluating the plan.36 The RTDC process centers around a morning clinician huddle to predict which and how many patients will be discharged that same day. Given daily predictions for demand, the group then attempts to match their supply of beds to the demand from new patients (i.e., admissions) by prioritizing current patients able to be discharged. RTDC implementation signifies a culture change as the hospital staff dedicates time each morning to coordinate and focus on patient flow. The developers of RTDC have demonstrated that this approach, after an initial learning period, may reduce key indicators of patient flow across hospitals. These indicators include ED boarding time, patients who left without being seen, and overnight holds in the PACU.
RTDC does have limitations that we strive to address in this work. First, RTDC requires a daily clinician huddle that demands dedicated time that may be expedited or even eliminated through automated prediction. Second, the prediction process is subjective and thus susceptible to high variability. In addition, the most successful implementations of RTDC have occurred in surgical units, where patient conditions and clinical pathways are commonly well prescribed.37,38 Thus, previous results on RTDC predictive performance for surgical patients may not be generalized to patients in medical units where clinical pathways are less defined and length of stay (LOS) is more variable.
Our objective for this study is to support automation of the RTDC process by producing daily predictions of patient discharge times for a single inpatient medical unit. We hypothesize that a predictive model using readily available health information may perform comparable or better than predictions made by clinicians during their daily huddle. We apply supervised machine learning methods to predict the probability of patient discharge by 2p.m. and by the end of each day (i.e., midnight). By discharging a patient earlier in the day (e.g., prior to 2p.m.), there is an increased likelihood of admitting a new patient to the same bed, thereby facilitating increased bed utilization and patient flow.36
Forecasting models have been leveraged to predict hospital occupancy, patient arrivals and discharges, and other unit-specific operational metrics. These models have been derived from variations of autoregressive moving average approaches,39–44 exponential smoothing,39,40,42 Poisson regression,45 neural networks,42 and discrete-event simulation methods.46–48 More recent studies have employed logistic regression49 and survival analysis50 methods to predict patient LOS. Through accurate prediction of patients’ LOS, clinical staff may efficiently schedule future appointments or admissions to avoid backlog or denials.44,51–53 Our approach builds on this research by focusing on short-term predictions (i.e., daily discharges) for patients.
Tree-based supervised machine learning algorithms have been applied in healthcare to 1) predict a continuous-valued outcome or 2) classify patients into one or more clinical subgroups. Example applications for the former include using linear regression or regression trees to predict the range of motion for orthopedic patients,54 costs,55,56 and utilization.57 An example of the latter typically involves logistic regression or classification trees, and has been used to differentiate between benign and malignant tumors,58 identify patients most likely to benefit from screening procedures,59,60 identify high risk patients,61 and predict specific clinical outcomes.62–64 Recently, more advanced machine learning techniques such as bagging, boosting, random forests, and support vector machines have been applied to healthcare problems such as classifying heart failure patients65 and predicting healthcare costs.66 We build upon this research by adding a new application area for these methods and leveraging the benefits of ensemble learning techniques such as bagging and boosting.
The study was conducted within a single, 36-bed medical unit in a large, mid-Atlantic academic medical center serving an urban population. The unit is staffed with hospitalist physicians with no teaching responsibilities. Patient flow data (i.e., admission and discharge times), demographics, and basic admission diagnoses data was collected for 9636 patient visits over a 34-month study period from January 1, 2011 to November 1, 2013. After excluding incomplete and erroneous records, we retained data for 8852 patient visits and converted these visits to N=20243 individual patient days. These data are summarized in Table 1.
These data are standardized and readily accessible in most hospital information systems, thus reproducible in other hospitals. The demographic and clinical predictors are static model inputs that are known at the time of admission and do not change during a patient’s stay. Other predictors such as patient census, day of the week, and elapsed length of stay are dynamic and are continuously updated during a patient’s stay. Elapsed length of stay, age, and patient census are numerical variables, whereas the remaining predictors are modeled using binary indicator variables (i.e., 0 or 1 indicating the absence or presence of a specific category).
The reason for visit is determined according to the International Classification of Diseases-9 diagnoses structure.67 A physician identifies this condition (via an electronic pick list) at the time of admission to the unit, thereby documenting the primary reason for hospitalization. Observation status is assigned to patients who are cared for within the unit and must be monitored and evaluated before they are eligible for safe discharge.68 Administratively, hospitals may use this designation to bill Medicare for the patient under the outpatient service category. However, a large study by the Department of Health and Human Services found that observation patients have the same health conditions as those who are fully admitted.69
In addition to the patient data listed in Table 1, a novel aspect of this study is our collection of clinician predictions of patients to be discharged at 2p.m. and midnight. We recorded predictions from daily morning huddles for 8 overlapping months between March 18, 2013 and November 1, 2013. The prediction team was comprised of a charge nurse, case manager, and physicians. Members of this team were directly involved with the administrative or clinical aspects of a subset of patients in the unit each day. These team members had access to substantially more information for their predictions than what was accessible to our prediction models. This unique data facilitated a comparative study between the machine-learning and clinician predictions. In order to compare performance directly to the RTDC process, the machine-learning models were designed to produce predictions based on data available at 7a.m. each day, which is analogous to the clinician huddle times.
We applied and evaluated several supervised machine learning algorithms to predict patient discharge and compare with clinician predictions. These algorithms are considered ‘supervised’ because they are fit to labeled training data (i.e., the outcomes are known in retrospect) and then independently evaluated on separate labeled test data to estimate their performance in practice. For each patient day, these algorithms produced two predictions representing the probability of a patient being discharged by either 2p.m. or midnight that same day. These predictions were generated for every patient in the unit for each day of their stay, which simulated the clinician-based RTDC prediction process. This model is designed to support real-time predictions of expected bed capacity and could be adapted to predict patient discharges for any time interval (e.g., hourly instead of specifically 2p.m. and the end of the day).
Systematic experiments applying common supervised machine-learning methods to predict individual patient discharges were performed. These methods included logistic regression (i.e., reference method), classification and regression trees, and tree-based ensemble learning methods. Model parameters and thresholds were tuned for optimal performance with respect to the predictive measures described in the following section.70 We compared the results among these predictive methods, and then selected the best method to compare to the clinician predictions.
Logistic regression is a commonly used classification method for clinical applications, and it is an effective approach for providing a baseline for how effectively the available data can be leveraged to predict the primary outcome. However, logistic regression is sometimes limited for predictive applications, especially with large, highly dimensional data where interactions may exist.70 Despite this, logistic regression is a well-established method that will serve as a baseline for comparison with more robust supervised machine learning algorithms.
Next, we applied tree-based methods to predict patient discharge outcomes, which iteratively partition the data into groups with similar characteristics and outcomes.71–73 These methods are adept at recognizing predictor variable interactions and identifying sub-cohorts of patients that are more likely to have a positive outcome.71 Tree-based models have the distinct advantage of providing more practical insight than regression-based methods. The highest levels of these trees can be translated into effective decision rules that may be interpreted by clinicians, and embedded within existing clinical information technology infrastructure to facilitate implementation and uptake in practice.68
Figure 1 shows a visualization of the classification tree for end-of-day discharge predictions, which provides an example of how our results could be shared with clinicians. In this figure, we observe that observation status is the most critical predictor relative to predicting the outcome. Patients who are not on observation status (i.e., an intermittent care status to evaluate whether they need to be admitted) tend to stay, regardless of any of their other conditions. Patients who are on observation status are more likely to be discharged only when their elapsed LOS exceeds approximately 12h or if they reported chest pain as their chief complaint.
Despite these benefits, simple tree-based learning methods have the potential to over-fit training data – causing the in-sample performance to exceed the out-of-sample predictive performance, which occurs when the model is overly complex and mistakes noise for key underlying relationships.72,74 Tree-based ensemble learning methods, such as bagging and boosting, address over-fitting by training large numbers of “weak learners” and leveraging the diversity across learners (i.e., individual trees) to produce stable out-of-sample predictions. Diverse trees are aggregated in some form (e.g., by selecting majority votes or averaging) to classify or estimate risk for an outcome.72,74–76 Bagging utilizes a bootstrapping process (i.e., sampling with replacement) to train numerous trees to produce model-averaged predictions.74 The random forest is a common bagging method that increases diversity across each individual tree.72,75,76 The random forest approach also facilitates evaluation of weak learners by selecting a random subset of predictors at each candidate split in the learning process for predictor variable importance in classifying outcomes. Boosting implements optimization algorithms to re-weight misclassified observations in an attempt to improve overall prediction accuracy.72
Our ultimate goal was to implement the most parsimonious model with high accuracy that utilizes data that can be automatically extracted from electronic medical record systems. Predictor variables (see Table 1) for each patient day (collected at 7a.m.) were linked to binary outcomes indicating whether the patient was discharged by 2p.m. or the end of day, respectively. We trained each model on data collected over 26.6 months from the start of the study (January 1, 2011) until the date when the clinician predictions began (18 March, 2013). Model predictions were generated for the following 9.4 months (19 March 2013 to 31 December, 2013) out-of-sample; purposefully overlapping with the same period clinician predictions were collected. This facilitated cross-validation (78% training set and 22% testing set) of the model and direct comparison to the clinician predictions. Predictive performance was then estimated for model-based and clinician predictions using binary classification statistics. Measures from the confusion matrix (i.e., true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN)) were used to calculate:
These measures, along with the number of positive predictions (i.e., discharges), captured different aspects of model performance. Youden’s Index77 is a global accuracy measure whereas sensitivity and specificity capture how well the predictions perform with respect to a specific outcome (i.e., discharge or stay). Youden’s Index is a common metric used to evaluate the performance of diagnostic tests,78–80 and similar to these studies, we used it to optimize the cutoff thresholds (i.e., to determine a discharge or stay prediction) for each algorithm. We also calculated these metrics for near future outcomes (i.e., outcomes for the next time period). For example, how many patients predicted to be discharged by 2p.m. were discharged by the end of the day? Similarly, how many patients predicted to be discharged by the end of the day were discharged by the end of the next day? These measures include correct predictions but also incorrect predictions that were correct within one day, which provides some insight as to the magnitude of predictive errors.
We conducted hypothesis testing (α=0.05) to detect statistically significant differences between the predictive capabilities for the model and the clinicians. Each of these tests can be applied readily to paired samples, which is required for comparing predictions for the same patient day(s) We used McNemar’s test with Yates correction for continuity81–82 to test the null hypothesis H0: pmodel=pclinicians, where p represents the proportion of correct predictions relative to the total number of positive (sensitivity) or negative (specificity) outcomes. For Youden’s Index J, we apply the method devised in Chen et al.83 for paired samples to test the null hypothesis H0: Jmodel=Jclinicians. For both tests, we used two-sided alternative hypotheses.
In addition, we evaluated the results in two additional measures that are relevant to model implementation. First, we aggregated (i.e., summed) the number of expected daily discharges from the model and clinician predictions, and then compared these results to the actual number of discharged patients for each day using paired hypothesis tests at the α=0.05 level. This aggregate measure is useful for proactive patient flow management by anticipating available (i.e., unoccupied) beds in advance and facilitating accept/reject admission decisions related to capacity constraints. Separately, we used model predictions to rank the patients in order of their likelihood for discharge each day. Spearman rank correlation coefficients were computed to evaluate the accuracy of these rankings with observed patients remaining length of stay.84 This latter approach provided some indication as to whether this model could be used to accurately prioritize patients for discharge.
We applied tree-based supervised machine learning methods to predict discharge by 2p.m. and the end of the day for each patient day. The regression random forest (RRF) method proved most accurate from systematic experiments of several tree-based algorithms with parameters tuned for optimal predictive performance. Figure 2, summarizing variable importance from the training of the RRF models, provides evidence that elapsed LOS and observation status are substantially more important than all of the other predictors, followed by Sunday, chest pain, disposition, age, and syncope. Predictors such as gender, ethnicity, weekdays (Monday through Thursday), and less common reasons for visit (e.g., abdominal pain, COPD, congestive heart failure) had little to no predictive power for this patient population.
We compared the RRF model predictions to the clinician predictions in order to evaluate the potential of this approach in practice. We have also included comparisons to logistic regression as a reference to better understand the efficacy of the ensemble learning approach. Results for both 2p.m. and end-of-day outcomes are summarized in Table 2 for the specific days when clinician predictions were available (19 March, 2013 to 31 December, 2013), which included 4833 patient days.
Overall, the logistic regression and RRF model were more aggressive in predicting discharge than the clinicians for both the 2p.m. and end-of-day outcomes. This resulted in the automated models predicting discharges with higher sensitivity (P<.01) and lower specificity (P<.01) compared to the clinicians. Thus the models predicted a higher proportion of discharges, but at the cost of producing more false positives. The logistic regression baseline model consistently demonstrated more sensitive behavior than the RRF model.
Despite differences in sensitivity and specificity, the RRF model and clinicians predictive performance were comparable for Youden’s Index, our global accuracy measure. There were no significant differences for same-day predictions (2p.m. RRF model: 26.0%, clinicians 26.5%, P=.81; end-of-day RRF model: 34.0%, clinicians 34.0%, P=.84) or the near-future end-of-day outcome (RRF model: 26.6%, clinicians 25.2%, P=.62). However, the RRF model did perform significantly better for the near-future 2p.m. outcome (RRF model: 31.3%, clinicians 19.0%, P<.01). Predictions across models and clinicians were more accurate for the end of the day than for 2p.m.
We also compared the aggregated RRF model and clinician predictions for the total number of discharges each day. For the model predictions, we summed the predicted probabilities across all patients to calculate the expected value of discharged patients. The results for 2p.m. and the end of day for the overlapping date range are summarized in Figures 3 and and4,4, which show that the model outperforms clinician estimates of the average number of patients to be discharged early and by the end of the day.
In Figure 3, we plot the distribution of the actual number of discharges per day against the distributions for the RRF model and clinician predictions for each outcome. The actual number of discharges per day averaged 2.36 by 2p.m. and 8.29 patients by the end of the day. Discharges predicted each day by the RRF model were 2.45 (P=.37) and 8.51 (P=.13), respectively, demonstrating no statistically significant difference from the actual number of discharges. In comparison, clinician predictions were accurate for 2p.m. discharges (2.16, P=.19), but significantly deviated from the actual number of discharges for the end of the day (6.54, P<.01). The model predicted the total number of discharges within 2 patients 82% of days for 2 p.m. and 105 63% of days for the end of the day, whereas the clinicians predicted the same for 52% and 32% of days, respectively (see Figure 4). In addition, large residuals (i.e., errors of 5 discharges or more) were much more frequent for the clinician predictions.
In practice, our prediction model could be used to rank patients daily—based on their likelihood of being discharged—in order to prioritize the remaining tasks for the most likely patients. We computed Spearman’s rank correlation between the remaining LOS for each patient and their respective RRF model predicted probabilities for each day (trained on the full data set). Figure 5 shows a histogram of these correlations (in intervals of 0.05) for the 2p.m. and end-of-day models. The mean rank correlation for 2p.m. was 0.4816 and for the end of the day was 0.4489. Both plots show that the correlations are almost exclusively positive and moderately large, which suggests that the rank of the model predictions was moderately correlated with the actual discharge order. For some days, the model nearly predicted the exact order in which the patients were discharged.
Improving patient flow continues to be a top priority in the acute-care setting, where patients with longer lengths of stay are less satisfied and exposed to the risk of adverse events (e.g., hospital-acquired infections, complications). Hospitals have aligned incentives to improve patient flow because of the rising demand for services and the economic pressures to reduce costs and improve resource utilization. Our approach is designed to empower clinicians and hospital administrators with analytical tools to increase their collective efficiency.
Our approach could be operationalized in three distinct ways. First, it could be used to identify individual patients who are most likely to be discharged on a given day. Hospital staff could prioritize these patients in order to discharge them as early as possible—without negatively affecting their care—so that other patients can be admitted in their place. Supervised machine learning methods may be used to rank patients concurrently in a hospital (or specific unit) according to their discharge probabilities. We have shown that our models perform well for prediction or ranking. Alternatively, patients who are most likely to be discharged may not be impacted significantly by an effort to prioritize their remaining tasks. Therefore, another approach could be to identify a second tier of patients who are only mildly to moderately likely to be discharged. Prioritizing the remaining tasks for these patients may have a more significant global impact on the number of patients discharged over a given time period. Setting a range of predicted probabilities immediately below the classification threshold would identify these patients. The last approach is to aggregate the predicted discharge probabilities into daily discharge predictions. This approach does not facilitate prioritization for patients likely to be discharged, but it supports bed capacity planning for the unit. We have shown that these types of predictions are very accurate and would provide the staff with a good idea of how many in-use beds are likely to be available by a specific time of day.
There are three important limitations to highlight for this study. First, the performance of the models may improve with a larger data set, in terms of the number of patient days used to train the models and also with respect to being collected from multiple sites. Increasing the size of the training data set would improve our ability to detect more complex patterns in patient length of stay. Similarly, the patterns detected in our training data set may not be generalizable to other hospitals or units. Our intention is for each hospital and/or unit to replicate our prediction model and train it on its own data. Finally, we compared the performance of continuous (i.e., probability-based) predictions from our models to binary predictions (i.e., exit or stay) from the clinicians. Ideally, the fairest comparison would be between continuous predictions from both methods, however these predictions would be difficult to generate and collect in practice.
We applied supervised machine learning algorithms to readily available health information to predict daily discharge outcomes as part of the RTDC process. We directly compared model predictions to clinician predictions using several performance metrics. The model predicted discharges with higher sensitivity and lower specificity compared to the clinicians, and the two methods were comparable (i.e., not statistically significantly different) for our global accuracy measure (Youden’s Index). However, the model did outperform the clinicians for some near-future and aggregate prediction metrics. Thus there is high potential for these models to automate and expedite the RTDC prediction process, thereby eliminating the need for daily clinician huddles or supporting more accurate clinician predictions. Furthermore, these models were applied to simple and readily accessible information that can be quite powerful, and easily replicated across acute-care environments using electronic information systems.
This work was supported in part by the National Science Foundation grant number 0927207.
S.B., S.L., and E.H. made significant contributions to the conception and design of the work. M.T., E.H., and S.L. made significant contributions to the acquisition and processing of the data used for the analysis. S.B., S.L., and S.S. made significant contributions to the analysis and interpretation of the data for the work. All authors have contributed to either drafting or revising the article, have approved the version to be published, and agree to be accountable for all aspects of the work.