|Home | About | Journals | Submit | Contact Us | Français|
Avoidable hospital readmissions not only contribute to the high costs of healthcare in the US, but also have an impact on the quality of care for patients. Large scale adoption of Electronic Health Records (EHR) has created the opportunity to proactively identify patients with high risk of hospital readmission, and apply effective interventions to mitigate that risk. To that end, in the past, numerous machine-learning models have been employed to predict the risk of 30-day hospital readmission. However, the need for an accurate and real-time predictive model, suitable for hospital setting applications still exists. Here, using data from more than 300,000 hospital stays in California from Sutter Health’s EHR system, we built and tested an artificial neural network (NN) model based on Google’s TensorFlow library. Through comparison with other traditional and non-traditional models, we demonstrated that neural networks are great candidates to capture the complexity and interdependency of various data fields in EHRs. LACE, the current industry standard, showed a precision (PPV) of 0.20 in identifying high-risk patients in our database. In contrast, our NN model yielded a PPV of 0.24, which is a 20% improvement over LACE. Additionally, we discussed the predictive power of Social Determinants of Health (SDoH) data, and presented a simple cost analysis to assist hospitalists in implementing helpful and cost-effective post-discharge interventions.
Since the Affordable Care Act (ACA) was signed into law in 2010, hospital readmission rates have received increasing attention as both a metric for the quality of care and a savings opportunity for the American healthcare system . Per American Hospital Association, the national readmission rate finally fell to 17.5% in 2013 after holding at approximately 19% for several years . Hospital readmissions cost more than $17 billion annually . According to the Medicare Payment Advisory Committee (MedPAC), 76% of hospital readmissions are potentially avoidable .
In response, ACA has required the Center for Medicare and Medicaid Services (CMS) to reduce payments to hospitals with excess readmissions . These penalties should be put in the context of a larger shift in healthcare from the current fee-for-service payment model to a more patient-centered value-based payment model. Formation of Accountable Care Organizations (ACO) and CMS’ Quality Payment Program are examples of this trend that has created financial incentives for hospitals and care providers to address the readmission problem more systematically.
Before establishing targeted intervention programs, it is important to first identify those patients with a high risk of readmission. Fortunately, the widespread adoption of EHR systems has produced a vast amount of data that could help predict patients’ risk of future readmissions. Numerous attempts to build such predictive models have been made [6–12]. However, the majority of them suffer from at least one of the following shortcomings: (1) the model is not predictive enough compared to LACE , the industry-standard scoring model , (2) the model uses insurance claim data, which would not be available in a real-time clinical setting [6,7], (3) the model does not consider social determinants of health (SDoH) [13,8], which have proven to be predictive , (4) the model is limited to a particular medical condition, and thus, limited in scope [9,10].
To address these shortcomings, we built a model to predict all-cause 30-day readmission risk, and added block-level census data as proxies for social determinants of health. Additionally, instead of using insurance claims data, which could take up to a month to process, we built our model on the data available during the inpatient stay or at the time of discharge. Generally, using real-time EHR data allows models to be employed in hospital setting applications. Particularly, the authors are interested in applications of this predictive model in supporting data-driven post-discharge interventions to mitigate the risk of hospital readmission.
This study was conducted using health record data (without patient names) taken from 20 hospitals across Sutter Health, a large nonprofit hospital network serving Northern California. The Institutional Review Board (IRB) of Sutter Health (SH IRB # 2015.084EXP RDD) approved the study.
Electronic health records corresponding to 323,813 inpatient stays were extracted from Sutter Health’s EPIC electronic record system. Table 1 shows a summary of the population under study. We had access to all Sutter EHR data, beginning in 2009 and going through the end of 2015. Since many hospitals only recently completed their EHR integration, some 80% of the data comes from 2013–2015 (Fig 1). To ensure data consistency, we limited our hospitals of study to those with over 3,000 inpatient records and excluded Skilled Nursing and other specialty facilities. Fig 2 shows the total number of records for each hospital, and their respective readmission rates.
We studied all inpatient visits to all Sutter hospitals. Hospital transfers and elective admissions were excluded. With this method, a 30-day boolean readmission label was created for each hospital admission.
In the current version of their EHR system, Sutter Health captures a few SDoH data fields, such as history of alcohol and tobacco use. We supplemented those data with block-level 2010 census data  by matching patients’ addresses. The Google Geocoding API was used to determine the coordinates of each patient’s home address, and a spatial join was performed with the open-source QGIS platform  to find respective census tract and block IDs.
The data was transferred from Sutter to a HIPAA-compliant cloud service, where it was stored in a PostgreSQL database. An open-source framework , written in Python, was built to systematically extract features from the dataset. In total, 335,815 patient records with 1667 distinct features, comprising 15 feature sets, were extracted from the database, as summarized in Table 2.
Each type of feature (age, length of stay, etc) was independently studied using Jupyter Notebook, an interactive Python tool for data exploration and analysis. Using the pandas  library, we explored the quality and completeness of the data for each feature, identified quirks, and came to a holistic understanding of the feature, before using it in our models. Each feature-study notebook provided a readable document mixing code and results, allowing the research team to share findings with one another in a clear and technically reproducible way.
Initially, we experimented with several classic and modern classifiers, including logistic regression, random forests , and neural networks. In each case, a 5-fold cross validation, with 20% of the data kept hidden from the model, was performed. We found that the neural network models heavily outperformed other models in performance and recall, with the neural network model being about 10 times faster to train than the random forest model, the second best performing model. Therefore, we focused on optimizing the neural network model.
After evaluating a variety of neural network architectures, we found the best-performing model to be a two-layer neural network, containing one dense hidden layer with half the size of the input layer, and dropout nodes between all layers to prevent overfitting. Our model architecture can be seen in Fig 3. To train the neural network, we used the keras framework  on top of Google’s TensorFlow  algorithm. We trained in batches of 64 samples using the Adam optimizer , limiting our training to 5 epochs because we found that any further training tended to result in overfitting, as indicated by validation accuracy decreasing with each epoch while training loss continued to improve.
Initially, we trained the model on 1667 features extracted from the dataset. We then retrained the model using the top N features most correlated with 30-day readmission, for different values of N. As shown in Fig 4, the model achieved over 95% of the optimal precision when limited to the top 100 features, suggesting that 100 features is a reasonable cutoff for achieving near-optimal performance at a fraction of the training time and model size required for the full model. Table 3 summarizes the features most correlated with readmission risk.
Measuring a model’s performance cannot be completely separated from its intended use. While one metric, AUC, is designed to measure model behavior across the full range of possible uses, in practice risk models are only ever used to flag a minority patient population, and so the statistic is not fully relevant. Metrics like precision and recall require a yes/no intervention threshold before they can even be computed, something that we lack as this model is not slated for a specific clinical program. For simplification, we assumed the model would be used in an intervention on the 25% of patients with the highest predicted risk. We chose 25% because this is the fraction of patients that LACE naturally flags as high-risk, so we conservatively compare to LACE on its best terms. Additionally, we wanted to understand the predictive power of each set of features. To achieve that, we removed individual feature sets, one at a time, and compared the performance (in terms of AUC) with the best performing model.
Providers often want to focus their interventions on a specific patient population based on their age, geography or medical condition. Therefore, it is important to measure how well the model performs in each of those subpopulations. In addition, so far, CMS has penalized hospitals for excessive readmission of patients with heart failure (HF), chronic obstructive pulmonary disease (COPD), acute myocardial infarction (AMI), or pneumonia5. We compared the performance of our model against LACE in each of those subpopulations.
The main objective of this research study is to build and pilot a predictive model to accurately identify high-risk patients, and support the implementation of valuable and cost-effective post-discharge interventions. Therefore, a cost-saving analysis could assist decision makers to effectively plan and optimize hospital resources.
The optimal intervention threshold for maximizing cost savings depends on (1) the average cost of a readmission, (2) the expected cost of intervention(s), and (3) the expected effectiveness of intervention(s). Then, we can calculate the expected savings from each given intervention strategy as follows:
Table 4 compares the performance (assuming a 25% intervention rate) of our models and that of LACE when run on all data with 5-fold validation, using the metrics of precision (PPV), recall (sensitivity), and AUC (c-statistic).
Any model trained on present data will always perform slightly worse on future data, as the world changes and the model’s assumptions become less accurate. To evaluate performance on future data, we trained our best-performing model, the two-layer neural network, on all patients’ data with a hospitalization event prior to 2015, and measured the performance of the model in predicting 30-day readmissions in 2015. As seen in Table 5, a slight performance reduction in precision (from 24% to 23%), relative to the model’s performance on all data, is observed.
Fig 5 compares our model with LACE in four different age brackets. From this graph, the discriminatory power of the model decreases in older patients. However, it still outperforms LACE (+0.02 precision, +0.11 recall). Fig 6 compares the performance of the model in the top five Sutter Health hospitals by number of inpatient records. As seen in this graph, performance varies depending on the hospital location and the population it serves. Lastly, Fig 7 compares our model’s performance among subgroups with varying medical conditions. While the result suggests that the model performs slightly worse in those conditions, it is still superior to LACE (+ 0.03–0.05 precision, + 0.02–0.12 recall).
Due to the nonlinear relationship of different feature sets, it is virtually impossible to calculate the absolute contribution of individual feature sets on the model. However, we can approximate their effect by measuring the model performance using all feature sets except one. The result of this experiment is shown in Table 6. As seen in this table, removing any single feature set, except Medications, Utilization or Vitals, does not have a significant effect on the model performance.
For the cost savings analysis, while the actual values may be difficult (or, in some cases, even impossible) to predict, we will use the following values as an example: Readmission cost: $5000, Intervention Cost: $250, Intervention success rate: 20%.
Fig 8 shows the projected saving values as a function of the intervention rate (percentage of patients subjected to readmission-prevention interventions).
The factors behind hospital readmission are numerous, complex and interdependent. Although some factors, such as prior utilization, comorbidities, and age, are very predictive by themselves, improving the predictive power beyond LACE requires models that capture the interdependencies and non-linearity of those factors more efficiently. Artificial neural networks (ANN), by modeling nonlinear interactions between factors, provide an opportunity to capture those complexities. This nonlinear nature of ANNs enables us to harness more predicitive power from the additional extracted EHR data fields beyond LACE’s four parameters.
Furthermore, neural networks are compact and can be incrementally retrained on new data to avoid the “model drift” that occurs when a model trained on data too far back in the past performs progressively worse on future data that follows a different pattern.
The TensorFlow framework provides several added benefits for training a readmission model. First, TensorFlow can run in a variety of environments, whether on CPUs, GPUs, or distributed clusters. This means that the same kind of model can be trained in a variety of different hospital IT architectures, and achieve optimal performance in each. Secondly, with the aid of high-level interfaces, such as keras, TensorFlow can model neural network architectures in a very natural way. This enabled us to quickly experiment with different neural network setups to find the ideal configuration for the problem. Finally, TensorFlow is an actively maintained open-source project, and its performance improves continually through contributions from the open-source machine-learning community.
A fair comparison of our model with results in existing literature is not feasible, because the performance of readmission risk models varies tremendously between different patient populations, and no previous readmission prediction work has been done on the Sutter Health patient population. Even the LACE model’s performance varies in the literature from 0.596 AUC  to 0.684 AUC , which illustrates the impact of patient population on the accuracy of readmission prediction.
The performance of our model (as measured by precision, recall, and AUC) within patient subgroups tends to be worse than the performance of the same model within the whole patient population. Some of this performance drop can be explained by the fact that each subgroup represents a reduced feature set to our model—for example, age is no longer as predictive a feature to when every patient in a subgroup has a similar age. Furthermore, our model tends to perform on subgroups that LACE also has the worst performance on, such as patients aged 85+ (Fig 5) or patients with heart failure (Fig 7), suggesting that certain patient subpopulations have significantly less predictable readmission patterns than the general patient population.
We used two sources of SDoH features: health history questions (regarding tobacco, alcohol, and drug use) and block-level census data based on patient address. The health history features had some predictive value, two of them (“no alcohol use” and “quit smoking”) being in the top 100 features most linearly correlated with readmission risk. However, the census features were less predictive, with no features in the top 100 and only a few in the top 200 (such as poverty rate and household income). Both feature sources suffered from drawbacks: the health surveys were both brief and incomplete for ~25% of patients, while the block-level census data only provided information about a patient’s neighborhood but not about the patient themselves. For SDoH features to provide significant predictive value, they would have to be both comprehensive and individualized.
Since this study was conducted on EHR data from Sutter Health network of hospitals in California, it does not capture potential out-of-network hospital readmissions. To address this limitation, the dataset could be supplemented by state or national index hospital admissions to build a more comprehensive dataset.
In this study, we successfully trained and tested a neural network model to predict the risk of patients’ rehospitalization within 30 days of their discharge. This model has several advantages over LACE, the current industry standard, and other proposed models in the literature including (1) significantly better performance in predicting the readmission risk, (2) being based on real-time data from EHR, and thus applicable at the time discharge from hospital, and (3) being compact and immune to model drift. Furthermore, to determine the classifier’s labeling threshold, we suggested a simple cost-saving optimization analysis.
Further research is required to study the effect of more granular and structured social determinants of health data on the model’s predictive power. Some studies  have shown that natural language processing (NLP) techniques could be used to extract SDoH data from patient’s case notes. However, the most systematic method is to gather such data from SDoH screeners. Currently, multiple initiatives  are underway to standardize SDoH screeners, and integrate them into EHR systems.
The importance of reducing hospital readmissions, and therefore risk assessment, is likely to only grow in importance in the years to come. We believe that predictive analytics in general, and modern machine-learning techniques in particular, are powerful tools that have to be fully exploited in this field.
The neural network model described in the paper, as well as the code to run it on EMR data, is available (under the Apache license) at https://github.com/bayesimpact/readmission-risk.
The authors would like to thank the Robert Wood Johnson Foundation (RWJF) for supporting this research study. The Research, Development and Dissemination (RD&D) team at Sutter Health was instrumental in providing the required datasets and clinical expertise. In addition, Dr. Andrew Auerbach from University of California San Francisco, and Dr. Aravind Mani from California Pacific Medical Center provided valuable advisory support and guidance over the course of the project.
Robert Wood Johnson Foundation (www.rwjf.org) funded this study through a grant. Eric Liu was awarded grant #33676. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Data are available from the Sutter Health Institutional Review Board for researchers who meet the criteria for access to confidential data. Through an internal review board process (SH IRB # 2015.084EXP RDD), Sutter Health provided Bayes Impact with its patients anonymized electronic medical records. Due to the existence of personally identifiable information (PII) in the dataset, the authors cannot release this dataset. Interested researchers could contact Sylvia Sudat (gro.htlaehrettus@sretuek), a senior biostatistician, or Nicole Kwon (gro.htlaehrettus@1NnowK), a Research Project Manager at Research, Development & Dissemination of Sutter Health for more information and access requirements.