Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Pain Med. Author manuscript; available in PMC 2016 July 1.
Published in final edited form as:
PMCID: PMC4504764

Teaching a Machine to Feel Postoperative Pain: Combining High-Dimensional Clinical Data with Machine Learning Algorithms to Forecast Acute Postoperative Pain

Patrick J. Tighe, MD, MS,* Christopher A. Harle, PhD, Robert W. Hurley, MD, PhD,* Haldun Aytug, PhD,# Andre P. Boezaart, MD, PhD,*§ and Roger B. Fillingim, PhD**



Given their ability to process highly dimensional datasets with hundreds of variables, machine learning algorithms may offer one solution to the vexing challenge of predicting postoperative pain.


Here, we report on the application of machine learning algorithms to predict postoperative pain outcomes in a retrospective cohort of 8071 surgical patients using 796 clinical variables. Five algorithms were compared in terms of their ability to forecast moderate to severe postoperative pain: Least Absolute Shrinkage and Selection Operator (LASSO), gradient-boosted decision tree, support vector machine, neural network, and k-nearest neighbor, with logistic regression included for baseline comparison.


In forecasting moderate to severe postoperative pain for postoperative day (POD) 1, the LASSO algorithm, using all 796 variables, had the highest accuracy with an area under the receiver-operating curve (ROC) of 0.704. Next, the gradient-boosted decision tree had an ROC of 0.665 and the k-nearest neighbor algorithm had an ROC of 0.643. For POD 3, the LASSO algorithm, using all variables, again had the highest accuracy, with an ROC of 0.727. Logistic regression had a lower ROC of 0.5 for predicting pain outcomes on POD 1 and 3.


Machine learning algorithms, when combined with complex and heterogeneous data from electronic medical record systems, can forecast acute postoperative pain outcomes with accuracies similar to methods that rely only on variables specifically collected for pain outcome prediction.

Keywords: Machine Learning, Algorithm, Postoperative Pain, Pain Prediction


Over 60% of surgical patients suffer from moderate to severe acute postoperative pain, and this pain has been associated with the development of chronic postsurgical pain [1,2]. Mounting evidence points to the importance of establishing preemptive, and even preventative, analgesia whenever possible before the onset of surgical stimulus [3,4]. However, many preemptive and preventative analgesic interventions can carry considerable side effects, such as bleeding or major adverse cardiac events with nonsteroidal anti-inflammatories and sedation with gabapentinoids. Therefore, the ability to predict which patients are more likely to suffer from moderate to severe acute postoperative pain would permit targeting of perioperative analgesic therapies in a manner that optimizes the risk to benefit ratio.

Accurate postoperative pain prediction has been the topic of research for over a century [5]. Although previous efforts that used logistic regression have highlighted potential risk factors for severe postoperative pain, these approaches are limited [6,7]. For instance, logistic regression approaches are unable to incorporate the rapidly expanding set of available clinical data, let alone the genetic, proteomic, and metabolomic data expected to be available for clinical decision support systems in the near future [613]. Pragmatically, such approaches also require regular updating to remain relevant to modern practice. Thus, new methods are needed that incorporate the potential predictive power of the myriad data elements being routinely collected. Moreover, new methods are needed that can automatically select the most useful variables and develop and validate prediction algorithms to stay current with current clinical practice.

Machine learning classifiers are algorithms that can autonomously integrate and learn from complex datasets with many hundreds of variables. Therefore, machine learning classifiers may offer a solution to the vexing challenge of predicting postoperative pain [14]. These algorithms employ a variety of mathematical approaches and are often more computationally efficient and accurate when using very large datasets with complex distributions that do not conform to the assumptions of parametric methods like logistic regression [1518]. Machine learning classifiers have already been successfully applied to many prediction problems, including crime prevention, handwriting recognition, fraud detection, and email spam filtering [1922]. Furthermore, the recent focus on the adoption and meaningful use of electronic medical records (EMR) has led to massive clinical datasets comprising variables collected by healthcare providers during the course of a patient’s hospitalization [2325]. Machine learning approaches have the potential to leverage this clinical “Big Data” to create more accurate and automated predictions of postoperative pain.

Here, we explore the application of machine learning algorithms to analyzing the highly complex data available in the preoperative period to accurately predict acute postoperative pain. The primary goal was to test the feasibility of an automated machine learning process to collect, prepare, and classify preoperative patient data from an EMR and determine whether a patient was at risk for moderate to severe postoperative pain. The secondary goal was to determine the proportion of at-risk patients that could be reliably identified with machine learning algorithms. Together, these aims lay a foundation for the future incorporation of highly complex clinical features into a clinical decision support system that predicts which patients are at risk of postoperative pain and guides clinicians toward the safest and most effective preemptive and preventative analgesic interventions.

Materials and Methods

Study Design

This study was approved by the Institutional Review Board (IRB 354-2012) at the University of Florida and was a retrospective cohort study of surgical patients undergoing non-obstetric, non-ambulatory surgical procedures over a 1-year time period from May 2011 to May 2012 at a large tertiary-care teaching hospital.

Description of Dataset

Surgical case data were obtained from the University of Florida’s Integrated Data Repository, which is a large database of patient demographic characteristics and care data obtained from the university health system’s (UF Health) EMR system. Subjects were patients aged 21 and over undergoing non-ambulatory surgery at UF Health over a 1-year period beginning May 2011. Surgical case exclusion criteria included obstetric surgery, as well as patients who received multiple separate surgeries within the study period to avoid contamination of pain scores from the effects of surgeries preceding or following the case of interest. Results were reported in accordance with the STROBE criteria for cohort studies. (

Description of Outcomes

All pain scores were documented by clinical staff using the numeric rating scale (NRS) on an 11-point system ranging from 0 to 10, where zero represents no pain and 10 the worst pain imaginable. Pain scores were recorded every 4 h per nursing protocol, with a repeat query within 1 h after administration of analgesic medications for breakthrough pain. When the clinical staff documented a pain score as “patient asleep,” the pain score was converted to a missing value rather than 0/10 to account for the fact that some patients had received additional sedatives that may have facilitated sleep despite ongoing pain. All pain scores were recorded with a corresponding date/time stamp, as were the start and end times of the related surgical procedure. End of surgery times generally reflected the closure of skin and emergence from anesthesia.

Two outcomes were defined: the presence or absence of a moderate (NRS score of 4–6) to severe (NRS score of 7–10) maximum pain score on POD 1 and on POD 3. POD 1 and 3 were selected to address challenges with the early adaptive response of the healthcare system to address patient needs on POD1, as well as patients with refractory pain on POD3, despite the theoretical escalation of pain therapies for at least 48 h after surgery [26].

Description of Variables

Predictions were rendered based on 796 variables (Figure 1). This compares to the use of only 24 or fewer variables in previous work [26,27]. Demographic data included age, gender, body mass index, ethnicity, insurance/payer, and marital status. Binary variables were defined based on the presence or absence of home use of opioids, non-steroidal anti-inflammatory drugs (NSAIDs), muscle relaxants, benzodiazepines, and amine reuptake inhibitors. Medications were extracted using the World Health Organization pharmaceutical ontology (

Figure 1
Loading of variables into machine learning classifier pipeline. Variables were included using a staged approach for demographics, comorbidities, home medications, surgical procedure, and the circumstances of surgery.

Patient comorbidity data were prepared by first extracting up to 50 comorbid diagnoses per patient. Diagnoses were recorded using the International Classification of Disease, 9th edition, Clinical Modification (ICD-9-CM). Each diagnostic code was also associated with a “present on admission” flag, denoting that the diagnosis was explicitly documented as a diagnosis occurring prior to hospital admission. Also, the ICD-9-CM codes were then converted into a Charlson Comorbidity Index [28]. Separate from the Charlson Comorbidity Index, the total number of comorbid conditions was also calculated. Next, comorbid diagnoses were included in the analysis using 30 binary variables. These categorical variables were defined by the presence or absence of 1 of 30 predefined Agency for Healthcare Research and Quality (AHRQ) comorbidity codes ( Additionally, a parallel and corresponding variable was assigned to each comorbid diagnosis. Each ICD-9-CM diagnosis was recoded as Clinical Categorization Software for Services and Procedures (CCS) diagnosis according to the CCS system ( Finally, for each of the 288 separate CCS diagnoses, the presence or absence of the diagnosis was arrayed as a binary variable, irrespective of order of entry. Ultimately, an array of 48,787 variables pertaining to established comorbidities was loaded into the machine learning process.

The identities of the surgeon, anesthesiologist, nurse, time of surgery (day of week, weekday versus weekend, normal versus off-hours), postoperative admission versus inpatient status, nerve block status, and emergent versus elective status of the procedure were included and organized into 16 separate variables used to describe the circumstances of the surgery. Types of surgery were identified using current procedural terminology (CPT) codes published by the American Medical Association. Up to 10 CPT codes were included for each patient, and a count of the number of concurrent CPT codes was also included as a covariate. Given the large number of CPT codes, surgeries were grouped into 245 separate categories according to the CCS system, as well as a broader grouping using anatomic location of surgery based on the first one to three digits of the CPT code ( The CCS grouping was performed using a ranked parallel listing of CCS procedure groups as well as a wide array of CCS groups represented as binary flags. Ultimately, 275 variables were included to describe and categorize the type of procedures performed.

Machine Learning Process: Data Preparation

Figure 2 outlines the overall experimental design. First, data were imported as two discrete tables, one including all cases with an outcome (i.e., a valid pain score) on POD1, and a subset of this table for patients who also had an outcome on POD3. The next step in data cleansing was imputation of missing data. Because several of the algorithms would not function if missing values were present, we used a protocol for automated entry of missing data. While this approach inevitably leads to information loss, this step improves the clinical feasibility for implementing an automated clinical decision support system with real-world hospital administrative datasets, which frequently contain missing data. Additionally, this step tested the ability of the analysis to function automatically, such as in a setting where manual cleaning and imputation would be infeasible. For nominal variables, missing entries were imputed using the distribution method, whereby replacement values for a given variable were based on the normalized random percentiles of that variable’s distribution. For continuous variables, the median value for a given variable was used for imputation.

Figure 2
Overview of machine learning classifier pipeline. Separate experiments were conducted for outcomes occurring on POD 1 and 3. Data replacement, imputation, and partitioning were performed using an algorithmic approach. Five machine learning classifiers ...

Next, we used three levels of interventions to address the risk of overfitting, whereby the model is over-customized to existing data and less useful for predicting future patient outcomes [2731]. First, data were partitioned into training (40% of observations), validation (30% of observations), and hold-out testing (30% of observations) partitions. Each partition was stratified on the target outcome so that roughly equivalent proportions of moderate to severe pain outcomes were present in each partition. Second, we included an experiment branch that included an automated variable selection algorithm that selected a subset of variables for use by the algorithms. Third, several of the algorithms tested incorporated regularization features and/or additional cross-validation in their modeling process.

Description of Algorithms

Five separate algorithms were tested in the classification array: Least Absolute Shrinkage and Selection Operator (LASSO), gradient-boosted decision tree, support vector machine (SVM), neural network, k-nearest neighbor (k-NN), and logistic regression. These algorithms were chosen to represent a wide variety of classification approaches ranging from the classic (logistic regression) to those specifically designed to accommodate highly dimensional data (SVM, LASSO). Details of the selected algorithms, and their implementation, can be found in the technical supplement.


Following the training and validation of each algorithm with full and reduced variable sets, algorithm accuracy was compared by examining accuracy in classifying moderate to severe pain in the holdout test data partition [32]. The primary endpoint for comparison of model accuracy was the area under the receiver-operating curve (ROC) [33,34]. Misclassification rates and the number of wrong classifications were reported to offer clinical context of the ROC [35]. Additionally, we computed error matrices for to determine in which direction the errors were made.

As a secondary endpoint of classifier performance, we reported the cumulative lift for each model [3638]. Lift measures how many times more likely an algorithm is to include instances of interest (patients with pain in this case) relative to pure chance if we had to choose only a small subsample (i.e., we want the subsample to include as many patients with pain as possible). It is the ratio of the percentage of patients with a high(er) probability of pain as predicted by the model to the percentage of patients with pain in the overall dataset. For example, take the case where an acute pain service could offer only a limited number of nerve blocks each day, such that only 20% of eligible surgical patients could receive a block. Given a predictive model, we assume rightly or wrongly, our best chance of identifying that subpopulation most likely to otherwise suffer from severe pain is to examine the prediction probabilities of the model and pick a sample from the general surgical population that has the top 20% of the predicted probability of suffering from severe acute postoperative pain. A perfect model would fill that entire sample with patients who actually will suffer from severe acute postoperative pain. If the distribution of present versus absent acute pain outcomes was a 50:50 split, the maximum top decile of lift for a perfect model would be 2. In comparison, if the ratio of a present versus absent acute pain outcome was 80:20, then the maximum theoretical lift for a perfect model would be 5. A value of one or less signals an inaccurate model (i.e., the percentage of patients with higher probability of pain as predicted by the model in a subset of the test set does not exceed the percentage of patients with pain in the full test set). The value reported in this work is the maximum cumulative lift.

All analyses were conducted using SAS Enterprise Miner 12.1 (SAS Institute, Cary, NC).


A total of 8071 subjects were included in this study, reflecting a convenience sample of patients available with pain scores on POD 1. A 5031-patient subset of this sample also had documented pain scores on POD 3 due to continued hospitalization. For POD 1 outcomes, all 8071 subjects were included. Table 1 provides an overview of the demographic and procedural characteristics of the POD 1 and 3 samples.

Table 1
Subject demographics

Pain Outcomes on POD 1 and 3

Of the 8071 subjects included in the POD 1 dataset, 4267 (53%) reported suffering from moderate to severe pain on the first day after surgery (Table 2). For the POD 3 dataset, 2256 (45%) reported suffering from moderate to severe pain on the third day after surgery, yielding an absolute reduction rate of 8%. Of the 4267 patients who reported suffering from moderate to severe pain on POD 1, 2,676 remained hospitalized on POD3, and 1786 (79%) of these patients also reported moderate to severe postoperative pain on POD 3. For those 3804 patients with no reports of moderate to severe pain on POD 1, 2335 remained hospitalized on POD3, 1885 (81%) of whom also reported no episodes of moderate to severe pain on POD 3.

Table 2
Associations between POD 1 and 3 outcomes

Imputation of Missing Variables

The majority of missing value imputations were due to absence of “present on admission” flag data for the fifth (1983 imputations), sixth (2575), seventh (3189), and eighth (3779) listed comorbid conditions, followed by features pertaining to home medication use (1402 imputations for each home medication) and the identities of the attending surgeon or anesthesiologist. A summary of the imputations for the POD1 and POD3 datasets can be found in Appendix A.

Data Partition

As noted above, to avoid overfitting, algorithms were trained on the training set, tuned on the validation set, and then tested on the hold-out partition of data. For the POD1 data, there were 3227 subjects partitioned to the training set, 2421 to the validation set, and 2423 to the hold-out data set. By design, 53% of patients suffered from moderate to severe pain in each of the three partitions. For the POD 3 data, there were 2011 subjects partitioned to the training set, 1509 to the validation set, and 1511 to the test set. Again by design, within the POD 3 training set, 45% of patients suffered from moderate to severe pain in each of the three sets.

Feature Selection

Separate sets of features were selected for the POD 1 and 3 outcomes (Table 3). Details concerning patient age, type of surgery, and comorbidities grouped using the CCS array featured prominently in the POD 1 and 3 outcomes. Home opioid use carried a much higher relative importance for POD 1 outcomes (relative importance 0.54) than POD 3 (relative importance 0.26) outcomes.

Table 3
Results of automated feature selection for pain prediction outcomes on POD 1 and 3

Model Comparison

Each algorithm was compared on the hold-out test set using the full and reduced feature set, and then against the outcome of moderate to severe pain on POD 1 and 3, yielding a total of four experimental branches (Table 4). Overall, the LASSO algorithm, using the entire feature set to predict the occurrence of moderate to severe pain on POD 3, had the highest accuracy, with an area under the ROC of 0.727.

Table 4
Comparison test outcomes of machine learning algorithms

For POD 1, the LASSO algorithm, using the full feature set, had the highest accuracy with an ROC of 0.704. This was followed by the gradient-boosted decision tree algorithm, with an ROC of 0.665 and the k-NN algorithm, with an ROC of 0.643. In this branch of the experiment, the LASSO algorithm suffered 844 misclassifications for a misclassification rate of 0.35 (Fig. 3a). Using the full feature dataset, the LASSO algorithm exhibited a cumulative lift of 1.49 given the 53% incidence of postoperative pain, suggesting that at the top decile, 78% of that decile’s patients actually did suffer from severe acute postoperative pain (Fig. 4a). On POD 1 using the full feature set, LASSO exhibited a sensitivity of 0.69, a specificity of 0.61, and a likelihood ratio of 1.77 (Table 5). Table 6 demonstrates those parameter estimates with the greatest weights when tested using the entire feature set via LASSO.

Figure 3Figure 3
ROC for pain outcomes on POD 1 and 3. The ROC for each tested classifier are shown at the training, validation, and testing stages for POD 1 (A) and POD 3 (B). For POD 1, the LASSO algorithm, using the full feature set, had the highest accuracy, with ...
Figure 4
Cumulative lift curves for pain outcomes on POD 1 and 3. The LASSO algorithm exhibited a cumulative lift of 1.49 given the 53% incidence of moderate to severe postoperative pain on POD 1, suggesting that at the top decile, 78% of that decile’s ...
Table 5
Confusion matrix for LASSO with full feature set: POD 1 and 3
Table 6
Parameter estimates for LASSO on POD1 and 3

When using the full feature set on POD 1, the neural network and logistic regression algorithms had the lowest accuracy, with an ROC of 0.5 each. This suggested negligible improvement in classification accuracy over that offered by chance.

When the feature set was reduced using the pre-algorithm variable selection step, the LASSO algorithm again had the highest accuracy, with an ROC of 0.704, followed by the gradient-boosted decision tree algorithm, with an ROC of 0.698, and the autoneural algorithm, with an ROC of 0.688. Here, accuracy of the LASSO algorithm remained grossly unchanged, committing 848 misclassifications versus 844 with the full feature set. However, the gradient-boosted decision tree algorithm had increased accuracy with the reduced feature set, increasing in ROC from 0.665 with the full feature set to 0.698 with the reduced. Using the reduced-feature dataset, the LASSO algorithm exhibited a slightly lower cumulative lift of 1.44, suggesting that the top decile is 1.44 times more likely to include patients with severe acute postoperative pain than would a model based upon random sampling.

As mentioned previously, for POD 3, the LASSO algorithm using the full feature set again had the highest accuracy, with a ROC of 0.727. In this branch of the experiment, the LASSO algorithm suffered 483 misclassifications for a misclassification rate of 0.32. This was again followed by the gradient-boosted decision tree, with an ROC of 0.682, and the k-NN algorithm, with a ROC of 0.637 (Fig. 3b). Using the full feature dataset, the LASSO algorithm exhibited a cumulative lift of 1.61, suggesting that the top decile is 1.61 times more likely to include patients with severe acute postoperative pain than would a model based on random sampling (Fig. 4b). On POD 3, with the full feature set, LASSO exhibited a sensitivity of 0.59, a specificity of 0.75, and likelihood ratio of 2.4.

When using the full feature set on POD 3, the neural network and logistic regression had the lowest accuracy, each with an ROC of 0.5. This suggested negligible improvement in classification accuracy over that offered by chance.

When the feature set was reduced using the pre-algorithm variable selection step on POD 3 outcome data, the LASSO algorithm had the highest accuracy, with an ROC of 0.717, followed by the gradient-boosted decision tree algorithm, with an ROC of 0.702, and then neural network algorithm, with a ROC of 0.691. Using the reduced-feature dataset, the LASSO algorithm exhibited a cumulative lift of 1.6, suggesting that the LASSO algorithm detected, or captured, 61% of those subjects who suffered from moderate to severe postoperative pain.


Our results demonstrate that machine learning algorithms, when applied to highly dimensional datasets developed from clinical data repositories, offer substantial improvements in accuracy over the tested logistic regression-based approaches to classification of acute postoperative pain outcomes. The majority of algorithms offered slightly better accuracy in predicting the occurrence of moderate to severe postoperative pain on POD 3 in comparison to POD 1. Reducing the number of predictor variables using used an automated approach improved the accuracy of many of the algorithms tested; however, LASSO performed equally well with the complete and reduced feature sets.

Our analysis included multiple metrics of algorithm performance to more fully delineate the differences in prediction capabilities afforded by each machine learning approach. Although ROC is a widely accepted metric of model accuracy, it fails to provide substantial insight into what portion of the population is likely to benefit from the accuracy offered by the model [35]. This is partially due to the fact that the proportion of patients who will suffer from moderate to severe postoperative pain is not equal to the proportion of those who will not. Subsequently, and even independently in some cases, misclassifications may be biased toward, or against, the detection of patients likely to suffer from moderate to severe pain after surgery. Indeed, the results presented here suggest that the LASSO algorithm may capture a larger proportion of patients expected to have an adverse acute pain outcome on POD1 than on POD3, despite having an identical ROC. This information may be helpful in developing future iterations of a postoperative pain prediction pipeline by modifying the costs associated with a particular misclassification, thereby helping influence the direction of misclassification to favor the detection of at-risk patients.

Using only routinely collected clinical data, our results compare favorably to the models derived from prior studies in which predictive models were prospectively developed using datasets designed a priori for research purposes [26,27]. Kalkman and others [27] prospectively examined 1416 patients undergoing a mix of surgical procedures, excluding cardiac and neurosurgical cases, and developed a logistic regression model incorporating the following features: age, gender, type of surgery, intended incision size, blood pressure, heart rate, body mass index, preoperative pain intensity, and health-related quality of life as measured by the SF-36, the State-Trait Anxiety Inventory, and the Amsterdam Preoperative Anxiety and Information Scale. The bootstrapped model had an ROC of 0.73, and the authors concluded that pain scores within the first hour of surgery can be predicted using a set of variables collectible during a preoperative evaluation. Although this represented a significant contribution toward the prediction of postoperative pain, it should be noted that early postoperative pain scores do not correlate well with pain scores reported on POD 1 through 5 [39]. Furthermore, the work by Kalkman incorporated variables that were collected solely for the purpose of postoperative pain prediction; such tools are not universally applied in clinical preoperative evaluations. Our model accuracy of 0.7 to 0.73, using routinely available clinical data not prescreened for inclusion into the model, thus compares favorably to the dedicated prospective efforts by Kalkman.

Similarly, Sommer et al. collected postoperative pain scores on 1490 patients undergoing a mix of surgical procedures [26]. Preoperative variables included demographics, type of anesthesia, type of surgery, American Society of Anesthesiologists score, duration of procedure, and multiple psychometric scales. ROC ranged from 0.74 on the day of surgery to 0.78 on POD 4, a trend similar to our results, suggesting an increase in model accuracy with each POD. Notably, there is no report of any type of validation step used by Sommer et al., raising the possibility that their results suffered from model overfitting. For comparison, our own results offered an ROC of 1, 0.89, and 0.79 for the unvalidated training-set models developed by the full feature set logistic regression, reduced feature set logistic regression, and full feature set SVM algorithms on POD 1.

Our pragmatic approach to postoperative pain prediction thus offers classification accuracy that, although less than ideal, compares quite favorably to prior published work. Moreover, while these prior groundbreaking reports are quite laudable in their scope and results, they nevertheless employed approaches that lacked the ability to include additional variables. For instance, the inclusion of genomic data alone may result in the addition of tens of thousands of features for any given patient. Our approach suggests that pragmatic, autonomous forecasting of postoperative acute pain outcomes may be feasible for individual healthcare systems, thus permitting customization of models to the patients and practices that are particular to a given hospital and population.

Altogether, the risks and benefits associated with the assortment of pharmacologic and needle-based therapies offered by modern acute pain medicine services points to the need for accurate decision support systems capable of determining which patients are likely to benefit from such analgesic interventions. Simultaneously, such forecasts may spare those patients not at-risk for severe acute postoperative pain from the risks and costs inherent to regional anesthetics. Our results offer a specificity of 0.755 on POD 3, thus providing a moderate capability to spare those who would otherwise be scheduled for a nerve block from the associated risks and costs.

We also demonstrated a pragmatic application of advanced analytic methods to automatically process existing EMR data, select relevant variables, and then forecast severe acute postoperative pain [31]. The manual review of records to organize and “clean” data is no longer a feasible modeling approach given the massive amount of clinical data accumulated for each patient [40]. When using large administrative datasets, many patient characteristics that may be associated with poor postoperative pain outcomes, such as anxiety, catastrophizing, and socioeconomic status, may not be readily available in forms that are used within the experimental paradigm. Furthermore, the number of patients whose records would need to be reviewed in a time-sensitive fashion given the often short time interval between OR case scheduling and surgery makes this approach even more impractical. This presents a realistic challenge in converting experimentally derived models to models that are clinically applicable. This challenge, however, may be overcome with automated methods to processing EMR data, such as those presented here. Also, with the increasing structured clinical collection of social and behavioral characteristic, such as socioeconomic status, these automated methods may be made even more powerful in predicting postoperative pain.

Our data suffered from several limitations inherent to retrospective cohort studies. First, our study used static aggregate measures of pain by looking at the median pain scores. This represented a tradeoff in the specificity of the targeted outcome, such as would have been offered by selecting the number of severe pain events or focusing on severe pain events, for a more generalized clinical applicability affecting a larger proportion of patients. Second, this study did not incorporate information pertaining to analgesic use or functional capacity. Interestingly, we found that data pertaining to opioid administration via patient-controlled analgesia devices was not readily incorporated into the standard clinical EMR system. Although beyond the scope of this project (the use of machine learning classifiers), the simultaneous prediction of pain, analgesic requirement, and functional capacity remains an important goal for clinical decision support systems designed to forecast acute postoperative pain outcomes. Our model for POD 3 also suffered from a censuring effect, in that we offered no information pertaining to the reason for discharge of patients between POD 1 and 3. Discharges in this time interval may be due to low postoperative pain, whereas those patients remaining in the hospital may be there strictly due to poor pain control. This shortcoming points to the importance of supplemental data, as mentioned above, as well as the incorporation of time-domain information regarding resolution of acute pain, as explored preliminarily by Chapman et al. [41,42]. Perhaps the most important shortcoming of this study was the overall lack of model accuracy demonstrated despite the use of advanced algorithms and a highly dimensional dataset. Our results compare favorably to those reported by Kalkman and Sommer et al., despite their inclusion of additional psychometric data selected to enhance prediction of postoperative pain, and despite the lack of validation of the model in one of the studies [26]. Nevertheless, a large proportion of the observed variance in postoperative pain outcomes remains unexplained by our model. Fortunately, the machine learning approach tested here is well positioned to incorporate even higher dimensional data, including genetic, text, and social network variables in future studies.

In summary, our results suggest the feasibility of an autonomous “analytic pipeline” as follows: upon scheduling for surgery, the entire set of variables contained within a patient’s EMR could be sent to a machine learning classification system that has previously been trained, validated, and tested using historical data from many patients who have recently undergone surgery in the health system. Next, the system would automatically clean the patient’s data and forecast whether or not that patient is likely to suffer from moderate or severe pain after surgery. Those predictions could then be forwarded to the perioperative teams that would care for the patient on the day of surgery. Such an early-warning system may provide valuable information that allows a perioperative team to go beyond simple heuristics in choosing anesthesia therapies, such as basing them only on type of surgery. Notably, while an analytic pipeline based on the classification methods in this paper would provide a clinically valuable prediction of pain risk, it would not suggest specific preventative, preemptive, or rescue analgesia for a given patient. However, future work could migrate our general analytic pipeline approach from simply forecasting postoperative pain to simultaneously considering the clinical context of the postoperative pain experience and recommending therapies.

Machine learning algorithms thus, when combined with highly dimensional datasets, offer an exciting opportunity to accurately forecast severe acute postoperative pain. Although our results demonstrate the feasibility with accuracy comparable to prior efforts, future work will need to improve the analyzed feature set as well the target pain-related outcomes.


Sources of financial support: Patrick J. Tighe is funded by an NIH grant (no. K23GM102697).


Department/institution to which this work is attributed: Departments of Anesthesiology; Information Systems and Operations Management, Warrington College of Business Administration; and Community Dentistry and Behavioral Science, University of Florida, Gainesville, Florida.

The authors have no conflicts of interests to report.


1. Apfelbaum JL, Chen C, Mehta SS, Gan ATJ. Postoperative pain experience: Results from a National survey suggest postoperative pain continues to be undermanaged. Anesth Analg. 2003;97:534–40. [PubMed]
2. Kehlet H, Jensen TS, Woolf CJ. Persistent postsurgical pain: Risk factors and prevention. Lancet. 2006;367:1618–5. [PubMed]
3. Buvanendran A, Kroin JS. Multimodal analgesia for controlling acute postoperative pain. Curr Opin Anaesthesiol. 2009;22:588–93. [PubMed]
4. Katz J, Clarke H, Seltzer Z. Preventive analgesia. Anesth Analg. 2011;113:1242–53. [PubMed]
5. Ip HYV, Abrishami A, Peng PWH, Wong J, Chung F. Predictors of postoperative pain and analgesic consumption: A qualitative systematic review. Anesthesiology. 2009;111:657–77. [PubMed]
6. Kalkman CJ, Visser K, Moen J, Bonsel GJ, Grobbee DE, Moons KGM. Preoperative prediction of severe postoperative pain. Pain. 2003;105:415–23. [PubMed]
7. Sommer M, de Rijke JM, van Kleef M, et al. The prevalence of postoperative pain in a sample of 1490 surgical inpatients. Eur J Anaesthesiol. 2008;25:267–74. [PubMed]
8. Yip KY, Cheng C, Gerstein M. Machine learning and genome annotation: a match meant to be? Genome Biol. 2013;14:205. [PMC free article] [PubMed]
9. Okser S, Pahikkala T, Aittokallio T. Genetic variants and their interactions in disease risk prediction: Machine learning and network perspectives. BioData Mining. 2013;6:1. [PMC free article] [PubMed]
10. Menden MP, Iorio F, Garnett M, et al. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS ONE. 2013;8:e61318. [PMC free article] [PubMed]
11. Bessarabova M, Ishkin A, JeBailey L, Nikolskaya T, Nikolsky Y. Knowledge-based analysis of proteomics data. BMC Bioinformatics. 2012;13:S13. [PMC free article] [PubMed]
12. Pakhomov SV, Buntrock JD, Chute CG. Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques. J Am Med Inform Assoc. 2006;13:516–25. [PMC free article] [PubMed]
13. DeLisle S, Kim B, Deepak J, et al. Using the electronic medical record to identify community-acquired pneumonia: toward a replicable automated strategy. PLoS ONE. 2013;8:e70944. [PMC free article] [PubMed]
15. Breiman L. Statistical modeling: The two cultures. Stat Sci. 2001 doi: 10.1006/aama.1996.0501. [Cross Ref]
16. Hall M, Franke E, Holmes G, Pfahringer B, Reutemann P, Witten I. The WEKA data mining software: An update. SIGKDD Explorations. 2009;11:10–8.
17. Witten I, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. 2. San Francisco: Morgan Kaufmann; 2005.
18. Steyerberg EW, Eijkemans MJ, Habbema JD. Stepwise selection in small data sets: A simulation study of bias in logistic regression analysis. J Clin Epidemiol. 1999;52:935–42. [PubMed]
19. Li J, Huang K-Y, Jin J, Shi J. A survey on statistical methods for health care fraud detection. Health Care Manag Sci. 2008;11:275–87. [PubMed]
20. Bolton RJ, Hand DJ. Statistical fraud detection: A review. Stat Sci. 2002 doi: 10.2307/3182781. [Cross Ref]
21. Labusch K, Barth E, Martinetz T. Simple method for high-performance digit recognition based on sparse coding. IEEE Trans Neural Netw. 19:1985–9. [PubMed]
22. Zorkadis V, Karras DA, Panayotou M. Efficient information theoretic strategies for classifier combination, feature extraction and performance evaluation in improving false positives and false negatives for spam e-mail filtering. Neural Netw. 2005;18:799–807. [PubMed]
23. Furukawa MF. Meaningful use: A roadmap for the advancement of health information exchange. Isr J Health Policy Res. 2013;2:1. [PMC free article] [PubMed]
24. Lai M, Kheterpal S. Creating a real return-on-investment for information system implementation: Life after HITECH. Anesthesiol Clin. 2011;29:413–38. [PubMed]
25. Blumenthal D, Tavenner M. The “meaningful use” regulation for electronic health records. N Engl J Med. 2010;363:501–4. [PubMed]
26. Sommer M, de Rijke JM, van Kleef M, et al. Predictors of acute postoperative pain after elective surgery. Clin J Pain. 2010;26:87–94. [PubMed]
27. Kalkman CJ, Visser K, Moen J, Bonsel GJ, Grobbee DE, Moons KGM. Preoperative prediction of severe postoperative pain. Pain. 2003;105:415–23. [PubMed]
28. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation. J Chronic Dis. 1987;40:373–83. [PubMed]
29. Cohen PR, Jensen D. Overfitting explained. Preliminary Papers of the Sixth International. 1997
30. Toll DB, Janssen KJM, Vergouwe Y, Moons KGM. Validation, updating and impact of clinical prediction rules: A review. J Clin Epidemiol. 2008;61:1085–94. [PubMed]
31. Babyak MA. What you see may not be what you get: A brief, nontechnical introduction to overfitting in regression-type models. Psychosom Med. 2004;66:411–21. [PubMed]
32. Tao KM. A closer look at the radial basis function (RBF) networks. Neurocomputing. 1997;14:273–88.
33. Lee S-M, Abbott P, Johantgen M. Logistic regression and Bayesian networks to study outcomes using large data sets. Nursing Res. 2005;54:133–8. [PubMed]
34. Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis Machine Intell. 2000;22(1):4–37.
35. Metz CE. Basic principles of ROC analysis. Semin Nucl Med. 1978;8:283–98. [PubMed]
36. Linden A. Measuring diagnostic and predictive accuracy in disease management: An introduction to receiver operating characteristic (ROC) analysis. J Eval Clin Pract. 2006;12:132–9. [PubMed]
37. Provost FJ, Fawcett T, Kohavi R. The case against accuracy estimation for comparing induction algorithms. ICML. 1998
38. Bhattacharyya S. Evolutionary algorithms in data mining: Multi-objective performance modeling for direct marketing. Proc Sixth ACM SIGKDD Intl Conf Knowledge Discovery Data Mining; New York. 2000; pp. 465–473.
39. Tighe PJ, Harle CA, Boezaart AP, Aytug H, Fillingim R. Of rough starts and smooth finishes: Correlations between post-anesthesia care unit and postoperative days 1–5 pain scores. Pain Med. 2014;15:306–15. [PMC free article] [PubMed]
40. Zurada J, Lonial S. Comparison of the performance of several data mining methods for bad debt recovery in the healthcare industry. J Appl Business Res. 2005:21.
41. Chapman CR, Donaldson GW, Davis JJ, Bradshaw DH. Improving individual measurement of postoperative pain: The pain trajectory. J Pain. 2011;12:257–62. [PMC free article] [PubMed]
42. Chapman CR, Donaldson G, Davis J, Ericson D, Billharz J. Postoperative pain patterns in chronic pain patients: A pilot study. Pain Med. 2009;10:481–7. [PubMed]