|Home | About | Journals | Submit | Contact Us | Français|
The aim of this study is to develop a simple and reliable hybrid decision support model by combining statistical analysis and decision tree algorithms to ensure high accuracy of early diagnosis in patients with suspected acute appendicitis and to identify useful decision rules.
We enrolled 326 patients who attended an emergency medical center complaining mainly of acute abdominal pain. Statistical analysis approaches were used as a feature selection process in the design of decision support models, including the Chi-square test, Fisher's exact test, the Mann-Whitney U-test (p < 0.01), and Wald forward logistic regression (entry and removal criteria of 0.01 and 0.05, or 0.05 and 0.10, respectively). The final decision support models were constructed using the C5.0 decision tree algorithm of Clementine 12.0 after pre-processing.
Of 55 variables, two subsets were found to be indispensable for early diagnostic knowledge discovery in acute appendicitis. The two subsets were as follows: (1) lymphocytes, urine glucose, total bilirubin, total amylase, chloride, red blood cell, neutrophils, eosinophils, white blood cell, complaints, basophils, glucose, monocytes, activated partial thromboplastin time, urine ketone, and direct bilirubin in the univariate analysis-based model; and (2) neutrophils, complaints, total bilirubin, urine glucose, and lipase in the multivariate analysis-based model. The experimental results showed that the model with univariate analysis (80.2%, 82.4%, 78.3%, 76.8%, 83.5%, and 80.3%) outperformed models using multivariate analysis (71.6%, 69.3%, 73.7%, 69.7%, 73.3%, and 71.5% with entry and removal criteria of 0.01 and 0.05; 73.5%, 66.0%, 80.0%, 74.3%, 72.9%, and 73.0% with entry and removal criteria of 0.05 and 0.10) in terms of accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and area under ROC curve, during a 10-fold cross validation. A statistically significant difference was detected in the pairwise comparison of ROC curves (p < 0.01, 95% CI, 3.13-14.5; p < 0.05, 95% CI, 1.54-13.1). The larger induced decision model was more effective for identifying acute appendicitis in patients with acute abdominal pain, whereas the smaller induced decision tree was less accurate with the test data.
The decision model developed in this study can be applied as an aid in the initial decision making of clinicians to increase vigilance in cases of suspected acute appendicitis.
Acute appendicitis is a common disease in emergency abdominal surgery with a lifetime occurrence of approximately 7% and perforation rates of 17-20% [1-3]. The decision to explore a patient with suspected acute appendicitis is based mainly on disease history and physical findings, but the clinical presentation is seldom typical . Unfortunately, some patients with acute appendicitis are not diagnosed until the occurrence of peritonitis or other severe complications while their surgeons are waiting for more evidence of acute appendicitis. These patients have a higher mortality and morbidity than patients who are diagnosed in a timely manner . Thus, a timely and accurate diagnosis of acute appendicitis is important for avoiding unnecessary diagnostic procedures and for identifying appropriate therapeutic measures and clinical management strategies. However, finding meaningful factors and identifying their relationships is difficult due to the numerous parameters that are routinely available, such as patient history and laboratory data, etc.
Computer-aided diagnosis of acute abdominal pain has challenged researchers for over 40 years. Since the pioneering work of de Dombal et al. , several studies have aimed to support the diagnosis of acute appendicitis on the basis of grading medical history, clinical symptoms, and signs [7-9]. Eberhart  reported a comparison of appendicitis diagnosis versus non-specific abdominal pain using three different neural network paradigms: back propagation (BP), binary adaptive resonance theory (ART-1), and fuzzy resonance (Fuzzy-ART). Pesonen  compared the predictive performance of four different neural network algorithms in the diagnosis of acute appendicitis with different parameter groups, i.e., ART-1, self-organizing maps (SOM), learning vector quantization (LVQ), and BP. It was found that supervised learning algorithms (LVQ and BP) performed better than unsupervised learning algorithms (ART-1 and SOM) in medical decision making problems. Prabhudesai  evaluated artificial neural networks (ANNs) for the diagnosis of appendicitis in patients presenting with acute right iliac fossa (RIF) pain and compared ANN performance with assessments made by experienced clinicians and the Alvarado score . The ability of ANNs to accurately exclude the diagnosis of appendicitis in patients without true appendicitis was significantly better than clinical performance and an Alvarado score ≥ 6. All the neural network algorithms provided good performances in the diagnosis of acute appendicitis, but they had the following drawbacks: time-consuming depending on the size of training data, a black-box structure lacking transparency in the knowledge generated, and the inability to explain the decisions that were made.
Several other studies of acute abdominal pain and acute appendicitis have been performed, including decision tree models. The performance of these models ranged from 43% to 95% [5,14-16]. Ting  modified the Alvarado scoring system (ASS) with a decision tree technique and constructed a convenient and accurate decision support model that consisted of RLQ tenderness, the Alvarado score, migrating pain, and a neutrophil count > 75% for acute appendicitis diagnosis and timing of laparotomy. Gaga  introduced the data representation formalism ID+, which was derived from Quinlan's ID3 algorithm, to facilitate the modeling of dependencies between attributes or attribute values, with multiple values per attribute. They used this method to demonstrate a medical knowledge acquisition application for abdominal pain in children. Ohmann  evaluated the performance of seven knowledge acquisition techniques, i.e., Bayes independence and rule induction techniques, ID3, NewId, PRISM, CN2, C4.5, and ITRULE. No overall differences in accuracy were observed, except with NewId, which was less accurate compared with the other algorithms. None of the algorithms produced an overall accuracy of > 50%. Zorman  addressed the problem of separating acute appendicitis from other diseases causing acute abdominal pain with an improved decision tree approach based on the dynamic discretization of continuous attributes. This method was used to investigate the predictive performance of different decision trees with three prospective databases: the COMAC-BME-European Community Concerted Action on Objective Medical Decision Making in Patients with Acute Abdominal Pain project, the German MEDWIS project A70 "Expert system for acute abdominal pain," and the COPERNICUS program no. 555 project.
Most of these studies have focused on the issues, the discriminatory power of decision support models, or the decision rules derived from different decision tree algorithms without performing a statistical comparison of the significance of their results. In this study, we present a hybrid decision support model that combines statistical analysis and decision tree approaches to discover significant rules and provide high accuracy, early diagnosis for patients with suspected acute appendicitis.
After obtaining the Institutional Review Board (IRB) approval (no. 11-275) from Keimyung University Dongsan Hospital, we retrospectively collected the medical records of all patients attending the emergency medical center complaining mainly of acute abdominal pain between July 2006 and June 2007. Only complete medical records with no missing clinical parameters were included, i.e., age, gender, chief complaints, and clinical laboratory findings, such as urinalysis, common blood cell and differential counts, serum electrolytes, routine admission, etc. To analyze the chief complaints, we split the abdomen areas into eight distinct regions (Table (Table1)1) based on four abdominal quadrants , i.e., the right upper quadrant (RUQ), right lower quadrant (RLQ), left upper quadrant (LUQ), and left lower quadrant (LLQ). Patients diagnosed with complaints other than appendicitis were excluded, such as acute cholecystitis or diverticulitis, appendectomy incidental to another surgical procedure, previous use of antibiotics for chronic appendicitis, and appendectomy for chronic abdominal pain. The eligibility for study group (n = 152) was defined according to the International Classification of Diseases-10 (ICD-10) codes: K35.0 (acute appendicitis with generalized peritonitis), K35.1 (acute appendicitis with peritoneal abscess), and K35.9 (acute appendicitis without generalized peritonitis). Discharged patients (n = 174) admitted to the emergency medical center who complained mainly of acute abdominal pain were defined as the control group. All data collected were reconfirmed by gastroenterologists.
Statistical analysis was performed using SPSS 12.0 for Windows (SPSS Inc., Chicago, IL, USA). Univariate correlations between clinical or laboratory features were evaluated using the Chi-square test or Fisher's exact test, which are appropriate for categorical data, and using the Student t-test or Mann-Whitney U-test with continuous variables, after checking for normality using the Kolmogorov-Smirnov test. A two-tailed p < 0.01 was selected as the level of statistical significance. In the multivariate analysis, the Wald forward logistic regression model, with entry and removal criteria of 0.01 and 0.05, or 0.05 and 0.10, respectively, was used to identify independent predictors of acute appendicitis. Modeling results were expressed as the odds ratios (OR) with 95% confidence intervals (95% CI). The Hosmer-Lemeshow test (H) was used to assess the fit of the models, which divides subjects into deciles based on their predicted probabilities before computing Chi-square values from the observed and expected frequencies [17-20].
After the feature selection process, Quinlan's C5.0 decision tree algorithm [21,22] was used to design the final decision support models. This approach provides a very simple representation of accumulated knowledge and it also facilitates the derivation of an explanation for the decision, which is essential in medical applications. The model selects the best decision node that separates the different classes from the empirical data . The main induction loop of the decision tree is as follows : i) assume A as the possible "best" decision attribute for the next node; ii) assign A as the decision attribute for the node; iii) for each value of A, create a new descendent of the node; iv) count the entropies of the training examples to the leaf nodes; and v) stop searching for new leaf nodes if training examples are well-classified, or continue the new leaf nodes if they are not well-classified. The decision tree model used in this study was built with C5.0 component using the default experimental parameters of Clementine version 12.0 (SPSS Inc., Chicago, IL, USA).
Figure Figure11 shows the scheme of the decision support models, which were based on statistical tests (i.e., univariate analysis, p < 0.01) and the Wald forward logistic regression (entry and removal criteria of 0.01 and 0.05 or 0.05 and 0.10), for diagnosis of acute appendicitis. We used 10-fold cross validation experiments to provide an unbiased estimate of the generalization error. The full dataset was randomly divided into 10 subsets: nine subsets were used for training (90%), while the remaining subset was used for testing (10%). The process was then repeated 10 times. The performance of the models was evaluated using six standard measures: accuracy (ACC), sensitivity (SENS), specificity (SPEC), positive predictive value (PPV), negative predictive value (NPV), and the area under the ROC curve (AUC). We also made a pairwise comparison [24,25] between the ROC curves of the models to test for statistically significant differences.
Of the 326 patients enrolled in this study, 152 (46.6%) had acute appendicitis, while 174 (53.4%) were discharged. Significant differences were observed in terms of age (p < 0.01), complaints (p < 0.001), urine glucose (p < 0.001), and urine ketone (p < 0.001), among the acute appendicitis patients (mean age, 36.57 years) and the discharged patients (mean age, 43.05 years). Abdominal and RLQ pains were the most common complaints presented in the emergency medical center (Table (Table11).
In terms of blood test findings, white blood cell (p < 0.001), red blood cell (p < 0.01), neutrophils (p < 0.001), glucose (p < 0.01), total bilirubin (p < 0.001), direct bilirubin (p < 0.001), and activated partial thromboplastin time (p < 0.01) were significantly or slightly higher in patients with acute appendicitis, whereas lymphocytes (p < 0.001), monocytes (p < 0.001), eosinophils (p < 0.001), basophils (p < 0.001), large unstained cells (p < 0.001), sodium (p < 0.001), chloride (p < 0.001), lipase (p < 0.001), and total amylase (p < 0.001) were significantly higher in discharged patients. The remaining variables could not be used to differentiate acute appendicitis from the discharged patients (Table (Table22).
In the multivariate analysis, independent risk factors were identified using Wald forward logistic regression, to define entry and removal criteria of 0.01 and 0.05, or 0.05 and 0.10, respectively. Regardless of the criteria used, the independent risk factors provided the same results using the two logistic models. We included six variables in the final logistic regression that were independently associated with acute appendicitis: complaints, urine glucose, white blood cell, neutrophils, total bilirubin, and lipase (Table (Table3).3). These variables were tested by linear regression analysis to evaluate multicollinearity among the predictors. The data did not violate the multicollinearity assumption. The tolerance of each independent variable was greater than 0.616. The variance inflation factor (VIF) values of the variables ranged from 1.005 to 1.624. The ACC, SENS, SPEC, PPV, and NPV, were 79.8%, 76.3%, 82.8%, 79.5%, and 80.0%, respectively. The AUC of the models was 79.5% (95% CI, 74.7-83.8), indicating fair discriminatory power. The goodness-of-fit (H) statistic indicated that the models were well calibrated (p = 0.838).
Five of the six variables (Table (Table3)3) were selected by the C5.0 decision tree model and their importance was defined in the following order: neutrophils, complaints, total bilirubin, urine glucose, and lipase. The cut-off points were determined using the C5.0 decision tree algorithm and the criteria for dichotomizing the continuous variables were all statistically significant (p < 0.05) except for LUQ pain (OR, 0.732; 95% CI, 0.014-37.307; p = 0.876). The results are summarized in Table Table4.4. The decision support model is shown in Figure Figure22 and eight decision rules were generated from the full dataset. Seven decision rules (in Figure Figure2,2, leaf nodes 1, 5, 7, 8, 10, 11, and 12) were statistically significant, excluding leaf node LUQ (node 9). Three rules were associated with acute appendicitis as follows: 1) neutrophils > 73.1% and urine glucose is positive (p < 0.01); 2) neutrophils > 73.1% and urine glucose is negative and periumbilical area pain, or upper abdominal pain, or RLQ pain (p < 0.001); 3) neutrophils > 73.1% and urine glucose is negative and abdominal pain, and total bilirubin > 1.0 mg/dL, and lipase ≤ 46 U/L (p < 0.05). The ACC, SENS, SPEC, PPV, NPV, and AUC measures were 82.5%, 74.3%, 89.7%, 86.3%, 80.0%, and 82.0% (95% CI, 77.4-86.0), respectively.
Sixteen of the 20 variables with p < 0.01 (Tables (Tables11 and and2)2) were selected by the C5.0 decision tree algorithm and their importance was defined in the following order: lymphocytes, urine glucose, total bilirubin, total amylase, chloride, red blood cell, neutrophils, eosinophils, white blood cell, complaints, basophils, glucose, monocytes, activated partial thromboplastin time, urine ketone, and direct bilirubin. The criteria for the selected cut-off points are summarized in Table Table5.5. The decision support model for the diagnosis of acute appendicitis is shown in Figure Figure33 and its performance was 93.9%, 89.5%, 97.7%, 97.1%, 91.4%, and 93.6% (95% CI, 90.4-96.0). We generated 29 decision rules, i.e., 16 for acute appendicitis and 13 for the control group. Thirteen decision rules (in Figure Figure3:3: leaf nodes 6, 11, 15, 20, 22, 28, 39, 40, 41, 44, 45, 47, and 49) were statistically significant. Seven rules were associated with acute appendicitis as follows: 1) lymphocytes ≤ 20.2% and urine glucose is positive (p < 0.01); 2) lymphocytes ≤ 20.2% and urine glucose is negative and lower abdominal pain and direct bilirubin > 0.4 mg/dL (p < 0.05); 3) lymphocytes ≤ 20.2% and urine glucose is negative and RLQ pain and chloride > 104 mmol/L and urine ketone is negative and monocytes > 3.6% (p < 0.05); 4) lymphocytes ≤ 20.2% and urine glucose is negative and RLQ pain and chloride > 104 mmol/L and urine ketone is negative and monocytes ≤ 3.6% and eosinophils > 1.5% (p < 0.05); 5) lymphocytes ≤ 20.2% and urine glucose is negative and abdominal pain and total bilirubin ≤ 1.0 mg/dL and total amylase ≤ 58 U and monocytes ≤ 2.4% (p < 0.05); 6) lymphocytes ≤ 20.2% and urine glucose is negative and abdominal pain and total bilirubin > 1.0 mg/dL and activated partial thromboplastin time > 22.6 s and neutrophils ≤ 84% and lymphocytes > 13.8% (p < 0.05); 7) lymphocytes ≤ 20.2% and urine glucose is negative and abdominal pain and total bilirubin ≤ 1.0 mg/dL and total amylase ≤ 58 U and monocytes > 2.4% and eosinophils ≤ 2.4% and urine ketone is negative and glucose ≤ 124 mg/dL and chloride ≤ 107 mmol/L (p < 0.05).
The six measures were compared using a 10-fold cross validation to assess the generalization ability of these decision support models. The differences in the clinical factors selected before and after the application of the C5.0 decision tree algorithm are shown in Tables Tables66 and and7.7. This showed that the decision support model based on univariate analysis was superior to those based on multivariate analyses with different conditions (Table (Table8).8). The decision support model based on the univariate analysis was statistically superior to those based on multivariate analyses in terms of predictive power and discriminatory capacity, which was expressed by the area under the ROC curve (p < 0.01, 95% CI, 3.13-14.5; p < 0.05, 95% CI, 1.54-13.1; Table Table99 and Figure Figure4).4). The decision support model based on multivariate analysis using loose criteria was also better than that using strict criteria, especially the AUC measure, although the discriminatory power between the two models was not statistically significant (p = 0.400; 95% CI, -2.0-5.02).
From a clinical viewpoint, one of the most difficult problems is distinguishing patients with suspected acute appendicitis from those with acute abdominal pain. Thus, we developed a hybrid decision support model based on a decision tree algorithm and statistical analysis to reduce the high workload of clinicians. We also investigated the different diagnostic knowledge provided by the decision support models. We extracted subsets from the univariate analysis-based model (lymphocytes, urine glucose, total bilirubin, total amylase, chloride, red blood cell, neutrophils, eosinophils, white blood cell, complaints, basophils, glucose, monocytes, activated partial thromboplastin time, urine ketone, direct bilirubin) and from the multivariate analysis based model (neutrophils, complaints, total bilirubin, urine glucose, lipase) that were indispensable for discovering early diagnostic knowledge (i.e., relationships among these parameters) related to acute appendicitis, although several criteria did not reach statistical significance (Tables (Tables44 and and55).
The clinical parameters included well-known risk factors for acute appendicitis described in previous studies, i.e., neutrophils (or lymphocytes), eosinophils, RLQ tenderness, amylase, and lipase. Kalan  produced a modified Alvarado score by removing neutrophils from the model. However, the present study showed that the neutrophil count is a very important factor when evaluating patients with acute appendicitis , especially children [28,29]. Clark  tested the eosinophil count in the diagnostic evaluation of patients presenting with acute abdominal pain who subsequently underwent appendectomy and whether eosinophilia was related to subsequent histology. Patients with abdominal pain and peripheral eosinophils appeared less likely to have acute appendicitis based on their subsequent histology. Santosh  reported significant local eosinophil activation and degranulation during acute appendicitis, which was sufficient to elevate serum levels of eosinophil chemotactic protein. However, the inverse relationship between the duration of symptoms and serum eosinophil cationic protein was not statistically significant in cases of acute appendicitis. Um  reported the case of a 17-year-old female, who was characterized by increased serum amylase activities combined with normal serum lipase, normal creatinine, and a low amylase/creatinine clearance ratio. She was diagnosed with macroamylasemia and acute appendicitis without apparent clinical symptoms of a pancreatic disorder.
We performed a 10-fold cross validation to estimate the diagnostic accuracy of the decision support models. The results showed that the larger induced decision model was more effective at distinguishing patients with acute appendicitis from those with acute abdominal pain, whereas the compact decision model, with a smaller induced decision tree, was less accurate for the test data. The range of diagnostic accuracy was approximately 83-94% for the full dataset and 72-80% for the 10-fold cross validation, compared with the average accuracy in previous studies of acute abdominal pain [14,15] or acute appendicitis [5,16]. Several explanations provided similar or better results, which can be summarized as follows.
I) In our dataset, the sample size of each decision or diagnosis had a balanced distribution. In contrast, the distribution of diagnoses reported for these studies was extremely imbalanced, e.g., one diagnosis was represented by a significantly lower number of cases than the others . The decision boundary learned by a standard machine learning algorithm, such as a decision tree algorithm, can be severely skewed toward either a positive or negative decision. Consequently, the false negative or positive rate can be excessively high. One research approach for overcoming the class imbalance problem is to resample the original training dataset, by either oversampling the minority class and/or undersampling the majority class until decisions are represented in a more balanced way .
II) We used the reduced clinical parameters set after applying the univariate or multivariate analysis as a feature selection approach when constructing the final decision support model. This dataset was smaller compared with the number of parameters used in previous studies. Our dataset quality may even be increased by selecting informative features from a high-dimensional dataset. This reduces the time required to perform induction and it makes the resulting rules more comprehensible, thereby increasing the resulting accuracy [34,35].
This study had the following limitations because of its retrospective study design. The number of patients with acute appendicitis and non-acute appendicitis was relatively small, which produced variations when deriving the relevant parameters and their relationships. The feasibility of using derived rules has been verified using an external validation study  or a prospective studies [9,15,16]. These considerations may provide fruitful directions for further research.
This study developed a simple and reliable hybrid decision support model based on statistical analyses and a decision tree algorithm to provide high accuracy, early diagnosis of patients with suspected acute appendicitis. This model also facilitated diagnostic knowledge discovery using the derived rules. The experimental results show that a decision support model based on univariate analysis provided excellent discrimination and we demonstrated its feasibility for predicting acute appendicitis. Therefore, the decision model developed in our study can be applied to support the initial decision of clinicians and increase vigilance when detecting suspected acute appendicitis.
S.G: Specific gravity; O.B: Occult blood; WBC: White blood cell; RBC: Red blood cell; Ep. Cell: Epithelial cell: HGB: Hemoglobin; HCT: Hematocrit; MCV: Mean corpuscular volume; MCH: Mean corpuscular hemoglobin; MCHC: Mean corpuscular hemoglobin concentration; PLT: Platelet count; NEUT: Neutrophils; LYMP: Lymphocytes; MONO: Monocytes; EOS: Eosinophils; BASO: Basophils; LUC: Large unstained cells; MPV: Mean platelet volume; Na: Sodium; K: Potassium; Cl: Chloride; BUN: Blood urea nitrogen; ALP: Alkaline phosphatase; AST: Aspartate aminotransferase; ALT: Alanine aminotransferase; APTT: Activated partial thromboplastin time; PT: Prothrombin time.
The authors declare that they have no competing interests.
CSS conceived the study and participated in data analysis and the computation of performance values. BKJ was the consultant for the knowledge of (differential) diagnosis of acute appendicitis disease. STS and MSK interpreted the concept of the model prototype and helped revise the manuscript. YNK is the corresponding author who conceived the study and drafted the manuscript. All authors read and approved the final manuscript.
The pre-publication history for this paper can be accessed here:
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2011-0014736), and by Grant No. RTI04-01-01 from the Regional Technology Innovation Program of the Ministry of Knowledge Economy (MKE).