|Home | About | Journals | Submit | Contact Us | Français|
With the widespread adoption of e-Healthcare and telemedicine applications, accurate, intelligent disease diagnosis systems have been profoundly coveted. In recent years, numerous individual machine learning-based classifiers have been proposed and tested, and the fact that a single classifier cannot effectively classify and diagnose all diseases has been almost accorded with. This has seen a number of recent research attempts to arrive at a consensus using ensemble classification techniques. In this paper, a hybrid system is proposed to diagnose ailments using optimizing individual classifier parameters for two classifier techniques, namely, support vector machine (SVM) and multilayer perceptron (MLP) technique. We employ three recent evolutionary algorithms to optimize the parameters of the classifiers above, leading to six alternative hybrid disease diagnosis systems, also referred to as hybrid intelligent systems (HISs). Multiple objectives, namely, prediction accuracy, sensitivity, and specificity, have been considered to assess the efficacy of the proposed hybrid systems with existing ones. The proposed model is evaluated on 11 benchmark datasets, and the obtained results demonstrate that our proposed hybrid diagnosis systems perform better in terms of disease prediction accuracy, sensitivity, and specificity. Pertinent statistical tests were carried out to substantiate the efficacy of the obtained results.
The proliferations of computer usage across all aspects of life have resulted in accumulating a large number of systematic and related data. This has necessitated identifying useful patterns from raw datasets as the next logical step forward. Thus, data mining, a broad discipline encompassing classification, clustering, association, prediction, estimation, and visualization tasks , has emerged as a dynamic and significant field of research to address theoretical challenges as well as practical issues. Data mining and knowledge engineering techniques have been successfully applied to numerous areas, like education, pattern recognition, fraud detection, and medicine [2, 3].
The application of data mining and knowledge engineering techniques in the medical domain plays a prime role in the diagnosis of diseases and prognostication . It assists healthcare professionals and doctors to analyze and predict diseases  and is often commonly referred to as medical engineering. Numerous machine learning algorithms have been developed to extract useful patterns from raw medical data over the years . These patterns have been utilized for disease prediction using classification and clustering strategies. Medical research focuses on employing data mining for prediction of a broad range of diseases, including breast cancer , heart diseases , Parkinson's disease , hepatitis, and diabetes, only to name a few.
Over the years, several supervised machine learning techniques such as classification as well as several unsupervised machine learning techniques like clustering have been applied to available medical information [10, 11]. Individual classifiers, ensembles thereof, and hybrid systems have often been used to diagnose various diseases. Several techniques have been applied on medical data to improve such diagnosing efficacy, regarding performance parameters such as prediction accuracy, sensitivity, and specificity [12, 13].
This paper presents a hybrid system for diagnosis and prediction of numerous diseases using optimized parameters for classifiers. The classifier parameters are optimized using evolutionary algorithms to enhance classification performance. By juxtaposing the proposed parameter optimization step within existing classifier mechanisms, our method provides improved prediction accuracy. In this paper, 16 classifiers are executed in which two basics are with and without resampling, 6 hybrid intelligent systems without resampling and 6 hybrid intelligent systems with resampling technique. In summary, this paper presents a comparative analysis of parameter optimized versions of two classifiers, namely, support vector machine (SVM) and multilayer perceptron (MLP) for medical data. It has been concluded from experimental results presented in this paper that our proposed hybrid system outperforms state of the art (single or ensemble) for classifying medical data. To contrive the parameter optimization, we have employed three popular evolutionary algorithms, namely, particle swarm optimization (PSO), gravitational search algorithm (GSA), and firefly algorithm (FA) for optimizing parameters of SVM and MLP classifiers. Accordingly, we study the performance of six alternative hybrid systems for classifying medical data towards a diagnosis of such diseases. The performance of the proposed hybrid intelligent techniques is compared with the recent literature results (both simple and ensemble classifiers [14–16]). This hybrid intelligent system shows better performance than the recently published ensemble classifiers on 11 benchmark datasets.
The rest of this paper is organized as follows: A brief exposition of existing researches has been dealt with in Section 2, specifically focusing on several machine learning algorithms employed for processing medical datasets. The problem formulation of our proposed weighted multiobjective optimization for the classifying problem dealt with has been presented in Section 3. Section 4 provides the rudimentary steps and key features of the evolutionary algorithms employed for the parameter optimization of SVM and MLP classifiers, namely, particle swarm optimization (PSO), gravitational search algorithm (GSA), and firefly algorithm (FA). A very basic introduction of the two classifiers employed, namely, SVM and MLP, has been discussed in Section 5. Section 6 elaborately explains the development of the proposed hybrid classification system for disease diagnosis along with their key components and design principles involved. The performance of the proposed hybrid scheme is tested over 11 benchmark medical datasets, and Section 7 provides a brief account of the experimental setup and the experiments conducted and summarizes the results obtained. This section also presents a statistical analysis of obtained results for validating the acceptability of obtained statistical results. The conclusions of the research have been presented in Section 8.
There have been abundant attempts to analyze and diagnose ailments employing machine learning algorithms. This section gives a summary of the efforts in this field to put the contribution of our work in perspective. These researches, however, vary considerably in terms of classifiers applied and nature of systems employed; for example, some are simple and others are hybrid whereas some others present ensemble systems. There are also major varieties in terms of objective functions chosen, single or multiobjective formulation, the number of datasets on which these methods have been applied, performance parameters employed for validating the efficacy, and so forth.
Among the different disease datasets that have been studied in the literature, heart disease diagnosis has been very prominent within medical engineering circles, and a wide variety of machine learning techniques have been explored towards diagnosing the same. References [17–38] include some prominent contributions towards diagnosing heart diseases from various aspects using myriad machine learning techniques, details of which are presented hereafter. Chitra and Seenivasagam  proposed a cascaded neural network (CNN) classifier and support vector machine (SVM) to diagnose heart diseases. The performance of CNN and SVM was compared based on the accuracy, sensitivity, and specificity. Pattekari and Parveen  suggested an intelligent system, which used a naive Bayes classifier that was further improved by developing ensemble-based classifiers. Das et al.  developed a neural network ensemble model for heart disease diagnosis. The proposed technique used Statistical Analysis System (SAS) enterprise guide 4.3 programs for data preprocessing and SAS Enterprise miner 5.2 programs for recognizing the heart disease by combining three neural networks ensemble. The technique was further improved by combining other neural networks and was also used for various datasets. Das et al.  described an SAS-based Software 9.1.3 for diagnosing valvular heart diseases. The proposed method used a neural network ensemble. Predicted values, posterior probabilities, and voting posterior probabilities were applied.
Masethe and Masethe  used J48, naive Bayes, REPTREE, CART, and Bayes Net for diagnosing the efficacy of heart diseases. High accuracy was obtained using a J48 tree. Shaikh et al.  evaluated the performance of three classifiers, namely, k-NN, naive Bayesian, and decision tree based on four parameters, namely, precision, recall, accuracy, and F-measure. k-NN produced higher accuracy than other methods. Bhatla and Jyoti  compared naive Bayes, decision tree, and neural networks for the said diagnosis. For the decision tree, genetic algorithm and fuzzy logic were employed, and results presented used TANAGRA tool.
Kavitha and Christopher  performed classification of heart rate using a hybrid particle swarm optimization and fuzzy C-means (PSO-FCM) clustering. The proposed method performed feature selection using PSO. The fuzzy C-means cluster and classifier are combined to enhance the accuracy. Enhanced SVM was used for classifying heart diseases. The hybrid system could be trained to shorten the implementation time. Alizadehsani et al.  evaluated sequential minimal optimization (SMO), naive Bayes, bagging with SMO, and neural networks. They employed rapid miner tool, and high accuracy was obtained using bagging with SMO. Abhishek  employed j48, naive Bayes, neural networks with all attributes for diagnosing heart diseases with the WEKA machine learning software and concluded that j48 outperformed others regarding accuracy.
Jabbar et al.  used association mining and genetic algorithm in conjunction with heart disease prediction. The proposed method used Gini index statistics for association algorithm and crossover, the mutation for the genetic algorithm. They further employed a feature selection technique for improved accuracy. Ordonez et al.  presented an improved algorithm to determine constrained association rules by two techniques: mapping medical data and identifying constraints. The proposed method used mining attributes. Constrained association rules and parameters were used for the mapping. The technique produced interesting results by comparing this association rule with classification rule. Shenfield and Rostami  introduced a multiobjective approach to the evolutionary design of artificial neural networks for predicting heart disease.
Parthiban and Subramanian  developed a coactive neurofuzzy inference system (CANFIS) for prediction of heart diseases. The proposed model combined CANFIS, neural network, and fuzzy logic. It was then integrated with a genetic algorithm. Results showed that GA was useful for autotuning of the CANFIS parameters. Hedeshi and Abadeh  performed PSO algorithm with a boosting approach. The proposed method used fuzzy rule extraction with PSO and enhanced-particle swarm optimization 2 (En-PSO2). Karaolis et al.  used myocardial infarction (MI), percutaneous coronary intervention (PCI), and coronary artery bypass graft surgery (CABG) models. The proposed method used C4.5 decision tree algorithms. Results were compared based on false positive (FP), precision, and so forth. By further investigation with various datasets and employing extraction rule algorithms further, better results were obtained.
Kim et al.  proposed a fuzzy rule-based adaptive coronary heart disease prediction support model. The proposed method had three parts, namely, introducing fuzzy membership functions, a decision-tree rule induction technique, and fuzzy inference based on Mamdani's method. Outcomes were compared with neural network, logistic regression, decision tree, and Bayes Net. Chaurasia and Pal  offered three popular data mining algorithms: CART (classification and regression tree), ID3 (iterative dichotomized 3), and decision table (DT) for diagnosing heart diseases, and the results presented demonstrated that CART obtained higher accuracy within less time.
Olaniyi et al.  used neural network and support vector machine for heart diseases. Their proposed method used multilayer perceptron and demonstrated that SVM produced high accuracy. Yan et al.  proposed that multilayer perception with hidden layers is found by a cascade process. For the inductive reasoning of the methods, the proposed method used three assessment procedures, namely, cross-validation, hold out, and five bootstrapping samples for five intervals. Yan et al.  utilized multilayer perception for the diagnosis of five different cases of heart disease. The method employed a cascade learning process to find hidden layers and used back propagation for training the datasets. Further improvements to the accuracy were achieved by parameter adjustments. Shouman et al.  identified gaps in the research work for heart disease diagnosis. The proposed method applied both single and hybrid data mining techniques to establish baseline accuracy and compared. Based on the research, hybrid classifier produced higher accuracy than a single classifier.
Sartakhti et al.  presented a method for diagnosis of hepatitis by novel machine learning methods that hybridize support vector machine and simulated annealing process. The proposed method used two hyperparameters for radial basis function (RBF) kernel: C and gamma. For all potential combinations of C and gamma interval, k-fold cross-validation score had been calculated. Results demonstrated that tuning SVM parameters by simulated annealing increased the accuracy. Çalişir et al.  developed the principle component analysis and least square support vector machine (PS-LLSVM). The suggested method was carried out in two steps: (1) the feature extraction from hepatitis disease database and feature reduction by PCA and (2) the reduced features are fed to the LSSVM classifier. Li and Wong  proposed C4.5 and PCL classifier. The outcomes were compared between C4.5 (bagging, boosting, and single tree) and PCL, and it was concluded that PCL produced higher accuracy than C4.5 based on their observations.
Weng et al.  investigated the performance of different classifiers which predicts Parkinson's disease. The proposed method used an ANN classifier based on the evaluation criteria. Jane et al.  proposed a Q-back propagated time delay neural network (Q-BTDNN) classifier. It developed temporal classification models that performed the task of classification and prognostication in clinical decision-making system. It used to feed forward time-delay neural network (TDNN) where training was imparted by a Q-learning-induced back propagation (Q-BP) technique. A 10-fold-cross-validation was employed for assessing the classification model. The results obtained were considered for comparative analysis, and it produced high accuracy. Gürüler  described a combination of the k-means clustering-based feature weighting (KMCFW) method and a complex-valued artificial neural network (CVANN). The suggested method considered five different evaluation methods. The cluster centers were estimated using the KMC. Results obtained showed very high accuracy.
Bashir et al.  presented an ensemble framework for predicting people with diabetes with multilayer classification using enhanced bagging and optimized weighting. The proposed HM-BagMOOV method used KNN approach for missing data imputation and had three layers, namely, layer 1 containing naive Bayes (NB), quadratic discriminant analysis (QDA), linear regression (LR), instance-based learning (IBL), and SVM; layer 2 included ANN and RF; and layer 3 used multilayer weighted bagging prediction. The outcome showed that it produced good accuracy for all datasets. Iyer et al.  prescribed a method to diagnose the disease using decision tree and naive Bayes. The proposed method used 10-fold cross-validation. The technique had been further enhanced by using other classifiers and neural network techniques. Choubey and Sanchita  used genetic algorithm and multilayer perceptron techniques for the diagnosis of diabetics. The suggested methodology was implemented in two levels where genetic algorithm (GA) was used for feature selection and multilayer perceptron neural network (MLP NN) was used for classification of the selected characteristics. The results produced excellent accuracy that was further increased by considering receiver operating characteristic (ROC).
Kharya  used various data mining techniques for the diagnosis and prognosis of cancer. The proposed method used neural network, association rule mining, naïve Bayes, C4.5 decision tree algorithm, and Bayesian networks. The results showed that decision tree produced better accuracy than other classifiers. Chaurasia and Pal  investigated the performance of different classification techniques on breast cancer data. The proposed method used three classification techniques, namely, SMO, k-nearest neighbor algorithm (IBK), and best first (BF) tree. The results demonstrated that SMO produced higher accuracy than the other two techniques. In this article , an expert system (ES) is proposed for clinical diagnosis which is helpful for decision making in primary health care. The ES proposed used a rule-based system to identify several diseases based on clinical test reports.
Alzubaidi et al. studied ovarian cancer well . In this work, features are selected using a hybrid global optimization technique. The hybridization process has involved mutual information, linear discriminate analysis, and genetic algorithm. The performance of the proposed hybrid technique is compared with support vector machine. This hybrid technique has shown significant performance improvements than support vector machine.
Gwak et al.  have proposed an ensemble framework for combining various crossover strategies using probability. The performance of this context had tested over 27 benchmark functions. It showed outperformance on eight tough benchmark functions. This ensemble framework further can be efficiently used for feature selection of big datasets.
Hsieh et al.  have developed and ensemble machine learning model for diagnosing breast cancer. In this model, information-gain has been adopted for feature selection. The list of classifiers used for developing ensemble classifier is neural fuzzy (NF), k-nearest neighbor (KNN), and the quadratic classifier (QC). The performance of ensemble framework is compared with individual classifier performance. The results demonstrate that ensemble framework has shown better performance than single classifier.
Review of existing literature for disease diagnosis techniques with machine learning indicates that there exists a plethora of individual classifiers as well as ensemble techniques. However, from such studies, it was also been conclusively evident that no individual classifier gives high prediction accuracy for different disease datasets. This has led to abundant ensemble classifiers for disease diagnosis, compromising the simplicity that an individual classifier offers. To this end, this paper indulges in designing a hybrid system that focuses on providing generalized performance across a broad range of benchmark datasets. The most significant contribution of the proposed hybrid disease classifiers is that unlike most research works mentioned before that targets a specific disease, this paper validates the efficacy of the proposed hybrid classifiers across six different diseases collected over eleven datasets. For instance, among all heart disease, related diagnosis systems only  consider five different datasets for the said disease. Also, there are very few attempts in validating diagnosis efficacy over multiple diseases. Shen et al.  and Bashir et al.  are few exceptions that validate their results for four and five different diseases, respectively. The proposed classifiers employ novel parameter optimization approaches using a few recent evolutionary algorithms, detailed design of which has been presented in subsequent sections.
In this paper, we deal with classifying data from different disease datasets using a hybrid technique that optimizes the parameters of SVM and MLP classifiers for improved disease prediction. The list of objective functions to be targeted while solving the said classification problem include (i) prediction accuracy, (ii) specificity, and (iii) sensitivity, which has been considered very commonly for this problem in existing literature [55–57]. Each of these objective functions captures some aspect of quality of disease classification. In this sense, the problem studied in this paper is a multiobjective optimization problem.
All the aforementioned measures are computed in terms of the following values: true positive (TP), true negative (TN), false positive (FP), and false negative (FN), and their significance is defined as follows: TP: total number of positives that are correctly identified as positive; TN: total number of negatives that are identified as negatives; FP: total number of negatives that are incorrectly identified as positives; and FN: total number of positives that are wrongly identified as negatives.
The objective functions considered for optimization in this work are prediction accuracy (PAC), specificity (SPY), and sensitivity (SEY). To model these functions, two random indicator variables are introduced for all the data objects to compute TP, TN, FP, and FN. These are Xi1 and Xi2, where these are defined as follows:
where C+ represents the actual class label is positive (+), C− represents the actual class label is negative (−), PCi represents predicted class label of ith data object, and CLi represents the actual class label of the ith data object. At any point of the time, the sum of the entire indicator random variable values is equal to 1; that is, ∑j=12Xij = 1, ∀i.
Let the classifier being developed for classifying a given dataset be a binary classifier and the dataset has N instances with m1 positive and m2 negative instances. Therefore,
The performance parameters for the classifiers can thus be obtained using the following three equations:
The aim of this research is to arrive at optimal values of classifier parameters through evolution such that some maxima are attained for PAC, SPY, and SEY. It is worthwhile to mention that even different sets of classifier parameter values with same PAC can have different values for SPY and SEY. Thus, there exist tradeoffs among (3), (4), and (5).
Any multiobjective optimization problem can then be solved either by converting the objective functions into a single linear or nonlinear objective function or by computing Pareto fronts using the concept of nondominance .
In this paper, a linear combination of objective functions has been taken to form a single linear compound objective function due to the requirement of additional computational effort for finding Pareto fronts in every iteration.
subject to the constraints
where CLASSIFIER_PARi is the ith sensitive parameter of the considered classifier, (7) represents the totality condition of the weights, (8) guarantees the nonnegativity condition, and (9) checks that the ith classifier parameter values is within the specified bounds.
In this section, we present a summary of the three evolutionary algorithms employed to optimize the parameters of SVM and MLP for classifying medical datasets for disease diagnosis. The discussions are restricted only to provide a brief overview. Detailed information and possible variations of these algorithms are beyond the scope of this paper.
Gravitational search algorithm (GSA) is one of the population-based stochastic search methods initially developed by Rashedi et al. in the year 2009 . GSA is inspired by Newton's gravitational law in physics, where every particle in the universe attracts every other particle with a force that is directly proportional to the product of their masses and inversely proportional to the square of the distance between them. GSA has been successfully applied to solve several engineering optimization problems [60, 61].
In GSA, several masses are considered on a d-dimensional space. The position of each mass resembles a point in the solution space of the problem to be solved. The fitness values of the agent, worst (t), and best (t) are used to compute the force (F) of mass. Equations corresponding to these parameters are provided in
To update the position of mass (xid(t + 1)), velocity (vid(t)) needs to be updated first. The velocity of the mass at the time (t + 1) majorly depends on the values of velocity and acceleration at that time instant t. Acceleration of the ith mass at instant t is (aid(t)) depending on forces of all other heavy masses based on (14). The equation corresponding to the acceleration is given in (15). Equations corresponding to updating process of mass position and mass velocity are provided in (16) and (17).
where randi and randj lie between 0 and 1. “” is a small value. The distance between agents i and j is denoted by Rij(t). The best k agents are denoted with kbest. G is a gravitational constant which is initialized with G0 at the beginning, and with the progress in time, the value of G decreases.
In 1995, Dr. Kennedy and Dr. Eberhart developed a population-based speculative computational optimization procedure called particle swarm optimization based on the social behavior of living organisms like fish schools and bird flocks . In PSO, the particles are randomly initialized. Position and velocity of the particles are represented as Xi and Vi, respectively. The fitness function is computed for each particle. Personal best (pBest) and global best (gBest) are the two important factors in PSO. Each particle has its own personal best, which is the particles' individual best so far achieved until a time instant t. Global best is the overall best of all particles upto the time instant t. The algorithm is executed for a certain number of iterations. At each iteration, velocity is updated for all particles using a velocity updating scheme  as depicted in
where w represents the inertia weight, c1 and c2 are the personal and global learning factors, and rand() is a random number between [0,1].
The following equation updates the new position of the particle:
The basic steps of PSO are given in Algorithm 1.
The firefly algorithm is a recently proposed bioinspired, evolutionary metaheuristic that mimics the social behavior of firefly species. Fireflies produce short and rhythmic flashes, the pattern of which characterizes particular species. The artificial, firefly-inspired algorithm makes certain assumptions regarding its functioning, such as unisexual fireflies for ensuring that all artificial fireflies attract each other and that the attractiveness is proportional to their brightness to define the potential of relative firefly movements. The brightness of a firefly is defined based on the problem at hand that it needs to optimize. For the minimization problem, the brightness may be the reciprocal of the objective function value. The pseudocode of the basic firefly algorithm as given by Yang in  has been depicted in Algorithm 2. The list of equations used in firefly algorithm is given as follows:
where β0, , α [0, 1].
Each firefly position is updated based on (20). The velocity of ith firefly is based on a fraction of attractiveness in the distance between fireflies Xi and Xj in an m-dimensional space and also on α, a small random value in the range 0 to 0.2; the equations related to velocity are given in (21) and (22).
Two classification techniques are used, and the basic details of these techniques are discussed in the subsequent sections.
Multilayer perceptron (MLP) is the most commonly used supervised feed forward artificial neural network. It is a modification of the linear perceptron algorithm. It consists of many nodes that are arranged in several layers. In general, MLP contains three or more processing layers: one input layer for receiving input features, one or more hidden layers, and an output layer for producing classification results based on the classes .
Each node is represented as an artificial neuron which converts the input features into output using a weighted sum of inputs and activation function.
The weighted input is given by
where V is the weighted sum of input features, W represents weights, X represents the input features, and Θ is the bias based on the classes.
The activation function is denoted by f(x). The most frequently used activation functions are sigmoids. They are as follows:
The multilayer perceptron is trained using back propagation (BP). The weight update equation used in BP is given in
There is a chance of being caught at a local minimum during the process of back propagation learning, and hence, to overcome it in this research article, learning rate and momentum values are evolved using 3 evolutionary search algorithms (CSO, IWO, and FF). A 3-layer neural network has been executed with an input layer, one hidden layer, and an output layer. The size of the input layer is equal to the number features of the data, and also, the size of the output layer is nothing but the number classes. The size of the hidden layer is the average of input and output layer sizes. Moreover, the performance of these three algorithms is compared in the simulation and is discussed in the Results and Discussion of this paper.
Support vector machine (SVM) is one of the supervised machine learning algorithms, which are often used for binary classifications. It was originally developed by Vapnik in 1979 . The training data is in the form of instance-value pairs (xi, yi). The SVM classifier finds an optimal hyperplane to separate negative and positive classes, and it is represented by F(x) = wt · x + b = 0.
Based on the class labels, two hyperplanes are formed, which are as follows:
F(x) = wt · x + b ≥ 0 for positive instances (yj = +1) and F(x) = wt · x + b ≤ 0 for negative instances (yj = −1), where w is the weight vector, x is input vector, and b is bias. Classifications are made on the hyperplanes thus formed.
The optimization problem formed during the development of soft margin classifier is as follows:
Diagnosing diseases from data collected from many patients with a varied degree of a specific disease is a classification problem. In medical information systems, single classifiers as well as ensemble classifiers have been studied for the disease diagnosis problem. In this section, we present the design of hybrid systems that employ evolutionary algorithms as well as classification techniques to classify diseases based on data. A few hybrid systems have been developed to optimize the parameters of the classifiers [69, 70]; however, the premises of such classifiers are different application domains.
The performance of any classifier broadly depends on three factors, namely, the technique used for the classification; data statistics (regression, entropy, kurtosis, standard deviation, number of features considered for training, size of the training data, etc.); and parameters of the classifier (learning rate, depth of the tree, maximum number of child nodes allowed for a parent node in the decision tree, pruning, fuzzy membership functions, activation functions, etc.). In this paper, we focus on optimizing the parameter classifiers using evolutionary algorithms, and thus, our designed system qualifies as a hybrid system. Figure 1 illustrates a schematic block diagram of the proposed hybrid system depicting the major steps to be carried out to arrive at disease diagnosis. The rectangle with dotted border illustrates the main emphasis of this paper. It represents that in this paper, we have studied how two classifiers, namely, SVM and MLP, perform as far as disease diagnosis is concerned. The parameters of these two classifiers have been optimized using three evolutionary algorithms, namely, PSO, GSA, and FA, with the goal of maximizing quality of diagnosis in terms of PAC, SPY, and SEY; or simply said, the goal is to optimize the three objectives as has been explained in Section 3. This has been depicted in the left half of the dotted rectangle in Figure 1. The basic steps involved in the hybrid system are summarized in Algorithm 4.
Preprocessing stage handles missing data of a feature by inserting most popular data or interval estimated data for that feature. As a part of preprocessing, features have also been normalized using min–max norm with the goal of reducing training phase time of classifiers, which takes quite some time due to the varied range of the feature values. In step 3, we have employed two classifiers (SVM and MLP). In step 4, the parameters selected for evolving in SVM are COST whereas for MLP, two parameters, namely, learning rate and momentum, have been selected for evolution. The range of these three parameters (cost, learning rate, and momentum) has been set as [0, 1]. In step 5, the objective function selected is either a single objective or multiobjective. If the method of optimization is multiobjective optimization, then for the sake of simplicity or uniformity, convert all the objective functions into either maximization or minimization. The multiple objectives considered for multiobjective optimization are given in (3)–(9). In step 6, three evolutionary algorithms (cat swarm optimization, gravitational search algorithm, and firefly algorithm) are selected as optimization techniques to find the optimum parameter values for the considered classifiers with respective to the multiple objectives: prediction accuracy, sensitivity, and specificity. Equations corresponding to multiobjective optimization are given in (3)–(9). In step 8, postprocessing of the results found in step 7 has to be done based on the optimization model selected in step 5. If the optimization model is single objective optimization, then to check the performance of the evolutionary algorithm, several statistical values like max, minimum, mean, median, and so forth have to be computed. If the selected optimization model is multiobjective (or weighted multiobjective) optimization model, the quality of nondominated solutions must found using the metrics like spacing, generational distance, and so forth.
The hybridization process ensures that the population of the evolutionary algorithms is constructed based on the classifier parameters by satisfying parameter bounds. During the execution of evolutionary algorithms, population fitness is computed by substituting the performance parameter values of the classifier executed on the dataset in step 4.
Once all the three EAs are executed individually, optimal parameter values for each of the two classifiers (SVM and MLP) are found, and subsequently, these six HISs are compared based on their fitness values. That HIS having the best fitness value for a particular dataset is considered as the proposed HIS for that particular dataset. The objective function and parameter values of the best hybrid intelligent system are treated as final optimal values.
By combining the two classifiers and the three evolutionary optimization techniques for optimizing chosen classifier parameters, a number of hybrid intelligent systems have been obtained as possible alternatives. These alternative hybrid intelligent systems (HISs) have been termed as GSA-based SVM (GSVM), FA-based SVM (FSVM), PSO-based SVM (PSVM), GSA-based MLP (GMLP), FA-based MLP (FMLP), and PSO-based MLP (PMLP). These six HISs are tested on all the eleven benchmark datasets considered in this work, once without employing resampling and then using resampling technique. Hence, these HISs produce a set of sixteen results for each of the disease datasets, eight for SVM and eight for MLP. Out of these eight results, one is for the basic classifier (only SVM and only MLP) without data resampling, another for the same with resampling data, and the remaining six are for the three evolutionary algorithms each, once with original data and again with resampling data. The benchmark datasets are tested with ADABOOST version of SVM and MLP. However, on average, the ADABOOST results are not competitive with the instance-based supervised resampling technique in Weka, and the corresponding performances are given in Table 1. Moreover, we continued our experiments using instance-based supervised resampling technique.
To check the performance of the proposed hybrid system, 11 medical datasets of various diseases are considered. These data have been collected from the UCI repository , and the same form the basis of almost all performance evaluations in disease diagnosis. A detailed account of the datasets employed in this paper has been summarized in Table 2. All the six hybrid system alternatives and basic classifier technique have been executed on each of the 11 datasets, once without resampling of the dataset and then repeated with resampled dataset.
All the three evolutionary algorithms are executed for 50 iterations by considering 20 agents per iteration. These algorithms have been implemented in Java. Weka 3.7.4 tool class libraries have been used for the implementation of SVM and MLP. Instance-based resampling, which is available in Weka, has been used for resampling purposes. For experimentation purposes, the datasets considered are divided into testing and training sets, and a 10-fold cross-validation is used to that effect. To compare the performance of our proposed hybrid system for the datasets employed, we have compared results we have obtained with the results presented in three very recent papers that use the same datasets (not all 11 datasets, only a subset is utilized by these papers). References  through  are recent literature, and they have been referred to in our work as the base papers for every dataset (as has been earmarked in legends in Figure 2). The results of datasets corresponding to diseases like breast cancer, hepatitis, BUPA liver, Pima, Cleveland, and Parkinson have been compared with those of , whereas the results of Statlog, Spect, Spectf, and Eric have been compared with those of BagMOOV . Thyroid disease results alone are compared with those of . In this work, the highest priority is given in favor of prediction accuracy. Hence, w1, w2, and w3 in (6) correspond to 0.95, 0.05/2, and 0.05/2, respectively.
In this section, we present a number of statistical analyses for the results obtained from our proposed hybrid system. The following subsections provide details about how these analyses are done.
The statistical analysis was done using Wilcoxon signed-rank test . It tests the performance of all the techniques. The null hypothesis and alternative null hypothesis are set as follows:
The objective values corresponding to FMLP overall disease datasets are tested over rest of the five techniques on all disease datasets, once with and next without resampling for each of the hybrid system alternatives, namely, FSVM, GSVM, PSVM, GMLP, and PMLP.
The Wilcoxon signed-rank test was executed with the level of significances 0.01 and 0.05. The Matlab function “signrank()” was used to perform the statistical analysis and the conclusions arrived upon has been presented in Tables Tables3,3, ,4,4, ,5,5, and and66.
Student's t-test is used to test whether the sample X derived from a normal distribution can have the mean m without knowing standard deviation . We execute FMLP for 20 times, and we also noted the best performance in each iteration. Student's t-test is executed on the three objectives: prediction accuracy (PAC), sensitivity (SEN), and specificity (SPE). Null hypothesis and alternative hypothesis are set as follows:
The important distinct goals of multiobjective optimization are (1) finding solutions as close to the Pareto-optimal solutions as possible and (2) finding solutions as diverse as possible in the obtained nondominated front. In this work, to test the first goal is tested using generational distance (GD) and the second target is tested by computing spacing . In the metric computation, two sets are used, namely, Q and P, where Q is the Pareto front found by test algorithm and P is the subset of true Pareto-optimal members. Before computing these metrics, the data in Q is to be normalized since various objective functions will have different ranges.
Generational distance (GD): Veldhuizen introduces this metric in the year 1990 . This metric finds an average distance between the members of Q and P as follows: GD = (∑i=1|Q|dip)1/p/|Q|. For p = 2, the parameter di is the Euclidean distance between the members of Q and the nearest member of P: where fm(k) is the mth objective function value of the kth member of P. An algorithm having a small value of GD is better. The members in P are having a maximum value for at least one objective function.
Spacing (SP): Schott introduces this metric in the year 1995 . This metric finds the standard deviation of different di values. It can be calculated as follows: where di = minkQ∩k≠i∑m=1M|fmi − fmk| and is the mean value of di's. A good algorithm will be having a minimal SP value. The set Q is caught by executing FMLP for 50 iterations with each iteration having 20 agents. In every iteration, the Pareto fronts are stored in external memory. The metrics for GD and SP for all the three objectives with and without resampling are given in Table 9.
The best values found in all the hybrid systems are discussed as follows.
The performance of all the 8 techniques (2 basic machine learning and six hybrid systems) over Cleveland dataset is depicted in Table 10. PMLP shows best sensitivity (84.79%), whereas FMLP shows better results for all the other performance parameters, like accuracy (85.8%), specificity (87.5%), F-measure (85.74%), recall (85.8%), and precision (85.91%) without resampling. On the contrary, with resampling, PMLP shows the best accuracy (94.1%), but for all the other parameters, like sensitivity (93.49%), specificity (94.77%), F-measure (94.05%), recall (94.05%), and precision (94.07%), PMLP (GMLP, FMLP) shows best results. A comparison of Cleveland result with the state-of-the-art result is given in Table 11. Table 10 summarizes the performance of the proposed hybrid alternatives for the Cleveland dataset, and Table 11 compares this performance with best results obtained in recent literature.
The performance of all the 8 techniques (2 machine learning and six hybrid systems) over Statlog dataset with and without resampling is given in Table 12. The highest accuracy, sensitivity, specificity, F-measure, recall, and precision without resampling are achieved by FMLP, PMLP, FMLP, FMLP, FMLP, and FMLP, respectively; best values found have been bolded for easy identification in Table 12. The highest accuracy, sensitivity, specificity, F-measure, recall, and precision with resampling are achieved by GMLP, GMLP, FMLP, GMLP, GMLP, and GMLP, respectively; best values found have been bolded for easy identification in Table 12. A comparison of Statlog result with the state-of-the-art result is given in Table 13. The highest prediction accuracy for Statlog is 85.9% (without resampling) and 90.7 (with resampling). The performance of all the considered techniques over Statlog dataset with resampling is better than without resampling. Table 12 summarizes the performance of proposed hybrid alternatives for this dataset, and Table 13 compares this performance with best results obtained in recent literature.
The performance of all the 8 techniques (2 machine learning and six hybrid systems) over Spect dataset with and without resampling is given in Table 14. The highest accuracy, sensitivity, specificity, F-measure, recall, and precision without resampling are achieved by FMLP, FSVM, FMLP, FMLP, FMLP, and FMLP, respectively, with the values 85%, 88.4%, 74.2%, 83.3%, 85%, and 83.9%. The highest accuracy, sensitivity, specificity, F-measure, recall, and precision with resampling are achieved by GMLP (PMLP), GMLP (FMLP, PMLP), GMLP (PMLP), PMLP (GMLP), GMLP (PMLP), and GMLP (PMLP), respectively, with the values 89.5%, 91.9%, 77.3%, 89.2%, 89.5%, and 89.1%. A comparison of Spect result with the state-of-the-art result is given in Table 15. The highest prediction accuracy for Spect is 85% (without resampling) and 89.5 (with resampling). The performance of all the considered techniques over Spect dataset with resampling is better than without resampling. Table 14 summarizes the performance of proposed hybrid alternatives for this dataset, and Table 15 compares this performance with best results obtained in recent literature.
The performance of all the 8 techniques (2 machine learning and six hybrid systems) over Spectf dataset with and without resampling is given in Table 16. The highest accuracy, sensitivity, specificity, F-measure, recall, and precision without resampling are achieved by FMLP, FSVM, FMLP, PSVM, FMLP, and FMLP, respectively, with the values 82.4%, 88%, 83.3%, 80.6%, 82.4%, and 82.6%. The highest accuracy, sensitivity, specificity, F-measure, recall, and precision with resampling are achieved by PMLP, GSVM, PMLP, PMLP, PMLP, and PMLP, respectively; best values found are bolded for easy identification in Table 16. A comparison of Spectf result with the state-of-the-art result is given in Table 17. The highest prediction accuracy for Spectf is 82.4% (without resampling) and 90.6% (with resampling). The performance of all the considered techniques over Spectf dataset with resampling is better than without resampling except in specificity. Table 16 summarizes the performance of proposed hybrid alternatives for this dataset, and Table 17 compares this performance with best results obtained in recent literature.
The performance of all the 8 techniques (2 basic machine learning and six hybrid systems) over ERIC dataset is depicted in Table 18. FMLP shows best results for parameters like accuracy (81.34%), specificity (79.1%), F-measure (81.02%), and recall (81.34%), whereas GMLP shows better results for sensitivity (88.41%) and precision (82.5%) without resampling. On the contrary, with resampling, GMLP shows best results for parameters like accuracy (91.39%), sensitivity (88.78%), specificity (93.69%), F-measure (91.40%), recall (91.39%), and precision (91.48%). A comparison of ERIC result with the state-of-the-art result is given in Table 19. Table 18 summarizes the performance of the proposed hybrid alternatives for the ERIC dataset, and Table 19 compares this performance with best performance in recent literature.
The performance of all the 8 techniques (2 basic machine learning and six hybrid systems) over breast cancer dataset is depicted in Table 20. GMLP shows best accuracy (97%) and precision (97.04%), whereas PSVM shows better results for the parameters like sensitivity (95.08%), specificity (98.02%), and F-measure (97%), and GMLP and PSVM together show the best result for recall (97%) without resampling. On the contrary, with resampling, FMLP shows the best accuracy (98%), but for all the other parameters, like sensitivity (96.61%), F-measure (98%), recall (98%), and precision (98%), PMLP (FMLP) shows best results and PSVM (GSVM and FSVM) shows best results for specificity (99.55%). A comparison of breast cancer result with the state-of-the-art result is given in Table 21. Table 20 summarizes the performance of the proposed hybrid alternatives for the breast cancer dataset, and Table 21 compares this performance with best results obtained in recent literature.
The performance of all the 8 techniques (2 basic machine learning and six hybrid systems) over Hepatitis dataset is depicted in Table 22. PMLP shows best results for specificity (90.55%), F-measure (86.77%), recall (87.1%), and precision (86.6%), whereas GSVM (PSVM and FSVM) shows better results for the parameters like accuracy (87.1%) and sensitivity (73.08%) without resampling. On the contrary, with resampling, FMLP (PMLP and GMLP) shows best results for parameters like accuracy (92.26%), sensitivity (80.77%), specificity (94.57%), F-measure (92.14%), recall (92.26%), and precision (92.08%). A comparison of Hepatitis result with the state-of-the-art result is given in Table 23. Table 22 summarizes the performance of the proposed hybrid alternatives for the Hepatitis dataset, and Table 23 compares this performance with best results obtained in recent literature.
The performance of all the 8 techniques (2 machine learning and 6 hybrid systems) over thyroid dataset with and without resampling is given in Table 24. The highest accuracy, sensitivity, specificity, F-measure, recall, and precision without resampling is achieved by FMLP, FMLP (PMLP), FMLP, FMLP, FMLP (PMLP), and FMLP (PMLP), respectively, best values found have been bolded for easy identification in Table 24. The highest accuracy, sensitivity, specificity, F-measure, recall, and precision with resampling are achieved by PMLP (fMLP), FMLP (PMLP), PMLP (FMLP), PMLP (FMLP), PMLP (FMLP), and PMLP (FMLP), respectively, with the values 98.6%, 98.2%, 98.74%, 98.6%, 98.6%, and 98.6%. A comparison of thyroid result with the state-of-the-art result is given in Table 25. The highest prediction accuracy for thyroid is 97.7% (without resampling) and 98.6% (with resampling). The performance of all the considered techniques over thyroid dataset with resampling is better than without resampling. Table 24 summarizes the performance of proposed hybrid alternatives for this dataset, and Table 25 compares this performance with best results obtained in recent literature.
The performance of all the 8 techniques (2 machine learning and six hybrid systems) over Parkinson dataset with and without resampling is given in Table 26. The highest accuracy, sensitivity, specificity, F-measure, recall, and precision without resampling are achieved by FMLP, FMLP, PSVM (FSVM, GSVM), FMLP, FMLP, and FMLP, respectively, with the values 93.8%, 96.6%, 96.2%, 93.9%, 93.8%, and 94%. The highest accuracy, sensitivity, specificity, F-measure, recall, and precision with resampling are achieved by GMLP, GMLP (FMLP, PMLP), PSVM (GSVM, FSVM), PMLP (GMLP), PMLP (GMLP), and PMLP (GMLP), respectively, with the values 96.9%, 97.4%, 100%, 96.9%, 96.9%, and 96.9%. A comparison of Parkinson result with the state-of-the-art result is given in Table 27. The highest prediction accuracy for Parkinson is 93.8% (without resampling) and 96.9% (with resampling). The performance of all the considered techniques over Pakinson dataset with resampling is better than without resampling. Table 26 summarizes the performance of proposed hybrid alternatives for this dataset, and Table 27 compares this performance with best results obtained in recent literature.
The performance of all the 8 techniques (2 machine learning and six hybrid systems) over Pima dataset with and without resampling is given in Table 28. The highest accuracy, sensitivity, specificity, F-measure, recall, and precision without resampling are achieved by FMLP, FMLP, PSVM (FSVM, GSVM), FMLP, FMLP, and FMLP, respectively; best values found have been bolded for easy identification in Table 28. The highest accuracy, sensitivity, specificity, F-measure, recall and precision with resampling are achieved by FMLP, GMLP, FMLP, FMLP, FMLP, and FMLP, respectively; best values found have been bolded for easy identification in Table 28. A comparison of Pima result with the state-of-the-art result is given in Table 29. The highest prediction accuracy for Pima is 78.3% (without resampling) and 81% (with resampling). The performance of all the considered techniques over Pima dataset with resampling is better than without resampling. Table 28 summarizes the performance of proposed hybrid alternatives for this dataset, and Table 29 compares this performance with best results obtained in recent literature.
The performance of all the 8 techniques (2 machine learning and six hybrid systems) over BUPA dataset with and without resampling is given in Table 30. The highest accuracy, sensitivity, specificity, F-measure, recall, and precision without resampling are achieved by GMLP, PSVM, FMLP, FMLP, FMLP, and FMLP, respectively, with the values 73%, 72%, 74.9%, 72.8%, 73%, and 72.8%. The highest accuracy, sensitivity, specificity, F-measure, recall, and precision with resampling are achieved by GMLP, FMLP, GMLP, GMLP, GMLP, and GMLP, respectively, with the values 73.3%, 67.5%, 78.4%, 73.3%, 73.3%, and 73.9%. A comparison of BUPA result with the state-of-the-art result is given in Table 31. The highest prediction accuracy for BUPA is 73% (without resampling) and 73.3% (with resampling). The performance of all the considered techniques over BUPA dataset with resampling is better than without resampling except in sensitivity. Table 30 summarizes the performance of proposed hybrid alternatives for this dataset, and Table 31 compares this performance with best results obtained in recent literature.
In GSA updating of an agent, the position is learned from all other agents, whereas in PSO updating of an agent position is based on two parameters called gBEST and pBEST. In each iteration of these two algorithms at most n, new solutions are brought forth. However, in FA in the worst case, each agent develops O(n) new solution by moving towards all other best solutions. Therefore, in the worst case of FA, space is managed more efficiently than the other two algorithms (GSA and PSO). The same is demonstrated over the 11 medical datasets.
From the previous observations, it is concluded that MLP without resampling shows improvement in all datasets when compared with latest literature results and the same is depicted in Figure 2. As mentioned earlier, in Figure 2, the blue bar represents best performance in literature. Best results obtained by any of the six HISs proposed in this paper has been depicted in Figure 2 alongside, once without resampling (orange bar) and then with resampled data (gray bar). Sensitivity and specificity values for all systems have been presented in Tables Tables3232 and and33,33, and it can be observed that our proposed hybrid system performs very well across all the datasets, in particular, the parameter optimized MLP. Table 34 summarizes the optimal parameter values of MLP. Hence, in comparison with the ensemble techniques, parameter optimized MLP gives a better result.
Table 3 gives the outcomes of the rank test for the results of with resampling at the level of significance (LOS) 0.01. If h value is zero, then H0 is true otherwise H1 is true. In Table 3, 161 ones are there out of a total of 165. It means that 161 times null hypothesis is false and four times null hypothesis is true.
Table 4 gives the outcomes of the rank test for the results of with resampling at the level of significance (LOS) 0.05. In Table 4, 163 ones are there out of 165. It means that 163 times null-hypothesis is false and four times null hypothesis is true.
Table 5 gives the outcomes of the rank test for the results of without resampling at the level of significance (LOS) 0.01. In Table 5, 164 ones are there out of 165. It means that 164 times null hypothesis is false and one time null hypothesis is true.
Table 6 gives the outcomes of the rank test for the results of without resampling at the level of significance (LOS) 0.05. In Table 6, 164 ones are there out of 165. It means that 164 times null hypothesis is false and one-time null hypothesis is true.
The outcomes of FMLP are taken as “m” value. Tables Tables77 and and88 give the results of t-test for both resampling and without resampling techniques with LOS 0.01 and 0.05. In these tables, h value is zero for all datasets at 0.01 and 0.05 LOS. Hence, null hypothesis is accepted for all datasets.
Due to the complex framework of ensemble approach and the moderate performance of the individual classifier, hybrid systems have a lot of promise in the diagnosis and prognosis of diseases. To overcome these, we proposed a disease diagnosis system by juxtaposing three evolutionary algorithms and SVM and MLP classifiers. Three evolutionary algorithms optimize the parameters of the two classifiers and such enhanced classifiers have been used to train and diagnose diseases. Accordingly, six hybrid diagnosis alternatives have been obtained by working out the combinations of classifiers and evolutionary algorithms. Based on results presented in this paper, it can be concluded that our hybridization approach provides high prediction accuracy than other methods in literature across a wide variety of disease datasets. Even among the six alternative parameter optimized classifier systems proposed, FMLP was found to be the relatively best across the majority of the 11 datasets considered. On average, MLP shows 2.2% and 6.814% improvement in prediction accuracy on the 11 datasets with and without resampling. The ranges of improvements shown by MLP in the objective sensitivity are −2.9 to 75.13 and −9.68 to 86.33 without and with resampling, respectively. The ranges of improvement shown by MLP in the objective specificity are −9.68 to 86.33 and −18.93 to 36.33 without and with resampling, respectively. From the experimental results, it is concluded that FMLP shows outperformance than recently developed ensemble classifiers ([14, 15]). As a part of the continuation of this research, we intend to process a very higher dimensional dataset with the major phases of feature selection and parameter evolution of the classifier. For feature selection, similarity metric-based hypergraph will be constructed and then by using hypergraph special properties, important topological and geometrical features will be identified. In phase 2, competitive and co-operative parallel hybrid intelligent systems will be employed for incorporating direct and indirect communication among the different systems at guaranteed run times that would allow the entire HISs to converge to a single value. This work is presently ongoing.
Two of the authors (MadhuSudana Rao Nalluri and Kannan K.) of this paper wish to thank the Department of Science and Technology, Government of India, for the financial sanction towards this work under FIST programme: SR/FST/MSI-107/2015.
The authors declare that there is no conflict of interest regarding the publication of this paper.